dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] new cgroup controller for gpu/drm subsystem
@ 2020-02-14 15:56 Kenny Ho
  2020-02-14 15:56 ` [PATCH 01/11] cgroup: Introduce cgroup for drm subsystem Kenny Ho
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

This is a submission for the introduction of a new cgroup controller for the drm subsystem follow a series of RFCs [v1, v2, v3, v4]

Changes from the RFC base on the feedbacks:
* drop all drm.memory.* related implementation and focus only on buffer and lgpu
* add weight resource type for logical gpu (lgpu)
* uncoupled drmcg device iteration from drm_minor

I'd also like to highlight the fact that these patches are currently released under MIT/X11 license aligning with the norm of the drm subsystem, but I am working to have the cgroup parts release under GPLv2 to align with the norm of the cgroup subsystem.

[v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
[v2]: https://www.spinics.net/lists/cgroups/msg22074.html
[v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html
[v4]: https://patchwork.kernel.org/cover/11120371/

Changes since the start of RFC are as follows:

v4:
Unchanged (no review needed)
* drm.memory.*/ttm resources (Patch 9-13, I am still working on memory bandwidth
and shrinker)
Base on feedbacks on v3:
* update nominclature to drmcg
* embed per device drmcg properties into drm_device
* split GEM buffer related commits into stats and limit
* rename function name to align with convention
* combined buffer accounting and check into a try_charge function
* support buffer stats without limit enforcement
* removed GEM buffer sharing limitation
* updated documentations
New features:
* introducing logical GPU concept
* example implementation with AMD KFD

v3:
Base on feedbacks on v2:
* removed .help type file from v2
* conform to cgroup convention for default and max handling
* conform to cgroup convention for addressing device specific limits (with major:minor)
New function:
* adopted memparse for memory size related attributes
* added macro to marshall drmcgrp cftype private ?(DRMCG_CTF_PRIV, etc.)
* added ttm buffer usage stats (per cgroup, for system, tt, vram.)
* added ttm buffer usage limit (per cgroup, for vram.)
* added per cgroup bandwidth stats and limiting (burst and average bandwidth)

v2:
* Removed the vendoring concepts
* Add limit to total buffer allocation
* Add limit to the maximum size of a buffer allocation

v1: cover letter

The purpose of this patch series is to start a discussion for a generic cgroup
controller for the drm subsystem.  The design proposed here is a very early 
one.  We are hoping to engage the community as we develop the idea.

Backgrounds
===========
Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
tasks, and all their future children, into hierarchical groups with specialized
behaviour, such as accounting/limiting the resources which processes in a 
cgroup can access[1].  Weights, limits, protections, allocations are the main 
resource distribution models.  Existing cgroup controllers includes cpu, 
memory, io, rdma, and more.  cgroup is one of the foundational technologies 
that enables the popular container application deployment and management method.

Direct Rendering Manager/drm contains code intended to support the needs of
complex graphics devices. Graphics drivers in the kernel may make use of DRM
functions to make tasks like memory management, interrupt handling and DMA
easier, and provide a uniform interface to applications.  The DRM has also
developed beyond traditional graphics applications to support compute/GPGPU
applications.

Motivations
===========
As GPU grow beyond the realm of desktop/workstation graphics into areas like
data center clusters and IoT, there are increasing needs to monitor and 
regulate GPU as a resource like cpu, memory and io.

Matt Roper from Intel began working on similar idea in early 2018 [2] for the
purpose of managing GPU priority using the cgroup hierarchy.  While that
particular use case may not warrant a standalone drm cgroup controller, there
are other use cases where having one can be useful [3].  Monitoring GPU
resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
(execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
sysadmins get a better understanding of the applications usage profile. 
Further usage regulations of the aforementioned resources can also help sysadmins optimize workload deployment on limited GPU resources.

With the increased importance of machine learning, data science and other
cloud-based applications, GPUs are already in production use in data centers
today [5,6,7].  Existing GPU resource management is very course grain, however,
as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
alternative is to use GPU virtualization (with or without SRIOV) but it
generally acts on the entire GPU instead of the specific resources in a GPU.
With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
resource management (in addition to what may be available via GPU
virtualization.)

In addition to production use, the DRM cgroup can also help with testing
graphics application robustness by providing a mean to artificially limit DRM
resources availble to the applications.


Challenges
==========
While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed
some of the ideas from RDMA cgroup controller.

Approach
========
To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757

Kenny Ho (11):
  cgroup: Introduce cgroup for drm subsystem
  drm, cgroup: Bind drm and cgroup subsystem
  drm, cgroup: Initialize drmcg properties
  drm, cgroup: Add total GEM buffer allocation stats
  drm, cgroup: Add peak GEM buffer allocation stats
  drm, cgroup: Add GEM buffer allocation count stats
  drm, cgroup: Add total GEM buffer allocation limit
  drm, cgroup: Add peak GEM buffer allocation limit
  drm, cgroup: Introduce lgpu as DRM cgroup resource
  drm, cgroup: add update trigger after limit change
  drm/amdgpu: Integrate with DRM cgroup

 Documentation/admin-guide/cgroup-v2.rst       |  197 ++-
 Documentation/cgroup-v1/drm.rst               |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   48 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |    6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |    6 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |    3 +
 .../amd/amdkfd/kfd_process_queue_manager.c    |  153 +++
 drivers/gpu/drm/drm_drv.c                     |   12 +
 drivers/gpu/drm/drm_gem.c                     |   16 +-
 include/drm/drm_cgroup.h                      |   81 ++
 include/drm/drm_device.h                      |    7 +
 include/drm/drm_drv.h                         |   19 +
 include/drm/drm_gem.h                         |   12 +-
 include/linux/cgroup_drm.h                    |  144 +++
 include/linux/cgroup_subsys.h                 |    4 +
 init/Kconfig                                  |    5 +
 kernel/cgroup/Makefile                        |    1 +
 kernel/cgroup/drm.c                           | 1059 +++++++++++++++++
 19 files changed, 1773 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/11] cgroup: Introduce cgroup for drm subsystem
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 02/11] drm, cgroup: Bind drm and cgroup subsystem Kenny Ho
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

With the increased importance of machine learning, data science and
other cloud-based applications, GPUs are already in production use in
data centers today.  Existing GPU resource management is very coarse
grain, however, as sysadmins are only able to distribute workload on a
per-GPU basis.  An alternative is to use GPU virtualization (with or
without SRIOV) but it generally acts on the entire GPU instead of the
specific resources in a GPU.  With a drm cgroup controller, we can
enable alternate, fine-grain, sub-GPU resource management (in addition
to what may be available via GPU virtualization.)

Change-Id: Ia90aed8c4cb89ff20d8216a903a765655b44fc9a
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 18 ++++-
 Documentation/cgroup-v1/drm.rst         |  1 +
 include/linux/cgroup_drm.h              | 92 +++++++++++++++++++++++++
 include/linux/cgroup_subsys.h           |  4 ++
 init/Kconfig                            |  5 ++
 kernel/cgroup/Makefile                  |  1 +
 kernel/cgroup/drm.c                     | 42 +++++++++++
 7 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 5361ebec3361..384db8df0f30 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -61,8 +61,10 @@ v1 is available under Documentation/admin-guide/cgroup-v1/.
      5-6. Device
      5-7. RDMA
        5-7-1. RDMA Interface Files
-     5-8. Misc
-       5-8-1. perf_event
+     5-8. DRM
+       5-8-1. DRM Interface Files
+     5-9. Misc
+       5-9-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -2051,6 +2053,18 @@ RDMA Interface Files
 	  ocrdma1 hca_handle=1 hca_object=23
 
 
+DRM
+---
+
+The "drm" controller regulates the distribution and accounting of
+of DRM (Direct Rendering Manager) and GPU-related resources.
+
+DRM Interface Files
+~~~~~~~~~~~~~~~~~~~~
+
+TODO
+
+
 Misc
 ----
 
diff --git a/Documentation/cgroup-v1/drm.rst b/Documentation/cgroup-v1/drm.rst
new file mode 100644
index 000000000000..5f5658e1f5ed
--- /dev/null
+++ b/Documentation/cgroup-v1/drm.rst
@@ -0,0 +1 @@
+Please see ../cgroup-v2.rst for details
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
new file mode 100644
index 000000000000..ba7981ac3afc
--- /dev/null
+++ b/include/linux/cgroup_drm.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef _CGROUP_DRM_H
+#define _CGROUP_DRM_H
+
+#include <linux/cgroup.h>
+
+#ifdef CONFIG_CGROUP_DRM
+
+/**
+ * The DRM cgroup controller data structure.
+ */
+struct drmcg {
+	struct cgroup_subsys_state	css;
+};
+
+/**
+ * css_to_drmcg - get the corresponding drmcg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Return: DRM cgroup that contains the @css
+ */
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct drmcg, css) : NULL;
+}
+
+/**
+ * drmcg_get - get the drmcg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increase the reference count of the css that the @task belongs to
+ *
+ * Return: reference to the DRM cgroup the task belongs to
+ */
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+	return css_to_drmcg(task_get_css(task, drm_cgrp_id));
+}
+
+/**
+ * drmcg_put - put a drmcg reference
+ * @drmcg: the target drmcg
+ *
+ * Put a reference obtained via drmcg_get
+ */
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+	if (drmcg)
+		css_put(&drmcg->css);
+}
+
+/**
+ * drmcg_parent - find the parent of a drm cgroup
+ * @cg: the target drmcg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Return: parent DRM cgroup of @cg
+ */
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+	return css_to_drmcg(cg->css.parent);
+}
+
+#else /* CONFIG_CGROUP_DRM */
+
+struct drmcg {
+};
+
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+}
+
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+	return NULL;
+}
+
+#endif	/* CONFIG_CGROUP_DRM */
+#endif	/* _CGROUP_DRM_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..ddedad809e8b 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,10 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+SUBSYS(drm)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index b4daad2bac23..8ea6bcfbe5e9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -981,6 +981,11 @@ config CGROUP_RDMA
 	  Attaching processes with active RDMA resources to the cgroup
 	  hierarchy is allowed even if can cross the hierarchy's limit.
 
+config CGROUP_DRM
+	bool "DRM controller (EXPERIMENTAL)"
+	help
+	  Provides accounting and enforcement of resources in the DRM subsystem.
+
 config CGROUP_FREEZER
 	bool "Freezer controller"
 	help
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 5d7a76bfbbb7..31f186f58121 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -4,5 +4,6 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o freezer.o
 obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += pids.o
 obj-$(CONFIG_CGROUP_RDMA) += rdma.o
+obj-$(CONFIG_CGROUP_DRM) += drm.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
new file mode 100644
index 000000000000..e97861b3cb30
--- /dev/null
+++ b/kernel/cgroup/drm.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+#include <linux/slab.h>
+#include <linux/cgroup.h>
+#include <linux/cgroup_drm.h>
+
+static struct drmcg *root_drmcg __read_mostly;
+
+static void drmcg_css_free(struct cgroup_subsys_state *css)
+{
+	struct drmcg *drmcg = css_to_drmcg(css);
+
+	kfree(drmcg);
+}
+
+static struct cgroup_subsys_state *
+drmcg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct drmcg *parent = css_to_drmcg(parent_css);
+	struct drmcg *drmcg;
+
+	drmcg = kzalloc(sizeof(struct drmcg), GFP_KERNEL);
+	if (!drmcg)
+		return ERR_PTR(-ENOMEM);
+
+	if (!parent)
+		root_drmcg = drmcg;
+
+	return &drmcg->css;
+}
+
+struct cftype files[] = {
+	{ }	/* terminate */
+};
+
+struct cgroup_subsys drm_cgrp_subsys = {
+	.css_alloc	= drmcg_css_alloc,
+	.css_free	= drmcg_css_free,
+	.early_init	= false,
+	.legacy_cftypes	= files,
+	.dfl_cftypes	= files,
+};
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/11] drm, cgroup: Bind drm and cgroup subsystem
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
  2020-02-14 15:56 ` [PATCH 01/11] cgroup: Introduce cgroup for drm subsystem Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 03/11] drm, cgroup: Initialize drmcg properties Kenny Ho
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

Since the drm subsystem can be compiled as a module and drm devices can
be added and removed during run time, add several functions to bind the
drm subsystem as well as drm devices with drmcg.

Two pairs of functions:
drmcg_bind/drmcg_unbind - used to bind/unbind the drm subsystem to the
cgroup subsystem as the drm core initialize/exit.

drmcg_register_dev/drmcg_unregister_dev - used to register/unregister
drm devices to the cgroup subsystem as the devices are presented/removed
from userspace.

Change-Id: I1cb6b2080fc7d27979d886ef23e784341efafb41
---
 drivers/gpu/drm/drm_drv.c  |   8 +++
 include/drm/drm_cgroup.h   |  39 +++++++++++
 include/linux/cgroup_drm.h |   4 ++
 kernel/cgroup/drm.c        | 131 +++++++++++++++++++++++++++++++++++++
 4 files changed, 182 insertions(+)
 create mode 100644 include/drm/drm_cgroup.h

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 1b9b40a1c7c9..8e59cc5a5bde 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -41,6 +41,7 @@
 #include <drm/drm_file.h>
 #include <drm/drm_mode_object.h>
 #include <drm/drm_print.h>
+#include <drm/drm_cgroup.h>
 
 #include "drm_crtc_internal.h"
 #include "drm_internal.h"
@@ -972,6 +973,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags)
 
 	ret = 0;
 
+	drmcg_register_dev(dev);
+
 	DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n",
 		 driver->name, driver->major, driver->minor,
 		 driver->patchlevel, driver->date,
@@ -1006,6 +1009,8 @@ EXPORT_SYMBOL(drm_dev_register);
  */
 void drm_dev_unregister(struct drm_device *dev)
 {
+	drmcg_unregister_dev(dev);
+
 	if (drm_core_check_feature(dev, DRIVER_LEGACY))
 		drm_lastclose(dev);
 
@@ -1112,6 +1117,7 @@ static const struct file_operations drm_stub_fops = {
 
 static void drm_core_exit(void)
 {
+	drmcg_unbind();
 	unregister_chrdev(DRM_MAJOR, "drm");
 	debugfs_remove(drm_debugfs_root);
 	drm_sysfs_destroy();
@@ -1138,6 +1144,8 @@ static int __init drm_core_init(void)
 	if (ret < 0)
 		goto error;
 
+	drmcg_bind(&drm_minor_acquire, &drm_dev_put);
+
 	drm_core_init_complete = true;
 
 	DRM_DEBUG("Initialized\n");
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
new file mode 100644
index 000000000000..530c9a0b3238
--- /dev/null
+++ b/include/drm/drm_cgroup.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef __DRM_CGROUP_H__
+#define __DRM_CGROUP_H__
+
+#ifdef CONFIG_CGROUP_DRM
+
+void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+		void (*put_ddev)(struct drm_device *dev));
+
+void drmcg_unbind(void);
+
+void drmcg_register_dev(struct drm_device *dev);
+
+void drmcg_unregister_dev(struct drm_device *dev);
+
+#else
+
+static inline void drmcg_bind(
+		struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+		void (*put_ddev)(struct drm_device *dev))
+{
+}
+
+static inline void drmcg_unbind(void)
+{
+}
+
+static inline void drmcg_register_dev(struct drm_device *dev)
+{
+}
+
+static inline void drmcg_unregister_dev(struct drm_device *dev)
+{
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index ba7981ac3afc..854591bbb430 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -5,6 +5,10 @@
 #define _CGROUP_DRM_H
 
 #include <linux/cgroup.h>
+#include <drm/drm_file.h>
+
+/* limit defined per the way drm_minor_alloc operates */
+#define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
 #ifdef CONFIG_CGROUP_DRM
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index e97861b3cb30..37f98dc47268 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -1,11 +1,142 @@
 // SPDX-License-Identifier: MIT
 // Copyright 2019 Advanced Micro Devices, Inc.
+#include <linux/bitmap.h>
+#include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/cgroup.h>
 #include <linux/cgroup_drm.h>
+#include <drm/drm_file.h>
+#include <drm/drm_device.h>
+#include <drm/drm_cgroup.h>
 
 static struct drmcg *root_drmcg __read_mostly;
 
+/* global mutex for drmcg across all devices */
+static DEFINE_MUTEX(drmcg_mutex);
+
+static DECLARE_BITMAP(known_devs, MAX_DRM_DEV);
+
+static struct drm_minor (*(*acquire_drm_minor)(unsigned int minor_id));
+
+static void (*put_drm_dev)(struct drm_device *dev);
+
+/**
+ * drmcg_bind - Bind DRM subsystem to cgroup subsystem
+ * @acq_dm: function pointer to the drm_minor_acquire function
+ * @put_ddev: function pointer to the drm_dev_put function
+ *
+ * This function binds some functions from the DRM subsystem and make
+ * them available to the drmcg subsystem.
+ *
+ * drmcg_unbind does the opposite of this function
+ */
+void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
+		void (*put_ddev)(struct drm_device *dev))
+{
+	mutex_lock(&drmcg_mutex);
+	acquire_drm_minor = acq_dm;
+	put_drm_dev = put_ddev;
+	mutex_unlock(&drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_bind);
+
+/**
+ * drmcg_unbind - Unbind DRM subsystem from cgroup subsystem
+ *
+ * drmcg_bind does the opposite of this function
+ */
+void drmcg_unbind(void)
+{
+	mutex_lock(&drmcg_mutex);
+	acquire_drm_minor = NULL;
+	put_drm_dev = NULL;
+	mutex_unlock(&drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_unbind);
+
+/**
+ * drmcg_register_dev - register a DRM device for usage in drm cgroup
+ * @dev: DRM device
+ *
+ * This function make a DRM device visible to the cgroup subsystem.
+ * Once the drmcg is aware of the device, drmcg can start tracking and
+ * control resource usage for said device.
+ *
+ * drmcg_unregister_dev reverse the operation of this function
+ */
+void drmcg_register_dev(struct drm_device *dev)
+{
+	if (WARN_ON(dev->primary->index >= MAX_DRM_DEV))
+		return;
+
+	mutex_lock(&drmcg_mutex);
+	set_bit(dev->primary->index, known_devs);
+	mutex_unlock(&drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_register_dev);
+
+/**
+ * drmcg_unregister_dev - Iterate through all stored DRM minors
+ * @dev: DRM device
+ *
+ * Unregister @dev so that drmcg no longer control resource usage
+ * of @dev.  The @dev was registered to drmcg using
+ * drmcg_register_dev function
+ */
+void drmcg_unregister_dev(struct drm_device *dev)
+{
+	if (WARN_ON(dev->primary->index >= MAX_DRM_DEV))
+		return;
+
+	mutex_lock(&drmcg_mutex);
+	clear_bit(dev->primary->index, known_devs);
+	mutex_unlock(&drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_unregister_dev);
+
+/**
+ * drm_minor_for_each - Iterate through all stored DRM minors
+ * @fn: Function to be called for each pointer.
+ * @data: Data passed to callback function.
+ *
+ * The callback function will be called for each registered device, passing
+ * the minor, the @drm_minor entry and @data.
+ *
+ * If @fn returns anything other than %0, the iteration stops and that
+ * value is returned from this function.
+ */
+static int drm_minor_for_each(int (*fn)(int id, void *p, void *data),
+		void *data)
+{
+	int rc = 0;
+
+	mutex_lock(&drmcg_mutex);
+	if (acquire_drm_minor) {
+		unsigned int minor;
+		struct drm_minor *dm;
+
+		minor = find_next_bit(known_devs, MAX_DRM_DEV, 0);
+		while (minor < MAX_DRM_DEV) {
+			dm = acquire_drm_minor(minor);
+
+			if (IS_ERR(dm))
+				continue;
+
+			rc = fn(minor, (void *)dm, data);
+
+			put_drm_dev(dm->dev); /* release from acquire_drm_minor */
+
+			if (rc)
+				break;
+
+			minor = find_next_bit(known_devs, MAX_DRM_DEV, minor+1);
+		}
+	}
+	mutex_unlock(&drmcg_mutex);
+
+	return rc;
+}
+
 static void drmcg_css_free(struct cgroup_subsys_state *css)
 {
 	struct drmcg *drmcg = css_to_drmcg(css);
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/11] drm, cgroup: Initialize drmcg properties
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
  2020-02-14 15:56 ` [PATCH 01/11] cgroup: Introduce cgroup for drm subsystem Kenny Ho
  2020-02-14 15:56 ` [PATCH 02/11] drm, cgroup: Bind drm and cgroup subsystem Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 04/11] drm, cgroup: Add total GEM buffer allocation stats Kenny Ho
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

drmcg initialization involves allocating a per cgroup, per device data
structure and setting the defaults.  There are two entry points for
drmcg init:

1) When struct drmcg is created via css_alloc, initialization is done
for each device

2) When DRM devices are created after drmcgs are created
  a) Per device drmcg data structure is allocated at the beginning of
  DRM device creation such that drmcg can begin tracking usage
  statistics
  b) At the end of DRM device creation, drmcg_register_dev will update in
  case device specific defaults need to be applied.

Entry point #2 usually applies to the root cgroup since it can be
created before DRM devices are available.  The drmcg controller will go
through all existing drm cgroups and initialize them with the new device
accordingly.

Change-Id: I64e421d8dfcc22ee8282cc1305960e20c2704db7
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/drm_drv.c  |   4 ++
 include/drm/drm_cgroup.h   |  18 +++++++
 include/drm/drm_device.h   |   7 +++
 include/drm/drm_drv.h      |   9 ++++
 include/linux/cgroup_drm.h |  12 +++++
 kernel/cgroup/drm.c        | 105 +++++++++++++++++++++++++++++++++++++
 6 files changed, 155 insertions(+)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 8e59cc5a5bde..44a66edc81c2 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -643,6 +643,7 @@ int drm_dev_init(struct drm_device *dev,
 	mutex_init(&dev->filelist_mutex);
 	mutex_init(&dev->clientlist_mutex);
 	mutex_init(&dev->master_mutex);
+	mutex_init(&dev->drmcg_mutex);
 
 	dev->anon_inode = drm_fs_inode_new();
 	if (IS_ERR(dev->anon_inode)) {
@@ -679,6 +680,7 @@ int drm_dev_init(struct drm_device *dev,
 	if (ret)
 		goto err_setunique;
 
+	drmcg_device_early_init(dev);
 	return 0;
 
 err_setunique:
@@ -693,6 +695,7 @@ int drm_dev_init(struct drm_device *dev,
 	drm_fs_inode_free(dev->anon_inode);
 err_free:
 	put_device(dev->dev);
+	mutex_destroy(&dev->drmcg_mutex);
 	mutex_destroy(&dev->master_mutex);
 	mutex_destroy(&dev->clientlist_mutex);
 	mutex_destroy(&dev->filelist_mutex);
@@ -769,6 +772,7 @@ void drm_dev_fini(struct drm_device *dev)
 
 	put_device(dev->dev);
 
+	mutex_destroy(&dev->drmcg_mutex);
 	mutex_destroy(&dev->master_mutex);
 	mutex_destroy(&dev->clientlist_mutex);
 	mutex_destroy(&dev->filelist_mutex);
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 530c9a0b3238..fda426fba035 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -4,8 +4,17 @@
 #ifndef __DRM_CGROUP_H__
 #define __DRM_CGROUP_H__
 
+#include <linux/cgroup_drm.h>
+
 #ifdef CONFIG_CGROUP_DRM
 
+/**
+ * Per DRM device properties for DRM cgroup controller for the purpose
+ * of storing per device defaults
+ */
+struct drmcg_props {
+};
+
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
 		void (*put_ddev)(struct drm_device *dev));
 
@@ -15,8 +24,13 @@ void drmcg_register_dev(struct drm_device *dev);
 
 void drmcg_unregister_dev(struct drm_device *dev);
 
+void drmcg_device_early_init(struct drm_device *device);
+
 #else
 
+struct drmcg_props {
+};
+
 static inline void drmcg_bind(
 		struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
 		void (*put_ddev)(struct drm_device *dev))
@@ -35,5 +49,9 @@ static inline void drmcg_unregister_dev(struct drm_device *dev)
 {
 }
 
+static inline void drmcg_device_early_init(struct drm_device *device)
+{
+}
+
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 1acfc3bbd3fb..a94598b8f670 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -8,6 +8,7 @@
 
 #include <drm/drm_hashtab.h>
 #include <drm/drm_mode_config.h>
+#include <drm/drm_cgroup.h>
 
 struct drm_driver;
 struct drm_minor;
@@ -308,6 +309,12 @@ struct drm_device {
 	 */
 	struct drm_fb_helper *fb_helper;
 
+        /** \name DRM Cgroup */
+	/*@{ */
+	struct mutex drmcg_mutex;
+	struct drmcg_props drmcg_props;
+	/*@} */
+
 	/* Everything below here is for legacy driver, never use! */
 	/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index cf13470810a5..1f65ac4d9bbf 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -715,6 +715,15 @@ struct drm_driver {
 			    struct drm_device *dev,
 			    uint32_t handle);
 
+	/**
+	 * @drmcg_custom_init
+	 *
+	 * Optional callback used to initialize drm cgroup per device properties
+	 * such as resource limit defaults.
+	 */
+	void (*drmcg_custom_init)(struct drm_device *dev,
+			struct drmcg_props *props);
+
 	/**
 	 * @gem_vm_ops: Driver private ops for this object
 	 *
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 854591bbb430..2014097eb75c 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -4,6 +4,7 @@
 #ifndef _CGROUP_DRM_H
 #define _CGROUP_DRM_H
 
+#include <linux/mutex.h>
 #include <linux/cgroup.h>
 #include <drm/drm_file.h>
 
@@ -12,11 +13,19 @@
 
 #ifdef CONFIG_CGROUP_DRM
 
+/**
+ * Per DRM cgroup, per device resources (such as statistics and limits)
+ */
+struct drmcg_device_resource {
+	/* for per device stats */
+};
+
 /**
  * The DRM cgroup controller data structure.
  */
 struct drmcg {
 	struct cgroup_subsys_state	css;
+	struct drmcg_device_resource	*dev_resources[MAX_DRM_DEV];
 };
 
 /**
@@ -70,6 +79,9 @@ static inline struct drmcg *drmcg_parent(struct drmcg *cg)
 
 #else /* CONFIG_CGROUP_DRM */
 
+struct drmcg_device_resource {
+};
+
 struct drmcg {
 };
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 37f98dc47268..30fd9aeffbe7 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -1,11 +1,17 @@
 // SPDX-License-Identifier: MIT
 // Copyright 2019 Advanced Micro Devices, Inc.
 #include <linux/bitmap.h>
+#include <linux/export.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/cgroup.h>
+#include <linux/fs.h>
+#include <linux/seq_file.h>
+#include <linux/mutex.h>
+#include <linux/kernel.h>
 #include <linux/cgroup_drm.h>
 #include <drm/drm_file.h>
+#include <drm/drm_drv.h>
 #include <drm/drm_device.h>
 #include <drm/drm_cgroup.h>
 
@@ -54,6 +60,47 @@ void drmcg_unbind(void)
 }
 EXPORT_SYMBOL(drmcg_unbind);
 
+/* caller must hold dev->drmcg_mutex */
+static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
+{
+	int minor = dev->primary->index;
+	struct drmcg_device_resource *ddr = drmcg->dev_resources[minor];
+
+	if (ddr == NULL) {
+		ddr = kzalloc(sizeof(struct drmcg_device_resource),
+			GFP_KERNEL);
+
+		if (!ddr)
+			return -ENOMEM;
+	}
+
+	drmcg->dev_resources[minor] = ddr;
+
+	/* set defaults here */
+
+	return 0;
+}
+
+static inline void drmcg_update_cg_tree(struct drm_device *dev)
+{
+	struct cgroup_subsys_state *pos;
+	struct drmcg *child;
+
+	if (root_drmcg == NULL)
+		return;
+
+	/* init cgroups created before registration (i.e. root cgroup) */
+
+	/* use cgroup_mutex instead of rcu_read_lock because
+	 * init_drmcg_single has alloc which may sleep */
+	mutex_lock(&cgroup_mutex);
+	css_for_each_descendant_pre(pos, &root_drmcg->css) {
+		child = css_to_drmcg(pos);
+		init_drmcg_single(child, dev);
+	}
+	mutex_unlock(&cgroup_mutex);
+}
+
 /**
  * drmcg_register_dev - register a DRM device for usage in drm cgroup
  * @dev: DRM device
@@ -71,6 +118,13 @@ void drmcg_register_dev(struct drm_device *dev)
 
 	mutex_lock(&drmcg_mutex);
 	set_bit(dev->primary->index, known_devs);
+
+	if (dev->driver->drmcg_custom_init)
+	{
+		dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
+
+		drmcg_update_cg_tree(dev);
+	}
 	mutex_unlock(&drmcg_mutex);
 }
 EXPORT_SYMBOL(drmcg_register_dev);
@@ -137,23 +191,61 @@ static int drm_minor_for_each(int (*fn)(int id, void *p, void *data),
 	return rc;
 }
 
+static int drmcg_css_free_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct drmcg *drmcg = data;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	kfree(drmcg->dev_resources[minor->index]);
+
+	return 0;
+}
+
 static void drmcg_css_free(struct cgroup_subsys_state *css)
 {
 	struct drmcg *drmcg = css_to_drmcg(css);
 
+	drm_minor_for_each(&drmcg_css_free_fn, drmcg);
+
 	kfree(drmcg);
 }
 
+static int init_drmcg_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct drmcg *drmcg = data;
+	int rc;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	mutex_lock(&minor->dev->drmcg_mutex);
+	rc = init_drmcg_single(drmcg, minor->dev);
+	mutex_unlock(&minor->dev->drmcg_mutex);
+
+	return rc;
+}
+
 static struct cgroup_subsys_state *
 drmcg_css_alloc(struct cgroup_subsys_state *parent_css)
 {
 	struct drmcg *parent = css_to_drmcg(parent_css);
 	struct drmcg *drmcg;
+	int rc;
 
 	drmcg = kzalloc(sizeof(struct drmcg), GFP_KERNEL);
 	if (!drmcg)
 		return ERR_PTR(-ENOMEM);
 
+	rc = drm_minor_for_each(&init_drmcg_fn, drmcg);
+	if (rc) {
+		drmcg_css_free(&drmcg->css);
+		return ERR_PTR(rc);
+	}
+
 	if (!parent)
 		root_drmcg = drmcg;
 
@@ -171,3 +263,16 @@ struct cgroup_subsys drm_cgrp_subsys = {
 	.legacy_cftypes	= files,
 	.dfl_cftypes	= files,
 };
+
+/**
+ * drmcg_device_early_init - initialize device specific resources for DRM cgroups
+ * @dev: the target DRM device
+ *
+ * Allocate and initialize device specific resources for existing DRM cgroups.
+ * Typically only the root cgroup exists before the initialization of @dev.
+ */
+void drmcg_device_early_init(struct drm_device *dev)
+{
+	drmcg_update_cg_tree(dev);
+}
+EXPORT_SYMBOL(drmcg_device_early_init);
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/11] drm, cgroup: Add total GEM buffer allocation stats
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (2 preceding siblings ...)
  2020-02-14 15:56 ` [PATCH 03/11] drm, cgroup: Initialize drmcg properties Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 05/11] drm, cgroup: Add peak " Kenny Ho
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

The drm resource being measured here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup stats per drm device.  Each allocation
is charged to the owning cgroup as well as all its ancestors.

Similar to the memory cgroup, migrating a process to a different cgroup
does not move the GEM buffer usages that the process started while in
previous cgroup, to the new cgroup.

The following is an example to illustrate some of the operations.  Given
the following cgroup hierarchy (The letters are cgroup names with R
being the root cgroup.  The numbers in brackets are processes.  The
processes are placed with cgroup's 'No Internal Process Constraint' in
mind, so no process is placed in cgroup B.)

R (4, 5) ------ A (6)
 \
  B ---- C (7,8)
   \
    D (9)

Here is a list of operation and the associated effect on the size
track by the cgroups (for simplicity, each buffer is 1 unit in size.)

==  ==  ==  ==  ==  ===================================================
R   A   B   C   D   Ops
==  ==  ==  ==  ==  ===================================================
1   0   0   0   0   4 allocated a buffer
1   0   0   0   0   4 shared a buffer with 5
1   0   0   0   0   4 shared a buffer with 9
2   0   1   0   1   9 allocated a buffer
3   0   2   1   1   7 allocated a buffer
3   0   2   1   1   7 shared a buffer with 8
3   0   2   1   1   7 sharing with 9
3   0   2   1   1   7 release a buffer
3   0   2   1   1   7 migrate to cgroup D
3   0   2   1   1   9 release a buffer from 7
2   0   1   0   1   8 release a buffer from 7 (last ref to shared buf)
==  ==  ==  ==  ==  ===================================================

drm.buffer.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Total GEM buffer allocation in bytes.

Change-Id: Ibc1f646ca7dbc588e2d11802b156b524696a23e7
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  50 +++++++++-
 drivers/gpu/drm/drm_gem.c               |   9 ++
 include/drm/drm_cgroup.h                |  16 +++
 include/drm/drm_gem.h                   |  10 ++
 include/linux/cgroup_drm.h              |   6 ++
 kernel/cgroup/drm.c                     | 126 ++++++++++++++++++++++++
 6 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 384db8df0f30..2d8162c109f3 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -63,6 +63,7 @@ v1 is available under Documentation/admin-guide/cgroup-v1/.
        5-7-1. RDMA Interface Files
      5-8. DRM
        5-8-1. DRM Interface Files
+       5-8-2. GEM Buffer Ownership
      5-9. Misc
        5-9-1. perf_event
      5-N. Non-normative information
@@ -2062,7 +2063,54 @@ of DRM (Direct Rendering Manager) and GPU-related resources.
 DRM Interface Files
 ~~~~~~~~~~~~~~~~~~~~
 
-TODO
+  drm.buffer.stats
+	A read-only flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Total GEM buffer allocation in bytes.
+
+GEM Buffer Ownership
+~~~~~~~~~~~~~~~~~~~~
+
+For the purpose of cgroup accounting and limiting, ownership of the
+buffer is deemed to be the cgroup for which the allocating process
+belongs to.  There is one cgroup stats per drm device.  Each allocation
+is charged to the owning cgroup as well as all its ancestors.
+
+Similar to the memory cgroup, migrating a process to a different cgroup
+does not move the GEM buffer usages that the process started while in
+previous cgroup, to the new cgroup.
+
+The following is an example to illustrate some of the operations.  Given
+the following cgroup hierarchy (The letters are cgroup names with R
+being the root cgroup.  The numbers in brackets are processes.  The
+processes are placed with cgroup's 'No Internal Process Constraint' in
+mind, so no process is placed in cgroup B.)
+
+R (4, 5) ------ A (6)
+ \
+  B ---- C (7,8)
+   \
+    D (9)
+
+Here is a list of operation and the associated effect on the size
+track by the cgroups (for simplicity, each buffer is 1 unit in size.)
+
+==  ==  ==  ==  ==  ===================================================
+R   A   B   C   D   Ops
+==  ==  ==  ==  ==  ===================================================
+1   0   0   0   0   4 allocated a buffer
+1   0   0   0   0   4 shared a buffer with 5
+1   0   0   0   0   4 shared a buffer with 9
+2   0   1   0   1   9 allocated a buffer
+3   0   2   1   1   7 allocated a buffer
+3   0   2   1   1   7 shared a buffer with 8
+3   0   2   1   1   7 sharing with 9
+3   0   2   1   1   7 release a buffer
+3   0   2   1   1   7 migrate to cgroup D
+3   0   2   1   1   9 release a buffer from 7
+2   0   1   0   1   8 release a buffer from 7 (last ref to shared buf)
+==  ==  ==  ==  ==  ===================================================
 
 
 Misc
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 2f2b889096b0..d158470edd98 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -38,6 +38,7 @@
 #include <linux/dma-buf.h>
 #include <linux/mem_encrypt.h>
 #include <linux/pagevec.h>
+#include <linux/cgroup_drm.h>
 
 #include <drm/drm.h>
 #include <drm/drm_device.h>
@@ -46,6 +47,7 @@
 #include <drm/drm_gem.h>
 #include <drm/drm_print.h>
 #include <drm/drm_vma_manager.h>
+#include <drm/drm_cgroup.h>
 
 #include "drm_internal.h"
 
@@ -164,6 +166,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
 		obj->resv = &obj->_resv;
 
 	drm_vma_node_reset(&obj->vma_node);
+
+	obj->drmcg = drmcg_get(current);
+	drmcg_chg_bo_alloc(obj->drmcg, dev, size);
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
@@ -957,6 +962,10 @@ drm_gem_object_release(struct drm_gem_object *obj)
 		fput(obj->filp);
 
 	dma_resv_fini(&obj->_resv);
+
+	drmcg_unchg_bo_alloc(obj->drmcg, obj->dev, obj->size);
+	drmcg_put(obj->drmcg);
+
 	drm_gem_free_mmap_offset(obj);
 }
 EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index fda426fba035..1eb3012e16a1 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -26,6 +26,12 @@ void drmcg_unregister_dev(struct drm_device *dev);
 
 void drmcg_device_early_init(struct drm_device *device);
 
+void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+		size_t size);
+
+void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+		size_t size);
+
 #else
 
 struct drmcg_props {
@@ -53,5 +59,15 @@ static inline void drmcg_device_early_init(struct drm_device *device)
 {
 }
 
+static inline void drmcg_chg_bo_alloc(struct drmcg *drmcg,
+		struct drm_device *dev,	size_t size)
+{
+}
+
+static inline void drmcg_unchg_bo_alloc(struct drmcg *drmcg,
+		struct drm_device *dev,	size_t size)
+{
+}
+
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 97a48165642c..6ac7018923f7 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -312,6 +312,16 @@ struct drm_gem_object {
 	 *
 	 */
 	const struct drm_gem_object_funcs *funcs;
+
+	/**
+	 * @drmcg:
+	 *
+	 * DRM cgroup this GEM object belongs to.
+	 *
+	 * This is used to track and limit the amount of GEM objects a user
+	 * can allocate.
+	 */
+	struct drmcg *drmcg;
 };
 
 /**
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 2014097eb75c..174ab50701ef 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -11,6 +11,11 @@
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
+enum drmcg_res_type {
+	DRMCG_TYPE_BO_TOTAL,
+	__DRMCG_TYPE_LAST,
+};
+
 #ifdef CONFIG_CGROUP_DRM
 
 /**
@@ -18,6 +23,7 @@
  */
 struct drmcg_device_resource {
 	/* for per device stats */
+	s64			bo_stats_total_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 30fd9aeffbe7..425566753a5c 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -13,6 +13,7 @@
 #include <drm/drm_file.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_device.h>
+#include <drm/drm_ioctl.h>
 #include <drm/drm_cgroup.h>
 
 static struct drmcg *root_drmcg __read_mostly;
@@ -26,6 +27,18 @@ static struct drm_minor (*(*acquire_drm_minor)(unsigned int minor_id));
 
 static void (*put_drm_dev)(struct drm_device *dev);
 
+#define DRMCG_CTF_PRIV_SIZE 3
+#define DRMCG_CTF_PRIV_MASK GENMASK((DRMCG_CTF_PRIV_SIZE - 1), 0)
+#define DRMCG_CTF_PRIV(res_type, f_type)  ((res_type) <<\
+		DRMCG_CTF_PRIV_SIZE | (f_type))
+#define DRMCG_CTF_PRIV2RESTYPE(priv) ((priv) >> DRMCG_CTF_PRIV_SIZE)
+#define DRMCG_CTF_PRIV2FTYPE(priv) ((priv) & DRMCG_CTF_PRIV_MASK)
+
+
+enum drmcg_file_type {
+	DRMCG_FTYPE_STATS,
+};
+
 /**
  * drmcg_bind - Bind DRM subsystem to cgroup subsystem
  * @acq_dm: function pointer to the drm_minor_acquire function
@@ -252,7 +265,66 @@ drmcg_css_alloc(struct cgroup_subsys_state *parent_css)
 	return &drmcg->css;
 }
 
+static void drmcg_print_stats(struct drmcg_device_resource *ddr,
+		struct seq_file *sf, enum drmcg_res_type type)
+{
+	if (ddr == NULL) {
+		seq_puts(sf, "\n");
+		return;
+	}
+
+	switch (type) {
+	case DRMCG_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
+static int drmcg_seq_show_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct seq_file *sf = data;
+	struct drmcg *drmcg = css_to_drmcg(seq_css(sf));
+	enum drmcg_file_type f_type =
+		DRMCG_CTF_PRIV2FTYPE(seq_cft(sf)->private);
+	enum drmcg_res_type type =
+		DRMCG_CTF_PRIV2RESTYPE(seq_cft(sf)->private);
+	struct drmcg_device_resource *ddr;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	ddr = drmcg->dev_resources[minor->index];
+
+	seq_printf(sf, "%d:%d ", DRM_MAJOR, minor->index);
+
+	switch (f_type) {
+	case DRMCG_FTYPE_STATS:
+		drmcg_print_stats(ddr, sf, type);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+
+	return 0;
+}
+
+int drmcg_seq_show(struct seq_file *sf, void *v)
+{
+	return drm_minor_for_each(&drmcg_seq_show_fn, sf);
+}
+
 struct cftype files[] = {
+	{
+		.name = "buffer.total.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
+						DRMCG_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -276,3 +348,57 @@ void drmcg_device_early_init(struct drm_device *dev)
 	drmcg_update_cg_tree(dev);
 }
 EXPORT_SYMBOL(drmcg_device_early_init);
+
+/**
+ * drmcg_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
+ * @drmcg: the DRM cgroup to be charged to
+ * @dev: the device the usage should be charged to
+ * @size: size of the GEM buffer to be accounted for
+ *
+ * This function should be called when a new GEM buffer is allocated to account
+ * for the utilization.  This should not be called when the buffer is shared (
+ * the GEM buffer's reference count being incremented.)
+ */
+void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+		size_t size)
+{
+	struct drmcg_device_resource *ddr;
+	int devIdx = dev->primary->index;
+
+	if (drmcg == NULL)
+		return;
+
+	mutex_lock(&dev->drmcg_mutex);
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+		ddr = drmcg->dev_resources[devIdx];
+
+		ddr->bo_stats_total_allocated += (s64)size;
+	}
+	mutex_unlock(&dev->drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_chg_bo_alloc);
+
+/**
+ * drmcg_unchg_bo_alloc -
+ * @drmcg: the DRM cgroup to uncharge from
+ * @dev: the device the usage should be removed from
+ * @size: size of the GEM buffer to be accounted for
+ *
+ * This function should be called when the GEM buffer is about to be freed (
+ * not simply when the GEM buffer's reference count is being decremented.)
+ */
+void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+		size_t size)
+{
+	int devIdx = dev->primary->index;
+
+	if (drmcg == NULL)
+		return;
+
+	mutex_lock(&dev->drmcg_mutex);
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg))
+		drmcg->dev_resources[devIdx]->bo_stats_total_allocated
+			-= (s64)size;
+	mutex_unlock(&dev->drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_unchg_bo_alloc);
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/11] drm, cgroup: Add peak GEM buffer allocation stats
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (3 preceding siblings ...)
  2020-02-14 15:56 ` [PATCH 04/11] drm, cgroup: Add total GEM buffer allocation stats Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 06/11] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

drm.buffer.peak.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Largest (high water mark) GEM buffer allocated in bytes.

Change-Id: I40fe4c13c1cea8613b3e04b802f3e1f19eaab4fc
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++++++
 include/linux/cgroup_drm.h              |  3 +++
 kernel/cgroup/drm.c                     | 12 ++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 2d8162c109f3..75b97962b127 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2069,6 +2069,12 @@ DRM Interface Files
 
 	Total GEM buffer allocation in bytes.
 
+  drm.buffer.peak.stats
+	A read-only flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Largest (high water mark) GEM buffer allocated in bytes.
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 174ab50701ef..593ad12602cd 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -13,6 +13,7 @@
 
 enum drmcg_res_type {
 	DRMCG_TYPE_BO_TOTAL,
+	DRMCG_TYPE_BO_PEAK,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -24,6 +25,8 @@ enum drmcg_res_type {
 struct drmcg_device_resource {
 	/* for per device stats */
 	s64			bo_stats_total_allocated;
+
+	s64			bo_stats_peak_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 425566753a5c..7a0da70c5a25 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -277,6 +277,9 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_TOTAL:
 		seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
 		break;
+	case DRMCG_TYPE_BO_PEAK:
+		seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -325,6 +328,12 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "buffer.peak.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+						DRMCG_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -373,6 +382,9 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		ddr = drmcg->dev_resources[devIdx];
 
 		ddr->bo_stats_total_allocated += (s64)size;
+
+		if (ddr->bo_stats_peak_allocated < (s64)size)
+			ddr->bo_stats_peak_allocated = (s64)size;
 	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/11] drm, cgroup: Add GEM buffer allocation count stats
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (4 preceding siblings ...)
  2020-02-14 15:56 ` [PATCH 05/11] drm, cgroup: Add peak " Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 07/11] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

drm.buffer.count.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Total number of GEM buffer allocated.

Change-Id: Iad29bdf44390dbcee07b1e72ea0ff811aa3b9dcd
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++++++
 include/linux/cgroup_drm.h              |  3 +++
 kernel/cgroup/drm.c                     | 22 +++++++++++++++++++---
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 75b97962b127..19fcf54ace83 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2075,6 +2075,12 @@ DRM Interface Files
 
 	Largest (high water mark) GEM buffer allocated in bytes.
 
+  drm.buffer.count.stats
+	A read-only flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Total number of GEM buffer allocated.
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 593ad12602cd..51a0cd37da92 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -14,6 +14,7 @@
 enum drmcg_res_type {
 	DRMCG_TYPE_BO_TOTAL,
 	DRMCG_TYPE_BO_PEAK,
+	DRMCG_TYPE_BO_COUNT,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -27,6 +28,8 @@ struct drmcg_device_resource {
 	s64			bo_stats_total_allocated;
 
 	s64			bo_stats_peak_allocated;
+
+	s64			bo_stats_count_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 7a0da70c5a25..bc162aa9971d 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -280,6 +280,9 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_PEAK:
 		seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
 		break;
+	case DRMCG_TYPE_BO_COUNT:
+		seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -334,6 +337,12 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "buffer.count.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
+						DRMCG_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -385,6 +394,8 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 
 		if (ddr->bo_stats_peak_allocated < (s64)size)
 			ddr->bo_stats_peak_allocated = (s64)size;
+
+		ddr->bo_stats_count_allocated++;
 	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
@@ -402,15 +413,20 @@ EXPORT_SYMBOL(drmcg_chg_bo_alloc);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size)
 {
+	struct drmcg_device_resource *ddr;
 	int devIdx = dev->primary->index;
 
 	if (drmcg == NULL)
 		return;
 
 	mutex_lock(&dev->drmcg_mutex);
-	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg))
-		drmcg->dev_resources[devIdx]->bo_stats_total_allocated
-			-= (s64)size;
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+		ddr = drmcg->dev_resources[devIdx];
+
+		ddr->bo_stats_total_allocated -= (s64)size;
+
+		ddr->bo_stats_count_allocated--;
+	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
 EXPORT_SYMBOL(drmcg_unchg_bo_alloc);
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/11] drm, cgroup: Add total GEM buffer allocation limit
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (5 preceding siblings ...)
  2020-02-14 15:56 ` [PATCH 06/11] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 08/11] drm, cgroup: Add peak " Kenny Ho
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

The drm resource being limited here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup limit per drm device.

The limiting functionality is added to the previous stats collection
function.  The drm_gem_private_object_init is modified to have a return
value to allow failure due to cgroup limit.

The try_chg function only fails if the DRM cgroup properties has
limit_enforced set to true for the DRM device.  This is to allow the DRM
cgroup controller to collect usage stats without enforcing the limits.

drm.buffer.default
        A read-only flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Default limits on the total GEM buffer allocation in bytes.

drm.buffer.max
        A read-write flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Per device limits on the total GEM buffer allocation in byte.
        This is a hard limit.  Attempts in allocating beyond the cgroup
        limit will result in ENOMEM.  Shorthand understood by memparse
        (such as k, m, g) can be used.

        Set allocation limit for /dev/dri/card1 to 1GB
        echo "226:1 1g" > drm.buffer.total.max

        Set allocation limit for /dev/dri/card0 to 512MB
        echo "226:0 512m" > drm.buffer.total.max

Change-Id: Id3265bbd0fafe84a16b59617df79bd32196160be
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst    |  21 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  19 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   6 +-
 drivers/gpu/drm/drm_gem.c                  |  11 +-
 include/drm/drm_cgroup.h                   |   8 +-
 include/drm/drm_gem.h                      |   2 +-
 include/linux/cgroup_drm.h                 |   1 +
 kernel/cgroup/drm.c                        | 227 ++++++++++++++++++++-
 8 files changed, 278 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 19fcf54ace83..064172df63e2 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2081,6 +2081,27 @@ DRM Interface Files
 
 	Total number of GEM buffer allocated.
 
+  drm.buffer.default
+	A read-only flat-keyed file which exists on the root cgroup.
+	Each entry is keyed by the drm device's major:minor.
+
+	Default limits on the total GEM buffer allocation in bytes.
+
+  drm.buffer.max
+	A read-write flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Per device limits on the total GEM buffer allocation in byte.
+	This is a hard limit.  Attempts in allocating beyond the cgroup
+	limit will result in ENOMEM.  Shorthand understood by memparse
+	(such as k, m, g) can be used.
+
+	Set allocation limit for /dev/dri/card1 to 1GB
+	echo "226:1 1g" > drm.buffer.total.max
+
+	Set allocation limit for /dev/dri/card0 to 512MB
+	echo "226:0 512m" > drm.buffer.total.max
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f28d040de3ce..3ebef1d62346 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1397,6 +1397,23 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
 						  stime, etime, mode);
 }
 
+#ifdef CONFIG_CGROUP_DRM
+
+static void amdgpu_drmcg_custom_init(struct drm_device *dev,
+	struct drmcg_props *props)
+{
+	props->limit_enforced = true;
+}
+
+#else
+
+static void amdgpu_drmcg_custom_init(struct drm_device *dev,
+	struct drmcg_props *props)
+{
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+
 static struct drm_driver kms_driver = {
 	.driver_features =
 	    DRIVER_USE_AGP | DRIVER_ATOMIC |
@@ -1430,6 +1447,8 @@ static struct drm_driver kms_driver = {
 	.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
 	.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
+	.drmcg_custom_init = amdgpu_drmcg_custom_init,
+
 	.name = DRIVER_NAME,
 	.desc = DRIVER_DESC,
 	.date = DRIVER_DATE,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 46c76e2e1281..b81c608cb2cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -34,6 +34,7 @@
 
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_cache.h>
+#include <drm/drm_cgroup.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -551,7 +552,10 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 	bo = kzalloc(sizeof(struct amdgpu_bo), GFP_KERNEL);
 	if (bo == NULL)
 		return -ENOMEM;
-	drm_gem_private_object_init(adev->ddev, &bo->tbo.base, size);
+	if (!drm_gem_private_object_init(adev->ddev, &bo->tbo.base, size)) {
+		kfree(bo);
+		return -ENOMEM;
+	}
 	INIT_LIST_HEAD(&bo->shadow_list);
 	bo->vm_bo = NULL;
 	bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d158470edd98..06e7576f1758 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -150,11 +150,17 @@ EXPORT_SYMBOL(drm_gem_object_init);
  * no GEM provided backing store. Instead the caller is responsible for
  * backing the object and handling it.
  */
-void drm_gem_private_object_init(struct drm_device *dev,
+bool drm_gem_private_object_init(struct drm_device *dev,
 				 struct drm_gem_object *obj, size_t size)
 {
 	BUG_ON((size & (PAGE_SIZE - 1)) != 0);
 
+	obj->drmcg = drmcg_get(current);
+	if (!drmcg_try_chg_bo_alloc(obj->drmcg, dev, size)) {
+		drmcg_put(obj->drmcg);
+		obj->drmcg = NULL;
+		return false;
+	}
 	obj->dev = dev;
 	obj->filp = NULL;
 
@@ -167,8 +173,7 @@ void drm_gem_private_object_init(struct drm_device *dev,
 
 	drm_vma_node_reset(&obj->vma_node);
 
-	obj->drmcg = drmcg_get(current);
-	drmcg_chg_bo_alloc(obj->drmcg, dev, size);
+	return true;
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 1eb3012e16a1..2783e56690db 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -13,6 +13,9 @@
  * of storing per device defaults
  */
 struct drmcg_props {
+	bool			limit_enforced;
+
+	s64			bo_limits_total_allocated_default;
 };
 
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
@@ -26,7 +29,7 @@ void drmcg_unregister_dev(struct drm_device *dev);
 
 void drmcg_device_early_init(struct drm_device *device);
 
-void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size);
 
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
@@ -59,9 +62,10 @@ static inline void drmcg_device_early_init(struct drm_device *device)
 {
 }
 
-static inline void drmcg_chg_bo_alloc(struct drmcg *drmcg,
+static inline bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg,
 		struct drm_device *dev,	size_t size)
 {
+	return true;
 }
 
 static inline void drmcg_unchg_bo_alloc(struct drmcg *drmcg,
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 6ac7018923f7..5748845c45eb 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -354,7 +354,7 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
 			struct drm_gem_object *obj, size_t size);
-void drm_gem_private_object_init(struct drm_device *dev,
+bool drm_gem_private_object_init(struct drm_device *dev,
 				 struct drm_gem_object *obj, size_t size);
 void drm_gem_vm_open(struct vm_area_struct *vma);
 void drm_gem_vm_close(struct vm_area_struct *vma);
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 51a0cd37da92..b03d90623763 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -26,6 +26,7 @@ enum drmcg_res_type {
 struct drmcg_device_resource {
 	/* for per device stats */
 	s64			bo_stats_total_allocated;
+	s64			bo_limits_total_allocated;
 
 	s64			bo_stats_peak_allocated;
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index bc162aa9971d..ee85482edd90 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -37,6 +37,8 @@ static void (*put_drm_dev)(struct drm_device *dev);
 
 enum drmcg_file_type {
 	DRMCG_FTYPE_STATS,
+	DRMCG_FTYPE_LIMIT,
+	DRMCG_FTYPE_DEFAULT,
 };
 
 /**
@@ -90,6 +92,8 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 	drmcg->dev_resources[minor] = ddr;
 
 	/* set defaults here */
+	ddr->bo_limits_total_allocated =
+		dev->drmcg_props.bo_limits_total_allocated_default;
 
 	return 0;
 }
@@ -289,6 +293,38 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 	}
 }
 
+static void drmcg_print_limits(struct drmcg_device_resource *ddr,
+		struct seq_file *sf, enum drmcg_res_type type)
+{
+	if (ddr == NULL) {
+		seq_puts(sf, "\n");
+		return;
+	}
+
+	switch (type) {
+	case DRMCG_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
+static void drmcg_print_default(struct drmcg_props *props,
+		struct seq_file *sf, enum drmcg_res_type type)
+{
+	switch (type) {
+	case DRMCG_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n",
+			props->bo_limits_total_allocated_default);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
 static int drmcg_seq_show_fn(int id, void *ptr, void *data)
 {
 	struct drm_minor *minor = ptr;
@@ -311,6 +347,12 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
 	case DRMCG_FTYPE_STATS:
 		drmcg_print_stats(ddr, sf, type);
 		break;
+	case DRMCG_FTYPE_LIMIT:
+		drmcg_print_limits(ddr, sf, type);
+		break;
+	case DRMCG_FTYPE_DEFAULT:
+		drmcg_print_default(&minor->dev->drmcg_props, sf, type);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -324,6 +366,130 @@ int drmcg_seq_show(struct seq_file *sf, void *v)
 	return drm_minor_for_each(&drmcg_seq_show_fn, sf);
 }
 
+static void drmcg_pr_cft_err(const struct drmcg *drmcg,
+		int rc, const char *cft_name, int minor)
+{
+	pr_err("drmcg: error parsing %s, minor %d, rc %d ",
+			cft_name, minor, rc);
+	pr_cont_cgroup_name(drmcg->css.cgroup);
+	pr_cont("\n");
+}
+
+static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
+			s64 def_val, s64 max_val, s64 *ret_val)
+{
+	int rc = strcmp("max", sval);
+
+
+	if (!rc)
+		*ret_val = max_val;
+	else {
+		rc = strcmp("default", sval);
+
+		if (!rc)
+			*ret_val = def_val;
+	}
+
+	if (rc) {
+		if (is_mem) {
+			*ret_val = memparse(sval, NULL);
+			rc = 0;
+		} else {
+			rc = kstrtoll(sval, 0, ret_val);
+		}
+	}
+
+	if (*ret_val > max_val)
+		rc = -EINVAL;
+
+	return rc;
+}
+
+/**
+ * drmcg_limit_write - parse cgroup interface files to obtain user config
+ *
+ * Minimal value check to keep track of user intent.  For example, user
+ * can specify limits greater than the values allowed by the parents.
+ * This way, the user configuration is kept and comes into effect if and
+ * when parents' limits are relaxed.
+ */
+static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
+		size_t nbytes, loff_t off)
+{
+	struct drmcg *drmcg = css_to_drmcg(of_css(of));
+	enum drmcg_res_type type =
+		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
+	char *cft_name = of_cft(of)->name;
+	char *limits = strstrip(buf);
+	struct drmcg_device_resource *ddr;
+	struct drmcg_props *props;
+	struct drm_minor *dm;
+	char *line;
+	char sattr[256];
+	s64 val;
+	int rc;
+	int minor;
+
+	while (limits != NULL) {
+		line =  strsep(&limits, "\n");
+
+		if (sscanf(line,
+			__stringify(DRM_MAJOR)":%u %255[^\t\n]",
+							&minor, sattr) != 2) {
+			pr_err("drmcg: error parsing %s ", cft_name);
+			pr_cont_cgroup_name(drmcg->css.cgroup);
+			pr_cont("\n");
+
+			continue;
+		}
+
+		mutex_lock(&drmcg_mutex);
+		if (acquire_drm_minor)
+			dm = acquire_drm_minor(minor);
+		else
+			dm = NULL;
+		mutex_unlock(&drmcg_mutex);
+
+		if (IS_ERR_OR_NULL(dm)) {
+			pr_err("drmcg: invalid minor %d for %s ",
+					minor, cft_name);
+			pr_cont_cgroup_name(drmcg->css.cgroup);
+			pr_cont("\n");
+
+			continue;
+		}
+
+		mutex_lock(&dm->dev->drmcg_mutex);
+		ddr = drmcg->dev_resources[minor];
+		props = &dm->dev->drmcg_props;
+		switch (type) {
+		case DRMCG_TYPE_BO_TOTAL:
+			rc = drmcg_process_limit_s64_val(sattr, true,
+				props->bo_limits_total_allocated_default,
+				S64_MAX,
+				&val);
+
+			if (rc || val < 0) {
+				drmcg_pr_cft_err(drmcg, rc, cft_name, minor);
+				break;
+			}
+
+			ddr->bo_limits_total_allocated = val;
+			break;
+		default:
+			break;
+		}
+		mutex_unlock(&dm->dev->drmcg_mutex);
+
+		mutex_lock(&drmcg_mutex);
+		if (put_drm_dev)
+			put_drm_dev(dm->dev); /* release from acquire */
+		mutex_unlock(&drmcg_mutex);
+	}
+
+	return nbytes;
+}
+
 struct cftype files[] = {
 	{
 		.name = "buffer.total.stats",
@@ -331,6 +497,20 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "buffer.total.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
+						DRMCG_FTYPE_DEFAULT),
+	},
+	{
+		.name = "buffer.total.max",
+		.write = drmcg_limit_write,
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
+						DRMCG_FTYPE_LIMIT),
+	},
 	{
 		.name = "buffer.peak.stats",
 		.seq_show = drmcg_seq_show,
@@ -363,12 +543,16 @@ struct cgroup_subsys drm_cgrp_subsys = {
  */
 void drmcg_device_early_init(struct drm_device *dev)
 {
+	dev->drmcg_props.limit_enforced = false;
+
+	dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
+
 	drmcg_update_cg_tree(dev);
 }
 EXPORT_SYMBOL(drmcg_device_early_init);
 
 /**
- * drmcg_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
+ * drmcg_try_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
  * @drmcg: the DRM cgroup to be charged to
  * @dev: the device the usage should be charged to
  * @size: size of the GEM buffer to be accounted for
@@ -377,29 +561,52 @@ EXPORT_SYMBOL(drmcg_device_early_init);
  * for the utilization.  This should not be called when the buffer is shared (
  * the GEM buffer's reference count being incremented.)
  */
-void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size)
 {
 	struct drmcg_device_resource *ddr;
 	int devIdx = dev->primary->index;
+	struct drmcg_props *props = &dev->drmcg_props;
+	struct drmcg *drmcg_cur = drmcg;
+	bool result = true;
+	s64 delta = 0;
 
 	if (drmcg == NULL)
-		return;
+		return true;
 
 	mutex_lock(&dev->drmcg_mutex);
-	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
-		ddr = drmcg->dev_resources[devIdx];
+	if (props->limit_enforced) {
+		for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+			ddr = drmcg->dev_resources[devIdx];
+			delta = ddr->bo_limits_total_allocated -
+					ddr->bo_stats_total_allocated;
+
+			if (delta <= 0 || size > delta) {
+				result = false;
+				break;
+			}
+		}
+	}
+
+	drmcg = drmcg_cur;
+
+	if (result || !props->limit_enforced) {
+		for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+			ddr = drmcg->dev_resources[devIdx];
 
-		ddr->bo_stats_total_allocated += (s64)size;
+			ddr->bo_stats_total_allocated += (s64)size;
 
-		if (ddr->bo_stats_peak_allocated < (s64)size)
-			ddr->bo_stats_peak_allocated = (s64)size;
+			if (ddr->bo_stats_peak_allocated < (s64)size)
+				ddr->bo_stats_peak_allocated = (s64)size;
 
-		ddr->bo_stats_count_allocated++;
+			ddr->bo_stats_count_allocated++;
+		}
 	}
 	mutex_unlock(&dev->drmcg_mutex);
+
+	return result;
 }
-EXPORT_SYMBOL(drmcg_chg_bo_alloc);
+EXPORT_SYMBOL(drmcg_try_chg_bo_alloc);
 
 /**
  * drmcg_unchg_bo_alloc -
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/11] drm, cgroup: Add peak GEM buffer allocation limit
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (6 preceding siblings ...)
  2020-02-14 15:56 ` [PATCH 07/11] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource Kenny Ho
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

drm.buffer.peak.default
        A read-only flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Default limits on the largest GEM buffer allocation in bytes.

drm.buffer.peak.max
        A read-write flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Per device limits on the largest GEM buffer allocation in bytes.
        This is a hard limit.  Attempts in allocating beyond the cgroup
        limit will result in ENOMEM.  Shorthand understood by memparse
        (such as k, m, g) can be used.

        Set largest allocation for /dev/dri/card1 to 4MB
        echo "226:1 4m" > drm.buffer.peak.max

Change-Id: I5ab3fb4a442b6cbd5db346be595897c90217da69
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 18 +++++++++++
 include/drm/drm_cgroup.h                |  1 +
 include/linux/cgroup_drm.h              |  1 +
 kernel/cgroup/drm.c                     | 43 +++++++++++++++++++++++++
 4 files changed, 63 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 064172df63e2..ce5dc027366a 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2102,6 +2102,24 @@ DRM Interface Files
 	Set allocation limit for /dev/dri/card0 to 512MB
 	echo "226:0 512m" > drm.buffer.total.max
 
+  drm.buffer.peak.default
+	A read-only flat-keyed file which exists on the root cgroup.
+	Each entry is keyed by the drm device's major:minor.
+
+	Default limits on the largest GEM buffer allocation in bytes.
+
+  drm.buffer.peak.max
+	A read-write flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Per device limits on the largest GEM buffer allocation in bytes.
+	This is a hard limit.  Attempts in allocating beyond the cgroup
+	limit will result in ENOMEM.  Shorthand understood by memparse
+	(such as k, m, g) can be used.
+
+	Set largest allocation for /dev/dri/card1 to 4MB
+	echo "226:1 4m" > drm.buffer.peak.max
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 2783e56690db..2b41d4d22e33 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -16,6 +16,7 @@ struct drmcg_props {
 	bool			limit_enforced;
 
 	s64			bo_limits_total_allocated_default;
+	s64			bo_limits_peak_allocated_default;
 };
 
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index b03d90623763..eae400f3d9b4 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -29,6 +29,7 @@ struct drmcg_device_resource {
 	s64			bo_limits_total_allocated;
 
 	s64			bo_stats_peak_allocated;
+	s64			bo_limits_peak_allocated;
 
 	s64			bo_stats_count_allocated;
 };
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index ee85482edd90..5fcbbc13fa1c 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -95,6 +95,9 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 	ddr->bo_limits_total_allocated =
 		dev->drmcg_props.bo_limits_total_allocated_default;
 
+	ddr->bo_limits_peak_allocated =
+		dev->drmcg_props.bo_limits_peak_allocated_default;
+
 	return 0;
 }
 
@@ -305,6 +308,9 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_TOTAL:
 		seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
 		break;
+	case DRMCG_TYPE_BO_PEAK:
+		seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -319,6 +325,10 @@ static void drmcg_print_default(struct drmcg_props *props,
 		seq_printf(sf, "%lld\n",
 			props->bo_limits_total_allocated_default);
 		break;
+	case DRMCG_TYPE_BO_PEAK:
+		seq_printf(sf, "%lld\n",
+			props->bo_limits_peak_allocated_default);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -476,6 +486,19 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 
 			ddr->bo_limits_total_allocated = val;
 			break;
+		case DRMCG_TYPE_BO_PEAK:
+			rc = drmcg_process_limit_s64_val(sattr, true,
+				props->bo_limits_peak_allocated_default,
+				S64_MAX,
+				&val);
+
+			if (rc || val < 0) {
+				drmcg_pr_cft_err(drmcg, rc, cft_name, minor);
+				break;
+			}
+
+			ddr->bo_limits_peak_allocated = val;
+			break;
 		default:
 			break;
 		}
@@ -517,6 +540,20 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "buffer.peak.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+						DRMCG_FTYPE_DEFAULT),
+	},
+	{
+		.name = "buffer.peak.max",
+		.write = drmcg_limit_write,
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+						DRMCG_FTYPE_LIMIT),
+	},
 	{
 		.name = "buffer.count.stats",
 		.seq_show = drmcg_seq_show,
@@ -546,6 +583,7 @@ void drmcg_device_early_init(struct drm_device *dev)
 	dev->drmcg_props.limit_enforced = false;
 
 	dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
+	dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
 
 	drmcg_update_cg_tree(dev);
 }
@@ -585,6 +623,11 @@ bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 				result = false;
 				break;
 			}
+
+			if (ddr->bo_limits_peak_allocated < size) {
+				result = false;
+				break;
+			}
 		}
 	}
 
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (7 preceding siblings ...)
  2020-02-14 15:56 ` [PATCH 08/11] drm, cgroup: Add peak " Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 16:44   ` Jason Ekstrand
  2020-02-14 15:56 ` [PATCH 10/11] drm, cgroup: add update trigger after limit change Kenny Ho
  2020-02-14 15:56 ` [PATCH 11/11] drm/amdgpu: Integrate with DRM cgroup Kenny Ho
  10 siblings, 1 reply; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

drm.lgpu
      A read-write nested-keyed file which exists on all cgroups.
      Each entry is keyed by the DRM device's major:minor.

      lgpu stands for logical GPU, it is an abstraction used to
      subdivide a physical DRM device for the purpose of resource
      management.  This file stores user configuration while the
      drm.lgpu.effective reflects the actual allocation after
      considering the relationship between the cgroups and their
      configurations.

      The lgpu is a discrete quantity that is device specific (i.e.
      some DRM devices may have 64 lgpus while others may have 100
      lgpus.)  The lgpu is a single quantity that can be allocated
      in three different ways denoted by the following nested keys.

        =====     ==============================================
        weight    Allocate by proportion in relationship with
                  active sibling cgroups
        count     Allocate by amount statically, treat lgpu as
                  anonymous resources
        list      Allocate statically, treat lgpu as named
                  resource
        =====     ==============================================

      For example:
      226:0 weight=100 count=256 list=0-255
      226:1 weight=100 count=4 list=0,2,4,6
      226:2 weight=100 count=32 list=32-63
      226:3 weight=100 count=0 list=
      226:4 weight=500 count=0 list=

      lgpu is represented by a bitmap and uses the bitmap_parselist
      kernel function so the list key input format is a
      comma-separated list of decimal numbers and ranges.

      Consecutively set bits are shown as two hyphen-separated decimal
      numbers, the smallest and largest bit numbers set in the range.
      Optionally each range can be postfixed to denote that only parts
      of it should be set.  The range will divided to groups of
      specific size.
      Syntax: range:used_size/group_size
      Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769

      The count key is the hamming weight / hweight of the bitmap.

      Weight, count and list accept the max and default keywords.

      Some DRM devices may only support lgpu as anonymous resources.
      In such case, the significance of the position of the set bits
      in list will be ignored.

      The weight quantity is only in effect when static allocation
      is not used (by setting count=0) for this cgroup.  The weight
      quantity distributes lgpus that are not statically allocated by
      the siblings.  For example, given siblings cgroupA, cgroupB and
      cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
      0-63, no lgpu is available to be distributed by weight.
      Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
      cgroupC will be starved if it tries to allocate by weight.

      On the other hand, if cgroupA has weight=100 count=0, cgroupB
      has list=16-47, and cgroupC has weight=100 count=0, then 32
      lgpus are available to be distributed evenly between cgroupA
      and cgroupC.  In drm.lgpu.effective, cgroupA will have
      list=0-15 and cgroupC will have list=48-63.

      This lgpu resource supports the 'allocation' and 'weight'
      resource distribution model.

drm.lgpu.effective
      A read-only nested-keyed file which exists on all cgroups.
      Each entry is keyed by the DRM device's major:minor.

      lgpu stands for logical GPU, it is an abstraction used to
      subdivide a physical DRM device for the purpose of resource
      management.  This file reflects the actual allocation after
      considering the relationship between the cgroups and their
      configurations in drm.lgpu.

Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  80 ++++++
 include/drm/drm_cgroup.h                |   3 +
 include/linux/cgroup_drm.h              |  22 ++
 kernel/cgroup/drm.c                     | 324 +++++++++++++++++++++++-
 4 files changed, 427 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index ce5dc027366a..d8a41956e5c7 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2120,6 +2120,86 @@ DRM Interface Files
 	Set largest allocation for /dev/dri/card1 to 4MB
 	echo "226:1 4m" > drm.buffer.peak.max
 
+  drm.lgpu
+	A read-write nested-keyed file which exists on all cgroups.
+	Each entry is keyed by the DRM device's major:minor.
+
+	lgpu stands for logical GPU, it is an abstraction used to
+	subdivide a physical DRM device for the purpose of resource
+	management.  This file stores user configuration while the
+        drm.lgpu.effective reflects the actual allocation after
+        considering the relationship between the cgroups and their
+        configurations.
+
+	The lgpu is a discrete quantity that is device specific (i.e.
+	some DRM devices may have 64 lgpus while others may have 100
+	lgpus.)  The lgpu is a single quantity that can be allocated
+        in three different ways denoted by the following nested keys.
+
+	  =====     ==============================================
+	  weight    Allocate by proportion in relationship with
+                    active sibling cgroups
+	  count     Allocate by amount statically, treat lgpu as
+                    anonymous resources
+	  list      Allocate statically, treat lgpu as named
+                    resource
+	  =====     ==============================================
+
+	For example:
+	226:0 weight=100 count=256 list=0-255
+	226:1 weight=100 count=4 list=0,2,4,6
+	226:2 weight=100 count=32 list=32-63
+	226:3 weight=100 count=0 list=
+	226:4 weight=500 count=0 list=
+
+	lgpu is represented by a bitmap and uses the bitmap_parselist
+	kernel function so the list key input format is a
+	comma-separated list of decimal numbers and ranges.
+
+	Consecutively set bits are shown as two hyphen-separated decimal
+	numbers, the smallest and largest bit numbers set in the range.
+	Optionally each range can be postfixed to denote that only parts
+	of it should be set.  The range will divided to groups of
+	specific size.
+	Syntax: range:used_size/group_size
+	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
+
+	The count key is the hamming weight / hweight of the bitmap.
+
+	Weight, count and list accept the max and default keywords.
+
+	Some DRM devices may only support lgpu as anonymous resources.
+	In such case, the significance of the position of the set bits
+	in list will be ignored.
+
+	The weight quantity is only in effect when static allocation
+	is not used (by setting count=0) for this cgroup.  The weight
+	quantity distributes lgpus that are not statically allocated by
+	the siblings.  For example, given siblings cgroupA, cgroupB and
+	cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
+	0-63, no lgpu is available to be distributed by weight.
+	Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
+	cgroupC will be starved if it tries to allocate by weight.
+
+	On the other hand, if cgroupA has weight=100 count=0, cgroupB
+	has list=16-47, and cgroupC has weight=100 count=0, then 32
+	lgpus are available to be distributed evenly between cgroupA
+	and cgroupC.  In drm.lgpu.effective, cgroupA will have
+	list=0-15 and cgroupC will have list=48-63.
+
+	This lgpu resource supports the 'allocation' and 'weight'
+	resource distribution model.
+
+  drm.lgpu.effective
+	A read-only nested-keyed file which exists on all cgroups.
+	Each entry is keyed by the DRM device's major:minor.
+
+	lgpu stands for logical GPU, it is an abstraction used to
+	subdivide a physical DRM device for the purpose of resource
+	management.  This file reflects the actual allocation after
+        considering the relationship between the cgroups and their
+        configurations in drm.lgpu.
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 2b41d4d22e33..619a110cc748 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -17,6 +17,9 @@ struct drmcg_props {
 
 	s64			bo_limits_total_allocated_default;
 	s64			bo_limits_peak_allocated_default;
+
+	int			lgpu_capacity;
+	DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
 };
 
 void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index eae400f3d9b4..bb09704e7f71 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -11,10 +11,14 @@
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
+#define MAX_DRMCG_LGPU_CAPACITY 256
+
 enum drmcg_res_type {
 	DRMCG_TYPE_BO_TOTAL,
 	DRMCG_TYPE_BO_PEAK,
 	DRMCG_TYPE_BO_COUNT,
+	DRMCG_TYPE_LGPU,
+	DRMCG_TYPE_LGPU_EFF,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -32,6 +36,24 @@ struct drmcg_device_resource {
 	s64			bo_limits_peak_allocated;
 
 	s64			bo_stats_count_allocated;
+
+	/**
+	 * Logical GPU
+	 *
+	 * *_cfg are properties configured by users
+	 * *_eff are the effective properties being applied to the hardware
+         * *_stg is used to calculate _eff before applying to _eff
+	 * after considering the entire hierarchy
+	 */
+	DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
+	/* user configurations */
+	s64			lgpu_weight_cfg;
+	DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
+	/* effective lgpu for the cgroup after considering
+	 * relationship with other cgroup
+	 */
+	s64			lgpu_count_eff;
+	DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 5fcbbc13fa1c..a4e88a3704bb 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -9,6 +9,7 @@
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
 #include <linux/kernel.h>
+#include <linux/bitmap.h>
 #include <linux/cgroup_drm.h>
 #include <drm/drm_file.h>
 #include <drm/drm_drv.h>
@@ -41,6 +42,10 @@ enum drmcg_file_type {
 	DRMCG_FTYPE_DEFAULT,
 };
 
+#define LGPU_LIMITS_NAME_LIST "list"
+#define LGPU_LIMITS_NAME_COUNT "count"
+#define LGPU_LIMITS_NAME_WEIGHT "weight"
+
 /**
  * drmcg_bind - Bind DRM subsystem to cgroup subsystem
  * @acq_dm: function pointer to the drm_minor_acquire function
@@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 	ddr->bo_limits_peak_allocated =
 		dev->drmcg_props.bo_limits_peak_allocated_default;
 
+	bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
+			MAX_DRMCG_LGPU_CAPACITY);
+	bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
+			MAX_DRMCG_LGPU_CAPACITY);
+
+	ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
+
 	return 0;
 }
 
@@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
 	mutex_unlock(&cgroup_mutex);
 }
 
+static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
+		const unsigned long *free_static,
+		const unsigned long *free_weighted,
+		struct drmcg *parent_drmcg)
+{
+	int capacity = dev->drmcg_props.lgpu_capacity;
+	DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
+	DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
+	struct drmcg_device_resource *parent_ddr;
+	struct drmcg_device_resource *ddr;
+	int minor = dev->primary->index;
+	struct cgroup_subsys_state *pos;
+	struct drmcg *child;
+	s64 weight_sum = 0;
+	s64 unused;
+
+	parent_ddr = parent_drmcg->dev_resources[minor];
+
+	if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
+		/* no static cfg, use weight for calculating the effective */
+		bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
+	else
+		/* lgpu statically configured, use the overlap as effective */
+		bitmap_and(parent_ddr->lgpu_stg, free_static,
+				parent_ddr->lgpu_cfg, capacity);
+
+	/* calculate lgpu available for distribution by weight for children */
+	bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
+	css_for_each_child(pos, &parent_drmcg->css) {
+		child = css_to_drmcg(pos);
+		ddr = child->dev_resources[minor];
+
+		if (bitmap_empty(ddr->lgpu_cfg, capacity))
+			/* no static allocation, participate in weight dist */
+			weight_sum += ddr->lgpu_weight_cfg;
+		else
+			/* take out statically allocated lgpu by siblings */
+			bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
+					capacity);
+	}
+
+	unused = bitmap_weight(lgpu_unused, capacity);
+
+	css_for_each_child(pos, &parent_drmcg->css) {
+		child = css_to_drmcg(pos);
+		ddr = child->dev_resources[minor];
+
+		bitmap_zero(lgpu_by_weight, capacity);
+		/* no static allocation, participate in weight distribution */
+		if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
+			int c;
+			int p = 0;
+
+			for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
+					c > 0; c--) {
+				p = find_next_bit(lgpu_unused, capacity, p);
+				if (p < capacity) {
+					clear_bit(p, lgpu_unused);
+					set_bit(p, lgpu_by_weight);
+				}
+			}
+
+		}
+
+		drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
+				lgpu_by_weight, child);
+	}
+}
+
+static void drmcg_apply_effective_lgpu(struct drm_device *dev)
+{
+	int capacity = dev->drmcg_props.lgpu_capacity;
+	int minor = dev->primary->index;
+	struct drmcg_device_resource *ddr;
+	struct cgroup_subsys_state *pos;
+	struct drmcg *drmcg;
+
+	if (root_drmcg == NULL) {
+		WARN_ON(root_drmcg == NULL);
+		return;
+	}
+
+	rcu_read_lock();
+
+	/* process the entire cgroup tree from root to simplify the algorithm */
+	drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
+			dev->drmcg_props.lgpu_slots, root_drmcg);
+
+	/* apply changes to effective only if there is a change */
+	css_for_each_descendant_pre(pos, &root_drmcg->css) {
+		drmcg = css_to_drmcg(pos);
+		ddr = drmcg->dev_resources[minor];
+
+		if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
+			bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
+			ddr->lgpu_count_eff =
+				bitmap_weight(ddr->lgpu_eff, capacity);
+		}
+	}
+	rcu_read_unlock();
+}
+
+static void drmcg_apply_effective(enum drmcg_res_type type,
+		struct drm_device *dev, struct drmcg *changed_drmcg)
+{
+	switch (type) {
+	case DRMCG_TYPE_LGPU:
+		drmcg_apply_effective_lgpu(dev);
+		break;
+	default:
+		break;
+	}
+}
+
 /**
  * drmcg_register_dev - register a DRM device for usage in drm cgroup
  * @dev: DRM device
@@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
 	{
 		dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
 
+		WARN_ON(dev->drmcg_props.lgpu_capacity !=
+				bitmap_weight(dev->drmcg_props.lgpu_slots,
+					MAX_DRMCG_LGPU_CAPACITY));
+
 		drmcg_update_cg_tree(dev);
+
+		drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
 	}
 	mutex_unlock(&drmcg_mutex);
 }
@@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 }
 
 static void drmcg_print_limits(struct drmcg_device_resource *ddr,
-		struct seq_file *sf, enum drmcg_res_type type)
+		struct seq_file *sf, enum drmcg_res_type type,
+		struct drm_device *dev)
 {
 	if (ddr == NULL) {
 		seq_puts(sf, "\n");
@@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_PEAK:
 		seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
 		break;
+	case DRMCG_TYPE_LGPU:
+		seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
+				LGPU_LIMITS_NAME_WEIGHT,
+				ddr->lgpu_weight_cfg,
+				LGPU_LIMITS_NAME_COUNT,
+				bitmap_weight(ddr->lgpu_cfg,
+					dev->drmcg_props.lgpu_capacity),
+				LGPU_LIMITS_NAME_LIST,
+				dev->drmcg_props.lgpu_capacity,
+				ddr->lgpu_cfg);
+		break;
+	case DRMCG_TYPE_LGPU_EFF:
+		seq_printf(sf, "%s=%lld %s=%*pbl\n",
+				LGPU_LIMITS_NAME_COUNT,
+				ddr->lgpu_count_eff,
+				LGPU_LIMITS_NAME_LIST,
+				dev->drmcg_props.lgpu_capacity,
+				ddr->lgpu_eff);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
 		seq_printf(sf, "%lld\n",
 			props->bo_limits_peak_allocated_default);
 		break;
+	case DRMCG_TYPE_LGPU:
+		seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
+				LGPU_LIMITS_NAME_WEIGHT,
+				CGROUP_WEIGHT_DFL,
+				LGPU_LIMITS_NAME_COUNT,
+				bitmap_weight(props->lgpu_slots,
+					props->lgpu_capacity),
+				LGPU_LIMITS_NAME_LIST,
+				props->lgpu_capacity,
+				props->lgpu_slots);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
 		drmcg_print_stats(ddr, sf, type);
 		break;
 	case DRMCG_FTYPE_LIMIT:
-		drmcg_print_limits(ddr, sf, type);
+		drmcg_print_limits(ddr, sf, type, minor->dev);
 		break;
 	case DRMCG_FTYPE_DEFAULT:
 		drmcg_print_default(&minor->dev->drmcg_props, sf, type);
@@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
 	return rc;
 }
 
+static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
+		struct drm_device *dev, char *attrs)
+{
+	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
+	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
+	enum drmcg_res_type type =
+		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
+	struct drmcg *drmcg = css_to_drmcg(of_css(of));
+	struct drmcg_props *props = &dev->drmcg_props;
+	char *cft_name = of_cft(of)->name;
+	int minor = dev->primary->index;
+	char *nested = strstrip(attrs);
+	struct drmcg_device_resource *ddr =
+		drmcg->dev_resources[minor];
+	char *attr;
+	char sname[256];
+	char sval[256];
+	s64 val;
+	int rc;
+
+	while (nested != NULL) {
+		attr = strsep(&nested, " ");
+
+		if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
+			continue;
+
+		switch (type) {
+		case DRMCG_TYPE_LGPU:
+			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
+				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
+				strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
+				continue;
+
+			if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
+					(!strcmp("max", sval) ||
+					!strcmp("default", sval))) {
+				bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
+						props->lgpu_capacity);
+
+				continue;
+			}
+
+			if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
+				rc = drmcg_process_limit_s64_val(sval,
+					false, CGROUP_WEIGHT_DFL,
+					CGROUP_WEIGHT_MAX, &val);
+
+				if (rc || val < CGROUP_WEIGHT_MIN ||
+						val > CGROUP_WEIGHT_MAX) {
+					drmcg_pr_cft_err(drmcg, rc, cft_name,
+							minor);
+					continue;
+				}
+
+				ddr->lgpu_weight_cfg = val;
+				continue;
+			}
+
+			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
+				rc = drmcg_process_limit_s64_val(sval,
+					false, props->lgpu_capacity,
+					props->lgpu_capacity, &val);
+
+				if (rc || val < 0) {
+					drmcg_pr_cft_err(drmcg, rc, cft_name,
+							minor);
+					continue;
+				}
+
+				bitmap_zero(tmp_bitmap,
+						MAX_DRMCG_LGPU_CAPACITY);
+				bitmap_set(tmp_bitmap, 0, val);
+			}
+
+			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
+				rc = bitmap_parselist(sval, tmp_bitmap,
+						MAX_DRMCG_LGPU_CAPACITY);
+
+				if (rc) {
+					drmcg_pr_cft_err(drmcg, rc, cft_name,
+							minor);
+					continue;
+				}
+
+				bitmap_andnot(chk_bitmap, tmp_bitmap,
+					props->lgpu_slots,
+					MAX_DRMCG_LGPU_CAPACITY);
+
+				/* user setting does not intersect with
+				 * available lgpu */
+				if (!bitmap_empty(chk_bitmap,
+						MAX_DRMCG_LGPU_CAPACITY)) {
+					drmcg_pr_cft_err(drmcg, 0, cft_name,
+							minor);
+					continue;
+				}
+			}
+
+			bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
+					props->lgpu_capacity);
+
+			break; /* DRMCG_TYPE_LGPU */
+		default:
+			break;
+		} /* switch (type) */
+	}
+}
+
+
 /**
  * drmcg_limit_write - parse cgroup interface files to obtain user config
  *
@@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 
 			ddr->bo_limits_peak_allocated = val;
 			break;
+		case DRMCG_TYPE_LGPU:
+			drmcg_nested_limit_parse(of, dm->dev, sattr);
+			break;
 		default:
 			break;
 		}
+
+		drmcg_apply_effective(type, dm->dev, drmcg);
+
 		mutex_unlock(&dm->dev->drmcg_mutex);
 
 		mutex_lock(&drmcg_mutex);
@@ -560,12 +838,51 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "lgpu",
+		.seq_show = drmcg_seq_show,
+		.write = drmcg_limit_write,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
+						DRMCG_FTYPE_LIMIT),
+	},
+	{
+		.name = "lgpu.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
+						DRMCG_FTYPE_DEFAULT),
+	},
+	{
+		.name = "lgpu.effective",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
+						DRMCG_FTYPE_LIMIT),
+	},
 	{ }	/* terminate */
 };
 
+static int drmcg_online_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct drmcg *drmcg = data;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
+
+	return 0;
+}
+
+static int drmcg_css_online(struct cgroup_subsys_state *css)
+{
+	return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
+}
+
 struct cgroup_subsys drm_cgrp_subsys = {
 	.css_alloc	= drmcg_css_alloc,
 	.css_free	= drmcg_css_free,
+	.css_online	= drmcg_css_online,
 	.early_init	= false,
 	.legacy_cftypes	= files,
 	.dfl_cftypes	= files,
@@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
 	dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
 	dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
 
+	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
+	bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
+
 	drmcg_update_cg_tree(dev);
 }
 EXPORT_SYMBOL(drmcg_device_early_init);
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/11] drm, cgroup: add update trigger after limit change
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (8 preceding siblings ...)
  2020-02-14 15:56 ` [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  2020-02-14 15:56 ` [PATCH 11/11] drm/amdgpu: Integrate with DRM cgroup Kenny Ho
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

Before this commit, drmcg limits are updated but enforcement is delayed
until the next time the driver check against the new limit.  While this
is sufficient for certain resources, a more proactive enforcement may be
needed for other resources.

Introducing an optional drmcg_limit_updated callback for the DRM
drivers.  When defined, it will be called in two scenarios:
1) When limits are updated for a particular cgroup, the callback will be
triggered for each task in the updated cgroup.
2) When a task is migrated from one cgroup to another, the callback will
be triggered for each resource type for the migrated task.

Change-Id: I0ce7c4e5a04c31bd0f8d9853a383575d4bc9a3fa
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 include/drm/drm_drv.h | 10 ++++++++
 kernel/cgroup/drm.c   | 59 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 1f65ac4d9bbf..e7333143e722 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -724,6 +724,16 @@ struct drm_driver {
 	void (*drmcg_custom_init)(struct drm_device *dev,
 			struct drmcg_props *props);
 
+	/**
+	 * @drmcg_limit_updated
+	 *
+	 * Optional callback
+	 */
+	void (*drmcg_limit_updated)(struct drm_device *dev,
+			struct task_struct *task,
+			struct drmcg_device_resource *ddr,
+			enum drmcg_res_type res_type);
+
 	/**
 	 * @gem_vm_ops: Driver private ops for this object
 	 *
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index a4e88a3704bb..d3fa23b71f5f 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -133,6 +133,26 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
 	mutex_unlock(&cgroup_mutex);
 }
 
+static void drmcg_limit_updated(struct drm_device *dev, struct drmcg *drmcg,
+		enum drmcg_res_type res_type)
+{
+	struct drmcg_device_resource *ddr =
+		drmcg->dev_resources[dev->primary->index];
+	struct css_task_iter it;
+	struct task_struct *task;
+
+	if (dev->driver->drmcg_limit_updated == NULL)
+		return;
+
+	css_task_iter_start(&drmcg->css.cgroup->self,
+			CSS_TASK_ITER_PROCS, &it);
+	while ((task = css_task_iter_next(&it))) {
+		dev->driver->drmcg_limit_updated(dev, task,
+				ddr, res_type);
+	}
+	css_task_iter_end(&it);
+}
+
 static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
 		const unsigned long *free_static,
 		const unsigned long *free_weighted,
@@ -230,6 +250,8 @@ static void drmcg_apply_effective_lgpu(struct drm_device *dev)
 			bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
 			ddr->lgpu_count_eff =
 				bitmap_weight(ddr->lgpu_eff, capacity);
+
+			drmcg_limit_updated(dev, drmcg, DRMCG_TYPE_LGPU);
 		}
 	}
 	rcu_read_unlock();
@@ -686,7 +708,6 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
 	}
 }
 
-
 /**
  * drmcg_limit_write - parse cgroup interface files to obtain user config
  *
@@ -879,10 +900,46 @@ static int drmcg_css_online(struct cgroup_subsys_state *css)
 	return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
 }
 
+static int drmcg_attach_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct task_struct *task = data;
+	struct drm_device *dev;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	dev = minor->dev;
+
+	if (dev->driver->drmcg_limit_updated) {
+		struct drmcg *drmcg = drmcg_get(task);
+		struct drmcg_device_resource *ddr =
+			drmcg->dev_resources[minor->index];
+		enum drmcg_res_type type;
+
+		for (type = 0; type < __DRMCG_TYPE_LAST; type++)
+			dev->driver->drmcg_limit_updated(dev, task, ddr, type);
+
+		drmcg_put(drmcg);
+	}
+
+	return 0;
+}
+
+static void drmcg_attach(struct cgroup_taskset *tset)
+{
+	struct task_struct *task;
+	struct cgroup_subsys_state *css;
+
+	cgroup_taskset_for_each(task, css, tset)
+		drm_minor_for_each(&drmcg_attach_fn, task);
+}
+
 struct cgroup_subsys drm_cgrp_subsys = {
 	.css_alloc	= drmcg_css_alloc,
 	.css_free	= drmcg_css_free,
 	.css_online	= drmcg_css_online,
+	.attach		= drmcg_attach,
 	.early_init	= false,
 	.legacy_cftypes	= files,
 	.dfl_cftypes	= files,
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/11] drm/amdgpu: Integrate with DRM cgroup
  2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (9 preceding siblings ...)
  2020-02-14 15:56 ` [PATCH 10/11] drm, cgroup: add update trigger after limit change Kenny Ho
@ 2020-02-14 15:56 ` Kenny Ho
  10 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 15:56 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel, nirmoy.das, damon.mcdougall, juan.zuniga-anaya
  Cc: Kenny Ho

The number of logical gpu (lgpu) is defined to be the number of compute
unit (CU) for a device.  The lgpu allocation limit only applies to
compute workload for the moment (enforced via kfd queue creation.)  Any
cu_mask update is validated against the availability of the compute unit
as defined by the drmcg the kfd process belongs to.

Change-Id: I2930e76ef9ac6d36d0feb81f604c89a4208e6614
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  29 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   6 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +
 .../amd/amdkfd/kfd_process_queue_manager.c    | 153 ++++++++++++++++++
 5 files changed, 195 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 47b0f2957d1f..a45c7b5d23b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
 		valid;							\
 	})
 
+int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
+		struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
+		unsigned int nbits);
+
 /* GPUVM API */
 int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
 					void **vm, void **process_info,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 3ebef1d62346..dc31b9af2c72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1402,9 +1402,31 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
 static void amdgpu_drmcg_custom_init(struct drm_device *dev,
 	struct drmcg_props *props)
 {
+	struct amdgpu_device *adev = dev->dev_private;
+
+	props->lgpu_capacity = adev->gfx.cu_info.number;
+	bitmap_zero(props->lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
+	bitmap_fill(props->lgpu_slots, props->lgpu_capacity);
+
 	props->limit_enforced = true;
 }
 
+static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
+		struct task_struct *task, struct drmcg_device_resource *ddr,
+		enum drmcg_res_type res_type)
+{
+	struct amdgpu_device *adev = dev->dev_private;
+
+	switch (res_type) {
+	case DRMCG_TYPE_LGPU:
+		amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
+                        ddr->lgpu_eff, dev->drmcg_props.lgpu_capacity);
+		break;
+	default:
+		break;
+	}
+}
+
 #else
 
 static void amdgpu_drmcg_custom_init(struct drm_device *dev,
@@ -1412,6 +1434,12 @@ static void amdgpu_drmcg_custom_init(struct drm_device *dev,
 {
 }
 
+static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
+		struct task_struct *task, struct drmcg_device_resource *ddr,
+		enum drmcg_res_type res_type)
+{
+}
+
 #endif /* CONFIG_CGROUP_DRM */
 
 static struct drm_driver kms_driver = {
@@ -1448,6 +1476,7 @@ static struct drm_driver kms_driver = {
 	.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
 	.drmcg_custom_init = amdgpu_drmcg_custom_init,
+	.drmcg_limit_updated = amdgpu_drmcg_limit_updated,
 
 	.name = DRIVER_NAME,
 	.desc = DRIVER_DESC,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 275f79ab0900..f39555c0f1d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -449,6 +449,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
 		return -EFAULT;
 	}
 
+	if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, cu_mask_size)) {
+		pr_debug("CU mask not permitted by DRM Cgroup");
+		kfree(properties.cu_mask);
+		return -EACCES;
+	}
+
 	mutex_lock(&p->mutex);
 
 	retval = pqm_set_cu_mask(&p->pqm, args->queue_id, &properties);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index c0b0defc8f7a..9053b1b7fb10 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -921,6 +921,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 		       u32 *ctl_stack_used_size,
 		       u32 *save_area_used_size);
 
+bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
+		unsigned int cu_mask_size);
+
 int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
 			      unsigned int fence_value,
 			      unsigned int timeout_ms);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 8fa856e6a03f..ff71b208d320 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -23,9 +23,11 @@
 
 #include <linux/slab.h>
 #include <linux/list.h>
+#include <linux/cgroup_drm.h>
 #include "kfd_device_queue_manager.h"
 #include "kfd_priv.h"
 #include "kfd_kernel_queue.h"
+#include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 
 static inline struct process_queue_node *get_queue_by_qid(
@@ -167,6 +169,7 @@ static int init_user_queue(struct process_queue_manager *pqm,
 				struct queue_properties *q_properties,
 				struct file *f, unsigned int qid)
 {
+	struct drmcg *drmcg;
 	int retval;
 
 	/* Doorbell initialized in user space*/
@@ -180,6 +183,37 @@ static int init_user_queue(struct process_queue_manager *pqm,
 	if (retval != 0)
 		return retval;
 
+#ifdef CONFIG_CGROUP_DRM
+	drmcg = drmcg_get(pqm->process->lead_thread);
+	if (drmcg) {
+		struct amdgpu_device *adev;
+		struct drmcg_device_resource *ddr;
+		int mask_size;
+		u32 *mask;
+
+		adev = (struct amdgpu_device *) dev->kgd;
+
+		mask_size = adev->ddev->drmcg_props.lgpu_capacity;
+		mask = kzalloc(sizeof(u32) * round_up(mask_size, 32),
+				GFP_KERNEL);
+
+		if (!mask) {
+			drmcg_put(drmcg);
+			uninit_queue(*q);
+			return -ENOMEM;
+		}
+
+		ddr = drmcg->dev_resources[adev->ddev->primary->index];
+
+		bitmap_to_arr32(mask, ddr->lgpu_eff, mask_size);
+
+		(*q)->properties.cu_mask_count = mask_size;
+		(*q)->properties.cu_mask = mask;
+
+		drmcg_put(drmcg);
+	}
+#endif /* CONFIG_CGROUP_DRM */
+
 	(*q)->device = dev;
 	(*q)->process = pqm->process;
 
@@ -508,6 +542,125 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 						       save_area_used_size);
 }
 
+#ifdef CONFIG_CGROUP_DRM
+
+bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
+		unsigned int cu_mask_size)
+{
+	DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY);
+	struct drmcg_device_resource *ddr;
+	struct process_queue_node *pqn;
+	struct amdgpu_device *adev;
+	struct drmcg *drmcg;
+	bool result;
+
+	if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY)
+		return false;
+
+	bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size);
+
+	pqn = get_queue_by_qid(&p->pqm, qid);
+	if (!pqn)
+		return false;
+
+	adev = (struct amdgpu_device *)pqn->q->device->kgd;
+
+	drmcg = drmcg_get(p->lead_thread);
+	ddr = drmcg->dev_resources[adev->ddev->primary->index];
+
+	if (bitmap_subset(curr_mask, ddr->lgpu_eff,
+				MAX_DRMCG_LGPU_CAPACITY))
+		result = true;
+	else
+		result = false;
+
+	drmcg_put(drmcg);
+
+	return result;
+}
+
+#else
+
+bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
+		unsigned int cu_mask_size)
+{
+	return true;
+}
+
+#endif /* CONFIG_CGROUP_DRM */
+
+int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
+		struct amdgpu_device *adev, unsigned long *lgpu_bm,
+		unsigned int lgpu_bm_size)
+{
+	struct kfd_dev *kdev = adev->kfd.dev;
+	struct process_queue_node *pqn;
+	struct kfd_process *kfdproc;
+	size_t size_in_bytes;
+	u32 *cu_mask;
+	int rc = 0;
+
+	if ((lgpu_bm_size % 32) != 0) {
+		pr_warn("lgpu_bm_size %d must be a multiple of 32",
+				lgpu_bm_size);
+		return -EINVAL;
+	}
+
+	kfdproc = kfd_get_process(task);
+
+	if (IS_ERR(kfdproc))
+		return -ESRCH;
+
+	size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32);
+
+	mutex_lock(&kfdproc->mutex);
+	list_for_each_entry(pqn, &kfdproc->pqm.queues, process_queue_list) {
+		if (pqn->q && pqn->q->device == kdev) {
+			/* update cu_mask accordingly */
+			cu_mask = kzalloc(size_in_bytes, GFP_KERNEL);
+			if (!cu_mask) {
+				rc = -ENOMEM;
+				break;
+			}
+
+			if (pqn->q->properties.cu_mask) {
+				DECLARE_BITMAP(curr_mask,
+						MAX_DRMCG_LGPU_CAPACITY);
+
+				if (pqn->q->properties.cu_mask_count >
+						lgpu_bm_size) {
+					rc = -EINVAL;
+					kfree(cu_mask);
+					break;
+				}
+
+				bitmap_from_arr32(curr_mask,
+						pqn->q->properties.cu_mask,
+						pqn->q->properties.cu_mask_count);
+
+				bitmap_and(curr_mask, curr_mask, lgpu_bm,
+						lgpu_bm_size);
+
+				bitmap_to_arr32(cu_mask, curr_mask,
+						lgpu_bm_size);
+
+				kfree(curr_mask);
+			} else
+				bitmap_to_arr32(cu_mask, lgpu_bm,
+						lgpu_bm_size);
+
+			pqn->q->properties.cu_mask = cu_mask;
+			pqn->q->properties.cu_mask_count = lgpu_bm_size;
+
+			rc = pqn->q->device->dqm->ops.update_queue(
+					pqn->q->device->dqm, pqn->q);
+		}
+	}
+	mutex_unlock(&kfdproc->mutex);
+
+	return rc;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 int pqm_debugfs_mqds(struct seq_file *m, void *data)
-- 
2.25.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 15:56 ` [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource Kenny Ho
@ 2020-02-14 16:44   ` Jason Ekstrand
  2020-02-14 16:59     ` Jason Ekstrand
  2020-02-14 17:08     ` Kenny Ho
  0 siblings, 2 replies; 26+ messages in thread
From: Jason Ekstrand @ 2020-02-14 16:44 UTC (permalink / raw)
  To: Kenny Ho
  Cc: juan.zuniga-anaya, felix.kuehling, jsparks, amd-gfx mailing list,
	lkaplan, alexander.deucher, nirmoy.das, y2kenny,
	Maling list - DRI developers, joseph.greathouse, tj, cgroups,
	Christian König, damon.mcdougall


[-- Attachment #1.1: Type: text/plain, Size: 30035 bytes --]

Pardon my ignorance but I'm a bit confused by this.  What is a "logical
GPU"?  What are we subdividing?  Are we carving up memory?  Compute power?
Both?

If it's carving up memory, why aren't we just measuring it in megabytes?

If it's carving up compute power, what's actually being carved up?  Time?
Execution units/waves/threads?  Even if that's the case, what advantage
does it give to have it in terms of a fixed set of lgpus where each cgroup
gets to pick a fixed set.  Does affinity matter that much?  Why not just
say how many waves the GPU supports and that they have to be allocated in
chunks of 16 waves (pulling a number out of thin air) and let the cgroup
specify how many waves it wants.

Don't get me wrong here.  I'm all for the notion of being able to use
cgroups to carve up GPU compute resources.  However, this sounds to me like
the most AMD-specific solution possible.  We (Intel) could probably do some
sort of carving up as well but we'd likely want to do it with preemption
and time-slicing rather than handing out specific EUs.

--Jason


On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho@amd.com> wrote:

> drm.lgpu
>       A read-write nested-keyed file which exists on all cgroups.
>       Each entry is keyed by the DRM device's major:minor.
>
>       lgpu stands for logical GPU, it is an abstraction used to
>       subdivide a physical DRM device for the purpose of resource
>       management.  This file stores user configuration while the
>       drm.lgpu.effective reflects the actual allocation after
>       considering the relationship between the cgroups and their
>       configurations.
>
>       The lgpu is a discrete quantity that is device specific (i.e.
>       some DRM devices may have 64 lgpus while others may have 100
>       lgpus.)  The lgpu is a single quantity that can be allocated
>       in three different ways denoted by the following nested keys.
>
>         =====     ==============================================
>         weight    Allocate by proportion in relationship with
>                   active sibling cgroups
>         count     Allocate by amount statically, treat lgpu as
>                   anonymous resources
>         list      Allocate statically, treat lgpu as named
>                   resource
>         =====     ==============================================
>
>       For example:
>       226:0 weight=100 count=256 list=0-255
>       226:1 weight=100 count=4 list=0,2,4,6
>       226:2 weight=100 count=32 list=32-63
>       226:3 weight=100 count=0 list=
>       226:4 weight=500 count=0 list=
>
>       lgpu is represented by a bitmap and uses the bitmap_parselist
>       kernel function so the list key input format is a
>       comma-separated list of decimal numbers and ranges.
>
>       Consecutively set bits are shown as two hyphen-separated decimal
>       numbers, the smallest and largest bit numbers set in the range.
>       Optionally each range can be postfixed to denote that only parts
>       of it should be set.  The range will divided to groups of
>       specific size.
>       Syntax: range:used_size/group_size
>       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>
>       The count key is the hamming weight / hweight of the bitmap.
>
>       Weight, count and list accept the max and default keywords.
>
>       Some DRM devices may only support lgpu as anonymous resources.
>       In such case, the significance of the position of the set bits
>       in list will be ignored.
>
>       The weight quantity is only in effect when static allocation
>       is not used (by setting count=0) for this cgroup.  The weight
>       quantity distributes lgpus that are not statically allocated by
>       the siblings.  For example, given siblings cgroupA, cgroupB and
>       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
>       0-63, no lgpu is available to be distributed by weight.
>       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
>       cgroupC will be starved if it tries to allocate by weight.
>
>       On the other hand, if cgroupA has weight=100 count=0, cgroupB
>       has list=16-47, and cgroupC has weight=100 count=0, then 32
>       lgpus are available to be distributed evenly between cgroupA
>       and cgroupC.  In drm.lgpu.effective, cgroupA will have
>       list=0-15 and cgroupC will have list=48-63.
>
>       This lgpu resource supports the 'allocation' and 'weight'
>       resource distribution model.
>
> drm.lgpu.effective
>       A read-only nested-keyed file which exists on all cgroups.
>       Each entry is keyed by the DRM device's major:minor.
>
>       lgpu stands for logical GPU, it is an abstraction used to
>       subdivide a physical DRM device for the purpose of resource
>       management.  This file reflects the actual allocation after
>       considering the relationship between the cgroups and their
>       configurations in drm.lgpu.
>
> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  80 ++++++
>  include/drm/drm_cgroup.h                |   3 +
>  include/linux/cgroup_drm.h              |  22 ++
>  kernel/cgroup/drm.c                     | 324 +++++++++++++++++++++++-
>  4 files changed, 427 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst
> b/Documentation/admin-guide/cgroup-v2.rst
> index ce5dc027366a..d8a41956e5c7 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -2120,6 +2120,86 @@ DRM Interface Files
>         Set largest allocation for /dev/dri/card1 to 4MB
>         echo "226:1 4m" > drm.buffer.peak.max
>
> +  drm.lgpu
> +       A read-write nested-keyed file which exists on all cgroups.
> +       Each entry is keyed by the DRM device's major:minor.
> +
> +       lgpu stands for logical GPU, it is an abstraction used to
> +       subdivide a physical DRM device for the purpose of resource
> +       management.  This file stores user configuration while the
> +        drm.lgpu.effective reflects the actual allocation after
> +        considering the relationship between the cgroups and their
> +        configurations.
> +
> +       The lgpu is a discrete quantity that is device specific (i.e.
> +       some DRM devices may have 64 lgpus while others may have 100
> +       lgpus.)  The lgpu is a single quantity that can be allocated
> +        in three different ways denoted by the following nested keys.
> +
> +         =====     ==============================================
> +         weight    Allocate by proportion in relationship with
> +                    active sibling cgroups
> +         count     Allocate by amount statically, treat lgpu as
> +                    anonymous resources
> +         list      Allocate statically, treat lgpu as named
> +                    resource
> +         =====     ==============================================
> +
> +       For example:
> +       226:0 weight=100 count=256 list=0-255
> +       226:1 weight=100 count=4 list=0,2,4,6
> +       226:2 weight=100 count=32 list=32-63
> +       226:3 weight=100 count=0 list=
> +       226:4 weight=500 count=0 list=
> +
> +       lgpu is represented by a bitmap and uses the bitmap_parselist
> +       kernel function so the list key input format is a
> +       comma-separated list of decimal numbers and ranges.
> +
> +       Consecutively set bits are shown as two hyphen-separated decimal
> +       numbers, the smallest and largest bit numbers set in the range.
> +       Optionally each range can be postfixed to denote that only parts
> +       of it should be set.  The range will divided to groups of
> +       specific size.
> +       Syntax: range:used_size/group_size
> +       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> +
> +       The count key is the hamming weight / hweight of the bitmap.
> +
> +       Weight, count and list accept the max and default keywords.
> +
> +       Some DRM devices may only support lgpu as anonymous resources.
> +       In such case, the significance of the position of the set bits
> +       in list will be ignored.
> +
> +       The weight quantity is only in effect when static allocation
> +       is not used (by setting count=0) for this cgroup.  The weight
> +       quantity distributes lgpus that are not statically allocated by
> +       the siblings.  For example, given siblings cgroupA, cgroupB and
> +       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> +       0-63, no lgpu is available to be distributed by weight.
> +       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> +       cgroupC will be starved if it tries to allocate by weight.
> +
> +       On the other hand, if cgroupA has weight=100 count=0, cgroupB
> +       has list=16-47, and cgroupC has weight=100 count=0, then 32
> +       lgpus are available to be distributed evenly between cgroupA
> +       and cgroupC.  In drm.lgpu.effective, cgroupA will have
> +       list=0-15 and cgroupC will have list=48-63.
> +
> +       This lgpu resource supports the 'allocation' and 'weight'
> +       resource distribution model.
> +
> +  drm.lgpu.effective
> +       A read-only nested-keyed file which exists on all cgroups.
> +       Each entry is keyed by the DRM device's major:minor.
> +
> +       lgpu stands for logical GPU, it is an abstraction used to
> +       subdivide a physical DRM device for the purpose of resource
> +       management.  This file reflects the actual allocation after
> +        considering the relationship between the cgroups and their
> +        configurations in drm.lgpu.
> +
>  GEM Buffer Ownership
>  ~~~~~~~~~~~~~~~~~~~~
>
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> index 2b41d4d22e33..619a110cc748 100644
> --- a/include/drm/drm_cgroup.h
> +++ b/include/drm/drm_cgroup.h
> @@ -17,6 +17,9 @@ struct drmcg_props {
>
>         s64                     bo_limits_total_allocated_default;
>         s64                     bo_limits_peak_allocated_default;
> +
> +       int                     lgpu_capacity;
> +       DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>  };
>
>  void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> index eae400f3d9b4..bb09704e7f71 100644
> --- a/include/linux/cgroup_drm.h
> +++ b/include/linux/cgroup_drm.h
> @@ -11,10 +11,14 @@
>  /* limit defined per the way drm_minor_alloc operates */
>  #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
>
> +#define MAX_DRMCG_LGPU_CAPACITY 256
> +
>  enum drmcg_res_type {
>         DRMCG_TYPE_BO_TOTAL,
>         DRMCG_TYPE_BO_PEAK,
>         DRMCG_TYPE_BO_COUNT,
> +       DRMCG_TYPE_LGPU,
> +       DRMCG_TYPE_LGPU_EFF,
>         __DRMCG_TYPE_LAST,
>  };
>
> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
>         s64                     bo_limits_peak_allocated;
>
>         s64                     bo_stats_count_allocated;
> +
> +       /**
> +        * Logical GPU
> +        *
> +        * *_cfg are properties configured by users
> +        * *_eff are the effective properties being applied to the hardware
> +         * *_stg is used to calculate _eff before applying to _eff
> +        * after considering the entire hierarchy
> +        */
> +       DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
> +       /* user configurations */
> +       s64                     lgpu_weight_cfg;
> +       DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
> +       /* effective lgpu for the cgroup after considering
> +        * relationship with other cgroup
> +        */
> +       s64                     lgpu_count_eff;
> +       DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
>  };
>
>  /**
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 5fcbbc13fa1c..a4e88a3704bb 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -9,6 +9,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/mutex.h>
>  #include <linux/kernel.h>
> +#include <linux/bitmap.h>
>  #include <linux/cgroup_drm.h>
>  #include <drm/drm_file.h>
>  #include <drm/drm_drv.h>
> @@ -41,6 +42,10 @@ enum drmcg_file_type {
>         DRMCG_FTYPE_DEFAULT,
>  };
>
> +#define LGPU_LIMITS_NAME_LIST "list"
> +#define LGPU_LIMITS_NAME_COUNT "count"
> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
> +
>  /**
>   * drmcg_bind - Bind DRM subsystem to cgroup subsystem
>   * @acq_dm: function pointer to the drm_minor_acquire function
> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg
> *drmcg, struct drm_device *dev)
>         ddr->bo_limits_peak_allocated =
>                 dev->drmcg_props.bo_limits_peak_allocated_default;
>
> +       bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
> +                       MAX_DRMCG_LGPU_CAPACITY);
> +       bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
> +                       MAX_DRMCG_LGPU_CAPACITY);
> +
> +       ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
> +
>         return 0;
>  }
>
> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct
> drm_device *dev)
>         mutex_unlock(&cgroup_mutex);
>  }
>
> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
> +               const unsigned long *free_static,
> +               const unsigned long *free_weighted,
> +               struct drmcg *parent_drmcg)
> +{
> +       int capacity = dev->drmcg_props.lgpu_capacity;
> +       DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
> +       DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
> +       struct drmcg_device_resource *parent_ddr;
> +       struct drmcg_device_resource *ddr;
> +       int minor = dev->primary->index;
> +       struct cgroup_subsys_state *pos;
> +       struct drmcg *child;
> +       s64 weight_sum = 0;
> +       s64 unused;
> +
> +       parent_ddr = parent_drmcg->dev_resources[minor];
> +
> +       if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
> +               /* no static cfg, use weight for calculating the effective
> */
> +               bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
> +       else
> +               /* lgpu statically configured, use the overlap as
> effective */
> +               bitmap_and(parent_ddr->lgpu_stg, free_static,
> +                               parent_ddr->lgpu_cfg, capacity);
> +
> +       /* calculate lgpu available for distribution by weight for
> children */
> +       bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
> +       css_for_each_child(pos, &parent_drmcg->css) {
> +               child = css_to_drmcg(pos);
> +               ddr = child->dev_resources[minor];
> +
> +               if (bitmap_empty(ddr->lgpu_cfg, capacity))
> +                       /* no static allocation, participate in weight
> dist */
> +                       weight_sum += ddr->lgpu_weight_cfg;
> +               else
> +                       /* take out statically allocated lgpu by siblings
> */
> +                       bitmap_andnot(lgpu_unused, lgpu_unused,
> ddr->lgpu_cfg,
> +                                       capacity);
> +       }
> +
> +       unused = bitmap_weight(lgpu_unused, capacity);
> +
> +       css_for_each_child(pos, &parent_drmcg->css) {
> +               child = css_to_drmcg(pos);
> +               ddr = child->dev_resources[minor];
> +
> +               bitmap_zero(lgpu_by_weight, capacity);
> +               /* no static allocation, participate in weight
> distribution */
> +               if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
> +                       int c;
> +                       int p = 0;
> +
> +                       for (c = ddr->lgpu_weight_cfg * unused /
> weight_sum;
> +                                       c > 0; c--) {
> +                               p = find_next_bit(lgpu_unused, capacity,
> p);
> +                               if (p < capacity) {
> +                                       clear_bit(p, lgpu_unused);
> +                                       set_bit(p, lgpu_by_weight);
> +                               }
> +                       }
> +
> +               }
> +
> +               drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
> +                               lgpu_by_weight, child);
> +       }
> +}
> +
> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
> +{
> +       int capacity = dev->drmcg_props.lgpu_capacity;
> +       int minor = dev->primary->index;
> +       struct drmcg_device_resource *ddr;
> +       struct cgroup_subsys_state *pos;
> +       struct drmcg *drmcg;
> +
> +       if (root_drmcg == NULL) {
> +               WARN_ON(root_drmcg == NULL);
> +               return;
> +       }
> +
> +       rcu_read_lock();
> +
> +       /* process the entire cgroup tree from root to simplify the
> algorithm */
> +       drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
> +                       dev->drmcg_props.lgpu_slots, root_drmcg);
> +
> +       /* apply changes to effective only if there is a change */
> +       css_for_each_descendant_pre(pos, &root_drmcg->css) {
> +               drmcg = css_to_drmcg(pos);
> +               ddr = drmcg->dev_resources[minor];
> +
> +               if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity))
> {
> +                       bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg,
> capacity);
> +                       ddr->lgpu_count_eff =
> +                               bitmap_weight(ddr->lgpu_eff, capacity);
> +               }
> +       }
> +       rcu_read_unlock();
> +}
> +
> +static void drmcg_apply_effective(enum drmcg_res_type type,
> +               struct drm_device *dev, struct drmcg *changed_drmcg)
> +{
> +       switch (type) {
> +       case DRMCG_TYPE_LGPU:
> +               drmcg_apply_effective_lgpu(dev);
> +               break;
> +       default:
> +               break;
> +       }
> +}
> +
>  /**
>   * drmcg_register_dev - register a DRM device for usage in drm cgroup
>   * @dev: DRM device
> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
>         {
>                 dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
>
> +               WARN_ON(dev->drmcg_props.lgpu_capacity !=
> +                               bitmap_weight(dev->drmcg_props.lgpu_slots,
> +                                       MAX_DRMCG_LGPU_CAPACITY));
> +
>                 drmcg_update_cg_tree(dev);
> +
> +               drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
>         }
>         mutex_unlock(&drmcg_mutex);
>  }
> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct
> drmcg_device_resource *ddr,
>  }
>
>  static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> -               struct seq_file *sf, enum drmcg_res_type type)
> +               struct seq_file *sf, enum drmcg_res_type type,
> +               struct drm_device *dev)
>  {
>         if (ddr == NULL) {
>                 seq_puts(sf, "\n");
> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct
> drmcg_device_resource *ddr,
>         case DRMCG_TYPE_BO_PEAK:
>                 seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
>                 break;
> +       case DRMCG_TYPE_LGPU:
> +               seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
> +                               LGPU_LIMITS_NAME_WEIGHT,
> +                               ddr->lgpu_weight_cfg,
> +                               LGPU_LIMITS_NAME_COUNT,
> +                               bitmap_weight(ddr->lgpu_cfg,
> +                                       dev->drmcg_props.lgpu_capacity),
> +                               LGPU_LIMITS_NAME_LIST,
> +                               dev->drmcg_props.lgpu_capacity,
> +                               ddr->lgpu_cfg);
> +               break;
> +       case DRMCG_TYPE_LGPU_EFF:
> +               seq_printf(sf, "%s=%lld %s=%*pbl\n",
> +                               LGPU_LIMITS_NAME_COUNT,
> +                               ddr->lgpu_count_eff,
> +                               LGPU_LIMITS_NAME_LIST,
> +                               dev->drmcg_props.lgpu_capacity,
> +                               ddr->lgpu_eff);
> +               break;
>         default:
>                 seq_puts(sf, "\n");
>                 break;
> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props
> *props,
>                 seq_printf(sf, "%lld\n",
>                         props->bo_limits_peak_allocated_default);
>                 break;
> +       case DRMCG_TYPE_LGPU:
> +               seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
> +                               LGPU_LIMITS_NAME_WEIGHT,
> +                               CGROUP_WEIGHT_DFL,
> +                               LGPU_LIMITS_NAME_COUNT,
> +                               bitmap_weight(props->lgpu_slots,
> +                                       props->lgpu_capacity),
> +                               LGPU_LIMITS_NAME_LIST,
> +                               props->lgpu_capacity,
> +                               props->lgpu_slots);
> +               break;
>         default:
>                 seq_puts(sf, "\n");
>                 break;
> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void
> *data)
>                 drmcg_print_stats(ddr, sf, type);
>                 break;
>         case DRMCG_FTYPE_LIMIT:
> -               drmcg_print_limits(ddr, sf, type);
> +               drmcg_print_limits(ddr, sf, type, minor->dev);
>                 break;
>         case DRMCG_FTYPE_DEFAULT:
>                 drmcg_print_default(&minor->dev->drmcg_props, sf, type);
> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval,
> bool is_mem,
>         return rc;
>  }
>
> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> +               struct drm_device *dev, char *attrs)
> +{
> +       DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> +       DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> +       enum drmcg_res_type type =
> +               DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> +       struct drmcg *drmcg = css_to_drmcg(of_css(of));
> +       struct drmcg_props *props = &dev->drmcg_props;
> +       char *cft_name = of_cft(of)->name;
> +       int minor = dev->primary->index;
> +       char *nested = strstrip(attrs);
> +       struct drmcg_device_resource *ddr =
> +               drmcg->dev_resources[minor];
> +       char *attr;
> +       char sname[256];
> +       char sval[256];
> +       s64 val;
> +       int rc;
> +
> +       while (nested != NULL) {
> +               attr = strsep(&nested, " ");
> +
> +               if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
> +                       continue;
> +
> +               switch (type) {
> +               case DRMCG_TYPE_LGPU:
> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> +                               strncmp(sname, LGPU_LIMITS_NAME_COUNT,
> 256) &&
> +                               strncmp(sname, LGPU_LIMITS_NAME_WEIGHT,
> 256))
> +                               continue;
> +
> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
> +                                       (!strcmp("max", sval) ||
> +                                       !strcmp("default", sval))) {
> +                               bitmap_copy(ddr->lgpu_cfg,
> props->lgpu_slots,
> +                                               props->lgpu_capacity);
> +
> +                               continue;
> +                       }
> +
> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256)
> == 0) {
> +                               rc = drmcg_process_limit_s64_val(sval,
> +                                       false, CGROUP_WEIGHT_DFL,
> +                                       CGROUP_WEIGHT_MAX, &val);
> +
> +                               if (rc || val < CGROUP_WEIGHT_MIN ||
> +                                               val > CGROUP_WEIGHT_MAX) {
> +                                       drmcg_pr_cft_err(drmcg, rc,
> cft_name,
> +                                                       minor);
> +                                       continue;
> +                               }
> +
> +                               ddr->lgpu_weight_cfg = val;
> +                               continue;
> +                       }
> +
> +                       if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) ==
> 0) {
> +                               rc = drmcg_process_limit_s64_val(sval,
> +                                       false, props->lgpu_capacity,
> +                                       props->lgpu_capacity, &val);
> +
> +                               if (rc || val < 0) {
> +                                       drmcg_pr_cft_err(drmcg, rc,
> cft_name,
> +                                                       minor);
> +                                       continue;
> +                               }
> +
> +                               bitmap_zero(tmp_bitmap,
> +                                               MAX_DRMCG_LGPU_CAPACITY);
> +                               bitmap_set(tmp_bitmap, 0, val);
> +                       }
> +
> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) ==
> 0) {
> +                               rc = bitmap_parselist(sval, tmp_bitmap,
> +                                               MAX_DRMCG_LGPU_CAPACITY);
> +
> +                               if (rc) {
> +                                       drmcg_pr_cft_err(drmcg, rc,
> cft_name,
> +                                                       minor);
> +                                       continue;
> +                               }
> +
> +                               bitmap_andnot(chk_bitmap, tmp_bitmap,
> +                                       props->lgpu_slots,
> +                                       MAX_DRMCG_LGPU_CAPACITY);
> +
> +                               /* user setting does not intersect with
> +                                * available lgpu */
> +                               if (!bitmap_empty(chk_bitmap,
> +                                               MAX_DRMCG_LGPU_CAPACITY)) {
> +                                       drmcg_pr_cft_err(drmcg, 0,
> cft_name,
> +                                                       minor);
> +                                       continue;
> +                               }
> +                       }
> +
> +                       bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
> +                                       props->lgpu_capacity);
> +
> +                       break; /* DRMCG_TYPE_LGPU */
> +               default:
> +                       break;
> +               } /* switch (type) */
> +       }
> +}
> +
> +
>  /**
>   * drmcg_limit_write - parse cgroup interface files to obtain user config
>   *
> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct
> kernfs_open_file *of, char *buf,
>
>                         ddr->bo_limits_peak_allocated = val;
>                         break;
> +               case DRMCG_TYPE_LGPU:
> +                       drmcg_nested_limit_parse(of, dm->dev, sattr);
> +                       break;
>                 default:
>                         break;
>                 }
> +
> +               drmcg_apply_effective(type, dm->dev, drmcg);
> +
>                 mutex_unlock(&dm->dev->drmcg_mutex);
>
>                 mutex_lock(&drmcg_mutex);
> @@ -560,12 +838,51 @@ struct cftype files[] = {
>                 .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
>                                                 DRMCG_FTYPE_STATS),
>         },
> +       {
> +               .name = "lgpu",
> +               .seq_show = drmcg_seq_show,
> +               .write = drmcg_limit_write,
> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> +                                               DRMCG_FTYPE_LIMIT),
> +       },
> +       {
> +               .name = "lgpu.default",
> +               .seq_show = drmcg_seq_show,
> +               .flags = CFTYPE_ONLY_ON_ROOT,
> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> +                                               DRMCG_FTYPE_DEFAULT),
> +       },
> +       {
> +               .name = "lgpu.effective",
> +               .seq_show = drmcg_seq_show,
> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
> +                                               DRMCG_FTYPE_LIMIT),
> +       },
>         { }     /* terminate */
>  };
>
> +static int drmcg_online_fn(int id, void *ptr, void *data)
> +{
> +       struct drm_minor *minor = ptr;
> +       struct drmcg *drmcg = data;
> +
> +       if (minor->type != DRM_MINOR_PRIMARY)
> +               return 0;
> +
> +       drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
> +
> +       return 0;
> +}
> +
> +static int drmcg_css_online(struct cgroup_subsys_state *css)
> +{
> +       return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
> +}
> +
>  struct cgroup_subsys drm_cgrp_subsys = {
>         .css_alloc      = drmcg_css_alloc,
>         .css_free       = drmcg_css_free,
> +       .css_online     = drmcg_css_online,
>         .early_init     = false,
>         .legacy_cftypes = files,
>         .dfl_cftypes    = files,
> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
>         dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
>         dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
>
> +       dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> +       bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> +
>         drmcg_update_cg_tree(dev);
>  }
>  EXPORT_SYMBOL(drmcg_device_early_init);
> --
> 2.25.0
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>

[-- Attachment #1.2: Type: text/html, Size: 37241 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 16:44   ` Jason Ekstrand
@ 2020-02-14 16:59     ` Jason Ekstrand
  2020-02-14 17:08     ` Kenny Ho
  1 sibling, 0 replies; 26+ messages in thread
From: Jason Ekstrand @ 2020-02-14 16:59 UTC (permalink / raw)
  To: Kenny Ho
  Cc: juan.zuniga-anaya, felix.kuehling, jsparks, amd-gfx mailing list,
	lkaplan, alexander.deucher, nirmoy.das, y2kenny,
	Maling list - DRI developers, joseph.greathouse, tj, cgroups,
	Christian König, damon.mcdougall

On Fri, Feb 14, 2020 at 10:44 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> Pardon my ignorance but I'm a bit confused by this.  What is a "logical GPU"?  What are we subdividing?  Are we carving up memory?  Compute power?  Both?
>
> If it's carving up memory, why aren't we just measuring it in megabytes?
>
> If it's carving up compute power, what's actually being carved up?  Time?  Execution units/waves/threads?  Even if that's the case, what advantage does it give to have it in terms of a fixed set of lgpus where each cgroup gets to pick a fixed set.  Does affinity matter that much?  Why not just say how many waves the GPU supports and that they have to be allocated in chunks of 16 waves (pulling a number out of thin air) and let the cgroup specify how many waves it wants.

One more question:  If I'm a userspace driver, and there are 14 lgpus
allocated to my cgroup, does that mean I have 14 GPUs?  Or does that
mean I have one GPU with 14 units of compute power?

> Don't get me wrong here.  I'm all for the notion of being able to use cgroups to carve up GPU compute resources.  However, this sounds to me like the most AMD-specific solution possible.  We (Intel) could probably do some sort of carving up as well but we'd likely want to do it with preemption and time-slicing rather than handing out specific EUs.

Ok, so "most AMD-specific solution possible" probably wasn't fair.
However, it does seem like an unnecessarily rigid solution to me.
Maybe there's something I'm not getting?

--Jason

> --Jason
>
>
> On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho@amd.com> wrote:
>>
>> drm.lgpu
>>       A read-write nested-keyed file which exists on all cgroups.
>>       Each entry is keyed by the DRM device's major:minor.
>>
>>       lgpu stands for logical GPU, it is an abstraction used to
>>       subdivide a physical DRM device for the purpose of resource
>>       management.  This file stores user configuration while the
>>       drm.lgpu.effective reflects the actual allocation after
>>       considering the relationship between the cgroups and their
>>       configurations.
>>
>>       The lgpu is a discrete quantity that is device specific (i.e.
>>       some DRM devices may have 64 lgpus while others may have 100
>>       lgpus.)  The lgpu is a single quantity that can be allocated
>>       in three different ways denoted by the following nested keys.
>>
>>         =====     ==============================================
>>         weight    Allocate by proportion in relationship with
>>                   active sibling cgroups
>>         count     Allocate by amount statically, treat lgpu as
>>                   anonymous resources
>>         list      Allocate statically, treat lgpu as named
>>                   resource
>>         =====     ==============================================
>>
>>       For example:
>>       226:0 weight=100 count=256 list=0-255
>>       226:1 weight=100 count=4 list=0,2,4,6
>>       226:2 weight=100 count=32 list=32-63
>>       226:3 weight=100 count=0 list=
>>       226:4 weight=500 count=0 list=
>>
>>       lgpu is represented by a bitmap and uses the bitmap_parselist
>>       kernel function so the list key input format is a
>>       comma-separated list of decimal numbers and ranges.
>>
>>       Consecutively set bits are shown as two hyphen-separated decimal
>>       numbers, the smallest and largest bit numbers set in the range.
>>       Optionally each range can be postfixed to denote that only parts
>>       of it should be set.  The range will divided to groups of
>>       specific size.
>>       Syntax: range:used_size/group_size
>>       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>>
>>       The count key is the hamming weight / hweight of the bitmap.
>>
>>       Weight, count and list accept the max and default keywords.
>>
>>       Some DRM devices may only support lgpu as anonymous resources.
>>       In such case, the significance of the position of the set bits
>>       in list will be ignored.
>>
>>       The weight quantity is only in effect when static allocation
>>       is not used (by setting count=0) for this cgroup.  The weight
>>       quantity distributes lgpus that are not statically allocated by
>>       the siblings.  For example, given siblings cgroupA, cgroupB and
>>       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
>>       0-63, no lgpu is available to be distributed by weight.
>>       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
>>       cgroupC will be starved if it tries to allocate by weight.
>>
>>       On the other hand, if cgroupA has weight=100 count=0, cgroupB
>>       has list=16-47, and cgroupC has weight=100 count=0, then 32
>>       lgpus are available to be distributed evenly between cgroupA
>>       and cgroupC.  In drm.lgpu.effective, cgroupA will have
>>       list=0-15 and cgroupC will have list=48-63.
>>
>>       This lgpu resource supports the 'allocation' and 'weight'
>>       resource distribution model.
>>
>> drm.lgpu.effective
>>       A read-only nested-keyed file which exists on all cgroups.
>>       Each entry is keyed by the DRM device's major:minor.
>>
>>       lgpu stands for logical GPU, it is an abstraction used to
>>       subdivide a physical DRM device for the purpose of resource
>>       management.  This file reflects the actual allocation after
>>       considering the relationship between the cgroups and their
>>       configurations in drm.lgpu.
>>
>> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
>> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
>> ---
>>  Documentation/admin-guide/cgroup-v2.rst |  80 ++++++
>>  include/drm/drm_cgroup.h                |   3 +
>>  include/linux/cgroup_drm.h              |  22 ++
>>  kernel/cgroup/drm.c                     | 324 +++++++++++++++++++++++-
>>  4 files changed, 427 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index ce5dc027366a..d8a41956e5c7 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -2120,6 +2120,86 @@ DRM Interface Files
>>         Set largest allocation for /dev/dri/card1 to 4MB
>>         echo "226:1 4m" > drm.buffer.peak.max
>>
>> +  drm.lgpu
>> +       A read-write nested-keyed file which exists on all cgroups.
>> +       Each entry is keyed by the DRM device's major:minor.
>> +
>> +       lgpu stands for logical GPU, it is an abstraction used to
>> +       subdivide a physical DRM device for the purpose of resource
>> +       management.  This file stores user configuration while the
>> +        drm.lgpu.effective reflects the actual allocation after
>> +        considering the relationship between the cgroups and their
>> +        configurations.
>> +
>> +       The lgpu is a discrete quantity that is device specific (i.e.
>> +       some DRM devices may have 64 lgpus while others may have 100
>> +       lgpus.)  The lgpu is a single quantity that can be allocated
>> +        in three different ways denoted by the following nested keys.
>> +
>> +         =====     ==============================================
>> +         weight    Allocate by proportion in relationship with
>> +                    active sibling cgroups
>> +         count     Allocate by amount statically, treat lgpu as
>> +                    anonymous resources
>> +         list      Allocate statically, treat lgpu as named
>> +                    resource
>> +         =====     ==============================================
>> +
>> +       For example:
>> +       226:0 weight=100 count=256 list=0-255
>> +       226:1 weight=100 count=4 list=0,2,4,6
>> +       226:2 weight=100 count=32 list=32-63
>> +       226:3 weight=100 count=0 list=
>> +       226:4 weight=500 count=0 list=
>> +
>> +       lgpu is represented by a bitmap and uses the bitmap_parselist
>> +       kernel function so the list key input format is a
>> +       comma-separated list of decimal numbers and ranges.
>> +
>> +       Consecutively set bits are shown as two hyphen-separated decimal
>> +       numbers, the smallest and largest bit numbers set in the range.
>> +       Optionally each range can be postfixed to denote that only parts
>> +       of it should be set.  The range will divided to groups of
>> +       specific size.
>> +       Syntax: range:used_size/group_size
>> +       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>> +
>> +       The count key is the hamming weight / hweight of the bitmap.
>> +
>> +       Weight, count and list accept the max and default keywords.
>> +
>> +       Some DRM devices may only support lgpu as anonymous resources.
>> +       In such case, the significance of the position of the set bits
>> +       in list will be ignored.
>> +
>> +       The weight quantity is only in effect when static allocation
>> +       is not used (by setting count=0) for this cgroup.  The weight
>> +       quantity distributes lgpus that are not statically allocated by
>> +       the siblings.  For example, given siblings cgroupA, cgroupB and
>> +       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
>> +       0-63, no lgpu is available to be distributed by weight.
>> +       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
>> +       cgroupC will be starved if it tries to allocate by weight.
>> +
>> +       On the other hand, if cgroupA has weight=100 count=0, cgroupB
>> +       has list=16-47, and cgroupC has weight=100 count=0, then 32
>> +       lgpus are available to be distributed evenly between cgroupA
>> +       and cgroupC.  In drm.lgpu.effective, cgroupA will have
>> +       list=0-15 and cgroupC will have list=48-63.
>> +
>> +       This lgpu resource supports the 'allocation' and 'weight'
>> +       resource distribution model.
>> +
>> +  drm.lgpu.effective
>> +       A read-only nested-keyed file which exists on all cgroups.
>> +       Each entry is keyed by the DRM device's major:minor.
>> +
>> +       lgpu stands for logical GPU, it is an abstraction used to
>> +       subdivide a physical DRM device for the purpose of resource
>> +       management.  This file reflects the actual allocation after
>> +        considering the relationship between the cgroups and their
>> +        configurations in drm.lgpu.
>> +
>>  GEM Buffer Ownership
>>  ~~~~~~~~~~~~~~~~~~~~
>>
>> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
>> index 2b41d4d22e33..619a110cc748 100644
>> --- a/include/drm/drm_cgroup.h
>> +++ b/include/drm/drm_cgroup.h
>> @@ -17,6 +17,9 @@ struct drmcg_props {
>>
>>         s64                     bo_limits_total_allocated_default;
>>         s64                     bo_limits_peak_allocated_default;
>> +
>> +       int                     lgpu_capacity;
>> +       DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>>  };
>>
>>  void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
>> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
>> index eae400f3d9b4..bb09704e7f71 100644
>> --- a/include/linux/cgroup_drm.h
>> +++ b/include/linux/cgroup_drm.h
>> @@ -11,10 +11,14 @@
>>  /* limit defined per the way drm_minor_alloc operates */
>>  #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
>>
>> +#define MAX_DRMCG_LGPU_CAPACITY 256
>> +
>>  enum drmcg_res_type {
>>         DRMCG_TYPE_BO_TOTAL,
>>         DRMCG_TYPE_BO_PEAK,
>>         DRMCG_TYPE_BO_COUNT,
>> +       DRMCG_TYPE_LGPU,
>> +       DRMCG_TYPE_LGPU_EFF,
>>         __DRMCG_TYPE_LAST,
>>  };
>>
>> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
>>         s64                     bo_limits_peak_allocated;
>>
>>         s64                     bo_stats_count_allocated;
>> +
>> +       /**
>> +        * Logical GPU
>> +        *
>> +        * *_cfg are properties configured by users
>> +        * *_eff are the effective properties being applied to the hardware
>> +         * *_stg is used to calculate _eff before applying to _eff
>> +        * after considering the entire hierarchy
>> +        */
>> +       DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
>> +       /* user configurations */
>> +       s64                     lgpu_weight_cfg;
>> +       DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
>> +       /* effective lgpu for the cgroup after considering
>> +        * relationship with other cgroup
>> +        */
>> +       s64                     lgpu_count_eff;
>> +       DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
>>  };
>>
>>  /**
>> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
>> index 5fcbbc13fa1c..a4e88a3704bb 100644
>> --- a/kernel/cgroup/drm.c
>> +++ b/kernel/cgroup/drm.c
>> @@ -9,6 +9,7 @@
>>  #include <linux/seq_file.h>
>>  #include <linux/mutex.h>
>>  #include <linux/kernel.h>
>> +#include <linux/bitmap.h>
>>  #include <linux/cgroup_drm.h>
>>  #include <drm/drm_file.h>
>>  #include <drm/drm_drv.h>
>> @@ -41,6 +42,10 @@ enum drmcg_file_type {
>>         DRMCG_FTYPE_DEFAULT,
>>  };
>>
>> +#define LGPU_LIMITS_NAME_LIST "list"
>> +#define LGPU_LIMITS_NAME_COUNT "count"
>> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
>> +
>>  /**
>>   * drmcg_bind - Bind DRM subsystem to cgroup subsystem
>>   * @acq_dm: function pointer to the drm_minor_acquire function
>> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
>>         ddr->bo_limits_peak_allocated =
>>                 dev->drmcg_props.bo_limits_peak_allocated_default;
>>
>> +       bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
>> +                       MAX_DRMCG_LGPU_CAPACITY);
>> +       bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
>> +                       MAX_DRMCG_LGPU_CAPACITY);
>> +
>> +       ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
>> +
>>         return 0;
>>  }
>>
>> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
>>         mutex_unlock(&cgroup_mutex);
>>  }
>>
>> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
>> +               const unsigned long *free_static,
>> +               const unsigned long *free_weighted,
>> +               struct drmcg *parent_drmcg)
>> +{
>> +       int capacity = dev->drmcg_props.lgpu_capacity;
>> +       DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
>> +       DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
>> +       struct drmcg_device_resource *parent_ddr;
>> +       struct drmcg_device_resource *ddr;
>> +       int minor = dev->primary->index;
>> +       struct cgroup_subsys_state *pos;
>> +       struct drmcg *child;
>> +       s64 weight_sum = 0;
>> +       s64 unused;
>> +
>> +       parent_ddr = parent_drmcg->dev_resources[minor];
>> +
>> +       if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
>> +               /* no static cfg, use weight for calculating the effective */
>> +               bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
>> +       else
>> +               /* lgpu statically configured, use the overlap as effective */
>> +               bitmap_and(parent_ddr->lgpu_stg, free_static,
>> +                               parent_ddr->lgpu_cfg, capacity);
>> +
>> +       /* calculate lgpu available for distribution by weight for children */
>> +       bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
>> +       css_for_each_child(pos, &parent_drmcg->css) {
>> +               child = css_to_drmcg(pos);
>> +               ddr = child->dev_resources[minor];
>> +
>> +               if (bitmap_empty(ddr->lgpu_cfg, capacity))
>> +                       /* no static allocation, participate in weight dist */
>> +                       weight_sum += ddr->lgpu_weight_cfg;
>> +               else
>> +                       /* take out statically allocated lgpu by siblings */
>> +                       bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
>> +                                       capacity);
>> +       }
>> +
>> +       unused = bitmap_weight(lgpu_unused, capacity);
>> +
>> +       css_for_each_child(pos, &parent_drmcg->css) {
>> +               child = css_to_drmcg(pos);
>> +               ddr = child->dev_resources[minor];
>> +
>> +               bitmap_zero(lgpu_by_weight, capacity);
>> +               /* no static allocation, participate in weight distribution */
>> +               if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
>> +                       int c;
>> +                       int p = 0;
>> +
>> +                       for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
>> +                                       c > 0; c--) {
>> +                               p = find_next_bit(lgpu_unused, capacity, p);
>> +                               if (p < capacity) {
>> +                                       clear_bit(p, lgpu_unused);
>> +                                       set_bit(p, lgpu_by_weight);
>> +                               }
>> +                       }
>> +
>> +               }
>> +
>> +               drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
>> +                               lgpu_by_weight, child);
>> +       }
>> +}
>> +
>> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
>> +{
>> +       int capacity = dev->drmcg_props.lgpu_capacity;
>> +       int minor = dev->primary->index;
>> +       struct drmcg_device_resource *ddr;
>> +       struct cgroup_subsys_state *pos;
>> +       struct drmcg *drmcg;
>> +
>> +       if (root_drmcg == NULL) {
>> +               WARN_ON(root_drmcg == NULL);
>> +               return;
>> +       }
>> +
>> +       rcu_read_lock();
>> +
>> +       /* process the entire cgroup tree from root to simplify the algorithm */
>> +       drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
>> +                       dev->drmcg_props.lgpu_slots, root_drmcg);
>> +
>> +       /* apply changes to effective only if there is a change */
>> +       css_for_each_descendant_pre(pos, &root_drmcg->css) {
>> +               drmcg = css_to_drmcg(pos);
>> +               ddr = drmcg->dev_resources[minor];
>> +
>> +               if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
>> +                       bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
>> +                       ddr->lgpu_count_eff =
>> +                               bitmap_weight(ddr->lgpu_eff, capacity);
>> +               }
>> +       }
>> +       rcu_read_unlock();
>> +}
>> +
>> +static void drmcg_apply_effective(enum drmcg_res_type type,
>> +               struct drm_device *dev, struct drmcg *changed_drmcg)
>> +{
>> +       switch (type) {
>> +       case DRMCG_TYPE_LGPU:
>> +               drmcg_apply_effective_lgpu(dev);
>> +               break;
>> +       default:
>> +               break;
>> +       }
>> +}
>> +
>>  /**
>>   * drmcg_register_dev - register a DRM device for usage in drm cgroup
>>   * @dev: DRM device
>> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
>>         {
>>                 dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
>>
>> +               WARN_ON(dev->drmcg_props.lgpu_capacity !=
>> +                               bitmap_weight(dev->drmcg_props.lgpu_slots,
>> +                                       MAX_DRMCG_LGPU_CAPACITY));
>> +
>>                 drmcg_update_cg_tree(dev);
>> +
>> +               drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
>>         }
>>         mutex_unlock(&drmcg_mutex);
>>  }
>> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
>>  }
>>
>>  static void drmcg_print_limits(struct drmcg_device_resource *ddr,
>> -               struct seq_file *sf, enum drmcg_res_type type)
>> +               struct seq_file *sf, enum drmcg_res_type type,
>> +               struct drm_device *dev)
>>  {
>>         if (ddr == NULL) {
>>                 seq_puts(sf, "\n");
>> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
>>         case DRMCG_TYPE_BO_PEAK:
>>                 seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
>>                 break;
>> +       case DRMCG_TYPE_LGPU:
>> +               seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
>> +                               LGPU_LIMITS_NAME_WEIGHT,
>> +                               ddr->lgpu_weight_cfg,
>> +                               LGPU_LIMITS_NAME_COUNT,
>> +                               bitmap_weight(ddr->lgpu_cfg,
>> +                                       dev->drmcg_props.lgpu_capacity),
>> +                               LGPU_LIMITS_NAME_LIST,
>> +                               dev->drmcg_props.lgpu_capacity,
>> +                               ddr->lgpu_cfg);
>> +               break;
>> +       case DRMCG_TYPE_LGPU_EFF:
>> +               seq_printf(sf, "%s=%lld %s=%*pbl\n",
>> +                               LGPU_LIMITS_NAME_COUNT,
>> +                               ddr->lgpu_count_eff,
>> +                               LGPU_LIMITS_NAME_LIST,
>> +                               dev->drmcg_props.lgpu_capacity,
>> +                               ddr->lgpu_eff);
>> +               break;
>>         default:
>>                 seq_puts(sf, "\n");
>>                 break;
>> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
>>                 seq_printf(sf, "%lld\n",
>>                         props->bo_limits_peak_allocated_default);
>>                 break;
>> +       case DRMCG_TYPE_LGPU:
>> +               seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
>> +                               LGPU_LIMITS_NAME_WEIGHT,
>> +                               CGROUP_WEIGHT_DFL,
>> +                               LGPU_LIMITS_NAME_COUNT,
>> +                               bitmap_weight(props->lgpu_slots,
>> +                                       props->lgpu_capacity),
>> +                               LGPU_LIMITS_NAME_LIST,
>> +                               props->lgpu_capacity,
>> +                               props->lgpu_slots);
>> +               break;
>>         default:
>>                 seq_puts(sf, "\n");
>>                 break;
>> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
>>                 drmcg_print_stats(ddr, sf, type);
>>                 break;
>>         case DRMCG_FTYPE_LIMIT:
>> -               drmcg_print_limits(ddr, sf, type);
>> +               drmcg_print_limits(ddr, sf, type, minor->dev);
>>                 break;
>>         case DRMCG_FTYPE_DEFAULT:
>>                 drmcg_print_default(&minor->dev->drmcg_props, sf, type);
>> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
>>         return rc;
>>  }
>>
>> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>> +               struct drm_device *dev, char *attrs)
>> +{
>> +       DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>> +       DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>> +       enum drmcg_res_type type =
>> +               DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
>> +       struct drmcg *drmcg = css_to_drmcg(of_css(of));
>> +       struct drmcg_props *props = &dev->drmcg_props;
>> +       char *cft_name = of_cft(of)->name;
>> +       int minor = dev->primary->index;
>> +       char *nested = strstrip(attrs);
>> +       struct drmcg_device_resource *ddr =
>> +               drmcg->dev_resources[minor];
>> +       char *attr;
>> +       char sname[256];
>> +       char sval[256];
>> +       s64 val;
>> +       int rc;
>> +
>> +       while (nested != NULL) {
>> +               attr = strsep(&nested, " ");
>> +
>> +               if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
>> +                       continue;
>> +
>> +               switch (type) {
>> +               case DRMCG_TYPE_LGPU:
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
>> +                               strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
>> +                               strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
>> +                               continue;
>> +
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
>> +                                       (!strcmp("max", sval) ||
>> +                                       !strcmp("default", sval))) {
>> +                               bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
>> +                                               props->lgpu_capacity);
>> +
>> +                               continue;
>> +                       }
>> +
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
>> +                               rc = drmcg_process_limit_s64_val(sval,
>> +                                       false, CGROUP_WEIGHT_DFL,
>> +                                       CGROUP_WEIGHT_MAX, &val);
>> +
>> +                               if (rc || val < CGROUP_WEIGHT_MIN ||
>> +                                               val > CGROUP_WEIGHT_MAX) {
>> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
>> +                                                       minor);
>> +                                       continue;
>> +                               }
>> +
>> +                               ddr->lgpu_weight_cfg = val;
>> +                               continue;
>> +                       }
>> +
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
>> +                               rc = drmcg_process_limit_s64_val(sval,
>> +                                       false, props->lgpu_capacity,
>> +                                       props->lgpu_capacity, &val);
>> +
>> +                               if (rc || val < 0) {
>> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
>> +                                                       minor);
>> +                                       continue;
>> +                               }
>> +
>> +                               bitmap_zero(tmp_bitmap,
>> +                                               MAX_DRMCG_LGPU_CAPACITY);
>> +                               bitmap_set(tmp_bitmap, 0, val);
>> +                       }
>> +
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
>> +                               rc = bitmap_parselist(sval, tmp_bitmap,
>> +                                               MAX_DRMCG_LGPU_CAPACITY);
>> +
>> +                               if (rc) {
>> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
>> +                                                       minor);
>> +                                       continue;
>> +                               }
>> +
>> +                               bitmap_andnot(chk_bitmap, tmp_bitmap,
>> +                                       props->lgpu_slots,
>> +                                       MAX_DRMCG_LGPU_CAPACITY);
>> +
>> +                               /* user setting does not intersect with
>> +                                * available lgpu */
>> +                               if (!bitmap_empty(chk_bitmap,
>> +                                               MAX_DRMCG_LGPU_CAPACITY)) {
>> +                                       drmcg_pr_cft_err(drmcg, 0, cft_name,
>> +                                                       minor);
>> +                                       continue;
>> +                               }
>> +                       }
>> +
>> +                       bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
>> +                                       props->lgpu_capacity);
>> +
>> +                       break; /* DRMCG_TYPE_LGPU */
>> +               default:
>> +                       break;
>> +               } /* switch (type) */
>> +       }
>> +}
>> +
>> +
>>  /**
>>   * drmcg_limit_write - parse cgroup interface files to obtain user config
>>   *
>> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
>>
>>                         ddr->bo_limits_peak_allocated = val;
>>                         break;
>> +               case DRMCG_TYPE_LGPU:
>> +                       drmcg_nested_limit_parse(of, dm->dev, sattr);
>> +                       break;
>>                 default:
>>                         break;
>>                 }
>> +
>> +               drmcg_apply_effective(type, dm->dev, drmcg);
>> +
>>                 mutex_unlock(&dm->dev->drmcg_mutex);
>>
>>                 mutex_lock(&drmcg_mutex);
>> @@ -560,12 +838,51 @@ struct cftype files[] = {
>>                 .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
>>                                                 DRMCG_FTYPE_STATS),
>>         },
>> +       {
>> +               .name = "lgpu",
>> +               .seq_show = drmcg_seq_show,
>> +               .write = drmcg_limit_write,
>> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>> +                                               DRMCG_FTYPE_LIMIT),
>> +       },
>> +       {
>> +               .name = "lgpu.default",
>> +               .seq_show = drmcg_seq_show,
>> +               .flags = CFTYPE_ONLY_ON_ROOT,
>> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>> +                                               DRMCG_FTYPE_DEFAULT),
>> +       },
>> +       {
>> +               .name = "lgpu.effective",
>> +               .seq_show = drmcg_seq_show,
>> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
>> +                                               DRMCG_FTYPE_LIMIT),
>> +       },
>>         { }     /* terminate */
>>  };
>>
>> +static int drmcg_online_fn(int id, void *ptr, void *data)
>> +{
>> +       struct drm_minor *minor = ptr;
>> +       struct drmcg *drmcg = data;
>> +
>> +       if (minor->type != DRM_MINOR_PRIMARY)
>> +               return 0;
>> +
>> +       drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
>> +
>> +       return 0;
>> +}
>> +
>> +static int drmcg_css_online(struct cgroup_subsys_state *css)
>> +{
>> +       return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
>> +}
>> +
>>  struct cgroup_subsys drm_cgrp_subsys = {
>>         .css_alloc      = drmcg_css_alloc,
>>         .css_free       = drmcg_css_free,
>> +       .css_online     = drmcg_css_online,
>>         .early_init     = false,
>>         .legacy_cftypes = files,
>>         .dfl_cftypes    = files,
>> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
>>         dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
>>         dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
>>
>> +       dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
>> +       bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>> +
>>         drmcg_update_cg_tree(dev);
>>  }
>>  EXPORT_SYMBOL(drmcg_device_early_init);
>> --
>> 2.25.0
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 16:44   ` Jason Ekstrand
  2020-02-14 16:59     ` Jason Ekstrand
@ 2020-02-14 17:08     ` Kenny Ho
  2020-02-14 17:48       ` Jason Ekstrand
  2020-02-14 18:34       ` Daniel Vetter
  1 sibling, 2 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 17:08 UTC (permalink / raw)
  To: Jason Ekstrand
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	amd-gfx mailing list, lkaplan, Alex Deucher, nirmoy.das,
	Maling list - DRI developers, Greathouse, Joseph, Tejun Heo,
	cgroups, Christian König, damon.mcdougall

Hi Jason,

Thanks for the review.

On Fri, Feb 14, 2020 at 11:44 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> Pardon my ignorance but I'm a bit confused by this.  What is a "logical GPU"?  What are we subdividing?  Are we carving up memory?  Compute power?  Both?

The intention is compute but it is up to the individual drm driver to decide.

> If it's carving up compute power, what's actually being carved up?  Time?  Execution units/waves/threads?  Even if that's the case, what advantage does it give to have it in terms of a fixed set of lgpus where each cgroup gets to pick a fixed set.  Does affinity matter that much?  Why not just say how many waves the GPU supports and that they have to be allocated in chunks of 16 waves (pulling a number out of thin air) and let the cgroup specify how many waves it wants.
>
> Don't get me wrong here.  I'm all for the notion of being able to use cgroups to carve up GPU compute resources.  However, this sounds to me like the most AMD-specific solution possible.  We (Intel) could probably do some sort of carving up as well but we'd likely want to do it with preemption and time-slicing rather than handing out specific EUs.

This has been discussed in the RFC before
(https://www.spinics.net/lists/cgroups/msg23469.html.)  As mentioned
before, the idea of a compute unit is hardly an AMD specific thing as
it is in the OpenCL standard and part of the architecture of many
different vendors.  In addition, the interface presented here supports
Intel's use case.  What you described is what I considered as the
"anonymous resources" view of the lgpu.  What you/Intel can do, is to
register your device to drmcg to have 100 lgpu and users can specify
simply by count.  So if they want to allocate 5% for a cgroup, they
would set count=5.  Per the documentation in this patch: "Some DRM
devices may only support lgpu as anonymous resources.  In such case,
the significance of the position of the set bits in list will be
ignored."  What Intel does with the user expressed configuration of "5
out of 100" is entirely up to Intel (time slice if you like, change to
specific EUs later if you like, or make it driver configurable to
support both if you like.)

Regards,
Kenny

>
> On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho@amd.com> wrote:
>>
>> drm.lgpu
>>       A read-write nested-keyed file which exists on all cgroups.
>>       Each entry is keyed by the DRM device's major:minor.
>>
>>       lgpu stands for logical GPU, it is an abstraction used to
>>       subdivide a physical DRM device for the purpose of resource
>>       management.  This file stores user configuration while the
>>       drm.lgpu.effective reflects the actual allocation after
>>       considering the relationship between the cgroups and their
>>       configurations.
>>
>>       The lgpu is a discrete quantity that is device specific (i.e.
>>       some DRM devices may have 64 lgpus while others may have 100
>>       lgpus.)  The lgpu is a single quantity that can be allocated
>>       in three different ways denoted by the following nested keys.
>>
>>         =====     ==============================================
>>         weight    Allocate by proportion in relationship with
>>                   active sibling cgroups
>>         count     Allocate by amount statically, treat lgpu as
>>                   anonymous resources
>>         list      Allocate statically, treat lgpu as named
>>                   resource
>>         =====     ==============================================
>>
>>       For example:
>>       226:0 weight=100 count=256 list=0-255
>>       226:1 weight=100 count=4 list=0,2,4,6
>>       226:2 weight=100 count=32 list=32-63
>>       226:3 weight=100 count=0 list=
>>       226:4 weight=500 count=0 list=
>>
>>       lgpu is represented by a bitmap and uses the bitmap_parselist
>>       kernel function so the list key input format is a
>>       comma-separated list of decimal numbers and ranges.
>>
>>       Consecutively set bits are shown as two hyphen-separated decimal
>>       numbers, the smallest and largest bit numbers set in the range.
>>       Optionally each range can be postfixed to denote that only parts
>>       of it should be set.  The range will divided to groups of
>>       specific size.
>>       Syntax: range:used_size/group_size
>>       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>>
>>       The count key is the hamming weight / hweight of the bitmap.
>>
>>       Weight, count and list accept the max and default keywords.
>>
>>       Some DRM devices may only support lgpu as anonymous resources.
>>       In such case, the significance of the position of the set bits
>>       in list will be ignored.
>>
>>       The weight quantity is only in effect when static allocation
>>       is not used (by setting count=0) for this cgroup.  The weight
>>       quantity distributes lgpus that are not statically allocated by
>>       the siblings.  For example, given siblings cgroupA, cgroupB and
>>       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
>>       0-63, no lgpu is available to be distributed by weight.
>>       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
>>       cgroupC will be starved if it tries to allocate by weight.
>>
>>       On the other hand, if cgroupA has weight=100 count=0, cgroupB
>>       has list=16-47, and cgroupC has weight=100 count=0, then 32
>>       lgpus are available to be distributed evenly between cgroupA
>>       and cgroupC.  In drm.lgpu.effective, cgroupA will have
>>       list=0-15 and cgroupC will have list=48-63.
>>
>>       This lgpu resource supports the 'allocation' and 'weight'
>>       resource distribution model.
>>
>> drm.lgpu.effective
>>       A read-only nested-keyed file which exists on all cgroups.
>>       Each entry is keyed by the DRM device's major:minor.
>>
>>       lgpu stands for logical GPU, it is an abstraction used to
>>       subdivide a physical DRM device for the purpose of resource
>>       management.  This file reflects the actual allocation after
>>       considering the relationship between the cgroups and their
>>       configurations in drm.lgpu.
>>
>> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
>> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
>> ---
>>  Documentation/admin-guide/cgroup-v2.rst |  80 ++++++
>>  include/drm/drm_cgroup.h                |   3 +
>>  include/linux/cgroup_drm.h              |  22 ++
>>  kernel/cgroup/drm.c                     | 324 +++++++++++++++++++++++-
>>  4 files changed, 427 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index ce5dc027366a..d8a41956e5c7 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -2120,6 +2120,86 @@ DRM Interface Files
>>         Set largest allocation for /dev/dri/card1 to 4MB
>>         echo "226:1 4m" > drm.buffer.peak.max
>>
>> +  drm.lgpu
>> +       A read-write nested-keyed file which exists on all cgroups.
>> +       Each entry is keyed by the DRM device's major:minor.
>> +
>> +       lgpu stands for logical GPU, it is an abstraction used to
>> +       subdivide a physical DRM device for the purpose of resource
>> +       management.  This file stores user configuration while the
>> +        drm.lgpu.effective reflects the actual allocation after
>> +        considering the relationship between the cgroups and their
>> +        configurations.
>> +
>> +       The lgpu is a discrete quantity that is device specific (i.e.
>> +       some DRM devices may have 64 lgpus while others may have 100
>> +       lgpus.)  The lgpu is a single quantity that can be allocated
>> +        in three different ways denoted by the following nested keys.
>> +
>> +         =====     ==============================================
>> +         weight    Allocate by proportion in relationship with
>> +                    active sibling cgroups
>> +         count     Allocate by amount statically, treat lgpu as
>> +                    anonymous resources
>> +         list      Allocate statically, treat lgpu as named
>> +                    resource
>> +         =====     ==============================================
>> +
>> +       For example:
>> +       226:0 weight=100 count=256 list=0-255
>> +       226:1 weight=100 count=4 list=0,2,4,6
>> +       226:2 weight=100 count=32 list=32-63
>> +       226:3 weight=100 count=0 list=
>> +       226:4 weight=500 count=0 list=
>> +
>> +       lgpu is represented by a bitmap and uses the bitmap_parselist
>> +       kernel function so the list key input format is a
>> +       comma-separated list of decimal numbers and ranges.
>> +
>> +       Consecutively set bits are shown as two hyphen-separated decimal
>> +       numbers, the smallest and largest bit numbers set in the range.
>> +       Optionally each range can be postfixed to denote that only parts
>> +       of it should be set.  The range will divided to groups of
>> +       specific size.
>> +       Syntax: range:used_size/group_size
>> +       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>> +
>> +       The count key is the hamming weight / hweight of the bitmap.
>> +
>> +       Weight, count and list accept the max and default keywords.
>> +
>> +       Some DRM devices may only support lgpu as anonymous resources.
>> +       In such case, the significance of the position of the set bits
>> +       in list will be ignored.
>> +
>> +       The weight quantity is only in effect when static allocation
>> +       is not used (by setting count=0) for this cgroup.  The weight
>> +       quantity distributes lgpus that are not statically allocated by
>> +       the siblings.  For example, given siblings cgroupA, cgroupB and
>> +       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
>> +       0-63, no lgpu is available to be distributed by weight.
>> +       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
>> +       cgroupC will be starved if it tries to allocate by weight.
>> +
>> +       On the other hand, if cgroupA has weight=100 count=0, cgroupB
>> +       has list=16-47, and cgroupC has weight=100 count=0, then 32
>> +       lgpus are available to be distributed evenly between cgroupA
>> +       and cgroupC.  In drm.lgpu.effective, cgroupA will have
>> +       list=0-15 and cgroupC will have list=48-63.
>> +
>> +       This lgpu resource supports the 'allocation' and 'weight'
>> +       resource distribution model.
>> +
>> +  drm.lgpu.effective
>> +       A read-only nested-keyed file which exists on all cgroups.
>> +       Each entry is keyed by the DRM device's major:minor.
>> +
>> +       lgpu stands for logical GPU, it is an abstraction used to
>> +       subdivide a physical DRM device for the purpose of resource
>> +       management.  This file reflects the actual allocation after
>> +        considering the relationship between the cgroups and their
>> +        configurations in drm.lgpu.
>> +
>>  GEM Buffer Ownership
>>  ~~~~~~~~~~~~~~~~~~~~
>>
>> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
>> index 2b41d4d22e33..619a110cc748 100644
>> --- a/include/drm/drm_cgroup.h
>> +++ b/include/drm/drm_cgroup.h
>> @@ -17,6 +17,9 @@ struct drmcg_props {
>>
>>         s64                     bo_limits_total_allocated_default;
>>         s64                     bo_limits_peak_allocated_default;
>> +
>> +       int                     lgpu_capacity;
>> +       DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>>  };
>>
>>  void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
>> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
>> index eae400f3d9b4..bb09704e7f71 100644
>> --- a/include/linux/cgroup_drm.h
>> +++ b/include/linux/cgroup_drm.h
>> @@ -11,10 +11,14 @@
>>  /* limit defined per the way drm_minor_alloc operates */
>>  #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
>>
>> +#define MAX_DRMCG_LGPU_CAPACITY 256
>> +
>>  enum drmcg_res_type {
>>         DRMCG_TYPE_BO_TOTAL,
>>         DRMCG_TYPE_BO_PEAK,
>>         DRMCG_TYPE_BO_COUNT,
>> +       DRMCG_TYPE_LGPU,
>> +       DRMCG_TYPE_LGPU_EFF,
>>         __DRMCG_TYPE_LAST,
>>  };
>>
>> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
>>         s64                     bo_limits_peak_allocated;
>>
>>         s64                     bo_stats_count_allocated;
>> +
>> +       /**
>> +        * Logical GPU
>> +        *
>> +        * *_cfg are properties configured by users
>> +        * *_eff are the effective properties being applied to the hardware
>> +         * *_stg is used to calculate _eff before applying to _eff
>> +        * after considering the entire hierarchy
>> +        */
>> +       DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
>> +       /* user configurations */
>> +       s64                     lgpu_weight_cfg;
>> +       DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
>> +       /* effective lgpu for the cgroup after considering
>> +        * relationship with other cgroup
>> +        */
>> +       s64                     lgpu_count_eff;
>> +       DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
>>  };
>>
>>  /**
>> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
>> index 5fcbbc13fa1c..a4e88a3704bb 100644
>> --- a/kernel/cgroup/drm.c
>> +++ b/kernel/cgroup/drm.c
>> @@ -9,6 +9,7 @@
>>  #include <linux/seq_file.h>
>>  #include <linux/mutex.h>
>>  #include <linux/kernel.h>
>> +#include <linux/bitmap.h>
>>  #include <linux/cgroup_drm.h>
>>  #include <drm/drm_file.h>
>>  #include <drm/drm_drv.h>
>> @@ -41,6 +42,10 @@ enum drmcg_file_type {
>>         DRMCG_FTYPE_DEFAULT,
>>  };
>>
>> +#define LGPU_LIMITS_NAME_LIST "list"
>> +#define LGPU_LIMITS_NAME_COUNT "count"
>> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
>> +
>>  /**
>>   * drmcg_bind - Bind DRM subsystem to cgroup subsystem
>>   * @acq_dm: function pointer to the drm_minor_acquire function
>> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
>>         ddr->bo_limits_peak_allocated =
>>                 dev->drmcg_props.bo_limits_peak_allocated_default;
>>
>> +       bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
>> +                       MAX_DRMCG_LGPU_CAPACITY);
>> +       bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
>> +                       MAX_DRMCG_LGPU_CAPACITY);
>> +
>> +       ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
>> +
>>         return 0;
>>  }
>>
>> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
>>         mutex_unlock(&cgroup_mutex);
>>  }
>>
>> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
>> +               const unsigned long *free_static,
>> +               const unsigned long *free_weighted,
>> +               struct drmcg *parent_drmcg)
>> +{
>> +       int capacity = dev->drmcg_props.lgpu_capacity;
>> +       DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
>> +       DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
>> +       struct drmcg_device_resource *parent_ddr;
>> +       struct drmcg_device_resource *ddr;
>> +       int minor = dev->primary->index;
>> +       struct cgroup_subsys_state *pos;
>> +       struct drmcg *child;
>> +       s64 weight_sum = 0;
>> +       s64 unused;
>> +
>> +       parent_ddr = parent_drmcg->dev_resources[minor];
>> +
>> +       if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
>> +               /* no static cfg, use weight for calculating the effective */
>> +               bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
>> +       else
>> +               /* lgpu statically configured, use the overlap as effective */
>> +               bitmap_and(parent_ddr->lgpu_stg, free_static,
>> +                               parent_ddr->lgpu_cfg, capacity);
>> +
>> +       /* calculate lgpu available for distribution by weight for children */
>> +       bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
>> +       css_for_each_child(pos, &parent_drmcg->css) {
>> +               child = css_to_drmcg(pos);
>> +               ddr = child->dev_resources[minor];
>> +
>> +               if (bitmap_empty(ddr->lgpu_cfg, capacity))
>> +                       /* no static allocation, participate in weight dist */
>> +                       weight_sum += ddr->lgpu_weight_cfg;
>> +               else
>> +                       /* take out statically allocated lgpu by siblings */
>> +                       bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
>> +                                       capacity);
>> +       }
>> +
>> +       unused = bitmap_weight(lgpu_unused, capacity);
>> +
>> +       css_for_each_child(pos, &parent_drmcg->css) {
>> +               child = css_to_drmcg(pos);
>> +               ddr = child->dev_resources[minor];
>> +
>> +               bitmap_zero(lgpu_by_weight, capacity);
>> +               /* no static allocation, participate in weight distribution */
>> +               if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
>> +                       int c;
>> +                       int p = 0;
>> +
>> +                       for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
>> +                                       c > 0; c--) {
>> +                               p = find_next_bit(lgpu_unused, capacity, p);
>> +                               if (p < capacity) {
>> +                                       clear_bit(p, lgpu_unused);
>> +                                       set_bit(p, lgpu_by_weight);
>> +                               }
>> +                       }
>> +
>> +               }
>> +
>> +               drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
>> +                               lgpu_by_weight, child);
>> +       }
>> +}
>> +
>> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
>> +{
>> +       int capacity = dev->drmcg_props.lgpu_capacity;
>> +       int minor = dev->primary->index;
>> +       struct drmcg_device_resource *ddr;
>> +       struct cgroup_subsys_state *pos;
>> +       struct drmcg *drmcg;
>> +
>> +       if (root_drmcg == NULL) {
>> +               WARN_ON(root_drmcg == NULL);
>> +               return;
>> +       }
>> +
>> +       rcu_read_lock();
>> +
>> +       /* process the entire cgroup tree from root to simplify the algorithm */
>> +       drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
>> +                       dev->drmcg_props.lgpu_slots, root_drmcg);
>> +
>> +       /* apply changes to effective only if there is a change */
>> +       css_for_each_descendant_pre(pos, &root_drmcg->css) {
>> +               drmcg = css_to_drmcg(pos);
>> +               ddr = drmcg->dev_resources[minor];
>> +
>> +               if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
>> +                       bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
>> +                       ddr->lgpu_count_eff =
>> +                               bitmap_weight(ddr->lgpu_eff, capacity);
>> +               }
>> +       }
>> +       rcu_read_unlock();
>> +}
>> +
>> +static void drmcg_apply_effective(enum drmcg_res_type type,
>> +               struct drm_device *dev, struct drmcg *changed_drmcg)
>> +{
>> +       switch (type) {
>> +       case DRMCG_TYPE_LGPU:
>> +               drmcg_apply_effective_lgpu(dev);
>> +               break;
>> +       default:
>> +               break;
>> +       }
>> +}
>> +
>>  /**
>>   * drmcg_register_dev - register a DRM device for usage in drm cgroup
>>   * @dev: DRM device
>> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
>>         {
>>                 dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
>>
>> +               WARN_ON(dev->drmcg_props.lgpu_capacity !=
>> +                               bitmap_weight(dev->drmcg_props.lgpu_slots,
>> +                                       MAX_DRMCG_LGPU_CAPACITY));
>> +
>>                 drmcg_update_cg_tree(dev);
>> +
>> +               drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
>>         }
>>         mutex_unlock(&drmcg_mutex);
>>  }
>> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
>>  }
>>
>>  static void drmcg_print_limits(struct drmcg_device_resource *ddr,
>> -               struct seq_file *sf, enum drmcg_res_type type)
>> +               struct seq_file *sf, enum drmcg_res_type type,
>> +               struct drm_device *dev)
>>  {
>>         if (ddr == NULL) {
>>                 seq_puts(sf, "\n");
>> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
>>         case DRMCG_TYPE_BO_PEAK:
>>                 seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
>>                 break;
>> +       case DRMCG_TYPE_LGPU:
>> +               seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
>> +                               LGPU_LIMITS_NAME_WEIGHT,
>> +                               ddr->lgpu_weight_cfg,
>> +                               LGPU_LIMITS_NAME_COUNT,
>> +                               bitmap_weight(ddr->lgpu_cfg,
>> +                                       dev->drmcg_props.lgpu_capacity),
>> +                               LGPU_LIMITS_NAME_LIST,
>> +                               dev->drmcg_props.lgpu_capacity,
>> +                               ddr->lgpu_cfg);
>> +               break;
>> +       case DRMCG_TYPE_LGPU_EFF:
>> +               seq_printf(sf, "%s=%lld %s=%*pbl\n",
>> +                               LGPU_LIMITS_NAME_COUNT,
>> +                               ddr->lgpu_count_eff,
>> +                               LGPU_LIMITS_NAME_LIST,
>> +                               dev->drmcg_props.lgpu_capacity,
>> +                               ddr->lgpu_eff);
>> +               break;
>>         default:
>>                 seq_puts(sf, "\n");
>>                 break;
>> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
>>                 seq_printf(sf, "%lld\n",
>>                         props->bo_limits_peak_allocated_default);
>>                 break;
>> +       case DRMCG_TYPE_LGPU:
>> +               seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
>> +                               LGPU_LIMITS_NAME_WEIGHT,
>> +                               CGROUP_WEIGHT_DFL,
>> +                               LGPU_LIMITS_NAME_COUNT,
>> +                               bitmap_weight(props->lgpu_slots,
>> +                                       props->lgpu_capacity),
>> +                               LGPU_LIMITS_NAME_LIST,
>> +                               props->lgpu_capacity,
>> +                               props->lgpu_slots);
>> +               break;
>>         default:
>>                 seq_puts(sf, "\n");
>>                 break;
>> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
>>                 drmcg_print_stats(ddr, sf, type);
>>                 break;
>>         case DRMCG_FTYPE_LIMIT:
>> -               drmcg_print_limits(ddr, sf, type);
>> +               drmcg_print_limits(ddr, sf, type, minor->dev);
>>                 break;
>>         case DRMCG_FTYPE_DEFAULT:
>>                 drmcg_print_default(&minor->dev->drmcg_props, sf, type);
>> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
>>         return rc;
>>  }
>>
>> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>> +               struct drm_device *dev, char *attrs)
>> +{
>> +       DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>> +       DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>> +       enum drmcg_res_type type =
>> +               DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
>> +       struct drmcg *drmcg = css_to_drmcg(of_css(of));
>> +       struct drmcg_props *props = &dev->drmcg_props;
>> +       char *cft_name = of_cft(of)->name;
>> +       int minor = dev->primary->index;
>> +       char *nested = strstrip(attrs);
>> +       struct drmcg_device_resource *ddr =
>> +               drmcg->dev_resources[minor];
>> +       char *attr;
>> +       char sname[256];
>> +       char sval[256];
>> +       s64 val;
>> +       int rc;
>> +
>> +       while (nested != NULL) {
>> +               attr = strsep(&nested, " ");
>> +
>> +               if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
>> +                       continue;
>> +
>> +               switch (type) {
>> +               case DRMCG_TYPE_LGPU:
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
>> +                               strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
>> +                               strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
>> +                               continue;
>> +
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
>> +                                       (!strcmp("max", sval) ||
>> +                                       !strcmp("default", sval))) {
>> +                               bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
>> +                                               props->lgpu_capacity);
>> +
>> +                               continue;
>> +                       }
>> +
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
>> +                               rc = drmcg_process_limit_s64_val(sval,
>> +                                       false, CGROUP_WEIGHT_DFL,
>> +                                       CGROUP_WEIGHT_MAX, &val);
>> +
>> +                               if (rc || val < CGROUP_WEIGHT_MIN ||
>> +                                               val > CGROUP_WEIGHT_MAX) {
>> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
>> +                                                       minor);
>> +                                       continue;
>> +                               }
>> +
>> +                               ddr->lgpu_weight_cfg = val;
>> +                               continue;
>> +                       }
>> +
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
>> +                               rc = drmcg_process_limit_s64_val(sval,
>> +                                       false, props->lgpu_capacity,
>> +                                       props->lgpu_capacity, &val);
>> +
>> +                               if (rc || val < 0) {
>> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
>> +                                                       minor);
>> +                                       continue;
>> +                               }
>> +
>> +                               bitmap_zero(tmp_bitmap,
>> +                                               MAX_DRMCG_LGPU_CAPACITY);
>> +                               bitmap_set(tmp_bitmap, 0, val);
>> +                       }
>> +
>> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
>> +                               rc = bitmap_parselist(sval, tmp_bitmap,
>> +                                               MAX_DRMCG_LGPU_CAPACITY);
>> +
>> +                               if (rc) {
>> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
>> +                                                       minor);
>> +                                       continue;
>> +                               }
>> +
>> +                               bitmap_andnot(chk_bitmap, tmp_bitmap,
>> +                                       props->lgpu_slots,
>> +                                       MAX_DRMCG_LGPU_CAPACITY);
>> +
>> +                               /* user setting does not intersect with
>> +                                * available lgpu */
>> +                               if (!bitmap_empty(chk_bitmap,
>> +                                               MAX_DRMCG_LGPU_CAPACITY)) {
>> +                                       drmcg_pr_cft_err(drmcg, 0, cft_name,
>> +                                                       minor);
>> +                                       continue;
>> +                               }
>> +                       }
>> +
>> +                       bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
>> +                                       props->lgpu_capacity);
>> +
>> +                       break; /* DRMCG_TYPE_LGPU */
>> +               default:
>> +                       break;
>> +               } /* switch (type) */
>> +       }
>> +}
>> +
>> +
>>  /**
>>   * drmcg_limit_write - parse cgroup interface files to obtain user config
>>   *
>> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
>>
>>                         ddr->bo_limits_peak_allocated = val;
>>                         break;
>> +               case DRMCG_TYPE_LGPU:
>> +                       drmcg_nested_limit_parse(of, dm->dev, sattr);
>> +                       break;
>>                 default:
>>                         break;
>>                 }
>> +
>> +               drmcg_apply_effective(type, dm->dev, drmcg);
>> +
>>                 mutex_unlock(&dm->dev->drmcg_mutex);
>>
>>                 mutex_lock(&drmcg_mutex);
>> @@ -560,12 +838,51 @@ struct cftype files[] = {
>>                 .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
>>                                                 DRMCG_FTYPE_STATS),
>>         },
>> +       {
>> +               .name = "lgpu",
>> +               .seq_show = drmcg_seq_show,
>> +               .write = drmcg_limit_write,
>> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>> +                                               DRMCG_FTYPE_LIMIT),
>> +       },
>> +       {
>> +               .name = "lgpu.default",
>> +               .seq_show = drmcg_seq_show,
>> +               .flags = CFTYPE_ONLY_ON_ROOT,
>> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>> +                                               DRMCG_FTYPE_DEFAULT),
>> +       },
>> +       {
>> +               .name = "lgpu.effective",
>> +               .seq_show = drmcg_seq_show,
>> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
>> +                                               DRMCG_FTYPE_LIMIT),
>> +       },
>>         { }     /* terminate */
>>  };
>>
>> +static int drmcg_online_fn(int id, void *ptr, void *data)
>> +{
>> +       struct drm_minor *minor = ptr;
>> +       struct drmcg *drmcg = data;
>> +
>> +       if (minor->type != DRM_MINOR_PRIMARY)
>> +               return 0;
>> +
>> +       drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
>> +
>> +       return 0;
>> +}
>> +
>> +static int drmcg_css_online(struct cgroup_subsys_state *css)
>> +{
>> +       return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
>> +}
>> +
>>  struct cgroup_subsys drm_cgrp_subsys = {
>>         .css_alloc      = drmcg_css_alloc,
>>         .css_free       = drmcg_css_free,
>> +       .css_online     = drmcg_css_online,
>>         .early_init     = false,
>>         .legacy_cftypes = files,
>>         .dfl_cftypes    = files,
>> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
>>         dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
>>         dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
>>
>> +       dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
>> +       bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>> +
>>         drmcg_update_cg_tree(dev);
>>  }
>>  EXPORT_SYMBOL(drmcg_device_early_init);
>> --
>> 2.25.0
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 17:08     ` Kenny Ho
@ 2020-02-14 17:48       ` Jason Ekstrand
  2020-02-14 18:34       ` Daniel Vetter
  1 sibling, 0 replies; 26+ messages in thread
From: Jason Ekstrand @ 2020-02-14 17:48 UTC (permalink / raw)
  To: Kenny Ho
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	amd-gfx mailing list, lkaplan, Alex Deucher, nirmoy.das,
	Maling list - DRI developers, Greathouse, Joseph, Tejun Heo,
	cgroups, Christian König, damon.mcdougall

On Fri, Feb 14, 2020 at 11:08 AM Kenny Ho <y2kenny@gmail.com> wrote:
>
> Hi Jason,
>
> Thanks for the review.
>
> On Fri, Feb 14, 2020 at 11:44 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
> >
> > Pardon my ignorance but I'm a bit confused by this.  What is a "logical GPU"?  What are we subdividing?  Are we carving up memory?  Compute power?  Both?
>
> The intention is compute but it is up to the individual drm driver to decide.
>
> > If it's carving up compute power, what's actually being carved up?  Time?  Execution units/waves/threads?  Even if that's the case, what advantage does it give to have it in terms of a fixed set of lgpus where each cgroup gets to pick a fixed set.  Does affinity matter that much?  Why not just say how many waves the GPU supports and that they have to be allocated in chunks of 16 waves (pulling a number out of thin air) and let the cgroup specify how many waves it wants.
> >
> > Don't get me wrong here.  I'm all for the notion of being able to use cgroups to carve up GPU compute resources.  However, this sounds to me like the most AMD-specific solution possible.  We (Intel) could probably do some sort of carving up as well but we'd likely want to do it with preemption and time-slicing rather than handing out specific EUs.
>
> This has been discussed in the RFC before
> (https://www.spinics.net/lists/cgroups/msg23469.html.)  As mentioned
> before, the idea of a compute unit is hardly an AMD specific thing as
> it is in the OpenCL standard and part of the architecture of many
> different vendors.  In addition, the interface presented here supports
> Intel's use case.  What you described is what I considered as the
> "anonymous resources" view of the lgpu.  What you/Intel can do, is to
> register your device to drmcg to have 100 lgpu and users can specify
> simply by count.  So if they want to allocate 5% for a cgroup, they
> would set count=5.  Per the documentation in this patch: "Some DRM
> devices may only support lgpu as anonymous resources.  In such case,
> the significance of the position of the set bits in list will be
> ignored."  What Intel does with the user expressed configuration of "5
> out of 100" is entirely up to Intel (time slice if you like, change to
> specific EUs later if you like, or make it driver configurable to
> support both if you like.)

Sure, there's an OpenCL thing.  However, just because there's an
OpenCL thing doesn't mean that it's as standardized as it looks. :-(
In particular,

 1. The OpenCL thing has a query first to ask the driver what kind of
carving up of the GPU is allowed
 2. When clCreateSubdevices is called, the type of partitioning is
specified so they can specifically ask for devices grouped by shared
L2 cache, for instance.
 3. Just because the API exists and everyone implements it doesn't
mean that everyone implements it usefully.  From my reading of the
spec, it looks like the API is very much designed towards a CPU
implementation of OpenCL.  The Intel OpenCL GPU compute drivers, for
instance, implement it as a total no-op and no real sub-dividing is
allowed.

That said, that doesn't necessarily mean that carving up units of
compute power is a bad plan.  It's just unclear (as Daniel said on the
above referenced chain) what those units mean.  Maybe it's ok if they
mean nothing or if their meaning is HW-specific?

> Regards,
> Kenny
>
> >
> > On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho@amd.com> wrote:
> >>
> >> drm.lgpu
> >>       A read-write nested-keyed file which exists on all cgroups.
> >>       Each entry is keyed by the DRM device's major:minor.
> >>
> >>       lgpu stands for logical GPU, it is an abstraction used to
> >>       subdivide a physical DRM device for the purpose of resource
> >>       management.  This file stores user configuration while the
> >>       drm.lgpu.effective reflects the actual allocation after
> >>       considering the relationship between the cgroups and their
> >>       configurations.
> >>
> >>       The lgpu is a discrete quantity that is device specific (i.e.
> >>       some DRM devices may have 64 lgpus while others may have 100
> >>       lgpus.)  The lgpu is a single quantity that can be allocated
> >>       in three different ways denoted by the following nested keys.
> >>
> >>         =====     ==============================================
> >>         weight    Allocate by proportion in relationship with
> >>                   active sibling cgroups
> >>         count     Allocate by amount statically, treat lgpu as
> >>                   anonymous resources
> >>         list      Allocate statically, treat lgpu as named
> >>                   resource
> >>         =====     ==============================================
> >>
> >>       For example:
> >>       226:0 weight=100 count=256 list=0-255
> >>       226:1 weight=100 count=4 list=0,2,4,6
> >>       226:2 weight=100 count=32 list=32-63
> >>       226:3 weight=100 count=0 list=
> >>       226:4 weight=500 count=0 list=
> >>
> >>       lgpu is represented by a bitmap and uses the bitmap_parselist
> >>       kernel function so the list key input format is a
> >>       comma-separated list of decimal numbers and ranges.
> >>
> >>       Consecutively set bits are shown as two hyphen-separated decimal
> >>       numbers, the smallest and largest bit numbers set in the range.
> >>       Optionally each range can be postfixed to denote that only parts
> >>       of it should be set.  The range will divided to groups of
> >>       specific size.
> >>       Syntax: range:used_size/group_size
> >>       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >>
> >>       The count key is the hamming weight / hweight of the bitmap.
> >>
> >>       Weight, count and list accept the max and default keywords.
> >>
> >>       Some DRM devices may only support lgpu as anonymous resources.
> >>       In such case, the significance of the position of the set bits
> >>       in list will be ignored.
> >>
> >>       The weight quantity is only in effect when static allocation
> >>       is not used (by setting count=0) for this cgroup.  The weight
> >>       quantity distributes lgpus that are not statically allocated by
> >>       the siblings.  For example, given siblings cgroupA, cgroupB and
> >>       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >>       0-63, no lgpu is available to be distributed by weight.
> >>       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >>       cgroupC will be starved if it tries to allocate by weight.
> >>
> >>       On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >>       has list=16-47, and cgroupC has weight=100 count=0, then 32
> >>       lgpus are available to be distributed evenly between cgroupA
> >>       and cgroupC.  In drm.lgpu.effective, cgroupA will have
> >>       list=0-15 and cgroupC will have list=48-63.
> >>
> >>       This lgpu resource supports the 'allocation' and 'weight'
> >>       resource distribution model.
> >>
> >> drm.lgpu.effective
> >>       A read-only nested-keyed file which exists on all cgroups.
> >>       Each entry is keyed by the DRM device's major:minor.
> >>
> >>       lgpu stands for logical GPU, it is an abstraction used to
> >>       subdivide a physical DRM device for the purpose of resource
> >>       management.  This file reflects the actual allocation after
> >>       considering the relationship between the cgroups and their
> >>       configurations in drm.lgpu.
> >>
> >> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
> >> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> >> ---
> >>  Documentation/admin-guide/cgroup-v2.rst |  80 ++++++
> >>  include/drm/drm_cgroup.h                |   3 +
> >>  include/linux/cgroup_drm.h              |  22 ++
> >>  kernel/cgroup/drm.c                     | 324 +++++++++++++++++++++++-
> >>  4 files changed, 427 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >> index ce5dc027366a..d8a41956e5c7 100644
> >> --- a/Documentation/admin-guide/cgroup-v2.rst
> >> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >> @@ -2120,6 +2120,86 @@ DRM Interface Files
> >>         Set largest allocation for /dev/dri/card1 to 4MB
> >>         echo "226:1 4m" > drm.buffer.peak.max
> >>
> >> +  drm.lgpu
> >> +       A read-write nested-keyed file which exists on all cgroups.
> >> +       Each entry is keyed by the DRM device's major:minor.
> >> +
> >> +       lgpu stands for logical GPU, it is an abstraction used to
> >> +       subdivide a physical DRM device for the purpose of resource
> >> +       management.  This file stores user configuration while the
> >> +        drm.lgpu.effective reflects the actual allocation after
> >> +        considering the relationship between the cgroups and their
> >> +        configurations.
> >> +
> >> +       The lgpu is a discrete quantity that is device specific (i.e.
> >> +       some DRM devices may have 64 lgpus while others may have 100
> >> +       lgpus.)  The lgpu is a single quantity that can be allocated
> >> +        in three different ways denoted by the following nested keys.
> >> +
> >> +         =====     ==============================================
> >> +         weight    Allocate by proportion in relationship with
> >> +                    active sibling cgroups
> >> +         count     Allocate by amount statically, treat lgpu as
> >> +                    anonymous resources
> >> +         list      Allocate statically, treat lgpu as named
> >> +                    resource
> >> +         =====     ==============================================
> >> +
> >> +       For example:
> >> +       226:0 weight=100 count=256 list=0-255
> >> +       226:1 weight=100 count=4 list=0,2,4,6
> >> +       226:2 weight=100 count=32 list=32-63
> >> +       226:3 weight=100 count=0 list=
> >> +       226:4 weight=500 count=0 list=
> >> +
> >> +       lgpu is represented by a bitmap and uses the bitmap_parselist
> >> +       kernel function so the list key input format is a
> >> +       comma-separated list of decimal numbers and ranges.
> >> +
> >> +       Consecutively set bits are shown as two hyphen-separated decimal
> >> +       numbers, the smallest and largest bit numbers set in the range.
> >> +       Optionally each range can be postfixed to denote that only parts
> >> +       of it should be set.  The range will divided to groups of
> >> +       specific size.
> >> +       Syntax: range:used_size/group_size
> >> +       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >> +
> >> +       The count key is the hamming weight / hweight of the bitmap.
> >> +
> >> +       Weight, count and list accept the max and default keywords.
> >> +
> >> +       Some DRM devices may only support lgpu as anonymous resources.
> >> +       In such case, the significance of the position of the set bits
> >> +       in list will be ignored.
> >> +
> >> +       The weight quantity is only in effect when static allocation
> >> +       is not used (by setting count=0) for this cgroup.  The weight
> >> +       quantity distributes lgpus that are not statically allocated by
> >> +       the siblings.  For example, given siblings cgroupA, cgroupB and
> >> +       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >> +       0-63, no lgpu is available to be distributed by weight.
> >> +       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >> +       cgroupC will be starved if it tries to allocate by weight.
> >> +
> >> +       On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >> +       has list=16-47, and cgroupC has weight=100 count=0, then 32
> >> +       lgpus are available to be distributed evenly between cgroupA
> >> +       and cgroupC.  In drm.lgpu.effective, cgroupA will have
> >> +       list=0-15 and cgroupC will have list=48-63.
> >> +
> >> +       This lgpu resource supports the 'allocation' and 'weight'
> >> +       resource distribution model.
> >> +
> >> +  drm.lgpu.effective
> >> +       A read-only nested-keyed file which exists on all cgroups.
> >> +       Each entry is keyed by the DRM device's major:minor.
> >> +
> >> +       lgpu stands for logical GPU, it is an abstraction used to
> >> +       subdivide a physical DRM device for the purpose of resource
> >> +       management.  This file reflects the actual allocation after
> >> +        considering the relationship between the cgroups and their
> >> +        configurations in drm.lgpu.
> >> +
> >>  GEM Buffer Ownership
> >>  ~~~~~~~~~~~~~~~~~~~~
> >>
> >> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> >> index 2b41d4d22e33..619a110cc748 100644
> >> --- a/include/drm/drm_cgroup.h
> >> +++ b/include/drm/drm_cgroup.h
> >> @@ -17,6 +17,9 @@ struct drmcg_props {
> >>
> >>         s64                     bo_limits_total_allocated_default;
> >>         s64                     bo_limits_peak_allocated_default;
> >> +
> >> +       int                     lgpu_capacity;
> >> +       DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >>  };
> >>
> >>  void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
> >> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> >> index eae400f3d9b4..bb09704e7f71 100644
> >> --- a/include/linux/cgroup_drm.h
> >> +++ b/include/linux/cgroup_drm.h
> >> @@ -11,10 +11,14 @@
> >>  /* limit defined per the way drm_minor_alloc operates */
> >>  #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >>
> >> +#define MAX_DRMCG_LGPU_CAPACITY 256
> >> +
> >>  enum drmcg_res_type {
> >>         DRMCG_TYPE_BO_TOTAL,
> >>         DRMCG_TYPE_BO_PEAK,
> >>         DRMCG_TYPE_BO_COUNT,
> >> +       DRMCG_TYPE_LGPU,
> >> +       DRMCG_TYPE_LGPU_EFF,
> >>         __DRMCG_TYPE_LAST,
> >>  };
> >>
> >> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
> >>         s64                     bo_limits_peak_allocated;
> >>
> >>         s64                     bo_stats_count_allocated;
> >> +
> >> +       /**
> >> +        * Logical GPU
> >> +        *
> >> +        * *_cfg are properties configured by users
> >> +        * *_eff are the effective properties being applied to the hardware
> >> +         * *_stg is used to calculate _eff before applying to _eff
> >> +        * after considering the entire hierarchy
> >> +        */
> >> +       DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
> >> +       /* user configurations */
> >> +       s64                     lgpu_weight_cfg;
> >> +       DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
> >> +       /* effective lgpu for the cgroup after considering
> >> +        * relationship with other cgroup
> >> +        */
> >> +       s64                     lgpu_count_eff;
> >> +       DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
> >>  };
> >>
> >>  /**
> >> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> >> index 5fcbbc13fa1c..a4e88a3704bb 100644
> >> --- a/kernel/cgroup/drm.c
> >> +++ b/kernel/cgroup/drm.c
> >> @@ -9,6 +9,7 @@
> >>  #include <linux/seq_file.h>
> >>  #include <linux/mutex.h>
> >>  #include <linux/kernel.h>
> >> +#include <linux/bitmap.h>
> >>  #include <linux/cgroup_drm.h>
> >>  #include <drm/drm_file.h>
> >>  #include <drm/drm_drv.h>
> >> @@ -41,6 +42,10 @@ enum drmcg_file_type {
> >>         DRMCG_FTYPE_DEFAULT,
> >>  };
> >>
> >> +#define LGPU_LIMITS_NAME_LIST "list"
> >> +#define LGPU_LIMITS_NAME_COUNT "count"
> >> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
> >> +
> >>  /**
> >>   * drmcg_bind - Bind DRM subsystem to cgroup subsystem
> >>   * @acq_dm: function pointer to the drm_minor_acquire function
> >> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >>         ddr->bo_limits_peak_allocated =
> >>                 dev->drmcg_props.bo_limits_peak_allocated_default;
> >>
> >> +       bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
> >> +                       MAX_DRMCG_LGPU_CAPACITY);
> >> +       bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
> >> +                       MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> +       ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
> >> +
> >>         return 0;
> >>  }
> >>
> >> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >>         mutex_unlock(&cgroup_mutex);
> >>  }
> >>
> >> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
> >> +               const unsigned long *free_static,
> >> +               const unsigned long *free_weighted,
> >> +               struct drmcg *parent_drmcg)
> >> +{
> >> +       int capacity = dev->drmcg_props.lgpu_capacity;
> >> +       DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
> >> +       DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
> >> +       struct drmcg_device_resource *parent_ddr;
> >> +       struct drmcg_device_resource *ddr;
> >> +       int minor = dev->primary->index;
> >> +       struct cgroup_subsys_state *pos;
> >> +       struct drmcg *child;
> >> +       s64 weight_sum = 0;
> >> +       s64 unused;
> >> +
> >> +       parent_ddr = parent_drmcg->dev_resources[minor];
> >> +
> >> +       if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
> >> +               /* no static cfg, use weight for calculating the effective */
> >> +               bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
> >> +       else
> >> +               /* lgpu statically configured, use the overlap as effective */
> >> +               bitmap_and(parent_ddr->lgpu_stg, free_static,
> >> +                               parent_ddr->lgpu_cfg, capacity);
> >> +
> >> +       /* calculate lgpu available for distribution by weight for children */
> >> +       bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
> >> +       css_for_each_child(pos, &parent_drmcg->css) {
> >> +               child = css_to_drmcg(pos);
> >> +               ddr = child->dev_resources[minor];
> >> +
> >> +               if (bitmap_empty(ddr->lgpu_cfg, capacity))
> >> +                       /* no static allocation, participate in weight dist */
> >> +                       weight_sum += ddr->lgpu_weight_cfg;
> >> +               else
> >> +                       /* take out statically allocated lgpu by siblings */
> >> +                       bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
> >> +                                       capacity);
> >> +       }
> >> +
> >> +       unused = bitmap_weight(lgpu_unused, capacity);
> >> +
> >> +       css_for_each_child(pos, &parent_drmcg->css) {
> >> +               child = css_to_drmcg(pos);
> >> +               ddr = child->dev_resources[minor];
> >> +
> >> +               bitmap_zero(lgpu_by_weight, capacity);
> >> +               /* no static allocation, participate in weight distribution */
> >> +               if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
> >> +                       int c;
> >> +                       int p = 0;
> >> +
> >> +                       for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
> >> +                                       c > 0; c--) {
> >> +                               p = find_next_bit(lgpu_unused, capacity, p);
> >> +                               if (p < capacity) {
> >> +                                       clear_bit(p, lgpu_unused);
> >> +                                       set_bit(p, lgpu_by_weight);
> >> +                               }
> >> +                       }
> >> +
> >> +               }
> >> +
> >> +               drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
> >> +                               lgpu_by_weight, child);
> >> +       }
> >> +}
> >> +
> >> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
> >> +{
> >> +       int capacity = dev->drmcg_props.lgpu_capacity;
> >> +       int minor = dev->primary->index;
> >> +       struct drmcg_device_resource *ddr;
> >> +       struct cgroup_subsys_state *pos;
> >> +       struct drmcg *drmcg;
> >> +
> >> +       if (root_drmcg == NULL) {
> >> +               WARN_ON(root_drmcg == NULL);
> >> +               return;
> >> +       }
> >> +
> >> +       rcu_read_lock();
> >> +
> >> +       /* process the entire cgroup tree from root to simplify the algorithm */
> >> +       drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
> >> +                       dev->drmcg_props.lgpu_slots, root_drmcg);
> >> +
> >> +       /* apply changes to effective only if there is a change */
> >> +       css_for_each_descendant_pre(pos, &root_drmcg->css) {
> >> +               drmcg = css_to_drmcg(pos);
> >> +               ddr = drmcg->dev_resources[minor];
> >> +
> >> +               if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
> >> +                       bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
> >> +                       ddr->lgpu_count_eff =
> >> +                               bitmap_weight(ddr->lgpu_eff, capacity);
> >> +               }
> >> +       }
> >> +       rcu_read_unlock();
> >> +}
> >> +
> >> +static void drmcg_apply_effective(enum drmcg_res_type type,
> >> +               struct drm_device *dev, struct drmcg *changed_drmcg)
> >> +{
> >> +       switch (type) {
> >> +       case DRMCG_TYPE_LGPU:
> >> +               drmcg_apply_effective_lgpu(dev);
> >> +               break;
> >> +       default:
> >> +               break;
> >> +       }
> >> +}
> >> +
> >>  /**
> >>   * drmcg_register_dev - register a DRM device for usage in drm cgroup
> >>   * @dev: DRM device
> >> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
> >>         {
> >>                 dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
> >>
> >> +               WARN_ON(dev->drmcg_props.lgpu_capacity !=
> >> +                               bitmap_weight(dev->drmcg_props.lgpu_slots,
> >> +                                       MAX_DRMCG_LGPU_CAPACITY));
> >> +
> >>                 drmcg_update_cg_tree(dev);
> >> +
> >> +               drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
> >>         }
> >>         mutex_unlock(&drmcg_mutex);
> >>  }
> >> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
> >>  }
> >>
> >>  static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >> -               struct seq_file *sf, enum drmcg_res_type type)
> >> +               struct seq_file *sf, enum drmcg_res_type type,
> >> +               struct drm_device *dev)
> >>  {
> >>         if (ddr == NULL) {
> >>                 seq_puts(sf, "\n");
> >> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >>         case DRMCG_TYPE_BO_PEAK:
> >>                 seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
> >>                 break;
> >> +       case DRMCG_TYPE_LGPU:
> >> +               seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
> >> +                               LGPU_LIMITS_NAME_WEIGHT,
> >> +                               ddr->lgpu_weight_cfg,
> >> +                               LGPU_LIMITS_NAME_COUNT,
> >> +                               bitmap_weight(ddr->lgpu_cfg,
> >> +                                       dev->drmcg_props.lgpu_capacity),
> >> +                               LGPU_LIMITS_NAME_LIST,
> >> +                               dev->drmcg_props.lgpu_capacity,
> >> +                               ddr->lgpu_cfg);
> >> +               break;
> >> +       case DRMCG_TYPE_LGPU_EFF:
> >> +               seq_printf(sf, "%s=%lld %s=%*pbl\n",
> >> +                               LGPU_LIMITS_NAME_COUNT,
> >> +                               ddr->lgpu_count_eff,
> >> +                               LGPU_LIMITS_NAME_LIST,
> >> +                               dev->drmcg_props.lgpu_capacity,
> >> +                               ddr->lgpu_eff);
> >> +               break;
> >>         default:
> >>                 seq_puts(sf, "\n");
> >>                 break;
> >> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
> >>                 seq_printf(sf, "%lld\n",
> >>                         props->bo_limits_peak_allocated_default);
> >>                 break;
> >> +       case DRMCG_TYPE_LGPU:
> >> +               seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
> >> +                               LGPU_LIMITS_NAME_WEIGHT,
> >> +                               CGROUP_WEIGHT_DFL,
> >> +                               LGPU_LIMITS_NAME_COUNT,
> >> +                               bitmap_weight(props->lgpu_slots,
> >> +                                       props->lgpu_capacity),
> >> +                               LGPU_LIMITS_NAME_LIST,
> >> +                               props->lgpu_capacity,
> >> +                               props->lgpu_slots);
> >> +               break;
> >>         default:
> >>                 seq_puts(sf, "\n");
> >>                 break;
> >> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
> >>                 drmcg_print_stats(ddr, sf, type);
> >>                 break;
> >>         case DRMCG_FTYPE_LIMIT:
> >> -               drmcg_print_limits(ddr, sf, type);
> >> +               drmcg_print_limits(ddr, sf, type, minor->dev);
> >>                 break;
> >>         case DRMCG_FTYPE_DEFAULT:
> >>                 drmcg_print_default(&minor->dev->drmcg_props, sf, type);
> >> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
> >>         return rc;
> >>  }
> >>
> >> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >> +               struct drm_device *dev, char *attrs)
> >> +{
> >> +       DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> +       DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> +       enum drmcg_res_type type =
> >> +               DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >> +       struct drmcg *drmcg = css_to_drmcg(of_css(of));
> >> +       struct drmcg_props *props = &dev->drmcg_props;
> >> +       char *cft_name = of_cft(of)->name;
> >> +       int minor = dev->primary->index;
> >> +       char *nested = strstrip(attrs);
> >> +       struct drmcg_device_resource *ddr =
> >> +               drmcg->dev_resources[minor];
> >> +       char *attr;
> >> +       char sname[256];
> >> +       char sval[256];
> >> +       s64 val;
> >> +       int rc;
> >> +
> >> +       while (nested != NULL) {
> >> +               attr = strsep(&nested, " ");
> >> +
> >> +               if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
> >> +                       continue;
> >> +
> >> +               switch (type) {
> >> +               case DRMCG_TYPE_LGPU:
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> >> +                               strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
> >> +                               strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
> >> +                               continue;
> >> +
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
> >> +                                       (!strcmp("max", sval) ||
> >> +                                       !strcmp("default", sval))) {
> >> +                               bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
> >> +                                               props->lgpu_capacity);
> >> +
> >> +                               continue;
> >> +                       }
> >> +
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
> >> +                               rc = drmcg_process_limit_s64_val(sval,
> >> +                                       false, CGROUP_WEIGHT_DFL,
> >> +                                       CGROUP_WEIGHT_MAX, &val);
> >> +
> >> +                               if (rc || val < CGROUP_WEIGHT_MIN ||
> >> +                                               val > CGROUP_WEIGHT_MAX) {
> >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> +                                                       minor);
> >> +                                       continue;
> >> +                               }
> >> +
> >> +                               ddr->lgpu_weight_cfg = val;
> >> +                               continue;
> >> +                       }
> >> +
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> >> +                               rc = drmcg_process_limit_s64_val(sval,
> >> +                                       false, props->lgpu_capacity,
> >> +                                       props->lgpu_capacity, &val);
> >> +
> >> +                               if (rc || val < 0) {
> >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> +                                                       minor);
> >> +                                       continue;
> >> +                               }
> >> +
> >> +                               bitmap_zero(tmp_bitmap,
> >> +                                               MAX_DRMCG_LGPU_CAPACITY);
> >> +                               bitmap_set(tmp_bitmap, 0, val);
> >> +                       }
> >> +
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> >> +                               rc = bitmap_parselist(sval, tmp_bitmap,
> >> +                                               MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> +                               if (rc) {
> >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> +                                                       minor);
> >> +                                       continue;
> >> +                               }
> >> +
> >> +                               bitmap_andnot(chk_bitmap, tmp_bitmap,
> >> +                                       props->lgpu_slots,
> >> +                                       MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> +                               /* user setting does not intersect with
> >> +                                * available lgpu */
> >> +                               if (!bitmap_empty(chk_bitmap,
> >> +                                               MAX_DRMCG_LGPU_CAPACITY)) {
> >> +                                       drmcg_pr_cft_err(drmcg, 0, cft_name,
> >> +                                                       minor);
> >> +                                       continue;
> >> +                               }
> >> +                       }
> >> +
> >> +                       bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
> >> +                                       props->lgpu_capacity);
> >> +
> >> +                       break; /* DRMCG_TYPE_LGPU */
> >> +               default:
> >> +                       break;
> >> +               } /* switch (type) */
> >> +       }
> >> +}
> >> +
> >> +
> >>  /**
> >>   * drmcg_limit_write - parse cgroup interface files to obtain user config
> >>   *
> >> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >>
> >>                         ddr->bo_limits_peak_allocated = val;
> >>                         break;
> >> +               case DRMCG_TYPE_LGPU:
> >> +                       drmcg_nested_limit_parse(of, dm->dev, sattr);
> >> +                       break;
> >>                 default:
> >>                         break;
> >>                 }
> >> +
> >> +               drmcg_apply_effective(type, dm->dev, drmcg);
> >> +
> >>                 mutex_unlock(&dm->dev->drmcg_mutex);
> >>
> >>                 mutex_lock(&drmcg_mutex);
> >> @@ -560,12 +838,51 @@ struct cftype files[] = {
> >>                 .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
> >>                                                 DRMCG_FTYPE_STATS),
> >>         },
> >> +       {
> >> +               .name = "lgpu",
> >> +               .seq_show = drmcg_seq_show,
> >> +               .write = drmcg_limit_write,
> >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> +                                               DRMCG_FTYPE_LIMIT),
> >> +       },
> >> +       {
> >> +               .name = "lgpu.default",
> >> +               .seq_show = drmcg_seq_show,
> >> +               .flags = CFTYPE_ONLY_ON_ROOT,
> >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> +                                               DRMCG_FTYPE_DEFAULT),
> >> +       },
> >> +       {
> >> +               .name = "lgpu.effective",
> >> +               .seq_show = drmcg_seq_show,
> >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
> >> +                                               DRMCG_FTYPE_LIMIT),
> >> +       },
> >>         { }     /* terminate */
> >>  };
> >>
> >> +static int drmcg_online_fn(int id, void *ptr, void *data)
> >> +{
> >> +       struct drm_minor *minor = ptr;
> >> +       struct drmcg *drmcg = data;
> >> +
> >> +       if (minor->type != DRM_MINOR_PRIMARY)
> >> +               return 0;
> >> +
> >> +       drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static int drmcg_css_online(struct cgroup_subsys_state *css)
> >> +{
> >> +       return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
> >> +}
> >> +
> >>  struct cgroup_subsys drm_cgrp_subsys = {
> >>         .css_alloc      = drmcg_css_alloc,
> >>         .css_free       = drmcg_css_free,
> >> +       .css_online     = drmcg_css_online,
> >>         .early_init     = false,
> >>         .legacy_cftypes = files,
> >>         .dfl_cftypes    = files,
> >> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
> >>         dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
> >>         dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
> >>
> >> +       dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> >> +       bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >>         drmcg_update_cg_tree(dev);
> >>  }
> >>  EXPORT_SYMBOL(drmcg_device_early_init);
> >> --
> >> 2.25.0
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 17:08     ` Kenny Ho
  2020-02-14 17:48       ` Jason Ekstrand
@ 2020-02-14 18:34       ` Daniel Vetter
  2020-02-14 18:51         ` Kenny Ho
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel Vetter @ 2020-02-14 18:34 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix, jsparks,
	amd-gfx mailing list, lkaplan, Alex Deucher, nirmoy.das,
	Maling list - DRI developers, Jason Ekstrand, Tejun Heo, cgroups,
	juan.zuniga-anaya, Christian König, damon.mcdougall

On Fri, Feb 14, 2020 at 12:08:35PM -0500, Kenny Ho wrote:
> Hi Jason,
> 
> Thanks for the review.
> 
> On Fri, Feb 14, 2020 at 11:44 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
> >
> > Pardon my ignorance but I'm a bit confused by this.  What is a "logical GPU"?  What are we subdividing?  Are we carving up memory?  Compute power?  Both?
> 
> The intention is compute but it is up to the individual drm driver to decide.

I think guidance from Tejun in previos discussions was pretty clear that
he expects cgroups to be both a) standardized and c) sufficient clear
meaning that end-users have a clear understanding of what happens when
they change the resource allocation.

I'm not sure lgpu here, at least as specified, passes either. But I also
don't have much clue, so pulled Jason in - he understands how this all
gets reflected to userspace apis a lot better than me.
-Daniel


> 
> > If it's carving up compute power, what's actually being carved up?  Time?  Execution units/waves/threads?  Even if that's the case, what advantage does it give to have it in terms of a fixed set of lgpus where each cgroup gets to pick a fixed set.  Does affinity matter that much?  Why not just say how many waves the GPU supports and that they have to be allocated in chunks of 16 waves (pulling a number out of thin air) and let the cgroup specify how many waves it wants.
> >
> > Don't get me wrong here.  I'm all for the notion of being able to use cgroups to carve up GPU compute resources.  However, this sounds to me like the most AMD-specific solution possible.  We (Intel) could probably do some sort of carving up as well but we'd likely want to do it with preemption and time-slicing rather than handing out specific EUs.
> 
> This has been discussed in the RFC before
> (https://www.spinics.net/lists/cgroups/msg23469.html.)  As mentioned
> before, the idea of a compute unit is hardly an AMD specific thing as
> it is in the OpenCL standard and part of the architecture of many
> different vendors.  In addition, the interface presented here supports
> Intel's use case.  What you described is what I considered as the
> "anonymous resources" view of the lgpu.  What you/Intel can do, is to
> register your device to drmcg to have 100 lgpu and users can specify
> simply by count.  So if they want to allocate 5% for a cgroup, they
> would set count=5.  Per the documentation in this patch: "Some DRM
> devices may only support lgpu as anonymous resources.  In such case,
> the significance of the position of the set bits in list will be
> ignored."  What Intel does with the user expressed configuration of "5
> out of 100" is entirely up to Intel (time slice if you like, change to
> specific EUs later if you like, or make it driver configurable to
> support both if you like.)
> 
> Regards,
> Kenny
> 
> >
> > On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho@amd.com> wrote:
> >>
> >> drm.lgpu
> >>       A read-write nested-keyed file which exists on all cgroups.
> >>       Each entry is keyed by the DRM device's major:minor.
> >>
> >>       lgpu stands for logical GPU, it is an abstraction used to
> >>       subdivide a physical DRM device for the purpose of resource
> >>       management.  This file stores user configuration while the
> >>       drm.lgpu.effective reflects the actual allocation after
> >>       considering the relationship between the cgroups and their
> >>       configurations.
> >>
> >>       The lgpu is a discrete quantity that is device specific (i.e.
> >>       some DRM devices may have 64 lgpus while others may have 100
> >>       lgpus.)  The lgpu is a single quantity that can be allocated
> >>       in three different ways denoted by the following nested keys.
> >>
> >>         =====     ==============================================
> >>         weight    Allocate by proportion in relationship with
> >>                   active sibling cgroups
> >>         count     Allocate by amount statically, treat lgpu as
> >>                   anonymous resources
> >>         list      Allocate statically, treat lgpu as named
> >>                   resource
> >>         =====     ==============================================
> >>
> >>       For example:
> >>       226:0 weight=100 count=256 list=0-255
> >>       226:1 weight=100 count=4 list=0,2,4,6
> >>       226:2 weight=100 count=32 list=32-63
> >>       226:3 weight=100 count=0 list=
> >>       226:4 weight=500 count=0 list=
> >>
> >>       lgpu is represented by a bitmap and uses the bitmap_parselist
> >>       kernel function so the list key input format is a
> >>       comma-separated list of decimal numbers and ranges.
> >>
> >>       Consecutively set bits are shown as two hyphen-separated decimal
> >>       numbers, the smallest and largest bit numbers set in the range.
> >>       Optionally each range can be postfixed to denote that only parts
> >>       of it should be set.  The range will divided to groups of
> >>       specific size.
> >>       Syntax: range:used_size/group_size
> >>       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >>
> >>       The count key is the hamming weight / hweight of the bitmap.
> >>
> >>       Weight, count and list accept the max and default keywords.
> >>
> >>       Some DRM devices may only support lgpu as anonymous resources.
> >>       In such case, the significance of the position of the set bits
> >>       in list will be ignored.
> >>
> >>       The weight quantity is only in effect when static allocation
> >>       is not used (by setting count=0) for this cgroup.  The weight
> >>       quantity distributes lgpus that are not statically allocated by
> >>       the siblings.  For example, given siblings cgroupA, cgroupB and
> >>       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >>       0-63, no lgpu is available to be distributed by weight.
> >>       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >>       cgroupC will be starved if it tries to allocate by weight.
> >>
> >>       On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >>       has list=16-47, and cgroupC has weight=100 count=0, then 32
> >>       lgpus are available to be distributed evenly between cgroupA
> >>       and cgroupC.  In drm.lgpu.effective, cgroupA will have
> >>       list=0-15 and cgroupC will have list=48-63.
> >>
> >>       This lgpu resource supports the 'allocation' and 'weight'
> >>       resource distribution model.
> >>
> >> drm.lgpu.effective
> >>       A read-only nested-keyed file which exists on all cgroups.
> >>       Each entry is keyed by the DRM device's major:minor.
> >>
> >>       lgpu stands for logical GPU, it is an abstraction used to
> >>       subdivide a physical DRM device for the purpose of resource
> >>       management.  This file reflects the actual allocation after
> >>       considering the relationship between the cgroups and their
> >>       configurations in drm.lgpu.
> >>
> >> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
> >> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> >> ---
> >>  Documentation/admin-guide/cgroup-v2.rst |  80 ++++++
> >>  include/drm/drm_cgroup.h                |   3 +
> >>  include/linux/cgroup_drm.h              |  22 ++
> >>  kernel/cgroup/drm.c                     | 324 +++++++++++++++++++++++-
> >>  4 files changed, 427 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >> index ce5dc027366a..d8a41956e5c7 100644
> >> --- a/Documentation/admin-guide/cgroup-v2.rst
> >> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >> @@ -2120,6 +2120,86 @@ DRM Interface Files
> >>         Set largest allocation for /dev/dri/card1 to 4MB
> >>         echo "226:1 4m" > drm.buffer.peak.max
> >>
> >> +  drm.lgpu
> >> +       A read-write nested-keyed file which exists on all cgroups.
> >> +       Each entry is keyed by the DRM device's major:minor.
> >> +
> >> +       lgpu stands for logical GPU, it is an abstraction used to
> >> +       subdivide a physical DRM device for the purpose of resource
> >> +       management.  This file stores user configuration while the
> >> +        drm.lgpu.effective reflects the actual allocation after
> >> +        considering the relationship between the cgroups and their
> >> +        configurations.
> >> +
> >> +       The lgpu is a discrete quantity that is device specific (i.e.
> >> +       some DRM devices may have 64 lgpus while others may have 100
> >> +       lgpus.)  The lgpu is a single quantity that can be allocated
> >> +        in three different ways denoted by the following nested keys.
> >> +
> >> +         =====     ==============================================
> >> +         weight    Allocate by proportion in relationship with
> >> +                    active sibling cgroups
> >> +         count     Allocate by amount statically, treat lgpu as
> >> +                    anonymous resources
> >> +         list      Allocate statically, treat lgpu as named
> >> +                    resource
> >> +         =====     ==============================================
> >> +
> >> +       For example:
> >> +       226:0 weight=100 count=256 list=0-255
> >> +       226:1 weight=100 count=4 list=0,2,4,6
> >> +       226:2 weight=100 count=32 list=32-63
> >> +       226:3 weight=100 count=0 list=
> >> +       226:4 weight=500 count=0 list=
> >> +
> >> +       lgpu is represented by a bitmap and uses the bitmap_parselist
> >> +       kernel function so the list key input format is a
> >> +       comma-separated list of decimal numbers and ranges.
> >> +
> >> +       Consecutively set bits are shown as two hyphen-separated decimal
> >> +       numbers, the smallest and largest bit numbers set in the range.
> >> +       Optionally each range can be postfixed to denote that only parts
> >> +       of it should be set.  The range will divided to groups of
> >> +       specific size.
> >> +       Syntax: range:used_size/group_size
> >> +       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >> +
> >> +       The count key is the hamming weight / hweight of the bitmap.
> >> +
> >> +       Weight, count and list accept the max and default keywords.
> >> +
> >> +       Some DRM devices may only support lgpu as anonymous resources.
> >> +       In such case, the significance of the position of the set bits
> >> +       in list will be ignored.
> >> +
> >> +       The weight quantity is only in effect when static allocation
> >> +       is not used (by setting count=0) for this cgroup.  The weight
> >> +       quantity distributes lgpus that are not statically allocated by
> >> +       the siblings.  For example, given siblings cgroupA, cgroupB and
> >> +       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >> +       0-63, no lgpu is available to be distributed by weight.
> >> +       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >> +       cgroupC will be starved if it tries to allocate by weight.
> >> +
> >> +       On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >> +       has list=16-47, and cgroupC has weight=100 count=0, then 32
> >> +       lgpus are available to be distributed evenly between cgroupA
> >> +       and cgroupC.  In drm.lgpu.effective, cgroupA will have
> >> +       list=0-15 and cgroupC will have list=48-63.
> >> +
> >> +       This lgpu resource supports the 'allocation' and 'weight'
> >> +       resource distribution model.
> >> +
> >> +  drm.lgpu.effective
> >> +       A read-only nested-keyed file which exists on all cgroups.
> >> +       Each entry is keyed by the DRM device's major:minor.
> >> +
> >> +       lgpu stands for logical GPU, it is an abstraction used to
> >> +       subdivide a physical DRM device for the purpose of resource
> >> +       management.  This file reflects the actual allocation after
> >> +        considering the relationship between the cgroups and their
> >> +        configurations in drm.lgpu.
> >> +
> >>  GEM Buffer Ownership
> >>  ~~~~~~~~~~~~~~~~~~~~
> >>
> >> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> >> index 2b41d4d22e33..619a110cc748 100644
> >> --- a/include/drm/drm_cgroup.h
> >> +++ b/include/drm/drm_cgroup.h
> >> @@ -17,6 +17,9 @@ struct drmcg_props {
> >>
> >>         s64                     bo_limits_total_allocated_default;
> >>         s64                     bo_limits_peak_allocated_default;
> >> +
> >> +       int                     lgpu_capacity;
> >> +       DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >>  };
> >>
> >>  void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
> >> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> >> index eae400f3d9b4..bb09704e7f71 100644
> >> --- a/include/linux/cgroup_drm.h
> >> +++ b/include/linux/cgroup_drm.h
> >> @@ -11,10 +11,14 @@
> >>  /* limit defined per the way drm_minor_alloc operates */
> >>  #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >>
> >> +#define MAX_DRMCG_LGPU_CAPACITY 256
> >> +
> >>  enum drmcg_res_type {
> >>         DRMCG_TYPE_BO_TOTAL,
> >>         DRMCG_TYPE_BO_PEAK,
> >>         DRMCG_TYPE_BO_COUNT,
> >> +       DRMCG_TYPE_LGPU,
> >> +       DRMCG_TYPE_LGPU_EFF,
> >>         __DRMCG_TYPE_LAST,
> >>  };
> >>
> >> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
> >>         s64                     bo_limits_peak_allocated;
> >>
> >>         s64                     bo_stats_count_allocated;
> >> +
> >> +       /**
> >> +        * Logical GPU
> >> +        *
> >> +        * *_cfg are properties configured by users
> >> +        * *_eff are the effective properties being applied to the hardware
> >> +         * *_stg is used to calculate _eff before applying to _eff
> >> +        * after considering the entire hierarchy
> >> +        */
> >> +       DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
> >> +       /* user configurations */
> >> +       s64                     lgpu_weight_cfg;
> >> +       DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
> >> +       /* effective lgpu for the cgroup after considering
> >> +        * relationship with other cgroup
> >> +        */
> >> +       s64                     lgpu_count_eff;
> >> +       DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
> >>  };
> >>
> >>  /**
> >> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> >> index 5fcbbc13fa1c..a4e88a3704bb 100644
> >> --- a/kernel/cgroup/drm.c
> >> +++ b/kernel/cgroup/drm.c
> >> @@ -9,6 +9,7 @@
> >>  #include <linux/seq_file.h>
> >>  #include <linux/mutex.h>
> >>  #include <linux/kernel.h>
> >> +#include <linux/bitmap.h>
> >>  #include <linux/cgroup_drm.h>
> >>  #include <drm/drm_file.h>
> >>  #include <drm/drm_drv.h>
> >> @@ -41,6 +42,10 @@ enum drmcg_file_type {
> >>         DRMCG_FTYPE_DEFAULT,
> >>  };
> >>
> >> +#define LGPU_LIMITS_NAME_LIST "list"
> >> +#define LGPU_LIMITS_NAME_COUNT "count"
> >> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
> >> +
> >>  /**
> >>   * drmcg_bind - Bind DRM subsystem to cgroup subsystem
> >>   * @acq_dm: function pointer to the drm_minor_acquire function
> >> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >>         ddr->bo_limits_peak_allocated =
> >>                 dev->drmcg_props.bo_limits_peak_allocated_default;
> >>
> >> +       bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
> >> +                       MAX_DRMCG_LGPU_CAPACITY);
> >> +       bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
> >> +                       MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> +       ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
> >> +
> >>         return 0;
> >>  }
> >>
> >> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >>         mutex_unlock(&cgroup_mutex);
> >>  }
> >>
> >> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
> >> +               const unsigned long *free_static,
> >> +               const unsigned long *free_weighted,
> >> +               struct drmcg *parent_drmcg)
> >> +{
> >> +       int capacity = dev->drmcg_props.lgpu_capacity;
> >> +       DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
> >> +       DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
> >> +       struct drmcg_device_resource *parent_ddr;
> >> +       struct drmcg_device_resource *ddr;
> >> +       int minor = dev->primary->index;
> >> +       struct cgroup_subsys_state *pos;
> >> +       struct drmcg *child;
> >> +       s64 weight_sum = 0;
> >> +       s64 unused;
> >> +
> >> +       parent_ddr = parent_drmcg->dev_resources[minor];
> >> +
> >> +       if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
> >> +               /* no static cfg, use weight for calculating the effective */
> >> +               bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
> >> +       else
> >> +               /* lgpu statically configured, use the overlap as effective */
> >> +               bitmap_and(parent_ddr->lgpu_stg, free_static,
> >> +                               parent_ddr->lgpu_cfg, capacity);
> >> +
> >> +       /* calculate lgpu available for distribution by weight for children */
> >> +       bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
> >> +       css_for_each_child(pos, &parent_drmcg->css) {
> >> +               child = css_to_drmcg(pos);
> >> +               ddr = child->dev_resources[minor];
> >> +
> >> +               if (bitmap_empty(ddr->lgpu_cfg, capacity))
> >> +                       /* no static allocation, participate in weight dist */
> >> +                       weight_sum += ddr->lgpu_weight_cfg;
> >> +               else
> >> +                       /* take out statically allocated lgpu by siblings */
> >> +                       bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
> >> +                                       capacity);
> >> +       }
> >> +
> >> +       unused = bitmap_weight(lgpu_unused, capacity);
> >> +
> >> +       css_for_each_child(pos, &parent_drmcg->css) {
> >> +               child = css_to_drmcg(pos);
> >> +               ddr = child->dev_resources[minor];
> >> +
> >> +               bitmap_zero(lgpu_by_weight, capacity);
> >> +               /* no static allocation, participate in weight distribution */
> >> +               if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
> >> +                       int c;
> >> +                       int p = 0;
> >> +
> >> +                       for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
> >> +                                       c > 0; c--) {
> >> +                               p = find_next_bit(lgpu_unused, capacity, p);
> >> +                               if (p < capacity) {
> >> +                                       clear_bit(p, lgpu_unused);
> >> +                                       set_bit(p, lgpu_by_weight);
> >> +                               }
> >> +                       }
> >> +
> >> +               }
> >> +
> >> +               drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
> >> +                               lgpu_by_weight, child);
> >> +       }
> >> +}
> >> +
> >> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
> >> +{
> >> +       int capacity = dev->drmcg_props.lgpu_capacity;
> >> +       int minor = dev->primary->index;
> >> +       struct drmcg_device_resource *ddr;
> >> +       struct cgroup_subsys_state *pos;
> >> +       struct drmcg *drmcg;
> >> +
> >> +       if (root_drmcg == NULL) {
> >> +               WARN_ON(root_drmcg == NULL);
> >> +               return;
> >> +       }
> >> +
> >> +       rcu_read_lock();
> >> +
> >> +       /* process the entire cgroup tree from root to simplify the algorithm */
> >> +       drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
> >> +                       dev->drmcg_props.lgpu_slots, root_drmcg);
> >> +
> >> +       /* apply changes to effective only if there is a change */
> >> +       css_for_each_descendant_pre(pos, &root_drmcg->css) {
> >> +               drmcg = css_to_drmcg(pos);
> >> +               ddr = drmcg->dev_resources[minor];
> >> +
> >> +               if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
> >> +                       bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
> >> +                       ddr->lgpu_count_eff =
> >> +                               bitmap_weight(ddr->lgpu_eff, capacity);
> >> +               }
> >> +       }
> >> +       rcu_read_unlock();
> >> +}
> >> +
> >> +static void drmcg_apply_effective(enum drmcg_res_type type,
> >> +               struct drm_device *dev, struct drmcg *changed_drmcg)
> >> +{
> >> +       switch (type) {
> >> +       case DRMCG_TYPE_LGPU:
> >> +               drmcg_apply_effective_lgpu(dev);
> >> +               break;
> >> +       default:
> >> +               break;
> >> +       }
> >> +}
> >> +
> >>  /**
> >>   * drmcg_register_dev - register a DRM device for usage in drm cgroup
> >>   * @dev: DRM device
> >> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
> >>         {
> >>                 dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
> >>
> >> +               WARN_ON(dev->drmcg_props.lgpu_capacity !=
> >> +                               bitmap_weight(dev->drmcg_props.lgpu_slots,
> >> +                                       MAX_DRMCG_LGPU_CAPACITY));
> >> +
> >>                 drmcg_update_cg_tree(dev);
> >> +
> >> +               drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
> >>         }
> >>         mutex_unlock(&drmcg_mutex);
> >>  }
> >> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
> >>  }
> >>
> >>  static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >> -               struct seq_file *sf, enum drmcg_res_type type)
> >> +               struct seq_file *sf, enum drmcg_res_type type,
> >> +               struct drm_device *dev)
> >>  {
> >>         if (ddr == NULL) {
> >>                 seq_puts(sf, "\n");
> >> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >>         case DRMCG_TYPE_BO_PEAK:
> >>                 seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
> >>                 break;
> >> +       case DRMCG_TYPE_LGPU:
> >> +               seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
> >> +                               LGPU_LIMITS_NAME_WEIGHT,
> >> +                               ddr->lgpu_weight_cfg,
> >> +                               LGPU_LIMITS_NAME_COUNT,
> >> +                               bitmap_weight(ddr->lgpu_cfg,
> >> +                                       dev->drmcg_props.lgpu_capacity),
> >> +                               LGPU_LIMITS_NAME_LIST,
> >> +                               dev->drmcg_props.lgpu_capacity,
> >> +                               ddr->lgpu_cfg);
> >> +               break;
> >> +       case DRMCG_TYPE_LGPU_EFF:
> >> +               seq_printf(sf, "%s=%lld %s=%*pbl\n",
> >> +                               LGPU_LIMITS_NAME_COUNT,
> >> +                               ddr->lgpu_count_eff,
> >> +                               LGPU_LIMITS_NAME_LIST,
> >> +                               dev->drmcg_props.lgpu_capacity,
> >> +                               ddr->lgpu_eff);
> >> +               break;
> >>         default:
> >>                 seq_puts(sf, "\n");
> >>                 break;
> >> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
> >>                 seq_printf(sf, "%lld\n",
> >>                         props->bo_limits_peak_allocated_default);
> >>                 break;
> >> +       case DRMCG_TYPE_LGPU:
> >> +               seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
> >> +                               LGPU_LIMITS_NAME_WEIGHT,
> >> +                               CGROUP_WEIGHT_DFL,
> >> +                               LGPU_LIMITS_NAME_COUNT,
> >> +                               bitmap_weight(props->lgpu_slots,
> >> +                                       props->lgpu_capacity),
> >> +                               LGPU_LIMITS_NAME_LIST,
> >> +                               props->lgpu_capacity,
> >> +                               props->lgpu_slots);
> >> +               break;
> >>         default:
> >>                 seq_puts(sf, "\n");
> >>                 break;
> >> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
> >>                 drmcg_print_stats(ddr, sf, type);
> >>                 break;
> >>         case DRMCG_FTYPE_LIMIT:
> >> -               drmcg_print_limits(ddr, sf, type);
> >> +               drmcg_print_limits(ddr, sf, type, minor->dev);
> >>                 break;
> >>         case DRMCG_FTYPE_DEFAULT:
> >>                 drmcg_print_default(&minor->dev->drmcg_props, sf, type);
> >> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
> >>         return rc;
> >>  }
> >>
> >> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >> +               struct drm_device *dev, char *attrs)
> >> +{
> >> +       DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> +       DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> +       enum drmcg_res_type type =
> >> +               DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >> +       struct drmcg *drmcg = css_to_drmcg(of_css(of));
> >> +       struct drmcg_props *props = &dev->drmcg_props;
> >> +       char *cft_name = of_cft(of)->name;
> >> +       int minor = dev->primary->index;
> >> +       char *nested = strstrip(attrs);
> >> +       struct drmcg_device_resource *ddr =
> >> +               drmcg->dev_resources[minor];
> >> +       char *attr;
> >> +       char sname[256];
> >> +       char sval[256];
> >> +       s64 val;
> >> +       int rc;
> >> +
> >> +       while (nested != NULL) {
> >> +               attr = strsep(&nested, " ");
> >> +
> >> +               if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
> >> +                       continue;
> >> +
> >> +               switch (type) {
> >> +               case DRMCG_TYPE_LGPU:
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> >> +                               strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
> >> +                               strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
> >> +                               continue;
> >> +
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
> >> +                                       (!strcmp("max", sval) ||
> >> +                                       !strcmp("default", sval))) {
> >> +                               bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
> >> +                                               props->lgpu_capacity);
> >> +
> >> +                               continue;
> >> +                       }
> >> +
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
> >> +                               rc = drmcg_process_limit_s64_val(sval,
> >> +                                       false, CGROUP_WEIGHT_DFL,
> >> +                                       CGROUP_WEIGHT_MAX, &val);
> >> +
> >> +                               if (rc || val < CGROUP_WEIGHT_MIN ||
> >> +                                               val > CGROUP_WEIGHT_MAX) {
> >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> +                                                       minor);
> >> +                                       continue;
> >> +                               }
> >> +
> >> +                               ddr->lgpu_weight_cfg = val;
> >> +                               continue;
> >> +                       }
> >> +
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> >> +                               rc = drmcg_process_limit_s64_val(sval,
> >> +                                       false, props->lgpu_capacity,
> >> +                                       props->lgpu_capacity, &val);
> >> +
> >> +                               if (rc || val < 0) {
> >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> +                                                       minor);
> >> +                                       continue;
> >> +                               }
> >> +
> >> +                               bitmap_zero(tmp_bitmap,
> >> +                                               MAX_DRMCG_LGPU_CAPACITY);
> >> +                               bitmap_set(tmp_bitmap, 0, val);
> >> +                       }
> >> +
> >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> >> +                               rc = bitmap_parselist(sval, tmp_bitmap,
> >> +                                               MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> +                               if (rc) {
> >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> +                                                       minor);
> >> +                                       continue;
> >> +                               }
> >> +
> >> +                               bitmap_andnot(chk_bitmap, tmp_bitmap,
> >> +                                       props->lgpu_slots,
> >> +                                       MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> +                               /* user setting does not intersect with
> >> +                                * available lgpu */
> >> +                               if (!bitmap_empty(chk_bitmap,
> >> +                                               MAX_DRMCG_LGPU_CAPACITY)) {
> >> +                                       drmcg_pr_cft_err(drmcg, 0, cft_name,
> >> +                                                       minor);
> >> +                                       continue;
> >> +                               }
> >> +                       }
> >> +
> >> +                       bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
> >> +                                       props->lgpu_capacity);
> >> +
> >> +                       break; /* DRMCG_TYPE_LGPU */
> >> +               default:
> >> +                       break;
> >> +               } /* switch (type) */
> >> +       }
> >> +}
> >> +
> >> +
> >>  /**
> >>   * drmcg_limit_write - parse cgroup interface files to obtain user config
> >>   *
> >> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >>
> >>                         ddr->bo_limits_peak_allocated = val;
> >>                         break;
> >> +               case DRMCG_TYPE_LGPU:
> >> +                       drmcg_nested_limit_parse(of, dm->dev, sattr);
> >> +                       break;
> >>                 default:
> >>                         break;
> >>                 }
> >> +
> >> +               drmcg_apply_effective(type, dm->dev, drmcg);
> >> +
> >>                 mutex_unlock(&dm->dev->drmcg_mutex);
> >>
> >>                 mutex_lock(&drmcg_mutex);
> >> @@ -560,12 +838,51 @@ struct cftype files[] = {
> >>                 .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
> >>                                                 DRMCG_FTYPE_STATS),
> >>         },
> >> +       {
> >> +               .name = "lgpu",
> >> +               .seq_show = drmcg_seq_show,
> >> +               .write = drmcg_limit_write,
> >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> +                                               DRMCG_FTYPE_LIMIT),
> >> +       },
> >> +       {
> >> +               .name = "lgpu.default",
> >> +               .seq_show = drmcg_seq_show,
> >> +               .flags = CFTYPE_ONLY_ON_ROOT,
> >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> +                                               DRMCG_FTYPE_DEFAULT),
> >> +       },
> >> +       {
> >> +               .name = "lgpu.effective",
> >> +               .seq_show = drmcg_seq_show,
> >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
> >> +                                               DRMCG_FTYPE_LIMIT),
> >> +       },
> >>         { }     /* terminate */
> >>  };
> >>
> >> +static int drmcg_online_fn(int id, void *ptr, void *data)
> >> +{
> >> +       struct drm_minor *minor = ptr;
> >> +       struct drmcg *drmcg = data;
> >> +
> >> +       if (minor->type != DRM_MINOR_PRIMARY)
> >> +               return 0;
> >> +
> >> +       drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static int drmcg_css_online(struct cgroup_subsys_state *css)
> >> +{
> >> +       return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
> >> +}
> >> +
> >>  struct cgroup_subsys drm_cgrp_subsys = {
> >>         .css_alloc      = drmcg_css_alloc,
> >>         .css_free       = drmcg_css_free,
> >> +       .css_online     = drmcg_css_online,
> >>         .early_init     = false,
> >>         .legacy_cftypes = files,
> >>         .dfl_cftypes    = files,
> >> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
> >>         dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
> >>         dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
> >>
> >> +       dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> >> +       bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >>         drmcg_update_cg_tree(dev);
> >>  }
> >>  EXPORT_SYMBOL(drmcg_device_early_init);
> >> --
> >> 2.25.0
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 18:34       ` Daniel Vetter
@ 2020-02-14 18:51         ` Kenny Ho
  2020-02-14 19:17           ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 18:51 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: juan.zuniga-anaya, Greathouse, Joseph, Kenny Ho, Kuehling, Felix,
	jsparks, amd-gfx mailing list, lkaplan, Alex Deucher, nirmoy.das,
	Maling list - DRI developers, Jason Ekstrand, Tejun Heo, cgroups,
	Christian König, damon.mcdougall

On Fri, Feb 14, 2020 at 1:34 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> I think guidance from Tejun in previos discussions was pretty clear that
> he expects cgroups to be both a) standardized and c) sufficient clear
> meaning that end-users have a clear understanding of what happens when
> they change the resource allocation.
>
> I'm not sure lgpu here, at least as specified, passes either.

I disagree (at least on the characterization of the feedback
provided.)  I believe this series satisfied the sprite of Tejun's
guidance so far (the weight knob for lgpu, for example, was
specifically implemented base on his input.)  But, I will let Tejun
speak for himself after he considered the implementation in detail.

Regards,
Kenny


> But I also
> don't have much clue, so pulled Jason in - he understands how this all
> gets reflected to userspace apis a lot better than me.
> -Daniel
>
>
> >
> > > If it's carving up compute power, what's actually being carved up?  Time?  Execution units/waves/threads?  Even if that's the case, what advantage does it give to have it in terms of a fixed set of lgpus where each cgroup gets to pick a fixed set.  Does affinity matter that much?  Why not just say how many waves the GPU supports and that they have to be allocated in chunks of 16 waves (pulling a number out of thin air) and let the cgroup specify how many waves it wants.
> > >
> > > Don't get me wrong here.  I'm all for the notion of being able to use cgroups to carve up GPU compute resources.  However, this sounds to me like the most AMD-specific solution possible.  We (Intel) could probably do some sort of carving up as well but we'd likely want to do it with preemption and time-slicing rather than handing out specific EUs.
> >
> > This has been discussed in the RFC before
> > (https://www.spinics.net/lists/cgroups/msg23469.html.)  As mentioned
> > before, the idea of a compute unit is hardly an AMD specific thing as
> > it is in the OpenCL standard and part of the architecture of many
> > different vendors.  In addition, the interface presented here supports
> > Intel's use case.  What you described is what I considered as the
> > "anonymous resources" view of the lgpu.  What you/Intel can do, is to
> > register your device to drmcg to have 100 lgpu and users can specify
> > simply by count.  So if they want to allocate 5% for a cgroup, they
> > would set count=5.  Per the documentation in this patch: "Some DRM
> > devices may only support lgpu as anonymous resources.  In such case,
> > the significance of the position of the set bits in list will be
> > ignored."  What Intel does with the user expressed configuration of "5
> > out of 100" is entirely up to Intel (time slice if you like, change to
> > specific EUs later if you like, or make it driver configurable to
> > support both if you like.)
> >
> > Regards,
> > Kenny
> >
> > >
> > > On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho@amd.com> wrote:
> > >>
> > >> drm.lgpu
> > >>       A read-write nested-keyed file which exists on all cgroups.
> > >>       Each entry is keyed by the DRM device's major:minor.
> > >>
> > >>       lgpu stands for logical GPU, it is an abstraction used to
> > >>       subdivide a physical DRM device for the purpose of resource
> > >>       management.  This file stores user configuration while the
> > >>       drm.lgpu.effective reflects the actual allocation after
> > >>       considering the relationship between the cgroups and their
> > >>       configurations.
> > >>
> > >>       The lgpu is a discrete quantity that is device specific (i.e.
> > >>       some DRM devices may have 64 lgpus while others may have 100
> > >>       lgpus.)  The lgpu is a single quantity that can be allocated
> > >>       in three different ways denoted by the following nested keys.
> > >>
> > >>         =====     ==============================================
> > >>         weight    Allocate by proportion in relationship with
> > >>                   active sibling cgroups
> > >>         count     Allocate by amount statically, treat lgpu as
> > >>                   anonymous resources
> > >>         list      Allocate statically, treat lgpu as named
> > >>                   resource
> > >>         =====     ==============================================
> > >>
> > >>       For example:
> > >>       226:0 weight=100 count=256 list=0-255
> > >>       226:1 weight=100 count=4 list=0,2,4,6
> > >>       226:2 weight=100 count=32 list=32-63
> > >>       226:3 weight=100 count=0 list=
> > >>       226:4 weight=500 count=0 list=
> > >>
> > >>       lgpu is represented by a bitmap and uses the bitmap_parselist
> > >>       kernel function so the list key input format is a
> > >>       comma-separated list of decimal numbers and ranges.
> > >>
> > >>       Consecutively set bits are shown as two hyphen-separated decimal
> > >>       numbers, the smallest and largest bit numbers set in the range.
> > >>       Optionally each range can be postfixed to denote that only parts
> > >>       of it should be set.  The range will divided to groups of
> > >>       specific size.
> > >>       Syntax: range:used_size/group_size
> > >>       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > >>
> > >>       The count key is the hamming weight / hweight of the bitmap.
> > >>
> > >>       Weight, count and list accept the max and default keywords.
> > >>
> > >>       Some DRM devices may only support lgpu as anonymous resources.
> > >>       In such case, the significance of the position of the set bits
> > >>       in list will be ignored.
> > >>
> > >>       The weight quantity is only in effect when static allocation
> > >>       is not used (by setting count=0) for this cgroup.  The weight
> > >>       quantity distributes lgpus that are not statically allocated by
> > >>       the siblings.  For example, given siblings cgroupA, cgroupB and
> > >>       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> > >>       0-63, no lgpu is available to be distributed by weight.
> > >>       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> > >>       cgroupC will be starved if it tries to allocate by weight.
> > >>
> > >>       On the other hand, if cgroupA has weight=100 count=0, cgroupB
> > >>       has list=16-47, and cgroupC has weight=100 count=0, then 32
> > >>       lgpus are available to be distributed evenly between cgroupA
> > >>       and cgroupC.  In drm.lgpu.effective, cgroupA will have
> > >>       list=0-15 and cgroupC will have list=48-63.
> > >>
> > >>       This lgpu resource supports the 'allocation' and 'weight'
> > >>       resource distribution model.
> > >>
> > >> drm.lgpu.effective
> > >>       A read-only nested-keyed file which exists on all cgroups.
> > >>       Each entry is keyed by the DRM device's major:minor.
> > >>
> > >>       lgpu stands for logical GPU, it is an abstraction used to
> > >>       subdivide a physical DRM device for the purpose of resource
> > >>       management.  This file reflects the actual allocation after
> > >>       considering the relationship between the cgroups and their
> > >>       configurations in drm.lgpu.
> > >>
> > >> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
> > >> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > >> ---
> > >>  Documentation/admin-guide/cgroup-v2.rst |  80 ++++++
> > >>  include/drm/drm_cgroup.h                |   3 +
> > >>  include/linux/cgroup_drm.h              |  22 ++
> > >>  kernel/cgroup/drm.c                     | 324 +++++++++++++++++++++++-
> > >>  4 files changed, 427 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > >> index ce5dc027366a..d8a41956e5c7 100644
> > >> --- a/Documentation/admin-guide/cgroup-v2.rst
> > >> +++ b/Documentation/admin-guide/cgroup-v2.rst
> > >> @@ -2120,6 +2120,86 @@ DRM Interface Files
> > >>         Set largest allocation for /dev/dri/card1 to 4MB
> > >>         echo "226:1 4m" > drm.buffer.peak.max
> > >>
> > >> +  drm.lgpu
> > >> +       A read-write nested-keyed file which exists on all cgroups.
> > >> +       Each entry is keyed by the DRM device's major:minor.
> > >> +
> > >> +       lgpu stands for logical GPU, it is an abstraction used to
> > >> +       subdivide a physical DRM device for the purpose of resource
> > >> +       management.  This file stores user configuration while the
> > >> +        drm.lgpu.effective reflects the actual allocation after
> > >> +        considering the relationship between the cgroups and their
> > >> +        configurations.
> > >> +
> > >> +       The lgpu is a discrete quantity that is device specific (i.e.
> > >> +       some DRM devices may have 64 lgpus while others may have 100
> > >> +       lgpus.)  The lgpu is a single quantity that can be allocated
> > >> +        in three different ways denoted by the following nested keys.
> > >> +
> > >> +         =====     ==============================================
> > >> +         weight    Allocate by proportion in relationship with
> > >> +                    active sibling cgroups
> > >> +         count     Allocate by amount statically, treat lgpu as
> > >> +                    anonymous resources
> > >> +         list      Allocate statically, treat lgpu as named
> > >> +                    resource
> > >> +         =====     ==============================================
> > >> +
> > >> +       For example:
> > >> +       226:0 weight=100 count=256 list=0-255
> > >> +       226:1 weight=100 count=4 list=0,2,4,6
> > >> +       226:2 weight=100 count=32 list=32-63
> > >> +       226:3 weight=100 count=0 list=
> > >> +       226:4 weight=500 count=0 list=
> > >> +
> > >> +       lgpu is represented by a bitmap and uses the bitmap_parselist
> > >> +       kernel function so the list key input format is a
> > >> +       comma-separated list of decimal numbers and ranges.
> > >> +
> > >> +       Consecutively set bits are shown as two hyphen-separated decimal
> > >> +       numbers, the smallest and largest bit numbers set in the range.
> > >> +       Optionally each range can be postfixed to denote that only parts
> > >> +       of it should be set.  The range will divided to groups of
> > >> +       specific size.
> > >> +       Syntax: range:used_size/group_size
> > >> +       Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > >> +
> > >> +       The count key is the hamming weight / hweight of the bitmap.
> > >> +
> > >> +       Weight, count and list accept the max and default keywords.
> > >> +
> > >> +       Some DRM devices may only support lgpu as anonymous resources.
> > >> +       In such case, the significance of the position of the set bits
> > >> +       in list will be ignored.
> > >> +
> > >> +       The weight quantity is only in effect when static allocation
> > >> +       is not used (by setting count=0) for this cgroup.  The weight
> > >> +       quantity distributes lgpus that are not statically allocated by
> > >> +       the siblings.  For example, given siblings cgroupA, cgroupB and
> > >> +       cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> > >> +       0-63, no lgpu is available to be distributed by weight.
> > >> +       Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> > >> +       cgroupC will be starved if it tries to allocate by weight.
> > >> +
> > >> +       On the other hand, if cgroupA has weight=100 count=0, cgroupB
> > >> +       has list=16-47, and cgroupC has weight=100 count=0, then 32
> > >> +       lgpus are available to be distributed evenly between cgroupA
> > >> +       and cgroupC.  In drm.lgpu.effective, cgroupA will have
> > >> +       list=0-15 and cgroupC will have list=48-63.
> > >> +
> > >> +       This lgpu resource supports the 'allocation' and 'weight'
> > >> +       resource distribution model.
> > >> +
> > >> +  drm.lgpu.effective
> > >> +       A read-only nested-keyed file which exists on all cgroups.
> > >> +       Each entry is keyed by the DRM device's major:minor.
> > >> +
> > >> +       lgpu stands for logical GPU, it is an abstraction used to
> > >> +       subdivide a physical DRM device for the purpose of resource
> > >> +       management.  This file reflects the actual allocation after
> > >> +        considering the relationship between the cgroups and their
> > >> +        configurations in drm.lgpu.
> > >> +
> > >>  GEM Buffer Ownership
> > >>  ~~~~~~~~~~~~~~~~~~~~
> > >>
> > >> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > >> index 2b41d4d22e33..619a110cc748 100644
> > >> --- a/include/drm/drm_cgroup.h
> > >> +++ b/include/drm/drm_cgroup.h
> > >> @@ -17,6 +17,9 @@ struct drmcg_props {
> > >>
> > >>         s64                     bo_limits_total_allocated_default;
> > >>         s64                     bo_limits_peak_allocated_default;
> > >> +
> > >> +       int                     lgpu_capacity;
> > >> +       DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > >>  };
> > >>
> > >>  void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
> > >> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> > >> index eae400f3d9b4..bb09704e7f71 100644
> > >> --- a/include/linux/cgroup_drm.h
> > >> +++ b/include/linux/cgroup_drm.h
> > >> @@ -11,10 +11,14 @@
> > >>  /* limit defined per the way drm_minor_alloc operates */
> > >>  #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> > >>
> > >> +#define MAX_DRMCG_LGPU_CAPACITY 256
> > >> +
> > >>  enum drmcg_res_type {
> > >>         DRMCG_TYPE_BO_TOTAL,
> > >>         DRMCG_TYPE_BO_PEAK,
> > >>         DRMCG_TYPE_BO_COUNT,
> > >> +       DRMCG_TYPE_LGPU,
> > >> +       DRMCG_TYPE_LGPU_EFF,
> > >>         __DRMCG_TYPE_LAST,
> > >>  };
> > >>
> > >> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
> > >>         s64                     bo_limits_peak_allocated;
> > >>
> > >>         s64                     bo_stats_count_allocated;
> > >> +
> > >> +       /**
> > >> +        * Logical GPU
> > >> +        *
> > >> +        * *_cfg are properties configured by users
> > >> +        * *_eff are the effective properties being applied to the hardware
> > >> +         * *_stg is used to calculate _eff before applying to _eff
> > >> +        * after considering the entire hierarchy
> > >> +        */
> > >> +       DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
> > >> +       /* user configurations */
> > >> +       s64                     lgpu_weight_cfg;
> > >> +       DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
> > >> +       /* effective lgpu for the cgroup after considering
> > >> +        * relationship with other cgroup
> > >> +        */
> > >> +       s64                     lgpu_count_eff;
> > >> +       DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
> > >>  };
> > >>
> > >>  /**
> > >> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> > >> index 5fcbbc13fa1c..a4e88a3704bb 100644
> > >> --- a/kernel/cgroup/drm.c
> > >> +++ b/kernel/cgroup/drm.c
> > >> @@ -9,6 +9,7 @@
> > >>  #include <linux/seq_file.h>
> > >>  #include <linux/mutex.h>
> > >>  #include <linux/kernel.h>
> > >> +#include <linux/bitmap.h>
> > >>  #include <linux/cgroup_drm.h>
> > >>  #include <drm/drm_file.h>
> > >>  #include <drm/drm_drv.h>
> > >> @@ -41,6 +42,10 @@ enum drmcg_file_type {
> > >>         DRMCG_FTYPE_DEFAULT,
> > >>  };
> > >>
> > >> +#define LGPU_LIMITS_NAME_LIST "list"
> > >> +#define LGPU_LIMITS_NAME_COUNT "count"
> > >> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
> > >> +
> > >>  /**
> > >>   * drmcg_bind - Bind DRM subsystem to cgroup subsystem
> > >>   * @acq_dm: function pointer to the drm_minor_acquire function
> > >> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> > >>         ddr->bo_limits_peak_allocated =
> > >>                 dev->drmcg_props.bo_limits_peak_allocated_default;
> > >>
> > >> +       bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
> > >> +                       MAX_DRMCG_LGPU_CAPACITY);
> > >> +       bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
> > >> +                       MAX_DRMCG_LGPU_CAPACITY);
> > >> +
> > >> +       ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
> > >> +
> > >>         return 0;
> > >>  }
> > >>
> > >> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
> > >>         mutex_unlock(&cgroup_mutex);
> > >>  }
> > >>
> > >> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
> > >> +               const unsigned long *free_static,
> > >> +               const unsigned long *free_weighted,
> > >> +               struct drmcg *parent_drmcg)
> > >> +{
> > >> +       int capacity = dev->drmcg_props.lgpu_capacity;
> > >> +       DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
> > >> +       DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
> > >> +       struct drmcg_device_resource *parent_ddr;
> > >> +       struct drmcg_device_resource *ddr;
> > >> +       int minor = dev->primary->index;
> > >> +       struct cgroup_subsys_state *pos;
> > >> +       struct drmcg *child;
> > >> +       s64 weight_sum = 0;
> > >> +       s64 unused;
> > >> +
> > >> +       parent_ddr = parent_drmcg->dev_resources[minor];
> > >> +
> > >> +       if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
> > >> +               /* no static cfg, use weight for calculating the effective */
> > >> +               bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
> > >> +       else
> > >> +               /* lgpu statically configured, use the overlap as effective */
> > >> +               bitmap_and(parent_ddr->lgpu_stg, free_static,
> > >> +                               parent_ddr->lgpu_cfg, capacity);
> > >> +
> > >> +       /* calculate lgpu available for distribution by weight for children */
> > >> +       bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
> > >> +       css_for_each_child(pos, &parent_drmcg->css) {
> > >> +               child = css_to_drmcg(pos);
> > >> +               ddr = child->dev_resources[minor];
> > >> +
> > >> +               if (bitmap_empty(ddr->lgpu_cfg, capacity))
> > >> +                       /* no static allocation, participate in weight dist */
> > >> +                       weight_sum += ddr->lgpu_weight_cfg;
> > >> +               else
> > >> +                       /* take out statically allocated lgpu by siblings */
> > >> +                       bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
> > >> +                                       capacity);
> > >> +       }
> > >> +
> > >> +       unused = bitmap_weight(lgpu_unused, capacity);
> > >> +
> > >> +       css_for_each_child(pos, &parent_drmcg->css) {
> > >> +               child = css_to_drmcg(pos);
> > >> +               ddr = child->dev_resources[minor];
> > >> +
> > >> +               bitmap_zero(lgpu_by_weight, capacity);
> > >> +               /* no static allocation, participate in weight distribution */
> > >> +               if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
> > >> +                       int c;
> > >> +                       int p = 0;
> > >> +
> > >> +                       for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
> > >> +                                       c > 0; c--) {
> > >> +                               p = find_next_bit(lgpu_unused, capacity, p);
> > >> +                               if (p < capacity) {
> > >> +                                       clear_bit(p, lgpu_unused);
> > >> +                                       set_bit(p, lgpu_by_weight);
> > >> +                               }
> > >> +                       }
> > >> +
> > >> +               }
> > >> +
> > >> +               drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
> > >> +                               lgpu_by_weight, child);
> > >> +       }
> > >> +}
> > >> +
> > >> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
> > >> +{
> > >> +       int capacity = dev->drmcg_props.lgpu_capacity;
> > >> +       int minor = dev->primary->index;
> > >> +       struct drmcg_device_resource *ddr;
> > >> +       struct cgroup_subsys_state *pos;
> > >> +       struct drmcg *drmcg;
> > >> +
> > >> +       if (root_drmcg == NULL) {
> > >> +               WARN_ON(root_drmcg == NULL);
> > >> +               return;
> > >> +       }
> > >> +
> > >> +       rcu_read_lock();
> > >> +
> > >> +       /* process the entire cgroup tree from root to simplify the algorithm */
> > >> +       drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
> > >> +                       dev->drmcg_props.lgpu_slots, root_drmcg);
> > >> +
> > >> +       /* apply changes to effective only if there is a change */
> > >> +       css_for_each_descendant_pre(pos, &root_drmcg->css) {
> > >> +               drmcg = css_to_drmcg(pos);
> > >> +               ddr = drmcg->dev_resources[minor];
> > >> +
> > >> +               if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
> > >> +                       bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
> > >> +                       ddr->lgpu_count_eff =
> > >> +                               bitmap_weight(ddr->lgpu_eff, capacity);
> > >> +               }
> > >> +       }
> > >> +       rcu_read_unlock();
> > >> +}
> > >> +
> > >> +static void drmcg_apply_effective(enum drmcg_res_type type,
> > >> +               struct drm_device *dev, struct drmcg *changed_drmcg)
> > >> +{
> > >> +       switch (type) {
> > >> +       case DRMCG_TYPE_LGPU:
> > >> +               drmcg_apply_effective_lgpu(dev);
> > >> +               break;
> > >> +       default:
> > >> +               break;
> > >> +       }
> > >> +}
> > >> +
> > >>  /**
> > >>   * drmcg_register_dev - register a DRM device for usage in drm cgroup
> > >>   * @dev: DRM device
> > >> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
> > >>         {
> > >>                 dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
> > >>
> > >> +               WARN_ON(dev->drmcg_props.lgpu_capacity !=
> > >> +                               bitmap_weight(dev->drmcg_props.lgpu_slots,
> > >> +                                       MAX_DRMCG_LGPU_CAPACITY));
> > >> +
> > >>                 drmcg_update_cg_tree(dev);
> > >> +
> > >> +               drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
> > >>         }
> > >>         mutex_unlock(&drmcg_mutex);
> > >>  }
> > >> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
> > >>  }
> > >>
> > >>  static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> > >> -               struct seq_file *sf, enum drmcg_res_type type)
> > >> +               struct seq_file *sf, enum drmcg_res_type type,
> > >> +               struct drm_device *dev)
> > >>  {
> > >>         if (ddr == NULL) {
> > >>                 seq_puts(sf, "\n");
> > >> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> > >>         case DRMCG_TYPE_BO_PEAK:
> > >>                 seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
> > >>                 break;
> > >> +       case DRMCG_TYPE_LGPU:
> > >> +               seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
> > >> +                               LGPU_LIMITS_NAME_WEIGHT,
> > >> +                               ddr->lgpu_weight_cfg,
> > >> +                               LGPU_LIMITS_NAME_COUNT,
> > >> +                               bitmap_weight(ddr->lgpu_cfg,
> > >> +                                       dev->drmcg_props.lgpu_capacity),
> > >> +                               LGPU_LIMITS_NAME_LIST,
> > >> +                               dev->drmcg_props.lgpu_capacity,
> > >> +                               ddr->lgpu_cfg);
> > >> +               break;
> > >> +       case DRMCG_TYPE_LGPU_EFF:
> > >> +               seq_printf(sf, "%s=%lld %s=%*pbl\n",
> > >> +                               LGPU_LIMITS_NAME_COUNT,
> > >> +                               ddr->lgpu_count_eff,
> > >> +                               LGPU_LIMITS_NAME_LIST,
> > >> +                               dev->drmcg_props.lgpu_capacity,
> > >> +                               ddr->lgpu_eff);
> > >> +               break;
> > >>         default:
> > >>                 seq_puts(sf, "\n");
> > >>                 break;
> > >> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
> > >>                 seq_printf(sf, "%lld\n",
> > >>                         props->bo_limits_peak_allocated_default);
> > >>                 break;
> > >> +       case DRMCG_TYPE_LGPU:
> > >> +               seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
> > >> +                               LGPU_LIMITS_NAME_WEIGHT,
> > >> +                               CGROUP_WEIGHT_DFL,
> > >> +                               LGPU_LIMITS_NAME_COUNT,
> > >> +                               bitmap_weight(props->lgpu_slots,
> > >> +                                       props->lgpu_capacity),
> > >> +                               LGPU_LIMITS_NAME_LIST,
> > >> +                               props->lgpu_capacity,
> > >> +                               props->lgpu_slots);
> > >> +               break;
> > >>         default:
> > >>                 seq_puts(sf, "\n");
> > >>                 break;
> > >> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
> > >>                 drmcg_print_stats(ddr, sf, type);
> > >>                 break;
> > >>         case DRMCG_FTYPE_LIMIT:
> > >> -               drmcg_print_limits(ddr, sf, type);
> > >> +               drmcg_print_limits(ddr, sf, type, minor->dev);
> > >>                 break;
> > >>         case DRMCG_FTYPE_DEFAULT:
> > >>                 drmcg_print_default(&minor->dev->drmcg_props, sf, type);
> > >> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
> > >>         return rc;
> > >>  }
> > >>
> > >> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> > >> +               struct drm_device *dev, char *attrs)
> > >> +{
> > >> +       DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > >> +       DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > >> +       enum drmcg_res_type type =
> > >> +               DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> > >> +       struct drmcg *drmcg = css_to_drmcg(of_css(of));
> > >> +       struct drmcg_props *props = &dev->drmcg_props;
> > >> +       char *cft_name = of_cft(of)->name;
> > >> +       int minor = dev->primary->index;
> > >> +       char *nested = strstrip(attrs);
> > >> +       struct drmcg_device_resource *ddr =
> > >> +               drmcg->dev_resources[minor];
> > >> +       char *attr;
> > >> +       char sname[256];
> > >> +       char sval[256];
> > >> +       s64 val;
> > >> +       int rc;
> > >> +
> > >> +       while (nested != NULL) {
> > >> +               attr = strsep(&nested, " ");
> > >> +
> > >> +               if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
> > >> +                       continue;
> > >> +
> > >> +               switch (type) {
> > >> +               case DRMCG_TYPE_LGPU:
> > >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> > >> +                               strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
> > >> +                               strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
> > >> +                               continue;
> > >> +
> > >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
> > >> +                                       (!strcmp("max", sval) ||
> > >> +                                       !strcmp("default", sval))) {
> > >> +                               bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
> > >> +                                               props->lgpu_capacity);
> > >> +
> > >> +                               continue;
> > >> +                       }
> > >> +
> > >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
> > >> +                               rc = drmcg_process_limit_s64_val(sval,
> > >> +                                       false, CGROUP_WEIGHT_DFL,
> > >> +                                       CGROUP_WEIGHT_MAX, &val);
> > >> +
> > >> +                               if (rc || val < CGROUP_WEIGHT_MIN ||
> > >> +                                               val > CGROUP_WEIGHT_MAX) {
> > >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> > >> +                                                       minor);
> > >> +                                       continue;
> > >> +                               }
> > >> +
> > >> +                               ddr->lgpu_weight_cfg = val;
> > >> +                               continue;
> > >> +                       }
> > >> +
> > >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> > >> +                               rc = drmcg_process_limit_s64_val(sval,
> > >> +                                       false, props->lgpu_capacity,
> > >> +                                       props->lgpu_capacity, &val);
> > >> +
> > >> +                               if (rc || val < 0) {
> > >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> > >> +                                                       minor);
> > >> +                                       continue;
> > >> +                               }
> > >> +
> > >> +                               bitmap_zero(tmp_bitmap,
> > >> +                                               MAX_DRMCG_LGPU_CAPACITY);
> > >> +                               bitmap_set(tmp_bitmap, 0, val);
> > >> +                       }
> > >> +
> > >> +                       if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> > >> +                               rc = bitmap_parselist(sval, tmp_bitmap,
> > >> +                                               MAX_DRMCG_LGPU_CAPACITY);
> > >> +
> > >> +                               if (rc) {
> > >> +                                       drmcg_pr_cft_err(drmcg, rc, cft_name,
> > >> +                                                       minor);
> > >> +                                       continue;
> > >> +                               }
> > >> +
> > >> +                               bitmap_andnot(chk_bitmap, tmp_bitmap,
> > >> +                                       props->lgpu_slots,
> > >> +                                       MAX_DRMCG_LGPU_CAPACITY);
> > >> +
> > >> +                               /* user setting does not intersect with
> > >> +                                * available lgpu */
> > >> +                               if (!bitmap_empty(chk_bitmap,
> > >> +                                               MAX_DRMCG_LGPU_CAPACITY)) {
> > >> +                                       drmcg_pr_cft_err(drmcg, 0, cft_name,
> > >> +                                                       minor);
> > >> +                                       continue;
> > >> +                               }
> > >> +                       }
> > >> +
> > >> +                       bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
> > >> +                                       props->lgpu_capacity);
> > >> +
> > >> +                       break; /* DRMCG_TYPE_LGPU */
> > >> +               default:
> > >> +                       break;
> > >> +               } /* switch (type) */
> > >> +       }
> > >> +}
> > >> +
> > >> +
> > >>  /**
> > >>   * drmcg_limit_write - parse cgroup interface files to obtain user config
> > >>   *
> > >> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > >>
> > >>                         ddr->bo_limits_peak_allocated = val;
> > >>                         break;
> > >> +               case DRMCG_TYPE_LGPU:
> > >> +                       drmcg_nested_limit_parse(of, dm->dev, sattr);
> > >> +                       break;
> > >>                 default:
> > >>                         break;
> > >>                 }
> > >> +
> > >> +               drmcg_apply_effective(type, dm->dev, drmcg);
> > >> +
> > >>                 mutex_unlock(&dm->dev->drmcg_mutex);
> > >>
> > >>                 mutex_lock(&drmcg_mutex);
> > >> @@ -560,12 +838,51 @@ struct cftype files[] = {
> > >>                 .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
> > >>                                                 DRMCG_FTYPE_STATS),
> > >>         },
> > >> +       {
> > >> +               .name = "lgpu",
> > >> +               .seq_show = drmcg_seq_show,
> > >> +               .write = drmcg_limit_write,
> > >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > >> +                                               DRMCG_FTYPE_LIMIT),
> > >> +       },
> > >> +       {
> > >> +               .name = "lgpu.default",
> > >> +               .seq_show = drmcg_seq_show,
> > >> +               .flags = CFTYPE_ONLY_ON_ROOT,
> > >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > >> +                                               DRMCG_FTYPE_DEFAULT),
> > >> +       },
> > >> +       {
> > >> +               .name = "lgpu.effective",
> > >> +               .seq_show = drmcg_seq_show,
> > >> +               .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
> > >> +                                               DRMCG_FTYPE_LIMIT),
> > >> +       },
> > >>         { }     /* terminate */
> > >>  };
> > >>
> > >> +static int drmcg_online_fn(int id, void *ptr, void *data)
> > >> +{
> > >> +       struct drm_minor *minor = ptr;
> > >> +       struct drmcg *drmcg = data;
> > >> +
> > >> +       if (minor->type != DRM_MINOR_PRIMARY)
> > >> +               return 0;
> > >> +
> > >> +       drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
> > >> +
> > >> +       return 0;
> > >> +}
> > >> +
> > >> +static int drmcg_css_online(struct cgroup_subsys_state *css)
> > >> +{
> > >> +       return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
> > >> +}
> > >> +
> > >>  struct cgroup_subsys drm_cgrp_subsys = {
> > >>         .css_alloc      = drmcg_css_alloc,
> > >>         .css_free       = drmcg_css_free,
> > >> +       .css_online     = drmcg_css_online,
> > >>         .early_init     = false,
> > >>         .legacy_cftypes = files,
> > >>         .dfl_cftypes    = files,
> > >> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
> > >>         dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
> > >>         dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
> > >>
> > >> +       dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> > >> +       bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > >> +
> > >>         drmcg_update_cg_tree(dev);
> > >>  }
> > >>  EXPORT_SYMBOL(drmcg_device_early_init);
> > >> --
> > >> 2.25.0
> > >>
> > >> _______________________________________________
> > >> dri-devel mailing list
> > >> dri-devel@lists.freedesktop.org
> > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 18:51         ` Kenny Ho
@ 2020-02-14 19:17           ` Tejun Heo
  2020-02-14 20:28             ` Kenny Ho
  2020-02-19 16:18             ` Johannes Weiner
  0 siblings, 2 replies; 26+ messages in thread
From: Tejun Heo @ 2020-02-14 19:17 UTC (permalink / raw)
  To: Kenny Ho
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	nirmoy.das, Maling list - DRI developers, lkaplan, Greathouse,
	Joseph, amd-gfx mailing list, Jason Ekstrand, Johannes Weiner,
	Alex Deucher, cgroups, Christian König, damon.mcdougall

Hello, Kenny, Daniel.

(cc'ing Johannes)

On Fri, Feb 14, 2020 at 01:51:32PM -0500, Kenny Ho wrote:
> On Fri, Feb 14, 2020 at 1:34 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > I think guidance from Tejun in previos discussions was pretty clear that
> > he expects cgroups to be both a) standardized and c) sufficient clear
> > meaning that end-users have a clear understanding of what happens when
> > they change the resource allocation.
> >
> > I'm not sure lgpu here, at least as specified, passes either.
> 
> I disagree (at least on the characterization of the feedback
> provided.)  I believe this series satisfied the sprite of Tejun's
> guidance so far (the weight knob for lgpu, for example, was
> specifically implemented base on his input.)  But, I will let Tejun
> speak for himself after he considered the implementation in detail.

I have to agree with Daniel here. My apologies if I weren't clear
enough. Here's one interface I can think of:

 * compute weight: The same format as io.weight. Proportional control
   of gpu compute.

 * memory low: Please see how the system memory.low behaves. For gpus,
   it'll need per-device entries.

Note that for both, there one number to configure and conceptually
it's pretty clear to everybody what that number means, which is not to
say that it's clear to implement but it's much better to deal with
that on this side of the interface than the other.

cc'ing Johannes. Do you have anything on mind regarding how gpu memory
configuration should look like? e.g. should it go w/ weights rather
than absoulte units (I don't think so given that it'll most likely
need limits at some point too but still and there are benefits from
staying consistent with system memory).

Also, a rather trivial high level question. Is drm a good controller
name given that other controller names are like cpu, memory, io?

Thanks.

-- 
tejun
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 19:17           ` Tejun Heo
@ 2020-02-14 20:28             ` Kenny Ho
  2020-02-14 21:15               ` Tejun Heo
  2020-02-19 16:21               ` Johannes Weiner
  2020-02-19 16:18             ` Johannes Weiner
  1 sibling, 2 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-14 20:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	nirmoy.das, Maling list - DRI developers, lkaplan, Greathouse,
	Joseph, amd-gfx mailing list, Jason Ekstrand, Johannes Weiner,
	Alex Deucher, cgroups, Christian König, damon.mcdougall

Hi Tejun,

On Fri, Feb 14, 2020 at 2:17 PM Tejun Heo <tj@kernel.org> wrote:
>
> I have to agree with Daniel here. My apologies if I weren't clear
> enough. Here's one interface I can think of:
>
>  * compute weight: The same format as io.weight. Proportional control
>    of gpu compute.
>
>  * memory low: Please see how the system memory.low behaves. For gpus,
>    it'll need per-device entries.
>
> Note that for both, there one number to configure and conceptually
> it's pretty clear to everybody what that number means, which is not to
> say that it's clear to implement but it's much better to deal with
> that on this side of the interface than the other.

Can you elaborate, per your understanding, how the lgpu weight
attribute differ from the io.weight you suggested?  Is it merely a
formatting/naming issue or is it the implementation details that you
find troubling?  From my perspective, the weight attribute implements
as you suggested back in RFCv4 (proportional control on top of a unit
- either physical or time unit.)

Perhaps more explicit questions would help me understand what you
mean. If I remove the 'list' and 'count' attributes leaving just
weight, is that satisfactory?  Are you saying the idea of affinity or
named-resource is banned from cgroup entirely (even though it exists
in the form of cpuset already and users are interested in having such
options [i.e. userspace OpenCL] when needed?)

To be clear, I am not saying no proportional control.  I am saying
give the user the options, which is what has been implemented.

> cc'ing Johannes. Do you have anything on mind regarding how gpu memory
> configuration should look like? e.g. should it go w/ weights rather
> than absoulte units (I don't think so given that it'll most likely
> need limits at some point too but still and there are benefits from
> staying consistent with system memory).
>
> Also, a rather trivial high level question. Is drm a good controller
> name given that other controller names are like cpu, memory, io?

There was a discussion about naming early in the RFC (I believe
RFCv2), the consensuses then was to use drmcg to align with the drm
subsystem.  I have no problem renaming it to gpucg  or something
similar if that is the last thing that's blocking acceptance.  For
now, I would like to get some clarity on the implementation before
having more code churn.

Regards,
Kenny


> Thanks.
>
> --
> tejun
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 20:28             ` Kenny Ho
@ 2020-02-14 21:15               ` Tejun Heo
  2020-02-19 16:21               ` Johannes Weiner
  1 sibling, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2020-02-14 21:15 UTC (permalink / raw)
  To: Kenny Ho
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	nirmoy.das, Maling list - DRI developers, lkaplan, Greathouse,
	Joseph, amd-gfx mailing list, Jason Ekstrand, Johannes Weiner,
	Alex Deucher, cgroups, Christian König, damon.mcdougall

On Fri, Feb 14, 2020 at 03:28:40PM -0500, Kenny Ho wrote:
> Can you elaborate, per your understanding, how the lgpu weight
> attribute differ from the io.weight you suggested?  Is it merely a

Oh, it's the non-weight part which is problematic.

> formatting/naming issue or is it the implementation details that you
> find troubling?  From my perspective, the weight attribute implements
> as you suggested back in RFCv4 (proportional control on top of a unit
> - either physical or time unit.)
> 
> Perhaps more explicit questions would help me understand what you
> mean. If I remove the 'list' and 'count' attributes leaving just
> weight, is that satisfactory?  Are you saying the idea of affinity or

At least from interface pov, yes, although I think it should be clear
what the weight controls.

> named-resource is banned from cgroup entirely (even though it exists
> in the form of cpuset already and users are interested in having such
> options [i.e. userspace OpenCL] when needed?)
> 
> To be clear, I am not saying no proportional control.  I am saying
> give the user the options, which is what has been implemented.

We can get there if we *really* have to but not from the get-go but
I'd rather avoid affinities if at all possible.

Thanks.

-- 
tejun
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 19:17           ` Tejun Heo
  2020-02-14 20:28             ` Kenny Ho
@ 2020-02-19 16:18             ` Johannes Weiner
  2020-02-19 16:28               ` Kenny Ho
  1 sibling, 1 reply; 26+ messages in thread
From: Johannes Weiner @ 2020-02-19 16:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	nirmoy.das, Maling list - DRI developers, lkaplan, Greathouse,
	Joseph, Kenny Ho, amd-gfx mailing list, Jason Ekstrand,
	Alex Deucher, cgroups, Christian König, damon.mcdougall

On Fri, Feb 14, 2020 at 02:17:54PM -0500, Tejun Heo wrote:
> Hello, Kenny, Daniel.
> 
> (cc'ing Johannes)
> 
> On Fri, Feb 14, 2020 at 01:51:32PM -0500, Kenny Ho wrote:
> > On Fri, Feb 14, 2020 at 1:34 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > I think guidance from Tejun in previos discussions was pretty clear that
> > > he expects cgroups to be both a) standardized and c) sufficient clear
> > > meaning that end-users have a clear understanding of what happens when
> > > they change the resource allocation.
> > >
> > > I'm not sure lgpu here, at least as specified, passes either.
> > 
> > I disagree (at least on the characterization of the feedback
> > provided.)  I believe this series satisfied the sprite of Tejun's
> > guidance so far (the weight knob for lgpu, for example, was
> > specifically implemented base on his input.)  But, I will let Tejun
> > speak for himself after he considered the implementation in detail.
> 
> I have to agree with Daniel here. My apologies if I weren't clear
> enough. Here's one interface I can think of:
> 
>  * compute weight: The same format as io.weight. Proportional control
>    of gpu compute.
> 
>  * memory low: Please see how the system memory.low behaves. For gpus,
>    it'll need per-device entries.
> 
> Note that for both, there one number to configure and conceptually
> it's pretty clear to everybody what that number means, which is not to
> say that it's clear to implement but it's much better to deal with
> that on this side of the interface than the other.
> 
> cc'ing Johannes. Do you have anything on mind regarding how gpu memory
> configuration should look like? e.g. should it go w/ weights rather
> than absoulte units (I don't think so given that it'll most likely
> need limits at some point too but still and there are benefits from
> staying consistent with system memory).

Yes, I'd go with absolute units when it comes to memory, because it's
not a renewable resource like CPU and IO, and so we do have cliff
behavior around the edge where you transition from ok to not-enough.

memory.low is a bit in flux right now, so if anything is unclear
around its semantics, please feel free to reach out.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-14 20:28             ` Kenny Ho
  2020-02-14 21:15               ` Tejun Heo
@ 2020-02-19 16:21               ` Johannes Weiner
  1 sibling, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2020-02-19 16:21 UTC (permalink / raw)
  To: Kenny Ho
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	nirmoy.das, Maling list - DRI developers, lkaplan, Alex Deucher,
	Greathouse, Joseph, amd-gfx mailing list, Jason Ekstrand,
	Tejun Heo, cgroups, Christian König, damon.mcdougall

On Fri, Feb 14, 2020 at 03:28:40PM -0500, Kenny Ho wrote:
> On Fri, Feb 14, 2020 at 2:17 PM Tejun Heo <tj@kernel.org> wrote:
> > Also, a rather trivial high level question. Is drm a good controller
> > name given that other controller names are like cpu, memory, io?
> 
> There was a discussion about naming early in the RFC (I believe
> RFCv2), the consensuses then was to use drmcg to align with the drm
> subsystem.  I have no problem renaming it to gpucg  or something
> similar if that is the last thing that's blocking acceptance.  For
> now, I would like to get some clarity on the implementation before
> having more code churn.

As far as precedence goes, we named the other controllers after the
resources they control rather than the subsystem: cpu instead of
scheduler, memory instead of mm, io instead of block layer etc.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-19 16:18             ` Johannes Weiner
@ 2020-02-19 16:28               ` Kenny Ho
  2020-02-19 18:38                 ` Johannes Weiner
  0 siblings, 1 reply; 26+ messages in thread
From: Kenny Ho @ 2020-02-19 16:28 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	nirmoy.das, Maling list - DRI developers, lkaplan, Alex Deucher,
	Greathouse, Joseph, amd-gfx mailing list, Jason Ekstrand,
	Tejun Heo, cgroups, Christian König, damon.mcdougall

On Wed, Feb 19, 2020 at 11:18 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Yes, I'd go with absolute units when it comes to memory, because it's
> not a renewable resource like CPU and IO, and so we do have cliff
> behavior around the edge where you transition from ok to not-enough.
>
> memory.low is a bit in flux right now, so if anything is unclear
> around its semantics, please feel free to reach out.

I am not familiar with the discussion, would you point me to a
relevant thread please?  In addition, is there some kind of order of
preference for implementing low vs high vs max?

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-19 16:28               ` Kenny Ho
@ 2020-02-19 18:38                 ` Johannes Weiner
  2020-02-21  5:59                   ` Kenny Ho
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Weiner @ 2020-02-19 18:38 UTC (permalink / raw)
  To: Kenny Ho
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	nirmoy.das, Maling list - DRI developers, lkaplan, Alex Deucher,
	Greathouse, Joseph, amd-gfx mailing list, Jason Ekstrand,
	Tejun Heo, cgroups, Christian König, damon.mcdougall

On Wed, Feb 19, 2020 at 11:28:48AM -0500, Kenny Ho wrote:
> On Wed, Feb 19, 2020 at 11:18 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > Yes, I'd go with absolute units when it comes to memory, because it's
> > not a renewable resource like CPU and IO, and so we do have cliff
> > behavior around the edge where you transition from ok to not-enough.
> >
> > memory.low is a bit in flux right now, so if anything is unclear
> > around its semantics, please feel free to reach out.
> 
> I am not familiar with the discussion, would you point me to a
> relevant thread please?

Here is a cleanup patch, not yet merged, that documents the exact
semantics and behavioral considerations:

https://lore.kernel.org/linux-mm/20191213192158.188939-3-hannes@cmpxchg.org/

But the high-level idea is this: you assign each cgroup or cgroup
subtree a chunk of the resource that it's guaranteed to be able to
consume. It *can* consume beyond that threshold if available, but that
overage may get reclaimed again if somebody else needs it instead.

This allows you to do a ballpark distribution of the resource between
different workloads, while the kernel retains the ability to optimize
allocation of spare resources - because in practice, workload demand
varies over time, workloads disappear and new ones start up etc.

> In addition, is there some kind of order of preference for
> implementing low vs high vs max?

If you implement only one allocation model, the preference would be on
memory.low. Limits are rigid and per definition waste resources, so in
practice we're moving away from them.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2020-02-19 18:38                 ` Johannes Weiner
@ 2020-02-21  5:59                   ` Kenny Ho
  0 siblings, 0 replies; 26+ messages in thread
From: Kenny Ho @ 2020-02-21  5:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: juan.zuniga-anaya, Kenny Ho, Kuehling, Felix, jsparks,
	nirmoy.das, Maling list - DRI developers, lkaplan, Alex Deucher,
	Greathouse, Joseph, amd-gfx mailing list, Jason Ekstrand,
	Tejun Heo, cgroups, Christian König, damon.mcdougall

Thanks, I will take a look.

Regards,
Kenny

On Wed, Feb 19, 2020 at 1:38 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Feb 19, 2020 at 11:28:48AM -0500, Kenny Ho wrote:
> > On Wed, Feb 19, 2020 at 11:18 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > Yes, I'd go with absolute units when it comes to memory, because it's
> > > not a renewable resource like CPU and IO, and so we do have cliff
> > > behavior around the edge where you transition from ok to not-enough.
> > >
> > > memory.low is a bit in flux right now, so if anything is unclear
> > > around its semantics, please feel free to reach out.
> >
> > I am not familiar with the discussion, would you point me to a
> > relevant thread please?
>
> Here is a cleanup patch, not yet merged, that documents the exact
> semantics and behavioral considerations:
>
> https://lore.kernel.org/linux-mm/20191213192158.188939-3-hannes@cmpxchg.org/
>
> But the high-level idea is this: you assign each cgroup or cgroup
> subtree a chunk of the resource that it's guaranteed to be able to
> consume. It *can* consume beyond that threshold if available, but that
> overage may get reclaimed again if somebody else needs it instead.
>
> This allows you to do a ballpark distribution of the resource between
> different workloads, while the kernel retains the ability to optimize
> allocation of spare resources - because in practice, workload demand
> varies over time, workloads disappear and new ones start up etc.
>
> > In addition, is there some kind of order of preference for
> > implementing low vs high vs max?
>
> If you implement only one allocation model, the preference would be on
> memory.low. Limits are rigid and per definition waste resources, so in
> practice we're moving away from them.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-02-21  6:00 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 15:56 [PATCH 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
2020-02-14 15:56 ` [PATCH 01/11] cgroup: Introduce cgroup for drm subsystem Kenny Ho
2020-02-14 15:56 ` [PATCH 02/11] drm, cgroup: Bind drm and cgroup subsystem Kenny Ho
2020-02-14 15:56 ` [PATCH 03/11] drm, cgroup: Initialize drmcg properties Kenny Ho
2020-02-14 15:56 ` [PATCH 04/11] drm, cgroup: Add total GEM buffer allocation stats Kenny Ho
2020-02-14 15:56 ` [PATCH 05/11] drm, cgroup: Add peak " Kenny Ho
2020-02-14 15:56 ` [PATCH 06/11] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
2020-02-14 15:56 ` [PATCH 07/11] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
2020-02-14 15:56 ` [PATCH 08/11] drm, cgroup: Add peak " Kenny Ho
2020-02-14 15:56 ` [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource Kenny Ho
2020-02-14 16:44   ` Jason Ekstrand
2020-02-14 16:59     ` Jason Ekstrand
2020-02-14 17:08     ` Kenny Ho
2020-02-14 17:48       ` Jason Ekstrand
2020-02-14 18:34       ` Daniel Vetter
2020-02-14 18:51         ` Kenny Ho
2020-02-14 19:17           ` Tejun Heo
2020-02-14 20:28             ` Kenny Ho
2020-02-14 21:15               ` Tejun Heo
2020-02-19 16:21               ` Johannes Weiner
2020-02-19 16:18             ` Johannes Weiner
2020-02-19 16:28               ` Kenny Ho
2020-02-19 18:38                 ` Johannes Weiner
2020-02-21  5:59                   ` Kenny Ho
2020-02-14 15:56 ` [PATCH 10/11] drm, cgroup: add update trigger after limit change Kenny Ho
2020-02-14 15:56 ` [PATCH 11/11] drm/amdgpu: Integrate with DRM cgroup Kenny Ho

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).