All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
@ 2019-08-29  6:05 Kenny Ho
  2019-08-29  6:05 ` [PATCH RFC v4 01/16] drm: Add drm_minor_for_each Kenny Ho
                   ` (8 more replies)
  0 siblings, 9 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

This is a follow up to the RFC I made previously to introduce a cgroup
controller for the GPU/DRM subsystem [v1,v2,v3].  The goal is to be able to
provide resource management to GPU resources using things like container.  

With this RFC v4, I am hoping to have some consensus on a merge plan.  I believe
the GEM related resources (drm.buffer.*) introduced in previous RFC and,
hopefully, the logical GPU concept (drm.lgpu.*) introduced in this RFC are
uncontroversial and ready to move out of RFC and into a more formal review.  I
will continue to work on the memory backend resources (drm.memory.*).

The cover letter from v1 is copied below for reference.

[v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
[v2]: https://www.spinics.net/lists/cgroups/msg22074.html
[v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html

v4:
Unchanged (no review needed)
* drm.memory.*/ttm resources (Patch 9-13, I am still working on memory bandwidth
and shrinker)
Base on feedbacks on v3:
* update nominclature to drmcg
* embed per device drmcg properties into drm_device
* split GEM buffer related commits into stats and limit
* rename function name to align with convention
* combined buffer accounting and check into a try_charge function
* support buffer stats without limit enforcement
* removed GEM buffer sharing limitation
* updated documentations
New features:
* introducing logical GPU concept
* example implementation with AMD KFD

v3:
Base on feedbacks on v2:
* removed .help type file from v2
* conform to cgroup convention for default and max handling
* conform to cgroup convention for addressing device specific limits (with major:minor)
New function:
* adopted memparse for memory size related attributes
* added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
* added ttm buffer usage stats (per cgroup, for system, tt, vram.)
* added ttm buffer usage limit (per cgroup, for vram.)
* added per cgroup bandwidth stats and limiting (burst and average bandwidth)

v2:
* Removed the vendoring concepts
* Add limit to total buffer allocation
* Add limit to the maximum size of a buffer allocation

v1: cover letter

The purpose of this patch series is to start a discussion for a generic cgroup
controller for the drm subsystem.  The design proposed here is a very early one.
We are hoping to engage the community as we develop the idea.


Backgrounds
==========
Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
tasks, and all their future children, into hierarchical groups with specialized
behaviour, such as accounting/limiting the resources which processes in a cgroup
can access[1].  Weights, limits, protections, allocations are the main resource
distribution models.  Existing cgroup controllers includes cpu, memory, io,
rdma, and more.  cgroup is one of the foundational technologies that enables the
popular container application deployment and management method.

Direct Rendering Manager/drm contains code intended to support the needs of
complex graphics devices. Graphics drivers in the kernel may make use of DRM
functions to make tasks like memory management, interrupt handling and DMA
easier, and provide a uniform interface to applications.  The DRM has also
developed beyond traditional graphics applications to support compute/GPGPU
applications.


Motivations
=========
As GPU grow beyond the realm of desktop/workstation graphics into areas like
data center clusters and IoT, there are increasing needs to monitor and regulate
GPU as a resource like cpu, memory and io.

Matt Roper from Intel began working on similar idea in early 2018 [2] for the
purpose of managing GPU priority using the cgroup hierarchy.  While that
particular use case may not warrant a standalone drm cgroup controller, there
are other use cases where having one can be useful [3].  Monitoring GPU
resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
(execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
sysadmins get a better understanding of the applications usage profile.  Further
usage regulations of the aforementioned resources can also help sysadmins
optimize workload deployment on limited GPU resources.

With the increased importance of machine learning, data science and other
cloud-based applications, GPUs are already in production use in data centers
today [5,6,7].  Existing GPU resource management is very course grain, however,
as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
alternative is to use GPU virtualization (with or without SRIOV) but it
generally acts on the entire GPU instead of the specific resources in a GPU.
With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
resource management (in addition to what may be available via GPU
virtualization.)

In addition to production use, the DRM cgroup can also help with testing
graphics application robustness by providing a mean to artificially limit DRM
resources availble to the applications.


Challenges
========
While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed some
of the ideas from RDMA cgroup controller.

Approach
=======
To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757

Kenny Ho (16):
  drm: Add drm_minor_for_each
  cgroup: Introduce cgroup for drm subsystem
  drm, cgroup: Initialize drmcg properties
  drm, cgroup: Add total GEM buffer allocation stats
  drm, cgroup: Add peak GEM buffer allocation stats
  drm, cgroup: Add GEM buffer allocation count stats
  drm, cgroup: Add total GEM buffer allocation limit
  drm, cgroup: Add peak GEM buffer allocation limit
  drm, cgroup: Add TTM buffer allocation stats
  drm, cgroup: Add TTM buffer peak usage stats
  drm, cgroup: Add per cgroup bw measure and control
  drm, cgroup: Add soft VRAM limit
  drm, cgroup: Allow more aggressive memory reclaim
  drm, cgroup: Introduce lgpu as DRM cgroup resource
  drm, cgroup: add update trigger after limit change
  drm/amdgpu: Integrate with DRM cgroup

 Documentation/admin-guide/cgroup-v2.rst       |  163 +-
 Documentation/cgroup-v1/drm.rst               |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   29 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |    6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |    3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |    6 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |    3 +
 .../amd/amdkfd/kfd_process_queue_manager.c    |  140 ++
 drivers/gpu/drm/drm_drv.c                     |   26 +
 drivers/gpu/drm/drm_gem.c                     |   16 +-
 drivers/gpu/drm/drm_internal.h                |    4 -
 drivers/gpu/drm/ttm/ttm_bo.c                  |   93 ++
 drivers/gpu/drm/ttm/ttm_bo_util.c             |    4 +
 include/drm/drm_cgroup.h                      |  122 ++
 include/drm/drm_device.h                      |    7 +
 include/drm/drm_drv.h                         |   23 +
 include/drm/drm_gem.h                         |   13 +-
 include/drm/ttm/ttm_bo_api.h                  |    2 +
 include/drm/ttm/ttm_bo_driver.h               |   10 +
 include/linux/cgroup_drm.h                    |  151 ++
 include/linux/cgroup_subsys.h                 |    4 +
 init/Kconfig                                  |    5 +
 kernel/cgroup/Makefile                        |    1 +
 kernel/cgroup/drm.c                           | 1367 +++++++++++++++++
 25 files changed, 2193 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
  2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
@ 2019-08-29  6:05 ` Kenny Ho
       [not found]   ` <20190829060533.32315-2-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-08-29  6:05 ` [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem Kenny Ho
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel
  Cc: Kenny Ho

To allow other subsystems to iterate through all stored DRM minors and
act upon them.

Also exposes drm_minor_acquire and drm_minor_release for other subsystem
to handle drm_minor.  DRM cgroup controller is the initial consumer of
this new features.

Change-Id: I7c4b67ce6b31f06d1037b03435386ff5b8144ca5
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
 drivers/gpu/drm/drm_internal.h |  4 ----
 include/drm/drm_drv.h          |  4 ++++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 862621494a93..000cddabd970 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
 
 	return minor;
 }
+EXPORT_SYMBOL(drm_minor_acquire);
 
 void drm_minor_release(struct drm_minor *minor)
 {
 	drm_dev_put(minor->dev);
 }
+EXPORT_SYMBOL(drm_minor_release);
 
 /**
  * DOC: driver instance overview
@@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
 }
 EXPORT_SYMBOL(drm_dev_set_unique);
 
+/**
+ * drm_minor_for_each - Iterate through all stored DRM minors
+ * @fn: Function to be called for each pointer.
+ * @data: Data passed to callback function.
+ *
+ * The callback function will be called for each @drm_minor entry, passing
+ * the minor, the entry and @data.
+ *
+ * If @fn returns anything other than %0, the iteration stops and that
+ * value is returned from this function.
+ */
+int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
+{
+	return idr_for_each(&drm_minors_idr, fn, data);
+}
+EXPORT_SYMBOL(drm_minor_for_each);
+
 /*
  * DRM Core
  * The DRM core module initializes all global DRM objects and makes them
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index e19ac7ca602d..6bfad76f8e78 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
 void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
 					struct dma_buf *dma_buf);
 
-/* drm_drv.c */
-struct drm_minor *drm_minor_acquire(unsigned int minor_id);
-void drm_minor_release(struct drm_minor *minor);
-
 /* drm_vblank.c */
 void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
 void drm_vblank_cleanup(struct drm_device *dev);
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 68ca736c548d..24f8d054c570 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
 
 int drm_dev_set_unique(struct drm_device *dev, const char *name);
 
+int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
+
+struct drm_minor *drm_minor_acquire(unsigned int minor_id);
+void drm_minor_release(struct drm_minor *minor);
 
 #endif
-- 
2.22.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
  2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
  2019-08-29  6:05 ` [PATCH RFC v4 01/16] drm: Add drm_minor_for_each Kenny Ho
@ 2019-08-29  6:05 ` Kenny Ho
       [not found]   ` <20190829060533.32315-3-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-08-29  6:05 ` [PATCH RFC v4 03/16] drm, cgroup: Initialize drmcg properties Kenny Ho
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel
  Cc: Kenny Ho

With the increased importance of machine learning, data science and
other cloud-based applications, GPUs are already in production use in
data centers today.  Existing GPU resource management is very coarse
grain, however, as sysadmins are only able to distribute workload on a
per-GPU basis.  An alternative is to use GPU virtualization (with or
without SRIOV) but it generally acts on the entire GPU instead of the
specific resources in a GPU.  With a drm cgroup controller, we can
enable alternate, fine-grain, sub-GPU resource management (in addition
to what may be available via GPU virtualization.)

Change-Id: I6830d3990f63f0c13abeba29b1d330cf28882831
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 18 ++++-
 Documentation/cgroup-v1/drm.rst         |  1 +
 include/linux/cgroup_drm.h              | 92 +++++++++++++++++++++++++
 include/linux/cgroup_subsys.h           |  4 ++
 init/Kconfig                            |  5 ++
 kernel/cgroup/Makefile                  |  1 +
 kernel/cgroup/drm.c                     | 42 +++++++++++
 7 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 88e746074252..2936423a3fd5 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -61,8 +61,10 @@ v1 is available under Documentation/cgroup-v1/.
      5-6. Device
      5-7. RDMA
        5-7-1. RDMA Interface Files
-     5-8. Misc
-       5-8-1. perf_event
+     5-8. DRM
+       5-8-1. DRM Interface Files
+     5-9. Misc
+       5-9-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -1889,6 +1891,18 @@ RDMA Interface Files
 	  ocrdma1 hca_handle=1 hca_object=23
 
 
+DRM
+---
+
+The "drm" controller regulates the distribution and accounting of
+of DRM (Direct Rendering Manager) and GPU-related resources.
+
+DRM Interface Files
+~~~~~~~~~~~~~~~~~~~~
+
+TODO
+
+
 Misc
 ----
 
diff --git a/Documentation/cgroup-v1/drm.rst b/Documentation/cgroup-v1/drm.rst
new file mode 100644
index 000000000000..5f5658e1f5ed
--- /dev/null
+++ b/Documentation/cgroup-v1/drm.rst
@@ -0,0 +1 @@
+Please see ../cgroup-v2.rst for details
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
new file mode 100644
index 000000000000..971166f9dd78
--- /dev/null
+++ b/include/linux/cgroup_drm.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef _CGROUP_DRM_H
+#define _CGROUP_DRM_H
+
+#ifdef CONFIG_CGROUP_DRM
+
+#include <linux/cgroup.h>
+
+/**
+ * The DRM cgroup controller data structure.
+ */
+struct drmcg {
+	struct cgroup_subsys_state	css;
+};
+
+/**
+ * css_to_drmcg - get the corresponding drmcg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Return: DRM cgroup that contains the @css
+ */
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct drmcg, css) : NULL;
+}
+
+/**
+ * drmcg_get - get the drmcg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increase the reference count of the css that the @task belongs to
+ *
+ * Return: reference to the DRM cgroup the task belongs to
+ */
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+	return css_to_drmcg(task_get_css(task, drm_cgrp_id));
+}
+
+/**
+ * drmcg_put - put a drmcg reference
+ * @drmcg: the target drmcg
+ *
+ * Put a reference obtained via drmcg_get
+ */
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+	if (drmcg)
+		css_put(&drmcg->css);
+}
+
+/**
+ * drmcg_parent - find the parent of a drm cgroup
+ * @cg: the target drmcg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Return: parent DRM cgroup of @cg
+ */
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+	return css_to_drmcg(cg->css.parent);
+}
+
+#else /* CONFIG_CGROUP_DRM */
+
+struct drmcg {
+};
+
+static inline struct drmcg *css_to_drmcg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct drmcg *drmcg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void drmcg_put(struct drmcg *drmcg)
+{
+}
+
+static inline struct drmcg *drmcg_parent(struct drmcg *cg)
+{
+	return NULL;
+}
+
+#endif	/* CONFIG_CGROUP_DRM */
+#endif	/* _CGROUP_DRM_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..ddedad809e8b 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,10 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+SUBSYS(drm)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index 8b9ffe236e4f..01d3453f6e04 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -876,6 +876,11 @@ config CGROUP_RDMA
 	  Attaching processes with active RDMA resources to the cgroup
 	  hierarchy is allowed even if can cross the hierarchy's limit.
 
+config CGROUP_DRM
+	bool "DRM controller (EXPERIMENTAL)"
+	help
+	  Provides accounting and enforcement of resources in the DRM subsystem.
+
 config CGROUP_FREEZER
 	bool "Freezer controller"
 	help
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 5d7a76bfbbb7..31f186f58121 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -4,5 +4,6 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o freezer.o
 obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += pids.o
 obj-$(CONFIG_CGROUP_RDMA) += rdma.o
+obj-$(CONFIG_CGROUP_DRM) += drm.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
new file mode 100644
index 000000000000..e97861b3cb30
--- /dev/null
+++ b/kernel/cgroup/drm.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+#include <linux/slab.h>
+#include <linux/cgroup.h>
+#include <linux/cgroup_drm.h>
+
+static struct drmcg *root_drmcg __read_mostly;
+
+static void drmcg_css_free(struct cgroup_subsys_state *css)
+{
+	struct drmcg *drmcg = css_to_drmcg(css);
+
+	kfree(drmcg);
+}
+
+static struct cgroup_subsys_state *
+drmcg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct drmcg *parent = css_to_drmcg(parent_css);
+	struct drmcg *drmcg;
+
+	drmcg = kzalloc(sizeof(struct drmcg), GFP_KERNEL);
+	if (!drmcg)
+		return ERR_PTR(-ENOMEM);
+
+	if (!parent)
+		root_drmcg = drmcg;
+
+	return &drmcg->css;
+}
+
+struct cftype files[] = {
+	{ }	/* terminate */
+};
+
+struct cgroup_subsys drm_cgrp_subsys = {
+	.css_alloc	= drmcg_css_alloc,
+	.css_free	= drmcg_css_free,
+	.early_init	= false,
+	.legacy_cftypes	= files,
+	.dfl_cftypes	= files,
+};
-- 
2.22.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 03/16] drm, cgroup: Initialize drmcg properties
  2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
  2019-08-29  6:05 ` [PATCH RFC v4 01/16] drm: Add drm_minor_for_each Kenny Ho
  2019-08-29  6:05 ` [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem Kenny Ho
@ 2019-08-29  6:05 ` Kenny Ho
  2019-08-29  6:05 ` [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel
  Cc: Kenny Ho

drmcg initialization involves allocating a per cgroup, per device data
structure and setting the defaults.  There are two entry points for
drmcg init:

1) When struct drmcg is created via css_alloc, initialization is done
for each device

2) When DRM devices are created after drmcgs are created
  a) Per device drmcg data structure is allocated at the beginning of
  DRM device creation such that drmcg can begin tracking usage
  statistics
  b) At the end of DRM device creation, drmcg_device_update is called in
  case device specific defaults need to be applied.

Entry point #2 usually applies to the root cgroup since it can be
created before DRM devices are available.  The drmcg controller will go
through all existing drm cgroups and initialize them with the new device
accordingly.

Change-Id: I908ee6975ea0585e4c30eafde4599f87094d8c65
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/drm_drv.c  |   7 +++
 include/drm/drm_cgroup.h   |  27 ++++++++
 include/drm/drm_device.h   |   7 +++
 include/drm/drm_drv.h      |   9 +++
 include/linux/cgroup_drm.h |  13 ++++
 kernel/cgroup/drm.c        | 123 +++++++++++++++++++++++++++++++++++++
 6 files changed, 186 insertions(+)
 create mode 100644 include/drm/drm_cgroup.h

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 000cddabd970..94265eba68ca 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -37,6 +37,7 @@
 #include <drm/drm_client.h>
 #include <drm/drm_drv.h>
 #include <drm/drmP.h>
+#include <drm/drm_cgroup.h>
 
 #include "drm_crtc_internal.h"
 #include "drm_legacy.h"
@@ -672,6 +673,7 @@ int drm_dev_init(struct drm_device *dev,
 	mutex_init(&dev->filelist_mutex);
 	mutex_init(&dev->clientlist_mutex);
 	mutex_init(&dev->master_mutex);
+	mutex_init(&dev->drmcg_mutex);
 
 	dev->anon_inode = drm_fs_inode_new();
 	if (IS_ERR(dev->anon_inode)) {
@@ -708,6 +710,7 @@ int drm_dev_init(struct drm_device *dev,
 	if (ret)
 		goto err_setunique;
 
+	drmcg_device_early_init(dev);
 	return 0;
 
 err_setunique:
@@ -722,6 +725,7 @@ int drm_dev_init(struct drm_device *dev,
 	drm_fs_inode_free(dev->anon_inode);
 err_free:
 	put_device(dev->dev);
+	mutex_destroy(&dev->drmcg_mutex);
 	mutex_destroy(&dev->master_mutex);
 	mutex_destroy(&dev->clientlist_mutex);
 	mutex_destroy(&dev->filelist_mutex);
@@ -798,6 +802,7 @@ void drm_dev_fini(struct drm_device *dev)
 
 	put_device(dev->dev);
 
+	mutex_destroy(&dev->drmcg_mutex);
 	mutex_destroy(&dev->master_mutex);
 	mutex_destroy(&dev->clientlist_mutex);
 	mutex_destroy(&dev->filelist_mutex);
@@ -1008,6 +1013,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags)
 		 dev->dev ? dev_name(dev->dev) : "virtual device",
 		 dev->primary->index);
 
+	drmcg_device_update(dev);
+
 	goto out_unlock;
 
 err_minors:
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
new file mode 100644
index 000000000000..bef9f9245924
--- /dev/null
+++ b/include/drm/drm_cgroup.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef __DRM_CGROUP_H__
+#define __DRM_CGROUP_H__
+
+/**
+ * Per DRM device properties for DRM cgroup controller for the purpose
+ * of storing per device defaults
+ */
+struct drmcg_props {
+};
+
+#ifdef CONFIG_CGROUP_DRM
+
+void drmcg_device_update(struct drm_device *device);
+void drmcg_device_early_init(struct drm_device *device);
+#else
+static inline void drmcg_device_update(struct drm_device *device)
+{
+}
+
+static inline void drmcg_device_early_init(struct drm_device *device)
+{
+}
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 7f9ef709b2b6..5d7d779a5083 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -8,6 +8,7 @@
 
 #include <drm/drm_hashtab.h>
 #include <drm/drm_mode_config.h>
+#include <drm/drm_cgroup.h>
 
 struct drm_driver;
 struct drm_minor;
@@ -304,6 +305,12 @@ struct drm_device {
 	 */
 	struct drm_fb_helper *fb_helper;
 
+        /** \name DRM Cgroup */
+	/*@{ */
+	struct mutex drmcg_mutex;
+	struct drmcg_props drmcg_props;
+	/*@} */
+
 	/* Everything below here is for legacy driver, never use! */
 	/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 24f8d054c570..c8a37a08d98d 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -660,6 +660,15 @@ struct drm_driver {
 			    struct drm_device *dev,
 			    uint32_t handle);
 
+	/**
+	 * @drmcg_custom_init
+	 *
+	 * Optional callback used to initialize drm cgroup per device properties
+	 * such as resource limit defaults.
+	 */
+	void (*drmcg_custom_init)(struct drm_device *dev,
+			struct drmcg_props *props);
+
 	/**
 	 * @gem_vm_ops: Driver private ops for this object
 	 */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 971166f9dd78..4ecd44f2ac27 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -6,13 +6,26 @@
 
 #ifdef CONFIG_CGROUP_DRM
 
+#include <linux/mutex.h>
 #include <linux/cgroup.h>
+#include <drm/drm_file.h>
+
+/* limit defined per the way drm_minor_alloc operates */
+#define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
+
+/**
+ * Per DRM cgroup, per device resources (such as statistics and limits)
+ */
+struct drmcg_device_resource {
+	/* for per device stats */
+};
 
 /**
  * The DRM cgroup controller data structure.
  */
 struct drmcg {
 	struct cgroup_subsys_state	css;
+	struct drmcg_device_resource	*dev_resources[MAX_DRM_DEV];
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index e97861b3cb30..135fdcdc4b51 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -1,28 +1,103 @@
 // SPDX-License-Identifier: MIT
 // Copyright 2019 Advanced Micro Devices, Inc.
+#include <linux/export.h>
 #include <linux/slab.h>
 #include <linux/cgroup.h>
+#include <linux/fs.h>
+#include <linux/seq_file.h>
+#include <linux/mutex.h>
 #include <linux/cgroup_drm.h>
+#include <linux/kernel.h>
+#include <drm/drm_file.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_device.h>
+#include <drm/drm_cgroup.h>
+
+/* global mutex for drmcg across all devices */
+static DEFINE_MUTEX(drmcg_mutex);
 
 static struct drmcg *root_drmcg __read_mostly;
 
+static int drmcg_css_free_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct drmcg *drmcg = data;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	kfree(drmcg->dev_resources[minor->index]);
+
+	return 0;
+}
+
 static void drmcg_css_free(struct cgroup_subsys_state *css)
 {
 	struct drmcg *drmcg = css_to_drmcg(css);
 
+	drm_minor_for_each(&drmcg_css_free_fn, drmcg);
+
 	kfree(drmcg);
 }
 
+static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
+{
+	int minor = dev->primary->index;
+	struct drmcg_device_resource *ddr = drmcg->dev_resources[minor];
+
+	if (ddr == NULL) {
+		ddr = kzalloc(sizeof(struct drmcg_device_resource),
+			GFP_KERNEL);
+
+		if (!ddr)
+			return -ENOMEM;
+	}
+
+	mutex_lock(&dev->drmcg_mutex);
+	drmcg->dev_resources[minor] = ddr;
+
+	/* set defaults here */
+
+	mutex_unlock(&dev->drmcg_mutex);
+	return 0;
+}
+
+static int init_drmcg_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct drmcg *drmcg = data;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	return init_drmcg_single(drmcg, minor->dev);
+}
+
+static inline int init_drmcg(struct drmcg *drmcg, struct drm_device *dev)
+{
+	if (dev != NULL)
+		return init_drmcg_single(drmcg, dev);
+
+	return drm_minor_for_each(&init_drmcg_fn, drmcg);
+}
+
 static struct cgroup_subsys_state *
 drmcg_css_alloc(struct cgroup_subsys_state *parent_css)
 {
 	struct drmcg *parent = css_to_drmcg(parent_css);
 	struct drmcg *drmcg;
+	int rc;
 
 	drmcg = kzalloc(sizeof(struct drmcg), GFP_KERNEL);
 	if (!drmcg)
 		return ERR_PTR(-ENOMEM);
 
+	rc = init_drmcg(drmcg, NULL);
+	if (rc) {
+		drmcg_css_free(&drmcg->css);
+		return ERR_PTR(rc);
+	}
+
 	if (!parent)
 		root_drmcg = drmcg;
 
@@ -40,3 +115,51 @@ struct cgroup_subsys drm_cgrp_subsys = {
 	.legacy_cftypes	= files,
 	.dfl_cftypes	= files,
 };
+
+static inline void drmcg_update_cg_tree(struct drm_device *dev)
+{
+	/* init cgroups created before registration (i.e. root cgroup) */
+	if (root_drmcg != NULL) {
+		struct cgroup_subsys_state *pos;
+		struct drmcg *child;
+
+		rcu_read_lock();
+		css_for_each_descendant_pre(pos, &root_drmcg->css) {
+			child = css_to_drmcg(pos);
+			init_drmcg(child, dev);
+		}
+		rcu_read_unlock();
+	}
+}
+
+/**
+ * drmcg_device_update - update DRM cgroups defaults
+ * @dev: the target DRM device
+ *
+ * If @dev has a drmcg_custom_init for the DRM cgroup controller, it will be called
+ * to set device specific defaults and set the initial values for all existing
+ * cgroups created prior to @dev become available.
+ */
+void drmcg_device_update(struct drm_device *dev)
+{
+	if (dev->driver->drmcg_custom_init)
+	{
+		dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
+
+		drmcg_update_cg_tree(dev);
+	}
+}
+EXPORT_SYMBOL(drmcg_device_update);
+
+/**
+ * drmcg_device_early_init - initialize device specific resources for DRM cgroups
+ * @dev: the target DRM device
+ *
+ * Allocate and initialize device specific resources for existing DRM cgroups.
+ * Typically only the root cgroup exists before the initialization of @dev.
+ */
+void drmcg_device_early_init(struct drm_device *dev)
+{
+	drmcg_update_cg_tree(dev);
+}
+EXPORT_SYMBOL(drmcg_device_early_init);
-- 
2.22.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 04/16] drm, cgroup: Add total GEM buffer allocation stats
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-08-29  6:05   ` Kenny Ho
  2019-08-29  6:05   ` [PATCH RFC v4 05/16] drm, cgroup: Add peak " Kenny Ho
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

The drm resource being measured here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup stats per drm device.  Each allocation
is charged to the owning cgroup as well as all its ancestors.

Similar to the memory cgroup, migrating a process to a different cgroup
does not move the GEM buffer usages that the process started while in
previous cgroup, to the new cgroup.

The following is an example to illustrate some of the operations.  Given
the following cgroup hierarchy (The letters are cgroup names with R
being the root cgroup.  The numbers in brackets are processes.  The
processes are placed with cgroup's 'No Internal Process Constraint' in
mind, so no process is placed in cgroup B.)

R (4, 5) ------ A (6)
 \
  B ---- C (7,8)
   \
    D (9)

Here is a list of operation and the associated effect on the size
track by the cgroups (for simplicity, each buffer is 1 unit in size.)

==  ==  ==  ==  ==  ===================================================
R   A   B   C   D   Ops
==  ==  ==  ==  ==  ===================================================
1   0   0   0   0   4 allocated a buffer
1   0   0   0   0   4 shared a buffer with 5
1   0   0   0   0   4 shared a buffer with 9
2   0   1   0   1   9 allocated a buffer
3   0   2   1   1   7 allocated a buffer
3   0   2   1   1   7 shared a buffer with 8
3   0   2   1   1   7 sharing with 9
3   0   2   1   1   7 release a buffer
3   0   2   1   1   7 migrate to cgroup D
3   0   2   1   1   9 release a buffer from 7
2   0   1   0   1   8 release a buffer from 7 (last ref to shared buf)
==  ==  ==  ==  ==  ===================================================

drm.buffer.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Total GEM buffer allocation in bytes.

Change-Id: I9d662ec50d64bb40a37dbf47f018b2f3a1c033ad
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  50 +++++++++-
 drivers/gpu/drm/drm_gem.c               |   9 ++
 include/drm/drm_cgroup.h                |  16 +++
 include/drm/drm_gem.h                   |  11 +++
 include/linux/cgroup_drm.h              |   6 ++
 kernel/cgroup/drm.c                     | 126 ++++++++++++++++++++++++
 6 files changed, 217 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 2936423a3fd5..0e29d136e2f9 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -63,6 +63,7 @@ v1 is available under Documentation/cgroup-v1/.
        5-7-1. RDMA Interface Files
      5-8. DRM
        5-8-1. DRM Interface Files
+       5-8-2. GEM Buffer Ownership
      5-9. Misc
        5-9-1. perf_event
      5-N. Non-normative information
@@ -1900,7 +1901,54 @@ of DRM (Direct Rendering Manager) and GPU-related resources.
 DRM Interface Files
 ~~~~~~~~~~~~~~~~~~~~
 
-TODO
+  drm.buffer.stats
+	A read-only flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Total GEM buffer allocation in bytes.
+
+GEM Buffer Ownership
+~~~~~~~~~~~~~~~~~~~~
+
+For the purpose of cgroup accounting and limiting, ownership of the
+buffer is deemed to be the cgroup for which the allocating process
+belongs to.  There is one cgroup stats per drm device.  Each allocation
+is charged to the owning cgroup as well as all its ancestors.
+
+Similar to the memory cgroup, migrating a process to a different cgroup
+does not move the GEM buffer usages that the process started while in
+previous cgroup, to the new cgroup.
+
+The following is an example to illustrate some of the operations.  Given
+the following cgroup hierarchy (The letters are cgroup names with R
+being the root cgroup.  The numbers in brackets are processes.  The
+processes are placed with cgroup's 'No Internal Process Constraint' in
+mind, so no process is placed in cgroup B.)
+
+R (4, 5) ------ A (6)
+ \
+  B ---- C (7,8)
+   \
+    D (9)
+
+Here is a list of operation and the associated effect on the size
+track by the cgroups (for simplicity, each buffer is 1 unit in size.)
+
+==  ==  ==  ==  ==  ===================================================
+R   A   B   C   D   Ops
+==  ==  ==  ==  ==  ===================================================
+1   0   0   0   0   4 allocated a buffer
+1   0   0   0   0   4 shared a buffer with 5
+1   0   0   0   0   4 shared a buffer with 9
+2   0   1   0   1   9 allocated a buffer
+3   0   2   1   1   7 allocated a buffer
+3   0   2   1   1   7 shared a buffer with 8
+3   0   2   1   1   7 sharing with 9
+3   0   2   1   1   7 release a buffer
+3   0   2   1   1   7 migrate to cgroup D
+3   0   2   1   1   9 release a buffer from 7
+2   0   1   0   1   8 release a buffer from 7 (last ref to shared buf)
+==  ==  ==  ==  ==  ===================================================
 
 
 Misc
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 50de138c89e0..517b71a6f4d4 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -38,10 +38,12 @@
 #include <linux/dma-buf.h>
 #include <linux/mem_encrypt.h>
 #include <linux/pagevec.h>
+#include <linux/cgroup_drm.h>
 #include <drm/drmP.h>
 #include <drm/drm_vma_manager.h>
 #include <drm/drm_gem.h>
 #include <drm/drm_print.h>
+#include <drm/drm_cgroup.h>
 #include "drm_internal.h"
 
 /** @file drm_gem.c
@@ -159,6 +161,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
 		obj->resv = &obj->_resv;
 
 	drm_vma_node_reset(&obj->vma_node);
+
+	obj->drmcg = drmcg_get(current);
+	drmcg_chg_bo_alloc(obj->drmcg, dev, size);
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
@@ -950,6 +955,10 @@ drm_gem_object_release(struct drm_gem_object *obj)
 		fput(obj->filp);
 
 	reservation_object_fini(&obj->_resv);
+
+	drmcg_unchg_bo_alloc(obj->drmcg, obj->dev, obj->size);
+	drmcg_put(obj->drmcg);
+
 	drm_gem_free_mmap_offset(obj);
 }
 EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index bef9f9245924..1fa37d1ad44c 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -4,6 +4,8 @@
 #ifndef __DRM_CGROUP_H__
 #define __DRM_CGROUP_H__
 
+#include <linux/cgroup_drm.h>
+
 /**
  * Per DRM device properties for DRM cgroup controller for the purpose
  * of storing per device defaults
@@ -15,6 +17,10 @@ struct drmcg_props {
 
 void drmcg_device_update(struct drm_device *device);
 void drmcg_device_early_init(struct drm_device *device);
+void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+		size_t size);
+void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+		size_t size);
 #else
 static inline void drmcg_device_update(struct drm_device *device)
 {
@@ -23,5 +29,15 @@ static inline void drmcg_device_update(struct drm_device *device)
 static inline void drmcg_device_early_init(struct drm_device *device)
 {
 }
+
+static inline void drmcg_chg_bo_alloc(struct drmcg *drmcg,
+		struct drm_device *dev,	size_t size)
+{
+}
+
+static inline void drmcg_unchg_bo_alloc(struct drmcg *drmcg,
+		struct drm_device *dev,	size_t size)
+{
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 5047c7ee25f5..6047968bdd17 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -291,6 +291,17 @@ struct drm_gem_object {
 	 *
 	 */
 	const struct drm_gem_object_funcs *funcs;
+
+	/**
+	 * @drmcg:
+	 *
+	 * DRM cgroup this GEM object belongs to.
+	 *
+	 * This is used to track and limit the amount of GEM objects a user
+	 * can allocate.  Since GEM objects can be shared, this is also used
+	 * to ensure GEM objects are only shared within the same cgroup.
+	 */
+	struct drmcg *drmcg;
 };
 
 /**
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 4ecd44f2ac27..1d8a7f2cdb4e 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -13,11 +13,17 @@
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
+enum drmcg_res_type {
+	DRMCG_TYPE_BO_TOTAL,
+	__DRMCG_TYPE_LAST,
+};
+
 /**
  * Per DRM cgroup, per device resources (such as statistics and limits)
  */
 struct drmcg_device_resource {
 	/* for per device stats */
+	s64			bo_stats_total_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 135fdcdc4b51..87ae9164d8d8 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -11,11 +11,24 @@
 #include <drm/drm_file.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_device.h>
+#include <drm/drm_ioctl.h>
 #include <drm/drm_cgroup.h>
 
 /* global mutex for drmcg across all devices */
 static DEFINE_MUTEX(drmcg_mutex);
 
+#define DRMCG_CTF_PRIV_SIZE 3
+#define DRMCG_CTF_PRIV_MASK GENMASK((DRMCG_CTF_PRIV_SIZE - 1), 0)
+#define DRMCG_CTF_PRIV(res_type, f_type)  ((res_type) <<\
+		DRMCG_CTF_PRIV_SIZE | (f_type))
+#define DRMCG_CTF_PRIV2RESTYPE(priv) ((priv) >> DRMCG_CTF_PRIV_SIZE)
+#define DRMCG_CTF_PRIV2FTYPE(priv) ((priv) & DRMCG_CTF_PRIV_MASK)
+
+
+enum drmcg_file_type {
+	DRMCG_FTYPE_STATS,
+};
+
 static struct drmcg *root_drmcg __read_mostly;
 
 static int drmcg_css_free_fn(int id, void *ptr, void *data)
@@ -104,7 +117,66 @@ drmcg_css_alloc(struct cgroup_subsys_state *parent_css)
 	return &drmcg->css;
 }
 
+static void drmcg_print_stats(struct drmcg_device_resource *ddr,
+		struct seq_file *sf, enum drmcg_res_type type)
+{
+	if (ddr == NULL) {
+		seq_puts(sf, "\n");
+		return;
+	}
+
+	switch (type) {
+	case DRMCG_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
+static int drmcg_seq_show_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct seq_file *sf = data;
+	struct drmcg *drmcg = css_to_drmcg(seq_css(sf));
+	enum drmcg_file_type f_type =
+		DRMCG_CTF_PRIV2FTYPE(seq_cft(sf)->private);
+	enum drmcg_res_type type =
+		DRMCG_CTF_PRIV2RESTYPE(seq_cft(sf)->private);
+	struct drmcg_device_resource *ddr;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	ddr = drmcg->dev_resources[minor->index];
+
+	seq_printf(sf, "%d:%d ", DRM_MAJOR, minor->index);
+
+	switch (f_type) {
+	case DRMCG_FTYPE_STATS:
+		drmcg_print_stats(ddr, sf, type);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+
+	return 0;
+}
+
+int drmcg_seq_show(struct seq_file *sf, void *v)
+{
+	return drm_minor_for_each(&drmcg_seq_show_fn, sf);
+}
+
 struct cftype files[] = {
+	{
+		.name = "buffer.total.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
+						DRMCG_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -163,3 +235,57 @@ void drmcg_device_early_init(struct drm_device *dev)
 	drmcg_update_cg_tree(dev);
 }
 EXPORT_SYMBOL(drmcg_device_early_init);
+
+/**
+ * drmcg_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
+ * @drmcg: the DRM cgroup to be charged to
+ * @dev: the device the usage should be charged to
+ * @size: size of the GEM buffer to be accounted for
+ *
+ * This function should be called when a new GEM buffer is allocated to account
+ * for the utilization.  This should not be called when the buffer is shared (
+ * the GEM buffer's reference count being incremented.)
+ */
+void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+		size_t size)
+{
+	struct drmcg_device_resource *ddr;
+	int devIdx = dev->primary->index;
+
+	if (drmcg == NULL)
+		return;
+
+	mutex_lock(&dev->drmcg_mutex);
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+		ddr = drmcg->dev_resources[devIdx];
+
+		ddr->bo_stats_total_allocated += (s64)size;
+	}
+	mutex_unlock(&dev->drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_chg_bo_alloc);
+
+/**
+ * drmcg_unchg_bo_alloc -
+ * @drmcg: the DRM cgroup to uncharge from
+ * @dev: the device the usage should be removed from
+ * @size: size of the GEM buffer to be accounted for
+ *
+ * This function should be called when the GEM buffer is about to be freed (
+ * not simply when the GEM buffer's reference count is being decremented.)
+ */
+void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+		size_t size)
+{
+	int devIdx = dev->primary->index;
+
+	if (drmcg == NULL)
+		return;
+
+	mutex_lock(&dev->drmcg_mutex);
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg))
+		drmcg->dev_resources[devIdx]->bo_stats_total_allocated
+			-= (s64)size;
+	mutex_unlock(&dev->drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_unchg_bo_alloc);
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 05/16] drm, cgroup: Add peak GEM buffer allocation stats
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-08-29  6:05   ` [PATCH RFC v4 04/16] drm, cgroup: Add total GEM buffer allocation stats Kenny Ho
@ 2019-08-29  6:05   ` Kenny Ho
  2019-08-29  6:05   ` [PATCH RFC v4 06/16] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

drm.buffer.peak.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Largest (high water mark) GEM buffer allocated in bytes.

Change-Id: I79e56222151a3d33a76a61ba0097fe93ebb3449f
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++++++
 include/linux/cgroup_drm.h              |  3 +++
 kernel/cgroup/drm.c                     | 12 ++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 0e29d136e2f9..8588a0ffc69d 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1907,6 +1907,12 @@ DRM Interface Files
 
 	Total GEM buffer allocation in bytes.
 
+  drm.buffer.peak.stats
+	A read-only flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Largest (high water mark) GEM buffer allocated in bytes.
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 1d8a7f2cdb4e..974d390cfa4f 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -15,6 +15,7 @@
 
 enum drmcg_res_type {
 	DRMCG_TYPE_BO_TOTAL,
+	DRMCG_TYPE_BO_PEAK,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -24,6 +25,8 @@ enum drmcg_res_type {
 struct drmcg_device_resource {
 	/* for per device stats */
 	s64			bo_stats_total_allocated;
+
+	s64			bo_stats_peak_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 87ae9164d8d8..0bf5b95668c4 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -129,6 +129,9 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_TOTAL:
 		seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
 		break;
+	case DRMCG_TYPE_BO_PEAK:
+		seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -177,6 +180,12 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "buffer.peak.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+						DRMCG_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -260,6 +269,9 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		ddr = drmcg->dev_resources[devIdx];
 
 		ddr->bo_stats_total_allocated += (s64)size;
+
+		if (ddr->bo_stats_peak_allocated < (s64)size)
+			ddr->bo_stats_peak_allocated = (s64)size;
 	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 06/16] drm, cgroup: Add GEM buffer allocation count stats
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-08-29  6:05   ` [PATCH RFC v4 04/16] drm, cgroup: Add total GEM buffer allocation stats Kenny Ho
  2019-08-29  6:05   ` [PATCH RFC v4 05/16] drm, cgroup: Add peak " Kenny Ho
@ 2019-08-29  6:05   ` Kenny Ho
  2019-08-29  6:05   ` [PATCH RFC v4 08/16] drm, cgroup: Add peak GEM buffer allocation limit Kenny Ho
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

drm.buffer.count.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Total number of GEM buffer allocated.

Change-Id: Id3e1809d5fee8562e47a7d2b961688956d844ec6
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++++++
 include/linux/cgroup_drm.h              |  3 +++
 kernel/cgroup/drm.c                     | 22 +++++++++++++++++++---
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 8588a0ffc69d..4dc72339a9b6 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1913,6 +1913,12 @@ DRM Interface Files
 
 	Largest (high water mark) GEM buffer allocated in bytes.
 
+  drm.buffer.count.stats
+	A read-only flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Total number of GEM buffer allocated.
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 974d390cfa4f..972f7aa975b5 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -16,6 +16,7 @@
 enum drmcg_res_type {
 	DRMCG_TYPE_BO_TOTAL,
 	DRMCG_TYPE_BO_PEAK,
+	DRMCG_TYPE_BO_COUNT,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -27,6 +28,8 @@ struct drmcg_device_resource {
 	s64			bo_stats_total_allocated;
 
 	s64			bo_stats_peak_allocated;
+
+	s64			bo_stats_count_allocated;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 0bf5b95668c4..85e46ece4a82 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -132,6 +132,9 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_PEAK:
 		seq_printf(sf, "%lld\n", ddr->bo_stats_peak_allocated);
 		break;
+	case DRMCG_TYPE_BO_COUNT:
+		seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -186,6 +189,12 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "buffer.count.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
+						DRMCG_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -272,6 +281,8 @@ void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 
 		if (ddr->bo_stats_peak_allocated < (s64)size)
 			ddr->bo_stats_peak_allocated = (s64)size;
+
+		ddr->bo_stats_count_allocated++;
 	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
@@ -289,15 +300,20 @@ EXPORT_SYMBOL(drmcg_chg_bo_alloc);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size)
 {
+	struct drmcg_device_resource *ddr;
 	int devIdx = dev->primary->index;
 
 	if (drmcg == NULL)
 		return;
 
 	mutex_lock(&dev->drmcg_mutex);
-	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg))
-		drmcg->dev_resources[devIdx]->bo_stats_total_allocated
-			-= (s64)size;
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+		ddr = drmcg->dev_resources[devIdx];
+
+		ddr->bo_stats_total_allocated -= (s64)size;
+
+		ddr->bo_stats_count_allocated--;
+	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
 EXPORT_SYMBOL(drmcg_unchg_bo_alloc);
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit
  2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (2 preceding siblings ...)
  2019-08-29  6:05 ` [PATCH RFC v4 03/16] drm, cgroup: Initialize drmcg properties Kenny Ho
@ 2019-08-29  6:05 ` Kenny Ho
       [not found]   ` <20190829060533.32315-8-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel
  Cc: Kenny Ho

The drm resource being limited here is the GEM buffer objects.  User
applications allocate and free these buffers.  In addition, a process
can allocate a buffer and share it with another process.  The consumer
of a shared buffer can also outlive the allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup limit per drm device.

The limiting functionality is added to the previous stats collection
function.  The drm_gem_private_object_init is modified to have a return
value to allow failure due to cgroup limit.

The try_chg function only fails if the DRM cgroup properties has
limit_enforced set to true for the DRM device.  This is to allow the DRM
cgroup controller to collect usage stats without enforcing the limits.

drm.buffer.default
        A read-only flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Default limits on the total GEM buffer allocation in bytes.

drm.buffer.max
        A read-write flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Per device limits on the total GEM buffer allocation in byte.
        This is a hard limit.  Attempts in allocating beyond the cgroup
        limit will result in ENOMEM.  Shorthand understood by memparse
        (such as k, m, g) can be used.

        Set allocation limit for /dev/dri/card1 to 1GB
        echo "226:1 1g" > drm.buffer.total.max

        Set allocation limit for /dev/dri/card0 to 512MB
        echo "226:0 512m" > drm.buffer.total.max

Change-Id: I96e0b7add4d331ed8bb267b3c9243d360c6e9903
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst    |  21 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   6 +-
 drivers/gpu/drm/drm_gem.c                  |  11 +-
 include/drm/drm_cgroup.h                   |   7 +-
 include/drm/drm_gem.h                      |   2 +-
 include/linux/cgroup_drm.h                 |   1 +
 kernel/cgroup/drm.c                        | 221 ++++++++++++++++++++-
 8 files changed, 260 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 4dc72339a9b6..e8fac2684179 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1919,6 +1919,27 @@ DRM Interface Files
 
 	Total number of GEM buffer allocated.
 
+  drm.buffer.default
+	A read-only flat-keyed file which exists on the root cgroup.
+	Each entry is keyed by the drm device's major:minor.
+
+	Default limits on the total GEM buffer allocation in bytes.
+
+  drm.buffer.max
+	A read-write flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Per device limits on the total GEM buffer allocation in byte.
+	This is a hard limit.  Attempts in allocating beyond the cgroup
+	limit will result in ENOMEM.  Shorthand understood by memparse
+	(such as k, m, g) can be used.
+
+	Set allocation limit for /dev/dri/card1 to 1GB
+	echo "226:1 1g" > drm.buffer.total.max
+
+	Set allocation limit for /dev/dri/card0 to 512MB
+	echo "226:0 512m" > drm.buffer.total.max
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index c0bbd3aa0558..163a4fbf0611 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1395,6 +1395,12 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
 						  stime, etime, mode);
 }
 
+static void amdgpu_drmcg_custom_init(struct drm_device *dev,
+	struct drmcg_props *props)
+{
+	props->limit_enforced = true;
+}
+
 static struct drm_driver kms_driver = {
 	.driver_features =
 	    DRIVER_USE_AGP | DRIVER_ATOMIC |
@@ -1431,6 +1437,8 @@ static struct drm_driver kms_driver = {
 	.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
 	.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
+	.drmcg_custom_init = amdgpu_drmcg_custom_init,
+
 	.name = DRIVER_NAME,
 	.desc = DRIVER_DESC,
 	.date = DRIVER_DATE,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 989b7b55cb2e..b1bd66be3e1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -34,6 +34,7 @@
 #include <drm/drmP.h>
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_cache.h>
+#include <drm/drm_cgroup.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -454,7 +455,10 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 	bo = kzalloc(sizeof(struct amdgpu_bo), GFP_KERNEL);
 	if (bo == NULL)
 		return -ENOMEM;
-	drm_gem_private_object_init(adev->ddev, &bo->gem_base, size);
+	if (!drm_gem_private_object_init(adev->ddev, &bo->gem_base, size)) {
+		kfree(bo);
+		return -ENOMEM;
+	}
 	INIT_LIST_HEAD(&bo->shadow_list);
 	bo->vm_bo = NULL;
 	bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 517b71a6f4d4..7887f153ab83 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -145,11 +145,17 @@ EXPORT_SYMBOL(drm_gem_object_init);
  * no GEM provided backing store. Instead the caller is responsible for
  * backing the object and handling it.
  */
-void drm_gem_private_object_init(struct drm_device *dev,
+bool drm_gem_private_object_init(struct drm_device *dev,
 				 struct drm_gem_object *obj, size_t size)
 {
 	BUG_ON((size & (PAGE_SIZE - 1)) != 0);
 
+	obj->drmcg = drmcg_get(current);
+	if (!drmcg_try_chg_bo_alloc(obj->drmcg, dev, size)) {
+		drmcg_put(obj->drmcg);
+		obj->drmcg = NULL;
+		return false;
+	}
 	obj->dev = dev;
 	obj->filp = NULL;
 
@@ -162,8 +168,7 @@ void drm_gem_private_object_init(struct drm_device *dev,
 
 	drm_vma_node_reset(&obj->vma_node);
 
-	obj->drmcg = drmcg_get(current);
-	drmcg_chg_bo_alloc(obj->drmcg, dev, size);
+	return true;
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 1fa37d1ad44c..49c5d35ff6e1 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -11,13 +11,16 @@
  * of storing per device defaults
  */
 struct drmcg_props {
+	bool			limit_enforced;
+
+	s64			bo_limits_total_allocated_default;
 };
 
 #ifdef CONFIG_CGROUP_DRM
 
 void drmcg_device_update(struct drm_device *device);
 void drmcg_device_early_init(struct drm_device *device);
-void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size);
@@ -30,7 +33,7 @@ static inline void drmcg_device_early_init(struct drm_device *device)
 {
 }
 
-static inline void drmcg_chg_bo_alloc(struct drmcg *drmcg,
+static inline void drmcg_try_chg_bo_alloc(struct drmcg *drmcg,
 		struct drm_device *dev,	size_t size)
 {
 }
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 6047968bdd17..2bf0c0962ddf 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -334,7 +334,7 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
 			struct drm_gem_object *obj, size_t size);
-void drm_gem_private_object_init(struct drm_device *dev,
+bool drm_gem_private_object_init(struct drm_device *dev,
 				 struct drm_gem_object *obj, size_t size);
 void drm_gem_vm_open(struct vm_area_struct *vma);
 void drm_gem_vm_close(struct vm_area_struct *vma);
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 972f7aa975b5..eb54e56f20ae 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -26,6 +26,7 @@ enum drmcg_res_type {
 struct drmcg_device_resource {
 	/* for per device stats */
 	s64			bo_stats_total_allocated;
+	s64			bo_limits_total_allocated;
 
 	s64			bo_stats_peak_allocated;
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 85e46ece4a82..7161fa40e156 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -27,6 +27,8 @@ static DEFINE_MUTEX(drmcg_mutex);
 
 enum drmcg_file_type {
 	DRMCG_FTYPE_STATS,
+	DRMCG_FTYPE_LIMIT,
+	DRMCG_FTYPE_DEFAULT,
 };
 
 static struct drmcg *root_drmcg __read_mostly;
@@ -70,6 +72,8 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 	drmcg->dev_resources[minor] = ddr;
 
 	/* set defaults here */
+	ddr->bo_limits_total_allocated =
+		dev->drmcg_props.bo_limits_total_allocated_default;
 
 	mutex_unlock(&dev->drmcg_mutex);
 	return 0;
@@ -141,6 +145,38 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 	}
 }
 
+static void drmcg_print_limits(struct drmcg_device_resource *ddr,
+		struct seq_file *sf, enum drmcg_res_type type)
+{
+	if (ddr == NULL) {
+		seq_puts(sf, "\n");
+		return;
+	}
+
+	switch (type) {
+	case DRMCG_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
+static void drmcg_print_default(struct drmcg_props *props,
+		struct seq_file *sf, enum drmcg_res_type type)
+{
+	switch (type) {
+	case DRMCG_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n",
+			props->bo_limits_total_allocated_default);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
 static int drmcg_seq_show_fn(int id, void *ptr, void *data)
 {
 	struct drm_minor *minor = ptr;
@@ -163,6 +199,12 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
 	case DRMCG_FTYPE_STATS:
 		drmcg_print_stats(ddr, sf, type);
 		break;
+	case DRMCG_FTYPE_LIMIT:
+		drmcg_print_limits(ddr, sf, type);
+		break;
+	case DRMCG_FTYPE_DEFAULT:
+		drmcg_print_default(&minor->dev->drmcg_props, sf, type);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -176,6 +218,124 @@ int drmcg_seq_show(struct seq_file *sf, void *v)
 	return drm_minor_for_each(&drmcg_seq_show_fn, sf);
 }
 
+static void drmcg_pr_cft_err(const struct drmcg *drmcg,
+		int rc, const char *cft_name, int minor)
+{
+	pr_err("drmcg: error parsing %s, minor %d, rc %d ",
+			cft_name, minor, rc);
+	pr_cont_cgroup_name(drmcg->css.cgroup);
+	pr_cont("\n");
+}
+
+static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
+			s64 def_val, s64 max_val, s64 *ret_val)
+{
+	int rc = strcmp("max", sval);
+
+
+	if (!rc)
+		*ret_val = max_val;
+	else {
+		rc = strcmp("default", sval);
+
+		if (!rc)
+			*ret_val = def_val;
+	}
+
+	if (rc) {
+		if (is_mem) {
+			*ret_val = memparse(sval, NULL);
+			rc = 0;
+		} else {
+			rc = kstrtoll(sval, 0, ret_val);
+		}
+	}
+
+	if (*ret_val > max_val)
+		rc = -EINVAL;
+
+	return rc;
+}
+
+static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
+{
+	mutex_lock(&dev->drmcg_mutex);
+	*dst = val;
+	mutex_unlock(&dev->drmcg_mutex);
+}
+
+static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
+		size_t nbytes, loff_t off)
+{
+	struct drmcg *drmcg = css_to_drmcg(of_css(of));
+	struct drmcg *parent = drmcg_parent(drmcg);
+	enum drmcg_res_type type =
+		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
+	char *cft_name = of_cft(of)->name;
+	char *limits = strstrip(buf);
+	struct drmcg_device_resource *ddr;
+	struct drmcg_props *props;
+	struct drm_minor *dm;
+	char *line;
+	char sattr[256];
+	s64 val;
+	s64 p_max;
+	int rc;
+	int minor;
+
+	while (limits != NULL) {
+		line =  strsep(&limits, "\n");
+
+		if (sscanf(line,
+			__stringify(DRM_MAJOR)":%u %255[^\t\n]",
+							&minor, sattr) != 2) {
+			pr_err("drmcg: error parsing %s ", cft_name);
+			pr_cont_cgroup_name(drmcg->css.cgroup);
+			pr_cont("\n");
+
+			continue;
+		}
+
+		dm = drm_minor_acquire(minor);
+		if (IS_ERR(dm)) {
+			pr_err("drmcg: invalid minor %d for %s ",
+					minor, cft_name);
+			pr_cont_cgroup_name(drmcg->css.cgroup);
+			pr_cont("\n");
+
+			continue;
+		}
+
+		ddr = drmcg->dev_resources[minor];
+		props = &dm->dev->drmcg_props;
+		switch (type) {
+		case DRMCG_TYPE_BO_TOTAL:
+			p_max = parent == NULL ? S64_MAX :
+				parent->dev_resources[minor]->
+				bo_limits_total_allocated;
+
+			rc = drmcg_process_limit_s64_val(sattr, true,
+				props->bo_limits_total_allocated_default,
+				p_max,
+				&val);
+
+			if (rc || val < 0) {
+				drmcg_pr_cft_err(drmcg, rc, cft_name, minor);
+				break;
+			}
+
+			drmcg_value_apply(dm->dev,
+					&ddr->bo_limits_total_allocated, val);
+			break;
+		default:
+			break;
+		}
+		drm_dev_put(dm->dev); /* release from drm_minor_acquire */
+	}
+
+	return nbytes;
+}
+
 struct cftype files[] = {
 	{
 		.name = "buffer.total.stats",
@@ -183,6 +343,20 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "buffer.total.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
+						DRMCG_FTYPE_DEFAULT),
+	},
+	{
+		.name = "buffer.total.max",
+		.write = drmcg_limit_write,
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_TOTAL,
+						DRMCG_FTYPE_LIMIT),
+	},
 	{
 		.name = "buffer.peak.stats",
 		.seq_show = drmcg_seq_show,
@@ -250,12 +424,16 @@ EXPORT_SYMBOL(drmcg_device_update);
  */
 void drmcg_device_early_init(struct drm_device *dev)
 {
+	dev->drmcg_props.limit_enforced = false;
+
+	dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
+
 	drmcg_update_cg_tree(dev);
 }
 EXPORT_SYMBOL(drmcg_device_early_init);
 
 /**
- * drmcg_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
+ * drmcg_try_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
  * @drmcg: the DRM cgroup to be charged to
  * @dev: the device the usage should be charged to
  * @size: size of the GEM buffer to be accounted for
@@ -264,29 +442,52 @@ EXPORT_SYMBOL(drmcg_device_early_init);
  * for the utilization.  This should not be called when the buffer is shared (
  * the GEM buffer's reference count being incremented.)
  */
-void drmcg_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
+bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size)
 {
 	struct drmcg_device_resource *ddr;
 	int devIdx = dev->primary->index;
+	struct drmcg_props *props = &dev->drmcg_props;
+	struct drmcg *drmcg_cur = drmcg;
+	bool result = true;
+	s64 delta = 0;
 
 	if (drmcg == NULL)
-		return;
+		return true;
 
 	mutex_lock(&dev->drmcg_mutex);
-	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
-		ddr = drmcg->dev_resources[devIdx];
+        if (props->limit_enforced) {
+		for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+			ddr = drmcg->dev_resources[devIdx];
+			delta = ddr->bo_limits_total_allocated -
+					ddr->bo_stats_total_allocated;
+
+			if (delta <= 0 || size > delta) {
+				result = false;
+				break;
+			}
+		}
+	}
+
+	drmcg = drmcg_cur;
+
+	if (result || !props->limit_enforced) {
+		for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+			ddr = drmcg->dev_resources[devIdx];
 
-		ddr->bo_stats_total_allocated += (s64)size;
+			ddr->bo_stats_total_allocated += (s64)size;
 
-		if (ddr->bo_stats_peak_allocated < (s64)size)
-			ddr->bo_stats_peak_allocated = (s64)size;
+			if (ddr->bo_stats_peak_allocated < (s64)size)
+				ddr->bo_stats_peak_allocated = (s64)size;
 
-		ddr->bo_stats_count_allocated++;
+			ddr->bo_stats_count_allocated++;
+		}
 	}
 	mutex_unlock(&dev->drmcg_mutex);
+
+	return result;
 }
-EXPORT_SYMBOL(drmcg_chg_bo_alloc);
+EXPORT_SYMBOL(drmcg_try_chg_bo_alloc);
 
 /**
  * drmcg_unchg_bo_alloc -
-- 
2.22.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 08/16] drm, cgroup: Add peak GEM buffer allocation limit
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2019-08-29  6:05   ` [PATCH RFC v4 06/16] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
@ 2019-08-29  6:05   ` Kenny Ho
  2019-08-29  6:05   ` [PATCH RFC v4 09/16] drm, cgroup: Add TTM buffer allocation stats Kenny Ho
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

drm.buffer.peak.default
        A read-only flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Default limits on the largest GEM buffer allocation in bytes.

drm.buffer.peak.max
        A read-write flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Per device limits on the largest GEM buffer allocation in bytes.
        This is a hard limit.  Attempts in allocating beyond the cgroup
        limit will result in ENOMEM.  Shorthand understood by memparse
        (such as k, m, g) can be used.

        Set largest allocation for /dev/dri/card1 to 4MB
        echo "226:1 4m" > drm.buffer.peak.max

Change-Id: I0830d56775568e1cf215b56cc892d5e7945e9f25
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 18 ++++++++++
 include/drm/drm_cgroup.h                |  1 +
 include/linux/cgroup_drm.h              |  1 +
 kernel/cgroup/drm.c                     | 48 +++++++++++++++++++++++++
 4 files changed, 68 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index e8fac2684179..87a195133eaa 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1940,6 +1940,24 @@ DRM Interface Files
 	Set allocation limit for /dev/dri/card0 to 512MB
 	echo "226:0 512m" > drm.buffer.total.max
 
+  drm.buffer.peak.default
+	A read-only flat-keyed file which exists on the root cgroup.
+	Each entry is keyed by the drm device's major:minor.
+
+	Default limits on the largest GEM buffer allocation in bytes.
+    
+  drm.buffer.peak.max
+	A read-write flat-keyed file which exists on all cgroups.  Each
+	entry is keyed by the drm device's major:minor.
+
+	Per device limits on the largest GEM buffer allocation in bytes.
+	This is a hard limit.  Attempts in allocating beyond the cgroup
+	limit will result in ENOMEM.  Shorthand understood by memparse
+	(such as k, m, g) can be used.
+
+	Set largest allocation for /dev/dri/card1 to 4MB
+	echo "226:1 4m" > drm.buffer.peak.max
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 49c5d35ff6e1..d61b90beded5 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -14,6 +14,7 @@ struct drmcg_props {
 	bool			limit_enforced;
 
 	s64			bo_limits_total_allocated_default;
+	s64			bo_limits_peak_allocated_default;
 };
 
 #ifdef CONFIG_CGROUP_DRM
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index eb54e56f20ae..87a2566c9fdd 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -29,6 +29,7 @@ struct drmcg_device_resource {
 	s64			bo_limits_total_allocated;
 
 	s64			bo_stats_peak_allocated;
+	s64			bo_limits_peak_allocated;
 
 	s64			bo_stats_count_allocated;
 };
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 7161fa40e156..2f54bff291e5 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -75,6 +75,9 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 	ddr->bo_limits_total_allocated =
 		dev->drmcg_props.bo_limits_total_allocated_default;
 
+	ddr->bo_limits_peak_allocated =
+		dev->drmcg_props.bo_limits_peak_allocated_default;
+
 	mutex_unlock(&dev->drmcg_mutex);
 	return 0;
 }
@@ -157,6 +160,9 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_TOTAL:
 		seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
 		break;
+	case DRMCG_TYPE_BO_PEAK:
+		seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -171,6 +177,10 @@ static void drmcg_print_default(struct drmcg_props *props,
 		seq_printf(sf, "%lld\n",
 			props->bo_limits_total_allocated_default);
 		break;
+	case DRMCG_TYPE_BO_PEAK:
+		seq_printf(sf, "%lld\n",
+			props->bo_limits_peak_allocated_default);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -327,6 +337,24 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 			drmcg_value_apply(dm->dev,
 					&ddr->bo_limits_total_allocated, val);
 			break;
+		case DRMCG_TYPE_BO_PEAK:
+			p_max = parent == NULL ? S64_MAX :
+				parent->dev_resources[minor]->
+				bo_limits_peak_allocated;
+
+			rc = drmcg_process_limit_s64_val(sattr, true,
+				props->bo_limits_peak_allocated_default,
+				p_max,
+				&val);
+
+			if (rc || val < 0) {
+				drmcg_pr_cft_err(drmcg, rc, cft_name, minor);
+				break;
+			}
+
+			drmcg_value_apply(dm->dev,
+					&ddr->bo_limits_peak_allocated, val);
+			break;
 		default:
 			break;
 		}
@@ -363,6 +391,20 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "buffer.peak.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+						DRMCG_FTYPE_DEFAULT),
+	},
+	{
+		.name = "buffer.peak.max",
+		.write = drmcg_limit_write,
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_PEAK,
+						DRMCG_FTYPE_LIMIT),
+	},
 	{
 		.name = "buffer.count.stats",
 		.seq_show = drmcg_seq_show,
@@ -427,6 +469,7 @@ void drmcg_device_early_init(struct drm_device *dev)
 	dev->drmcg_props.limit_enforced = false;
 
 	dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
+	dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
 
 	drmcg_update_cg_tree(dev);
 }
@@ -466,6 +509,11 @@ bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 				result = false;
 				break;
 			}
+
+			if (ddr->bo_limits_peak_allocated < size) {
+				result = false;
+				break;
+			}
 		}
 	}
 
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 09/16] drm, cgroup: Add TTM buffer allocation stats
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2019-08-29  6:05   ` [PATCH RFC v4 08/16] drm, cgroup: Add peak GEM buffer allocation limit Kenny Ho
@ 2019-08-29  6:05   ` Kenny Ho
  2019-08-29  6:05   ` [PATCH RFC v4 10/16] drm, cgroup: Add TTM buffer peak usage stats Kenny Ho
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

The drm resource being measured is the TTM (Translation Table Manager)
buffers.  TTM manages different types of memory that a GPU might access.
These memory types include dedicated Video RAM (VRAM) and host/system
memory accessible through IOMMU (GART/GTT).  TTM is currently used by
multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
nouveau, qxl, virtio, vmwgfx.)

drm.memory.stats
        A read-only nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ======         =============================================
          system         Host/system memory
          tt             Host memory used by the drm device (GTT/GART)
          vram           Video RAM used by the drm device
          priv           Other drm device, vendor specific memory
          ======         =============================================

        Reading returns the following::

        226:0 system=0 tt=0 vram=0 priv=0
        226:1 system=0 tt=9035776 vram=17768448 priv=16809984
        226:2 system=0 tt=9035776 vram=17768448 priv=16809984

drm.memory.evict.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Total number of evictions.

Change-Id: Ice2c4cc845051229549bebeb6aa2d7d6153bdf6a
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c            |  30 +++++++
 drivers/gpu/drm/ttm/ttm_bo_util.c       |   4 +
 include/drm/drm_cgroup.h                |  19 +++++
 include/drm/ttm/ttm_bo_api.h            |   2 +
 include/drm/ttm/ttm_bo_driver.h         |   8 ++
 include/linux/cgroup_drm.h              |   6 ++
 kernel/cgroup/drm.c                     | 108 ++++++++++++++++++++++++
 8 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index cfcbbdc39656..463e015e8694 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1720,8 +1720,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 	mutex_init(&adev->mman.gtt_window_lock);
 
 	/* No others user of address space so set it to 0 */
-	r = ttm_bo_device_init(&adev->mman.bdev,
+	r = ttm_bo_device_init_tmp(&adev->mman.bdev,
 			       &amdgpu_bo_driver,
+			       adev->ddev,
 			       adev->ddev->anon_inode->i_mapping,
 			       adev->need_dma32);
 	if (r) {
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 58c403eda04e..a0e9ce46baf3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -34,6 +34,7 @@
 #include <drm/ttm/ttm_module.h>
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
+#include <drm/drm_cgroup.h>
 #include <linux/jiffies.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
@@ -42,6 +43,7 @@
 #include <linux/module.h>
 #include <linux/atomic.h>
 #include <linux/reservation.h>
+#include <linux/cgroup_drm.h>
 
 static void ttm_bo_global_kobj_release(struct kobject *kobj);
 
@@ -151,6 +153,10 @@ static void ttm_bo_release_list(struct kref *list_kref)
 	struct ttm_bo_device *bdev = bo->bdev;
 	size_t acc_size = bo->acc_size;
 
+	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+		drmcg_unchg_mem(bo);
+	drmcg_put(bo->drmcg);
+
 	BUG_ON(kref_read(&bo->list_kref));
 	BUG_ON(kref_read(&bo->kref));
 	BUG_ON(atomic_read(&bo->cpu_writers));
@@ -360,6 +366,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 		if (bo->mem.mem_type == TTM_PL_SYSTEM) {
 			if (bdev->driver->move_notify)
 				bdev->driver->move_notify(bo, evict, mem);
+			if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+				drmcg_mem_track_move(bo, evict, mem);
 			bo->mem = *mem;
 			mem->mm_node = NULL;
 			goto moved;
@@ -368,6 +376,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 
 	if (bdev->driver->move_notify)
 		bdev->driver->move_notify(bo, evict, mem);
+	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+		drmcg_mem_track_move(bo, evict, mem);
 
 	if (!(old_man->flags & TTM_MEMTYPE_FLAG_FIXED) &&
 	    !(new_man->flags & TTM_MEMTYPE_FLAG_FIXED))
@@ -381,6 +391,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 		if (bdev->driver->move_notify) {
 			swap(*mem, bo->mem);
 			bdev->driver->move_notify(bo, false, mem);
+			if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+				drmcg_mem_track_move(bo, evict, mem);
 			swap(*mem, bo->mem);
 		}
 
@@ -1355,6 +1367,10 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
 		WARN_ON(!locked);
 	}
 
+	bo->drmcg = drmcg_get(current);
+	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+		drmcg_chg_mem(bo);
+
 	if (likely(!ret))
 		ret = ttm_bo_validate(bo, placement, ctx);
 
@@ -1747,6 +1763,20 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
 }
 EXPORT_SYMBOL(ttm_bo_device_init);
 
+/* TODO merge with official function when implementation finalized*/
+int ttm_bo_device_init_tmp(struct ttm_bo_device *bdev,
+		struct ttm_bo_driver *driver,
+		struct drm_device *ddev,
+		struct address_space *mapping,
+		bool need_dma32)
+{
+	int ret = ttm_bo_device_init(bdev, driver, mapping, need_dma32);
+
+	bdev->ddev = ddev;
+	return ret;
+}
+EXPORT_SYMBOL(ttm_bo_device_init_tmp);
+
 /*
  * buffer object vm functions.
  */
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 895d77d799e4..15acd2c0720e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -32,6 +32,7 @@
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
 #include <drm/drm_vma_manager.h>
+#include <drm/drm_cgroup.h>
 #include <linux/io.h>
 #include <linux/highmem.h>
 #include <linux/wait.h>
@@ -522,6 +523,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 	ret = reservation_object_trylock(fbo->base.resv);
 	WARN_ON(!ret);
 
+	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+		drmcg_chg_mem(bo);
+
 	*new_obj = &fbo->base;
 	return 0;
 }
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index d61b90beded5..7d63f73a5375 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -5,6 +5,7 @@
 #define __DRM_CGROUP_H__
 
 #include <linux/cgroup_drm.h>
+#include <drm/ttm/ttm_bo_api.h>
 
 /**
  * Per DRM device properties for DRM cgroup controller for the purpose
@@ -25,6 +26,11 @@ bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size);
+void drmcg_chg_mem(struct ttm_buffer_object *tbo);
+void drmcg_unchg_mem(struct ttm_buffer_object *tbo);
+void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
+		struct ttm_mem_reg *new_mem);
+
 #else
 static inline void drmcg_device_update(struct drm_device *device)
 {
@@ -43,5 +49,18 @@ static inline void drmcg_unchg_bo_alloc(struct drmcg *drmcg,
 		struct drm_device *dev,	size_t size)
 {
 }
+
+static inline void drmcg_chg_mem(struct ttm_buffer_object *tbo)
+{
+}
+
+static inline void drmcg_unchg_mem(struct ttm_buffer_object *tbo)
+{
+}
+
+static inline void drmcg_mem_track_move(struct ttm_buffer_object *old_bo,
+		bool evict, struct ttm_mem_reg *new_mem)
+{
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 49d9cdfc58f2..839936ab358c 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -128,6 +128,7 @@ struct ttm_tt;
  * struct ttm_buffer_object
  *
  * @bdev: Pointer to the buffer object device structure.
+ * @drmcg: DRM cgroup this object belongs to.
  * @type: The bo type.
  * @destroy: Destruction function. If NULL, kfree is used.
  * @num_pages: Actual number of pages.
@@ -174,6 +175,7 @@ struct ttm_buffer_object {
 	 */
 
 	struct ttm_bo_device *bdev;
+	struct drmcg *drmcg;
 	enum ttm_bo_type type;
 	void (*destroy) (struct ttm_buffer_object *);
 	unsigned long num_pages;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index c9b8ba492f24..e1a805d65b83 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -30,6 +30,7 @@
 #ifndef _TTM_BO_DRIVER_H_
 #define _TTM_BO_DRIVER_H_
 
+#include <drm/drm_device.h>
 #include <drm/drm_mm.h>
 #include <drm/drm_vma_manager.h>
 #include <linux/workqueue.h>
@@ -442,6 +443,7 @@ extern struct ttm_bo_global {
  * @driver: Pointer to a struct ttm_bo_driver struct setup by the driver.
  * @man: An array of mem_type_managers.
  * @vma_manager: Address space manager
+ * @ddev: Pointer to struct drm_device that this ttm_bo_device belongs to
  * lru_lock: Spinlock that protects the buffer+device lru lists and
  * ddestroy lists.
  * @dev_mapping: A pointer to the struct address_space representing the
@@ -460,6 +462,7 @@ struct ttm_bo_device {
 	struct ttm_bo_global *glob;
 	struct ttm_bo_driver *driver;
 	struct ttm_mem_type_manager man[TTM_NUM_MEM_TYPES];
+	struct drm_device *ddev;
 
 	/*
 	 * Protected by internal locks.
@@ -598,6 +601,11 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
 		       struct address_space *mapping,
 		       bool need_dma32);
 
+int ttm_bo_device_init_tmp(struct ttm_bo_device *bdev,
+		       struct ttm_bo_driver *driver,
+		       struct drm_device *ddev,
+		       struct address_space *mapping,
+		       bool need_dma32);
 /**
  * ttm_bo_unmap_virtual
  *
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 87a2566c9fdd..4c2794c9333d 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -9,6 +9,7 @@
 #include <linux/mutex.h>
 #include <linux/cgroup.h>
 #include <drm/drm_file.h>
+#include <drm/ttm/ttm_placement.h>
 
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
@@ -17,6 +18,8 @@ enum drmcg_res_type {
 	DRMCG_TYPE_BO_TOTAL,
 	DRMCG_TYPE_BO_PEAK,
 	DRMCG_TYPE_BO_COUNT,
+	DRMCG_TYPE_MEM,
+	DRMCG_TYPE_MEM_EVICT,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -32,6 +35,9 @@ struct drmcg_device_resource {
 	s64			bo_limits_peak_allocated;
 
 	s64			bo_stats_count_allocated;
+
+	s64			mem_stats[TTM_PL_PRIV+1];
+	s64			mem_stats_evict;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 2f54bff291e5..4960a8d1e8f4 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -10,6 +10,8 @@
 #include <linux/kernel.h>
 #include <drm/drm_file.h>
 #include <drm/drm_drv.h>
+#include <drm/ttm/ttm_bo_api.h>
+#include <drm/ttm/ttm_bo_driver.h>
 #include <drm/drm_device.h>
 #include <drm/drm_ioctl.h>
 #include <drm/drm_cgroup.h>
@@ -31,6 +33,13 @@ enum drmcg_file_type {
 	DRMCG_FTYPE_DEFAULT,
 };
 
+static char const *ttm_placement_names[] = {
+	[TTM_PL_SYSTEM] = "system",
+	[TTM_PL_TT]     = "tt",
+	[TTM_PL_VRAM]   = "vram",
+	[TTM_PL_PRIV]   = "priv",
+};
+
 static struct drmcg *root_drmcg __read_mostly;
 
 static int drmcg_css_free_fn(int id, void *ptr, void *data)
@@ -127,6 +136,7 @@ drmcg_css_alloc(struct cgroup_subsys_state *parent_css)
 static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 		struct seq_file *sf, enum drmcg_res_type type)
 {
+	int i;
 	if (ddr == NULL) {
 		seq_puts(sf, "\n");
 		return;
@@ -142,6 +152,16 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_COUNT:
 		seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
 		break;
+	case DRMCG_TYPE_MEM:
+		for (i = 0; i <= TTM_PL_PRIV; i++) {
+			seq_printf(sf, "%s=%lld ", ttm_placement_names[i],
+					ddr->mem_stats[i]);
+		}
+		seq_puts(sf, "\n");
+		break;
+	case DRMCG_TYPE_MEM_EVICT:
+		seq_printf(sf, "%lld\n", ddr->mem_stats_evict);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -411,6 +431,18 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "memory.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM,
+						DRMCG_FTYPE_STATS),
+	},
+	{
+		.name = "memory.evict.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM_EVICT,
+						DRMCG_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -566,3 +598,79 @@ void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 	mutex_unlock(&dev->drmcg_mutex);
 }
 EXPORT_SYMBOL(drmcg_unchg_bo_alloc);
+
+void drmcg_chg_mem(struct ttm_buffer_object *tbo)
+{
+	struct drm_device *dev = tbo->bdev->ddev;
+	struct drmcg *drmcg = tbo->drmcg;
+	int devIdx = dev->primary->index;
+	s64 size = (s64)(tbo->mem.size);
+	int mem_type = tbo->mem.mem_type;
+	struct drmcg_device_resource *ddr;
+
+	if (drmcg == NULL)
+		return;
+
+	mem_type = mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : mem_type;
+
+	mutex_lock(&dev->drmcg_mutex);
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+		ddr = drmcg->dev_resources[devIdx];
+		ddr->mem_stats[mem_type] += size;
+	}
+	mutex_unlock(&dev->drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_chg_mem);
+
+void drmcg_unchg_mem(struct ttm_buffer_object *tbo)
+{
+	struct drm_device *dev = tbo->bdev->ddev;
+	struct drmcg *drmcg = tbo->drmcg;
+	int devIdx = dev->primary->index;
+	s64 size = (s64)(tbo->mem.size);
+	int mem_type = tbo->mem.mem_type;
+	struct drmcg_device_resource *ddr;
+
+	if (drmcg == NULL)
+		return;
+
+	mem_type = mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : mem_type;
+
+	mutex_lock(&dev->drmcg_mutex);
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+		ddr = drmcg->dev_resources[devIdx];
+		ddr->mem_stats[mem_type] -= size;
+	}
+	mutex_unlock(&dev->drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_unchg_mem);
+
+void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
+		struct ttm_mem_reg *new_mem)
+{
+	struct drm_device *dev = old_bo->bdev->ddev;
+	struct drmcg *drmcg = old_bo->drmcg;
+	s64 move_in_bytes = (s64)(old_bo->mem.size);
+	int devIdx = dev->primary->index;
+	int old_mem_type = old_bo->mem.mem_type;
+	int new_mem_type = new_mem->mem_type;
+	struct drmcg_device_resource *ddr;
+
+	if (drmcg == NULL)
+		return;
+
+	old_mem_type = old_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : old_mem_type;
+	new_mem_type = new_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : new_mem_type;
+
+	mutex_lock(&dev->drmcg_mutex);
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+		ddr = drmcg->dev_resources[devIdx];
+		ddr->mem_stats[old_mem_type] -= move_in_bytes;
+		ddr->mem_stats[new_mem_type] += move_in_bytes;
+
+		if (evict)
+			ddr->mem_stats_evict++;
+	}
+	mutex_unlock(&dev->drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_mem_track_move);
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 10/16] drm, cgroup: Add TTM buffer peak usage stats
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2019-08-29  6:05   ` [PATCH RFC v4 09/16] drm, cgroup: Add TTM buffer allocation stats Kenny Ho
@ 2019-08-29  6:05   ` Kenny Ho
  2019-08-29  6:05   ` [PATCH RFC v4 11/16] drm, cgroup: Add per cgroup bw measure and control Kenny Ho
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

drm.memory.peak.stats
        A read-only nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ======         ==============================================
          system         Peak host memory used
          tt             Peak host memory used by the device (GTT/GART)
          vram           Peak Video RAM used by the drm device
          priv           Other drm device specific memory peak usage
          ======         ==============================================

        Reading returns the following::

        226:0 system=0 tt=0 vram=0 priv=0
        226:1 system=0 tt=9035776 vram=17768448 priv=16809984
        226:2 system=0 tt=9035776 vram=17768448 priv=16809984

Change-Id: I986e44533848f66411465bdd52105e78105a709a
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 include/linux/cgroup_drm.h |  2 ++
 kernel/cgroup/drm.c        | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 4c2794c9333d..9579e2a0b71d 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -20,6 +20,7 @@ enum drmcg_res_type {
 	DRMCG_TYPE_BO_COUNT,
 	DRMCG_TYPE_MEM,
 	DRMCG_TYPE_MEM_EVICT,
+	DRMCG_TYPE_MEM_PEAK,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -37,6 +38,7 @@ struct drmcg_device_resource {
 	s64			bo_stats_count_allocated;
 
 	s64			mem_stats[TTM_PL_PRIV+1];
+	s64			mem_peaks[TTM_PL_PRIV+1];
 	s64			mem_stats_evict;
 };
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 4960a8d1e8f4..899dc44722c3 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -162,6 +162,13 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_MEM_EVICT:
 		seq_printf(sf, "%lld\n", ddr->mem_stats_evict);
 		break;
+	case DRMCG_TYPE_MEM_PEAK:
+		for (i = 0; i <= TTM_PL_PRIV; i++) {
+			seq_printf(sf, "%s=%lld ", ttm_placement_names[i],
+					ddr->mem_peaks[i]);
+		}
+		seq_puts(sf, "\n");
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -443,6 +450,12 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM_EVICT,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "memory.peaks.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM_PEAK,
+						DRMCG_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -617,6 +630,8 @@ void drmcg_chg_mem(struct ttm_buffer_object *tbo)
 	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
 		ddr = drmcg->dev_resources[devIdx];
 		ddr->mem_stats[mem_type] += size;
+		ddr->mem_peaks[mem_type] = max(ddr->mem_peaks[mem_type],
+				ddr->mem_stats[mem_type]);
 	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
@@ -668,6 +683,10 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 		ddr->mem_stats[old_mem_type] -= move_in_bytes;
 		ddr->mem_stats[new_mem_type] += move_in_bytes;
 
+		ddr->mem_peaks[new_mem_type] = max(
+				ddr->mem_peaks[new_mem_type],
+				ddr->mem_stats[new_mem_type]);
+
 		if (evict)
 			ddr->mem_stats_evict++;
 	}
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 11/16] drm, cgroup: Add per cgroup bw measure and control
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2019-08-29  6:05   ` [PATCH RFC v4 10/16] drm, cgroup: Add TTM buffer peak usage stats Kenny Ho
@ 2019-08-29  6:05   ` Kenny Ho
       [not found]     ` <20190829060533.32315-12-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-08-29  6:05   ` [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim Kenny Ho
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

The bandwidth is measured by keeping track of the amount of bytes moved
by ttm within a time period.  We defined two type of bandwidth: burst
and average.  Average bandwidth is calculated by dividing the total
amount of bytes moved within a cgroup by the lifetime of the cgroup.
Burst bandwidth is similar except that the byte and time measurement is
reset after a user configurable period.

The bandwidth control is best effort since it is done on a per move
basis instead of per byte.  The bandwidth is limited by delaying the
move of a buffer.  The bandwidth limit can be exceeded when the next
move is larger than the remaining allowance.

drm.burst_bw_period_in_us
        A read-write flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Length of a period use to measure burst bandwidth in us.
        One period per device.

drm.burst_bw_period_in_us.default
        A read-only flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Default length of a period in us (one per device.)

drm.bandwidth.stats
        A read-only nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          =================     ======================================
          burst_byte_per_us     Burst bandwidth
          avg_bytes_per_us      Average bandwidth
          moved_byte            Amount of byte moved within a period
          accum_us              Amount of time accumulated in a period
          total_moved_byte      Byte moved within the cgroup lifetime
          total_accum_us        Cgroup lifetime in us
          byte_credit           Available byte credit to limit avg bw
          =================     ======================================

        Reading returns the following::
        226:1 burst_byte_per_us=23 avg_bytes_per_us=0 moved_byte=2244608
        accum_us=95575 total_moved_byte=45899776 total_accum_us=201634590
        byte_credit=13214278590464
        226:2 burst_byte_per_us=10 avg_bytes_per_us=219 moved_byte=430080
        accum_us=39350 total_moved_byte=65518026752 total_accum_us=298337721
        byte_credit=9223372036854644735

drm.bandwidth.high
        A read-write nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ================  =======================================
          bytes_in_period   Burst limit per period in byte
          avg_bytes_per_us  Average bandwidth limit in bytes per us
          ================  =======================================

        Reading returns the following::

        226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
        226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536

drm.bandwidth.default
        A read-only nested-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ================  ========================================
          bytes_in_period   Default burst limit per period in byte
          avg_bytes_per_us  Default average bw limit in bytes per us
          ================  ========================================

        Reading returns the following::

        226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
        226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536

Change-Id: Ie573491325ccc16535bb943e7857f43bd0962add
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |   7 +
 include/drm/drm_cgroup.h     |  19 +++
 include/linux/cgroup_drm.h   |  16 ++
 kernel/cgroup/drm.c          | 319 ++++++++++++++++++++++++++++++++++-
 4 files changed, 359 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index a0e9ce46baf3..32eee85f3641 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -36,6 +36,7 @@
 #include <drm/ttm/ttm_placement.h>
 #include <drm/drm_cgroup.h>
 #include <linux/jiffies.h>
+#include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/mm.h>
@@ -1256,6 +1257,12 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
 	 * Check whether we need to move buffer.
 	 */
 	if (!ttm_bo_mem_compat(placement, &bo->mem, &new_flags)) {
+		unsigned int move_delay = drmcg_get_mem_bw_period_in_us(bo);
+
+		move_delay /= 2000; /* check every half period in ms*/
+		while (bo->bdev->ddev != NULL && !drmcg_mem_can_move(bo))
+			msleep(move_delay);
+
 		ret = ttm_bo_move_buffer(bo, placement, ctx);
 		if (ret)
 			return ret;
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 7d63f73a5375..9ce0d54e6bd8 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -16,6 +16,12 @@ struct drmcg_props {
 
 	s64			bo_limits_total_allocated_default;
 	s64			bo_limits_peak_allocated_default;
+
+	s64			mem_bw_limits_period_in_us;
+	s64			mem_bw_limits_period_in_us_default;
+
+	s64			mem_bw_bytes_in_period_default;
+	s64			mem_bw_avg_bytes_per_us_default;
 };
 
 #ifdef CONFIG_CGROUP_DRM
@@ -30,6 +36,8 @@ void drmcg_chg_mem(struct ttm_buffer_object *tbo);
 void drmcg_unchg_mem(struct ttm_buffer_object *tbo);
 void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 		struct ttm_mem_reg *new_mem);
+unsigned int drmcg_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo);
+bool drmcg_mem_can_move(struct ttm_buffer_object *tbo);
 
 #else
 static inline void drmcg_device_update(struct drm_device *device)
@@ -62,5 +70,16 @@ static inline void drmcg_mem_track_move(struct ttm_buffer_object *old_bo,
 		bool evict, struct ttm_mem_reg *new_mem)
 {
 }
+
+static inline unsigned int drmcg_get_mem_bw_period_in_us(
+		struct ttm_buffer_object *tbo)
+{
+	return 0;
+}
+
+static inline bool drmcg_mem_can_move(struct ttm_buffer_object *tbo)
+{
+	return true;
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 9579e2a0b71d..27809a583bf2 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -14,6 +14,15 @@
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
+enum drmcg_mem_bw_attr {
+	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
+	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
+	DRMCG_MEM_BW_ATTR_TOTAL_BYTE_MOVED,
+	DRMCG_MEM_BW_ATTR_TOTAL_ACCUM_US,
+	DRMCG_MEM_BW_ATTR_BYTE_CREDIT,
+	__DRMCG_MEM_BW_ATTR_LAST,
+};
+
 enum drmcg_res_type {
 	DRMCG_TYPE_BO_TOTAL,
 	DRMCG_TYPE_BO_PEAK,
@@ -21,6 +30,8 @@ enum drmcg_res_type {
 	DRMCG_TYPE_MEM,
 	DRMCG_TYPE_MEM_EVICT,
 	DRMCG_TYPE_MEM_PEAK,
+	DRMCG_TYPE_BANDWIDTH,
+	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -40,6 +51,11 @@ struct drmcg_device_resource {
 	s64			mem_stats[TTM_PL_PRIV+1];
 	s64			mem_peaks[TTM_PL_PRIV+1];
 	s64			mem_stats_evict;
+
+	s64			mem_bw_stats_last_update_us;
+	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
+	s64			mem_bw_limits_bytes_in_period;
+	s64			mem_bw_limits_avg_bytes_per_us;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 899dc44722c3..ab962a277e58 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -7,6 +7,7 @@
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
 #include <linux/cgroup_drm.h>
+#include <linux/ktime.h>
 #include <linux/kernel.h>
 #include <drm/drm_file.h>
 #include <drm/drm_drv.h>
@@ -40,6 +41,17 @@ static char const *ttm_placement_names[] = {
 	[TTM_PL_PRIV]   = "priv",
 };
 
+static char const *mem_bw_attr_names[] = {
+	[DRMCG_MEM_BW_ATTR_BYTE_MOVED] = "moved_byte",
+	[DRMCG_MEM_BW_ATTR_ACCUM_US] = "accum_us",
+	[DRMCG_MEM_BW_ATTR_TOTAL_BYTE_MOVED] = "total_moved_byte",
+	[DRMCG_MEM_BW_ATTR_TOTAL_ACCUM_US] = "total_accum_us",
+	[DRMCG_MEM_BW_ATTR_BYTE_CREDIT] = "byte_credit",
+};
+
+#define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
+#define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
+
 static struct drmcg *root_drmcg __read_mostly;
 
 static int drmcg_css_free_fn(int id, void *ptr, void *data)
@@ -75,6 +87,9 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 
 		if (!ddr)
 			return -ENOMEM;
+
+		ddr->mem_bw_stats_last_update_us = ktime_to_us(ktime_get());
+		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_ACCUM_US] = 1;
 	}
 
 	mutex_lock(&dev->drmcg_mutex);
@@ -87,6 +102,12 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 	ddr->bo_limits_peak_allocated =
 		dev->drmcg_props.bo_limits_peak_allocated_default;
 
+	ddr->mem_bw_limits_bytes_in_period =
+		dev->drmcg_props.mem_bw_bytes_in_period_default;
+
+	ddr->mem_bw_limits_avg_bytes_per_us =
+		dev->drmcg_props.mem_bw_avg_bytes_per_us_default;
+
 	mutex_unlock(&dev->drmcg_mutex);
 	return 0;
 }
@@ -133,6 +154,26 @@ drmcg_css_alloc(struct cgroup_subsys_state *parent_css)
 	return &drmcg->css;
 }
 
+static inline void drmcg_mem_burst_bw_stats_reset(struct drm_device *dev)
+{
+	struct cgroup_subsys_state *pos;
+	struct drmcg *node;
+	struct drmcg_device_resource *ddr;
+	int devIdx;
+
+	devIdx =  dev->primary->index;
+
+	rcu_read_lock();
+	css_for_each_descendant_pre(pos, &root_drmcg->css) {
+		node = css_to_drmcg(pos);
+		ddr = node->dev_resources[devIdx];
+
+		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_ACCUM_US] = 1;
+		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_MOVED] = 0;
+	}
+	rcu_read_unlock();
+}
+
 static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 		struct seq_file *sf, enum drmcg_res_type type)
 {
@@ -169,6 +210,31 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 		}
 		seq_puts(sf, "\n");
 		break;
+	case DRMCG_TYPE_BANDWIDTH:
+		if (ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_ACCUM_US] == 0)
+			seq_puts(sf, "burst_byte_per_us=NaN ");
+		else
+			seq_printf(sf, "burst_byte_per_us=%lld ",
+				ddr->mem_bw_stats[
+				DRMCG_MEM_BW_ATTR_BYTE_MOVED]/
+				ddr->mem_bw_stats[
+				DRMCG_MEM_BW_ATTR_ACCUM_US]);
+
+		if (ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_TOTAL_ACCUM_US] == 0)
+			seq_puts(sf, "avg_bytes_per_us=NaN ");
+		else
+			seq_printf(sf, "avg_bytes_per_us=%lld ",
+				ddr->mem_bw_stats[
+				DRMCG_MEM_BW_ATTR_TOTAL_BYTE_MOVED]/
+				ddr->mem_bw_stats[
+				DRMCG_MEM_BW_ATTR_TOTAL_ACCUM_US]);
+
+		for (i = 0; i < __DRMCG_MEM_BW_ATTR_LAST; i++) {
+			seq_printf(sf, "%s=%lld ", mem_bw_attr_names[i],
+					ddr->mem_bw_stats[i]);
+		}
+		seq_puts(sf, "\n");
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -176,7 +242,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
 }
 
 static void drmcg_print_limits(struct drmcg_device_resource *ddr,
-		struct seq_file *sf, enum drmcg_res_type type)
+		struct seq_file *sf, enum drmcg_res_type type,
+		struct drm_device *dev)
 {
 	if (ddr == NULL) {
 		seq_puts(sf, "\n");
@@ -190,6 +257,17 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_PEAK:
 		seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
 		break;
+	case DRMCG_TYPE_BANDWIDTH_PERIOD_BURST:
+		seq_printf(sf, "%lld\n",
+			dev->drmcg_props.mem_bw_limits_period_in_us);
+		break;
+	case DRMCG_TYPE_BANDWIDTH:
+		seq_printf(sf, "%s=%lld %s=%lld\n",
+				MEM_BW_LIMITS_NAME_BURST,
+				ddr->mem_bw_limits_bytes_in_period,
+				MEM_BW_LIMITS_NAME_AVG,
+				ddr->mem_bw_limits_avg_bytes_per_us);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -208,6 +286,17 @@ static void drmcg_print_default(struct drmcg_props *props,
 		seq_printf(sf, "%lld\n",
 			props->bo_limits_peak_allocated_default);
 		break;
+	case DRMCG_TYPE_BANDWIDTH_PERIOD_BURST:
+		seq_printf(sf, "%lld\n",
+			props->mem_bw_limits_period_in_us_default);
+		break;
+	case DRMCG_TYPE_BANDWIDTH:
+		seq_printf(sf, "%s=%lld %s=%lld\n",
+				MEM_BW_LIMITS_NAME_BURST,
+				props->mem_bw_bytes_in_period_default,
+				MEM_BW_LIMITS_NAME_AVG,
+				props->mem_bw_avg_bytes_per_us_default);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -237,7 +326,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
 		drmcg_print_stats(ddr, sf, type);
 		break;
 	case DRMCG_FTYPE_LIMIT:
-		drmcg_print_limits(ddr, sf, type);
+		drmcg_print_limits(ddr, sf, type, minor->dev);
 		break;
 	case DRMCG_FTYPE_DEFAULT:
 		drmcg_print_default(&minor->dev->drmcg_props, sf, type);
@@ -301,6 +390,83 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
 	mutex_unlock(&dev->drmcg_mutex);
 }
 
+static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
+		struct drm_device *dev, char *attrs)
+{
+	enum drmcg_res_type type =
+		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
+	struct drmcg *drmcg = css_to_drmcg(of_css(of));
+	struct drmcg *parent = drmcg_parent(drmcg);
+	struct drmcg_props *props = &dev->drmcg_props;
+	char *cft_name = of_cft(of)->name;
+	int minor = dev->primary->index;
+	char *nested = strstrip(attrs);
+	struct drmcg_device_resource *ddr =
+		drmcg->dev_resources[minor];
+	char *attr;
+	char sname[256];
+	char sval[256];
+	s64 val;
+	s64 p_max;
+	int rc;
+
+	while (nested != NULL) {
+		attr = strsep(&nested, " ");
+
+		if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
+			continue;
+
+		switch (type) {
+		case DRMCG_TYPE_BANDWIDTH:
+			if (strncmp(sname, MEM_BW_LIMITS_NAME_BURST, 256)
+					== 0) {
+				p_max = parent == NULL ? S64_MAX :
+					parent->dev_resources[minor]->
+					mem_bw_limits_bytes_in_period;
+
+				rc = drmcg_process_limit_s64_val(sval, true,
+					props->mem_bw_bytes_in_period_default,
+					p_max, &val);
+
+				if (rc || val < 0) {
+					drmcg_pr_cft_err(drmcg, rc, cft_name,
+							minor);
+					continue;
+				}
+
+				drmcg_value_apply(dev,
+					&ddr->mem_bw_limits_bytes_in_period,
+					val);
+				continue;
+			}
+
+			if (strncmp(sname, MEM_BW_LIMITS_NAME_AVG, 256) == 0) {
+				p_max = parent == NULL ? S64_MAX :
+					parent->dev_resources[minor]->
+					mem_bw_limits_avg_bytes_per_us;
+
+				rc = drmcg_process_limit_s64_val(sval, true,
+					props->mem_bw_avg_bytes_per_us_default,
+					p_max, &val);
+
+				if (rc || val < 0) {
+					drmcg_pr_cft_err(drmcg, rc, cft_name,
+							minor);
+					continue;
+				}
+
+				drmcg_value_apply(dev,
+					&ddr->mem_bw_limits_avg_bytes_per_us,
+					val);
+				continue;
+			}
+			break; /* DRMCG_TYPE_BANDWIDTH */
+		default:
+			break;
+		} /* switch (type) */
+	}
+}
+
 static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 		size_t nbytes, loff_t off)
 {
@@ -382,6 +548,25 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 			drmcg_value_apply(dm->dev,
 					&ddr->bo_limits_peak_allocated, val);
 			break;
+		case DRMCG_TYPE_BANDWIDTH_PERIOD_BURST:
+			rc = drmcg_process_limit_s64_val(sattr, false,
+				props->mem_bw_limits_period_in_us_default,
+				S64_MAX,
+				&val);
+
+			if (rc || val < 2000) {
+				drmcg_pr_cft_err(drmcg, rc, cft_name, minor);
+				break;
+			}
+
+			drmcg_value_apply(dm->dev,
+					&props->mem_bw_limits_period_in_us,
+					val);
+			drmcg_mem_burst_bw_stats_reset(dm->dev);
+			break;
+		case DRMCG_TYPE_BANDWIDTH:
+			drmcg_nested_limit_parse(of, dm->dev, sattr);
+			break;
 		default:
 			break;
 		}
@@ -456,6 +641,41 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM_PEAK,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "burst_bw_period_in_us",
+		.write = drmcg_limit_write,
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
+						DRMCG_FTYPE_LIMIT),
+	},
+	{
+		.name = "burst_bw_period_in_us.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
+						DRMCG_FTYPE_DEFAULT),
+	},
+	{
+		.name = "bandwidth.stats",
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
+						DRMCG_FTYPE_STATS),
+	},
+	{
+		.name = "bandwidth.high",
+		.write = drmcg_limit_write,
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
+						DRMCG_FTYPE_LIMIT),
+	},
+	{
+		.name = "bandwidth.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
+						DRMCG_FTYPE_DEFAULT),
+	},
 	{ }	/* terminate */
 };
 
@@ -515,6 +735,10 @@ void drmcg_device_early_init(struct drm_device *dev)
 
 	dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
 	dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
+	dev->drmcg_props.mem_bw_limits_period_in_us_default = 200000;
+	dev->drmcg_props.mem_bw_limits_period_in_us = 200000;
+	dev->drmcg_props.mem_bw_bytes_in_period_default = S64_MAX;
+	dev->drmcg_props.mem_bw_avg_bytes_per_us_default = 65536;
 
 	drmcg_update_cg_tree(dev);
 }
@@ -660,6 +884,27 @@ void drmcg_unchg_mem(struct ttm_buffer_object *tbo)
 }
 EXPORT_SYMBOL(drmcg_unchg_mem);
 
+static inline void drmcg_mem_bw_accum(s64 time_us,
+		struct drmcg_device_resource *ddr)
+{
+	s64 increment_us = time_us - ddr->mem_bw_stats_last_update_us;
+	s64 new_credit = ddr->mem_bw_limits_avg_bytes_per_us * increment_us;
+
+	ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_ACCUM_US]
+		+= increment_us;
+	ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_TOTAL_ACCUM_US]
+		+= increment_us;
+
+	if ((S64_MAX - new_credit) >
+			ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT])
+		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT]
+			+= new_credit;
+	else
+		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT] = S64_MAX;
+
+	ddr->mem_bw_stats_last_update_us = time_us;
+}
+
 void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 		struct ttm_mem_reg *new_mem)
 {
@@ -669,6 +914,7 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 	int devIdx = dev->primary->index;
 	int old_mem_type = old_bo->mem.mem_type;
 	int new_mem_type = new_mem->mem_type;
+	s64 time_us;
 	struct drmcg_device_resource *ddr;
 
 	if (drmcg == NULL)
@@ -677,6 +923,14 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 	old_mem_type = old_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : old_mem_type;
 	new_mem_type = new_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : new_mem_type;
 
+	if (root_drmcg->dev_resources[devIdx] != NULL &&
+			root_drmcg->dev_resources[devIdx]->
+			mem_bw_stats[DRMCG_MEM_BW_ATTR_ACCUM_US] >=
+			dev->drmcg_props.mem_bw_limits_period_in_us)
+		drmcg_mem_burst_bw_stats_reset(dev);
+
+	time_us = ktime_to_us(ktime_get());
+
 	mutex_lock(&dev->drmcg_mutex);
 	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
 		ddr = drmcg->dev_resources[devIdx];
@@ -689,7 +943,68 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 
 		if (evict)
 			ddr->mem_stats_evict++;
+
+		drmcg_mem_bw_accum(time_us, ddr);
+
+		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_MOVED]
+			+= move_in_bytes;
+		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_TOTAL_BYTE_MOVED]
+			+= move_in_bytes;
+
+		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT]
+			-= move_in_bytes;
 	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
 EXPORT_SYMBOL(drmcg_mem_track_move);
+
+unsigned int drmcg_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo)
+{
+	struct drmcg_props *props;
+
+	//TODO replace with BUG_ON
+	if (tbo->bdev->ddev == NULL)
+		return 0;
+
+	props = &tbo->bdev->ddev->drmcg_props;
+
+	return (unsigned int) props->mem_bw_limits_period_in_us;
+}
+EXPORT_SYMBOL(drmcg_get_mem_bw_period_in_us);
+
+bool drmcg_mem_can_move(struct ttm_buffer_object *tbo)
+{
+	struct drm_device *dev = tbo->bdev->ddev;
+	struct drmcg *drmcg = tbo->drmcg;
+	int devIdx = dev->primary->index;
+	s64 time_us;
+	struct drmcg_device_resource *ddr;
+	bool result = true;
+
+	if (root_drmcg->dev_resources[devIdx] != NULL &&
+			root_drmcg->dev_resources[devIdx]->
+			mem_bw_stats[DRMCG_MEM_BW_ATTR_ACCUM_US] >=
+			dev->drmcg_props.mem_bw_limits_period_in_us)
+		drmcg_mem_burst_bw_stats_reset(dev);
+
+	time_us = ktime_to_us(ktime_get());
+
+	mutex_lock(&dev->drmcg_mutex);
+	for ( ; drmcg != NULL; drmcg = drmcg_parent(drmcg)) {
+		ddr = drmcg->dev_resources[devIdx];
+
+		drmcg_mem_bw_accum(time_us, ddr);
+
+		if (result &&
+			(ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_MOVED]
+			 >= ddr->mem_bw_limits_bytes_in_period ||
+			ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT]
+			 <= 0)) {
+			result = false;
+		}
+	}
+	mutex_unlock(&dev->drmcg_mutex);
+
+	return result;
+}
+EXPORT_SYMBOL(drmcg_mem_can_move);
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 12/16] drm, cgroup: Add soft VRAM limit
  2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (4 preceding siblings ...)
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-08-29  6:05 ` Kenny Ho
  2019-08-29  6:05 ` [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource Kenny Ho
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel
  Cc: Kenny Ho

The drm resource being limited is the TTM (Translation Table Manager)
buffers.  TTM manages different types of memory that a GPU might access.
These memory types include dedicated Video RAM (VRAM) and host/system
memory accessible through IOMMU (GART/GTT).  TTM is currently used by
multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
nouveau, qxl, virtio, vmwgfx.)

TTM buffers belonging to drm cgroups under memory pressure will be
selected to be evicted first.

drm.memory.high
        A read-write nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ====         =============================================
          vram         Video RAM soft limit for a drm device in byte
          ====         =============================================

        Reading returns the following::

        226:0 vram=0
        226:1 vram=17768448
        226:2 vram=17768448

drm.memory.default
        A read-only nested-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ====         ===============================
          vram         Video RAM default limit in byte
          ====         ===============================

        Reading returns the following::

        226:0 vram=0
        226:1 vram=17768448
        226:2 vram=17768448

Change-Id: I7988e28a453b53140b40a28c176239acbc81d491
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |   7 ++
 include/drm/drm_cgroup.h     |  17 +++++
 include/linux/cgroup_drm.h   |   2 +
 kernel/cgroup/drm.c          | 135 +++++++++++++++++++++++++++++++++++
 4 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 32eee85f3641..d7e3d3128ebb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -853,14 +853,21 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
 	struct ttm_bo_global *glob = bdev->glob;
 	struct ttm_mem_type_manager *man = &bdev->man[mem_type];
 	bool locked = false;
+	bool check_drmcg;
 	unsigned i;
 	int ret;
 
+	check_drmcg = drmcg_mem_pressure_scan(bdev, mem_type);
+
 	spin_lock(&glob->lru_lock);
 	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
 		list_for_each_entry(bo, &man->lru[i], lru) {
 			bool busy;
 
+			if (check_drmcg &&
+				!drmcg_mem_should_evict(bo, mem_type))
+				continue;
+
 			if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
 							    &busy)) {
 				if (busy && !busy_bo &&
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 9ce0d54e6bd8..c11df388fdf2 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -6,6 +6,7 @@
 
 #include <linux/cgroup_drm.h>
 #include <drm/ttm/ttm_bo_api.h>
+#include <drm/ttm/ttm_bo_driver.h>
 
 /**
  * Per DRM device properties for DRM cgroup controller for the purpose
@@ -22,6 +23,8 @@ struct drmcg_props {
 
 	s64			mem_bw_bytes_in_period_default;
 	s64			mem_bw_avg_bytes_per_us_default;
+
+	s64			mem_highs_default[TTM_PL_PRIV+1];
 };
 
 #ifdef CONFIG_CGROUP_DRM
@@ -38,6 +41,8 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 		struct ttm_mem_reg *new_mem);
 unsigned int drmcg_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo);
 bool drmcg_mem_can_move(struct ttm_buffer_object *tbo);
+bool drmcg_mem_pressure_scan(struct ttm_bo_device *bdev, unsigned int type);
+bool drmcg_mem_should_evict(struct ttm_buffer_object *tbo, unsigned int type);
 
 #else
 static inline void drmcg_device_update(struct drm_device *device)
@@ -81,5 +86,17 @@ static inline bool drmcg_mem_can_move(struct ttm_buffer_object *tbo)
 {
 	return true;
 }
+
+static inline bool drmcg_mem_pressure_scan(struct ttm_bo_device *bdev,
+		unsigned int type)
+{
+	return false;
+}
+
+static inline bool drmcg_mem_should_evict(struct ttm_buffer_object *tbo,
+		unsigned int type)
+{
+	return true;
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 27809a583bf2..c56cfe74d1a6 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -50,6 +50,8 @@ struct drmcg_device_resource {
 
 	s64			mem_stats[TTM_PL_PRIV+1];
 	s64			mem_peaks[TTM_PL_PRIV+1];
+	s64			mem_highs[TTM_PL_PRIV+1];
+	bool			mem_pressure[TTM_PL_PRIV+1];
 	s64			mem_stats_evict;
 
 	s64			mem_bw_stats_last_update_us;
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index ab962a277e58..04fb9a398740 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -80,6 +80,7 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 {
 	int minor = dev->primary->index;
 	struct drmcg_device_resource *ddr = drmcg->dev_resources[minor];
+	int i;
 
 	if (ddr == NULL) {
 		ddr = kzalloc(sizeof(struct drmcg_device_resource),
@@ -108,6 +109,12 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 	ddr->mem_bw_limits_avg_bytes_per_us =
 		dev->drmcg_props.mem_bw_avg_bytes_per_us_default;
 
+	ddr->mem_bw_limits_avg_bytes_per_us =
+		dev->drmcg_props.mem_bw_avg_bytes_per_us_default;
+
+	for (i = 0; i <= TTM_PL_PRIV; i++)
+		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
+
 	mutex_unlock(&dev->drmcg_mutex);
 	return 0;
 }
@@ -257,6 +264,11 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
 	case DRMCG_TYPE_BO_PEAK:
 		seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
 		break;
+	case DRMCG_TYPE_MEM:
+		seq_printf(sf, "%s=%lld\n",
+				ttm_placement_names[TTM_PL_VRAM],
+				ddr->mem_highs[TTM_PL_VRAM]);
+		break;
 	case DRMCG_TYPE_BANDWIDTH_PERIOD_BURST:
 		seq_printf(sf, "%lld\n",
 			dev->drmcg_props.mem_bw_limits_period_in_us);
@@ -286,6 +298,11 @@ static void drmcg_print_default(struct drmcg_props *props,
 		seq_printf(sf, "%lld\n",
 			props->bo_limits_peak_allocated_default);
 		break;
+	case DRMCG_TYPE_MEM:
+		seq_printf(sf, "%s=%lld\n",
+				ttm_placement_names[TTM_PL_VRAM],
+				props->mem_highs_default[TTM_PL_VRAM]);
+		break;
 	case DRMCG_TYPE_BANDWIDTH_PERIOD_BURST:
 		seq_printf(sf, "%lld\n",
 			props->mem_bw_limits_period_in_us_default);
@@ -461,6 +478,29 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
 				continue;
 			}
 			break; /* DRMCG_TYPE_BANDWIDTH */
+		case DRMCG_TYPE_MEM:
+			if (strncmp(sname, ttm_placement_names[TTM_PL_VRAM],
+						256) == 0) {
+				p_max = parent == NULL ? S64_MAX :
+					parent->dev_resources[minor]->
+					mem_highs[TTM_PL_VRAM];
+
+				rc = drmcg_process_limit_s64_val(sval, true,
+					props->mem_highs_default[TTM_PL_VRAM],
+					p_max, &val);
+
+				if (rc || val < 0) {
+					drmcg_pr_cft_err(drmcg, rc, cft_name,
+							minor);
+					continue;
+				}
+
+				drmcg_value_apply(dev,
+						&ddr->mem_highs[TTM_PL_VRAM],
+						val);
+				continue;
+			}
+			break; /* DRMCG_TYPE_MEM */
 		default:
 			break;
 		} /* switch (type) */
@@ -565,6 +605,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 			drmcg_mem_burst_bw_stats_reset(dm->dev);
 			break;
 		case DRMCG_TYPE_BANDWIDTH:
+		case DRMCG_TYPE_MEM:
 			drmcg_nested_limit_parse(of, dm->dev, sattr);
 			break;
 		default:
@@ -641,6 +682,20 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM_PEAK,
 						DRMCG_FTYPE_STATS),
 	},
+	{
+		.name = "memory.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM,
+						DRMCG_FTYPE_DEFAULT),
+	},
+	{
+		.name = "memory.high",
+		.write = drmcg_limit_write,
+		.seq_show = drmcg_seq_show,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_MEM,
+						DRMCG_FTYPE_LIMIT),
+	},
 	{
 		.name = "burst_bw_period_in_us",
 		.write = drmcg_limit_write,
@@ -731,6 +786,8 @@ EXPORT_SYMBOL(drmcg_device_update);
  */
 void drmcg_device_early_init(struct drm_device *dev)
 {
+	int i;
+
 	dev->drmcg_props.limit_enforced = false;
 
 	dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
@@ -740,6 +797,9 @@ void drmcg_device_early_init(struct drm_device *dev)
 	dev->drmcg_props.mem_bw_bytes_in_period_default = S64_MAX;
 	dev->drmcg_props.mem_bw_avg_bytes_per_us_default = 65536;
 
+	for (i = 0; i <= TTM_PL_PRIV; i++)
+		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
+
 	drmcg_update_cg_tree(dev);
 }
 EXPORT_SYMBOL(drmcg_device_early_init);
@@ -1008,3 +1068,78 @@ bool drmcg_mem_can_move(struct ttm_buffer_object *tbo)
 	return result;
 }
 EXPORT_SYMBOL(drmcg_mem_can_move);
+
+static inline void drmcg_mem_set_pressure(struct drmcg *drmcg,
+		int devIdx, unsigned int mem_type, bool pressure_val)
+{
+	struct drmcg_device_resource *ddr;
+	struct cgroup_subsys_state *pos;
+	struct drmcg *node;
+
+	css_for_each_descendant_pre(pos, &drmcg->css) {
+		node = css_to_drmcg(pos);
+		ddr = node->dev_resources[devIdx];
+		ddr->mem_pressure[mem_type] = pressure_val;
+	}
+}
+
+static inline bool drmcg_mem_check(struct drmcg *drmcg, int devIdx,
+		unsigned int mem_type)
+{
+	struct drmcg_device_resource *ddr = drmcg->dev_resources[devIdx];
+
+	/* already under pressure, no need to check and set */
+	if (ddr->mem_pressure[mem_type])
+		return true;
+
+	if (ddr->mem_stats[mem_type] >= ddr->mem_highs[mem_type]) {
+		drmcg_mem_set_pressure(drmcg, devIdx, mem_type, true);
+		return true;
+	}
+
+	return false;
+}
+
+bool drmcg_mem_pressure_scan(struct ttm_bo_device *bdev, unsigned int type)
+{
+	struct drm_device *dev = bdev->ddev;
+	struct cgroup_subsys_state *pos;
+	struct drmcg *node;
+	int devIdx;
+	bool result = false;
+
+	//TODO replace with BUG_ON
+	if (dev == NULL || type != TTM_PL_VRAM) /* only vram limit for now */
+		return false;
+
+	devIdx = dev->primary->index;
+
+	type = type > TTM_PL_PRIV ? TTM_PL_PRIV : type;
+
+	rcu_read_lock();
+	drmcg_mem_set_pressure(root_drmcg, devIdx, type, false);
+
+	css_for_each_descendant_pre(pos, &root_drmcg->css) {
+		node = css_to_drmcg(pos);
+		result |= drmcg_mem_check(node, devIdx, type);
+	}
+	rcu_read_unlock();
+
+	return result;
+}
+EXPORT_SYMBOL(drmcg_mem_pressure_scan);
+
+bool drmcg_mem_should_evict(struct ttm_buffer_object *tbo, unsigned int type)
+{
+	struct drm_device *dev = tbo->bdev->ddev;
+	int devIdx;
+
+	//TODO replace with BUG_ON
+	if (dev == NULL)
+		return true;
+
+	devIdx = dev->primary->index;
+
+	return tbo->drmcg->dev_resources[devIdx]->mem_pressure[type];
+}
+EXPORT_SYMBOL(drmcg_mem_should_evict);
-- 
2.22.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2019-08-29  6:05   ` [PATCH RFC v4 11/16] drm, cgroup: Add per cgroup bw measure and control Kenny Ho
@ 2019-08-29  6:05   ` Kenny Ho
  2019-08-29  7:08     ` Koenig, Christian
  2019-08-29  6:05   ` [PATCH RFC v4 15/16] drm, cgroup: add update trigger after limit change Kenny Ho
  2019-08-31  4:28   ` [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Tejun Heo
  9 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

Allow DRM TTM memory manager to register a work_struct, such that, when
a drmcgrp is under memory pressure, memory reclaiming can be triggered
immediately.

Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c    | 49 +++++++++++++++++++++++++++++++++
 include/drm/drm_cgroup.h        | 16 +++++++++++
 include/drm/ttm/ttm_bo_driver.h |  2 ++
 kernel/cgroup/drm.c             | 30 ++++++++++++++++++++
 4 files changed, 97 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index d7e3d3128ebb..72efae694b7e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1590,6 +1590,46 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned mem_type)
 }
 EXPORT_SYMBOL(ttm_bo_evict_mm);
 
+static void ttm_bo_reclaim_wq(struct work_struct *work)
+{
+	struct ttm_operation_ctx ctx = {
+		.interruptible = false,
+		.no_wait_gpu = false,
+		.flags = TTM_OPT_FLAG_FORCE_ALLOC
+	};
+	struct ttm_mem_type_manager *man =
+	    container_of(work, struct ttm_mem_type_manager, reclaim_wq);
+	struct ttm_bo_device *bdev = man->bdev;
+	struct dma_fence *fence;
+	int mem_type;
+	int ret;
+
+	for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
+		if (&bdev->man[mem_type] == man)
+			break;
+
+	WARN_ON(mem_type >= TTM_NUM_MEM_TYPES);
+	if (mem_type >= TTM_NUM_MEM_TYPES)
+		return;
+
+	if (!drmcg_mem_pressure_scan(bdev, mem_type))
+		return;
+
+	ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx, NULL);
+	if (ret)
+		return;
+
+	spin_lock(&man->move_lock);
+	fence = dma_fence_get(man->move);
+	spin_unlock(&man->move_lock);
+
+	if (fence) {
+		ret = dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+	}
+
+}
+
 int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
 			unsigned long p_size)
 {
@@ -1624,6 +1664,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
 		INIT_LIST_HEAD(&man->lru[i]);
 	man->move = NULL;
 
+	pr_err("drmcg %p type %d\n", bdev->ddev, type);
+
+	if (type <= TTM_PL_VRAM) {
+		INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
+		drmcg_register_device_mm(bdev->ddev, type, &man->reclaim_wq);
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(ttm_bo_init_mm);
@@ -1701,6 +1748,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
 		man = &bdev->man[i];
 		if (man->has_type) {
 			man->use_type = false;
+			drmcg_unregister_device_mm(bdev->ddev, i);
+			cancel_work_sync(&man->reclaim_wq);
 			if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
 				ret = -EBUSY;
 				pr_err("DRM memory manager type %d is not clean\n",
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index c11df388fdf2..6d9707e1eb72 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -5,6 +5,7 @@
 #define __DRM_CGROUP_H__
 
 #include <linux/cgroup_drm.h>
+#include <linux/workqueue.h>
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
 
@@ -25,12 +26,17 @@ struct drmcg_props {
 	s64			mem_bw_avg_bytes_per_us_default;
 
 	s64			mem_highs_default[TTM_PL_PRIV+1];
+
+	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
 };
 
 #ifdef CONFIG_CGROUP_DRM
 
 void drmcg_device_update(struct drm_device *device);
 void drmcg_device_early_init(struct drm_device *device);
+void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
+		struct work_struct *wq);
+void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type);
 bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
 		size_t size);
 void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
@@ -53,6 +59,16 @@ static inline void drmcg_device_early_init(struct drm_device *device)
 {
 }
 
+static inline void drmcg_register_device_mm(struct drm_device *dev,
+		unsigned int type, struct work_struct *wq)
+{
+}
+
+static inline void drmcg_unregister_device_mm(struct drm_device *dev,
+		unsigned int type)
+{
+}
+
 static inline void drmcg_try_chg_bo_alloc(struct drmcg *drmcg,
 		struct drm_device *dev,	size_t size)
 {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index e1a805d65b83..529cef92bcf6 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
 	 * Protected by @move_lock.
 	 */
 	struct dma_fence *move;
+
+	struct work_struct reclaim_wq;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 04fb9a398740..0ea7f0619e25 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -804,6 +804,29 @@ void drmcg_device_early_init(struct drm_device *dev)
 }
 EXPORT_SYMBOL(drmcg_device_early_init);
 
+void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
+		struct work_struct *wq)
+{
+	if (dev == NULL || type >= TTM_PL_PRIV)
+		return;
+
+	mutex_lock(&drmcg_mutex);
+	dev->drmcg_props.mem_reclaim_wq[type] = wq;
+	mutex_unlock(&drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_register_device_mm);
+
+void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type)
+{
+	if (dev == NULL || type >= TTM_PL_PRIV)
+		return;
+
+	mutex_lock(&drmcg_mutex);
+	dev->drmcg_props.mem_reclaim_wq[type] = NULL;
+	mutex_unlock(&drmcg_mutex);
+}
+EXPORT_SYMBOL(drmcg_unregister_device_mm);
+
 /**
  * drmcg_try_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
  * @drmcg: the DRM cgroup to be charged to
@@ -1013,6 +1036,13 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 
 		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT]
 			-= move_in_bytes;
+
+		if (dev->drmcg_props.mem_reclaim_wq[new_mem_type]
+			!= NULL &&
+			ddr->mem_stats[new_mem_type] >
+				ddr->mem_highs[new_mem_type])
+			schedule_work(dev->
+				drmcg_props.mem_reclaim_wq[new_mem_type]);
 	}
 	mutex_unlock(&dev->drmcg_mutex);
 }
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (5 preceding siblings ...)
  2019-08-29  6:05 ` [PATCH RFC v4 12/16] drm, cgroup: Add soft VRAM limit Kenny Ho
@ 2019-08-29  6:05 ` Kenny Ho
       [not found]   ` <20190829060533.32315-15-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-08-29  6:05 ` [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup Kenny Ho
  2019-09-03  8:02 ` [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Daniel Vetter
  8 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel
  Cc: Kenny Ho

drm.lgpu
        A read-write nested-keyed file which exists on all cgroups.
        Each entry is keyed by the DRM device's major:minor.

        lgpu stands for logical GPU, it is an abstraction used to
        subdivide a physical DRM device for the purpose of resource
        management.

        The lgpu is a discrete quantity that is device specific (i.e.
        some DRM devices may have 64 lgpus while others may have 100
        lgpus.)  The lgpu is a single quantity with two representations
        denoted by the following nested keys.

          =====     ========================================
          count     Representing lgpu as anonymous resource
          list      Representing lgpu as named resource
          =====     ========================================

        For example:
        226:0 count=256 list=0-255
        226:1 count=4 list=0,2,4,6
        226:2 count=32 list=32-63

        lgpu is represented by a bitmap and uses the bitmap_parselist
        kernel function so the list key input format is a
        comma-separated list of decimal numbers and ranges.

        Consecutively set bits are shown as two hyphen-separated decimal
        numbers, the smallest and largest bit numbers set in the range.
        Optionally each range can be postfixed to denote that only parts
        of it should be set.  The range will divided to groups of
        specific size.
        Syntax: range:used_size/group_size
        Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769

        The count key is the hamming weight / hweight of the bitmap.

        Both count and list accept the max and default keywords.

        Some DRM devices may only support lgpu as anonymous resources.
        In such case, the significance of the position of the set bits
        in list will be ignored.

        This lgpu resource supports the 'allocation' resource
        distribution model.

Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
 include/drm/drm_cgroup.h                |   4 +
 include/linux/cgroup_drm.h              |   6 ++
 kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
 4 files changed, 191 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 87a195133eaa..57f18469bd76 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1958,6 +1958,52 @@ DRM Interface Files
 	Set largest allocation for /dev/dri/card1 to 4MB
 	echo "226:1 4m" > drm.buffer.peak.max
 
+  drm.lgpu
+	A read-write nested-keyed file which exists on all cgroups.
+	Each entry is keyed by the DRM device's major:minor.
+
+	lgpu stands for logical GPU, it is an abstraction used to
+	subdivide a physical DRM device for the purpose of resource
+	management.
+
+	The lgpu is a discrete quantity that is device specific (i.e.
+	some DRM devices may have 64 lgpus while others may have 100
+	lgpus.)  The lgpu is a single quantity with two representations
+	denoted by the following nested keys.
+
+	  =====     ========================================
+	  count     Representing lgpu as anonymous resource
+	  list      Representing lgpu as named resource
+	  =====     ========================================
+
+	For example:
+	226:0 count=256 list=0-255
+	226:1 count=4 list=0,2,4,6
+	226:2 count=32 list=32-63
+
+	lgpu is represented by a bitmap and uses the bitmap_parselist
+	kernel function so the list key input format is a
+	comma-separated list of decimal numbers and ranges.
+
+	Consecutively set bits are shown as two hyphen-separated decimal
+	numbers, the smallest and largest bit numbers set in the range.
+	Optionally each range can be postfixed to denote that only parts
+	of it should be set.  The range will divided to groups of
+	specific size.
+	Syntax: range:used_size/group_size
+	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
+
+	The count key is the hamming weight / hweight of the bitmap.
+
+	Both count and list accept the max and default keywords.
+
+	Some DRM devices may only support lgpu as anonymous resources.
+	In such case, the significance of the position of the set bits
+	in list will be ignored.
+
+	This lgpu resource supports the 'allocation' resource
+	distribution model.
+
 GEM Buffer Ownership
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 6d9707e1eb72..a8d6be0b075b 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -6,6 +6,7 @@
 
 #include <linux/cgroup_drm.h>
 #include <linux/workqueue.h>
+#include <linux/types.h>
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
 
@@ -28,6 +29,9 @@ struct drmcg_props {
 	s64			mem_highs_default[TTM_PL_PRIV+1];
 
 	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
+
+	int			lgpu_capacity;
+        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
 };
 
 #ifdef CONFIG_CGROUP_DRM
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index c56cfe74d1a6..7b1cfc4ce4c3 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -14,6 +14,8 @@
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
+#define MAX_DRMCG_LGPU_CAPACITY 256
+
 enum drmcg_mem_bw_attr {
 	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
 	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
@@ -32,6 +34,7 @@ enum drmcg_res_type {
 	DRMCG_TYPE_MEM_PEAK,
 	DRMCG_TYPE_BANDWIDTH,
 	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
+	DRMCG_TYPE_LGPU,
 	__DRMCG_TYPE_LAST,
 };
 
@@ -58,6 +61,9 @@ struct drmcg_device_resource {
 	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
 	s64			mem_bw_limits_bytes_in_period;
 	s64			mem_bw_limits_avg_bytes_per_us;
+
+	s64			lgpu_used;
+	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 0ea7f0619e25..18c4368e2c29 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -9,6 +9,7 @@
 #include <linux/cgroup_drm.h>
 #include <linux/ktime.h>
 #include <linux/kernel.h>
+#include <linux/bitmap.h>
 #include <drm/drm_file.h>
 #include <drm/drm_drv.h>
 #include <drm/ttm/ttm_bo_api.h>
@@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
 #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
 #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
 
+#define LGPU_LIMITS_NAME_LIST "list"
+#define LGPU_LIMITS_NAME_COUNT "count"
+
 static struct drmcg *root_drmcg __read_mostly;
 
 static int drmcg_css_free_fn(int id, void *ptr, void *data)
@@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
 	for (i = 0; i <= TTM_PL_PRIV; i++)
 		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
 
+	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
+			MAX_DRMCG_LGPU_CAPACITY);
+	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
+
 	mutex_unlock(&dev->drmcg_mutex);
 	return 0;
 }
@@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
 				MEM_BW_LIMITS_NAME_AVG,
 				ddr->mem_bw_limits_avg_bytes_per_us);
 		break;
+	case DRMCG_TYPE_LGPU:
+		seq_printf(sf, "%s=%lld %s=%*pbl\n",
+				LGPU_LIMITS_NAME_COUNT,
+				ddr->lgpu_used,
+				LGPU_LIMITS_NAME_LIST,
+				dev->drmcg_props.lgpu_capacity,
+				ddr->lgpu_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
 				MEM_BW_LIMITS_NAME_AVG,
 				props->mem_bw_avg_bytes_per_us_default);
 		break;
+	case DRMCG_TYPE_LGPU:
+		seq_printf(sf, "%s=%d %s=%*pbl\n",
+				LGPU_LIMITS_NAME_COUNT,
+				bitmap_weight(props->lgpu_slots,
+					props->lgpu_capacity),
+				LGPU_LIMITS_NAME_LIST,
+				props->lgpu_capacity,
+				props->lgpu_slots);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
 	mutex_unlock(&dev->drmcg_mutex);
 }
 
+static void drmcg_lgpu_values_apply(struct drm_device *dev,
+		struct drmcg_device_resource *ddr, unsigned long *val)
+{
+
+	mutex_lock(&dev->drmcg_mutex);
+	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
+	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
+	mutex_unlock(&dev->drmcg_mutex);
+}
+
 static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
 		struct drm_device *dev, char *attrs)
 {
+	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
+	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
 	enum drmcg_res_type type =
 		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
 	struct drmcg *drmcg = css_to_drmcg(of_css(of));
@@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
 				continue;
 			}
 			break; /* DRMCG_TYPE_MEM */
+		case DRMCG_TYPE_LGPU:
+			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
+				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
+				continue;
+
+                        if (!strcmp("max", sval) ||
+					!strcmp("default", sval)) {
+				if (parent != NULL)
+					drmcg_lgpu_values_apply(dev, ddr,
+						parent->dev_resources[minor]->
+						lgpu_allocated);
+				else
+					drmcg_lgpu_values_apply(dev, ddr,
+						props->lgpu_slots);
+
+				continue;
+			}
+
+			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
+				p_max = parent == NULL ? props->lgpu_capacity:
+					bitmap_weight(
+					parent->dev_resources[minor]->
+					lgpu_allocated, props->lgpu_capacity);
+
+				rc = drmcg_process_limit_s64_val(sval,
+					false, p_max, p_max, &val);
+
+				if (rc || val < 0) {
+					drmcg_pr_cft_err(drmcg, rc, cft_name,
+							minor);
+					continue;
+				}
+
+				bitmap_zero(tmp_bitmap,
+						MAX_DRMCG_LGPU_CAPACITY);
+				bitmap_set(tmp_bitmap, 0, val);
+			}
+
+			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
+				rc = bitmap_parselist(sval, tmp_bitmap,
+						MAX_DRMCG_LGPU_CAPACITY);
+
+				if (rc) {
+					drmcg_pr_cft_err(drmcg, rc, cft_name,
+							minor);
+					continue;
+				}
+
+                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
+					props->lgpu_slots,
+					MAX_DRMCG_LGPU_CAPACITY);
+
+                        	if (!bitmap_empty(chk_bitmap,
+						MAX_DRMCG_LGPU_CAPACITY)) {
+					drmcg_pr_cft_err(drmcg, 0, cft_name,
+							minor);
+					continue;
+				}
+			}
+
+
+                        if (parent != NULL) {
+				bitmap_and(chk_bitmap, tmp_bitmap,
+				parent->dev_resources[minor]->lgpu_allocated,
+				props->lgpu_capacity);
+
+				if (bitmap_empty(chk_bitmap,
+						props->lgpu_capacity)) {
+					drmcg_pr_cft_err(drmcg, 0,
+							cft_name, minor);
+					continue;
+				}
+			}
+
+			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
+
+			break; /* DRMCG_TYPE_LGPU */
 		default:
 			break;
 		} /* switch (type) */
@@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 			break;
 		case DRMCG_TYPE_BANDWIDTH:
 		case DRMCG_TYPE_MEM:
+		case DRMCG_TYPE_LGPU:
 			drmcg_nested_limit_parse(of, dm->dev, sattr);
 			break;
 		default:
@@ -731,6 +846,20 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
 						DRMCG_FTYPE_DEFAULT),
 	},
+	{
+		.name = "lgpu",
+		.seq_show = drmcg_seq_show,
+		.write = drmcg_limit_write,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
+						DRMCG_FTYPE_LIMIT),
+	},
+	{
+		.name = "lgpu.default",
+		.seq_show = drmcg_seq_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
+						DRMCG_FTYPE_DEFAULT),
+	},
 	{ }	/* terminate */
 };
 
@@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
 
 static inline void drmcg_update_cg_tree(struct drm_device *dev)
 {
+        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
+        bitmap_fill(dev->drmcg_props.lgpu_slots,
+			dev->drmcg_props.lgpu_capacity);
+
 	/* init cgroups created before registration (i.e. root cgroup) */
 	if (root_drmcg != NULL) {
 		struct cgroup_subsys_state *pos;
@@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
 	for (i = 0; i <= TTM_PL_PRIV; i++)
 		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
 
+	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
+
 	drmcg_update_cg_tree(dev);
 }
 EXPORT_SYMBOL(drmcg_device_early_init);
-- 
2.22.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 15/16] drm, cgroup: add update trigger after limit change
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2019-08-29  6:05   ` [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim Kenny Ho
@ 2019-08-29  6:05   ` Kenny Ho
  2019-08-31  4:28   ` [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Tejun Heo
  9 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, felix.kuehling-5C7GfCeVMHo,
	joseph.greathouse-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	lkaplan-WVYJKLFxKCc, daniel-/w4YWyX8dFk
  Cc: Kenny Ho

Before this commit, drmcg limits are updated but enforcement is delayed
until the next time the driver check against the new limit.  While this
is sufficient for certain resources, a more proactive enforcement may be
needed for other resources.

Introducing an optional drmcg_limit_updated callback for the DRM
drivers.  When defined, it will be called in two scenarios:
1) When limits are updated for a particular cgroup, the callback will be
triggered for each task in the updated cgroup.
2) When a task is migrated from one cgroup to another, the callback will
be triggered for each resource type for the migrated task.

Change-Id: I68187a72818b855b5f295aefcb241cda8ab63b00
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 include/drm/drm_drv.h | 10 ++++++++
 kernel/cgroup/drm.c   | 57 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index c8a37a08d98d..7e588b874a27 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -669,6 +669,16 @@ struct drm_driver {
 	void (*drmcg_custom_init)(struct drm_device *dev,
 			struct drmcg_props *props);
 
+	/**
+	 * @drmcg_limit_updated
+	 *
+	 * Optional callback
+	 */
+	void (*drmcg_limit_updated)(struct drm_device *dev,
+			struct task_struct *task,\
+			struct drmcg_device_resource *ddr,
+			enum drmcg_res_type res_type);
+
 	/**
 	 * @gem_vm_ops: Driver private ops for this object
 	 */
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 18c4368e2c29..99772e5d9ccc 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -621,6 +621,23 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
 	}
 }
 
+static void drmcg_limit_updated(struct drm_device *dev, struct drmcg *drmcg,
+		enum drmcg_res_type res_type)
+{
+	struct drmcg_device_resource *ddr =
+		drmcg->dev_resources[dev->primary->index];
+	struct css_task_iter it;
+	struct task_struct *task;
+
+	css_task_iter_start(&drmcg->css.cgroup->self,
+			CSS_TASK_ITER_PROCS, &it);
+	while ((task = css_task_iter_next(&it))) {
+		dev->driver->drmcg_limit_updated(dev, task,
+				ddr, res_type);
+	}
+	css_task_iter_end(&it);
+}
+
 static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 		size_t nbytes, loff_t off)
 {
@@ -726,6 +743,10 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
 		default:
 			break;
 		}
+
+		if (dm->dev->driver->drmcg_limit_updated)
+			drmcg_limit_updated(dm->dev, drmcg, type);
+
 		drm_dev_put(dm->dev); /* release from drm_minor_acquire */
 	}
 
@@ -863,9 +884,45 @@ struct cftype files[] = {
 	{ }	/* terminate */
 };
 
+static int drmcg_attach_fn(int id, void *ptr, void *data)
+{
+	struct drm_minor *minor = ptr;
+	struct task_struct *task = data;
+	struct drm_device *dev;
+
+	if (minor->type != DRM_MINOR_PRIMARY)
+		return 0;
+
+	dev = minor->dev;
+
+	if (dev->driver->drmcg_limit_updated) {
+		struct drmcg *drmcg = drmcg_get(task);
+		struct drmcg_device_resource *ddr =
+			drmcg->dev_resources[minor->index];
+		enum drmcg_res_type type;
+
+		for (type = 0; type < __DRMCG_TYPE_LAST; type++)
+			dev->driver->drmcg_limit_updated(dev, task, ddr, type);
+
+		drmcg_put(drmcg);
+	}
+
+	return 0;
+}
+
+static void drmcg_attach(struct cgroup_taskset *tset)
+{
+	struct task_struct *task;
+	struct cgroup_subsys_state *css;
+
+	cgroup_taskset_for_each(task, css, tset)
+		drm_minor_for_each(&drmcg_attach_fn, task);
+}
+
 struct cgroup_subsys drm_cgrp_subsys = {
 	.css_alloc	= drmcg_css_alloc,
 	.css_free	= drmcg_css_free,
+	.attach		= drmcg_attach,
 	.early_init	= false,
 	.legacy_cftypes	= files,
 	.dfl_cftypes	= files,
-- 
2.22.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup
  2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (6 preceding siblings ...)
  2019-08-29  6:05 ` [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource Kenny Ho
@ 2019-08-29  6:05 ` Kenny Ho
       [not found]   ` <20190829060533.32315-17-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-09-03  8:02 ` [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Daniel Vetter
  8 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-08-29  6:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, felix.kuehling, joseph.greathouse, jsparks,
	lkaplan, daniel
  Cc: Kenny Ho

The number of logical gpu (lgpu) is defined to be the number of compute
unit (CU) for a device.  The lgpu allocation limit only applies to
compute workload for the moment (enforced via kfd queue creation.)  Any
cu_mask update is validated against the availability of the compute unit
as defined by the drmcg the kfd process belongs to.

Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  21 +++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   6 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +
 .../amd/amdkfd/kfd_process_queue_manager.c    | 140 ++++++++++++++++++
 5 files changed, 174 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 55cb1b2094fd..369915337213 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
 		valid;							\
 	})
 
+int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
+		struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
+		unsigned int nbits);
+
 /* GPUVM API */
 int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
 					void **vm, void **process_info,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 163a4fbf0611..8abeffdd2e5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
 static void amdgpu_drmcg_custom_init(struct drm_device *dev,
 	struct drmcg_props *props)
 {
+	struct amdgpu_device *adev = dev->dev_private;
+
+	props->lgpu_capacity = adev->gfx.cu_info.number;
+
 	props->limit_enforced = true;
 }
 
+static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
+		struct task_struct *task, struct drmcg_device_resource *ddr,
+		enum drmcg_res_type res_type)
+{
+	struct amdgpu_device *adev = dev->dev_private;
+
+	switch (res_type) {
+	case DRMCG_TYPE_LGPU:
+		amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
+                        ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
+		break;
+	default:
+		break;
+	}
+}
+
 static struct drm_driver kms_driver = {
 	.driver_features =
 	    DRIVER_USE_AGP | DRIVER_ATOMIC |
@@ -1438,6 +1458,7 @@ static struct drm_driver kms_driver = {
 	.gem_prime_mmap = amdgpu_gem_prime_mmap,
 
 	.drmcg_custom_init = amdgpu_drmcg_custom_init,
+	.drmcg_limit_updated = amdgpu_drmcg_limit_updated,
 
 	.name = DRIVER_NAME,
 	.desc = DRIVER_DESC,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 138c70454e2b..fa765b803f97 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
 		return -EFAULT;
 	}
 
+	if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, cu_mask_size)) {
+		pr_debug("CU mask not permitted by DRM Cgroup");
+		kfree(properties.cu_mask);
+		return -EACCES;
+	}
+
 	mutex_lock(&p->mutex);
 
 	retval = pqm_set_cu_mask(&p->pqm, args->queue_id, &properties);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 8b0eee5b3521..88881bec7550 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 		       u32 *ctl_stack_used_size,
 		       u32 *save_area_used_size);
 
+bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
+		unsigned int cu_mask_size);
+
 int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
 				unsigned int fence_value,
 				unsigned int timeout_ms);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 7e6c3ee82f5b..a896de290307 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -23,9 +23,11 @@
 
 #include <linux/slab.h>
 #include <linux/list.h>
+#include <linux/cgroup_drm.h>
 #include "kfd_device_queue_manager.h"
 #include "kfd_priv.h"
 #include "kfd_kernel_queue.h"
+#include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 
 static inline struct process_queue_node *get_queue_by_qid(
@@ -167,6 +169,7 @@ static int create_cp_queue(struct process_queue_manager *pqm,
 				struct queue_properties *q_properties,
 				struct file *f, unsigned int qid)
 {
+	struct drmcg *drmcg;
 	int retval;
 
 	/* Doorbell initialized in user space*/
@@ -180,6 +183,36 @@ static int create_cp_queue(struct process_queue_manager *pqm,
 	if (retval != 0)
 		return retval;
 
+
+	drmcg = drmcg_get(pqm->process->lead_thread);
+	if (drmcg) {
+		struct amdgpu_device *adev;
+		struct drmcg_device_resource *ddr;
+		int mask_size;
+		u32 *mask;
+
+		adev = (struct amdgpu_device *) dev->kgd;
+
+		mask_size = adev->ddev->drmcg_props.lgpu_capacity;
+		mask = kzalloc(sizeof(u32) * round_up(mask_size, 32),
+				GFP_KERNEL);
+
+		if (!mask) {
+			drmcg_put(drmcg);
+			uninit_queue(*q);
+			return -ENOMEM;
+		}
+
+		ddr = drmcg->dev_resources[adev->ddev->primary->index];
+
+		bitmap_to_arr32(mask, ddr->lgpu_allocated, mask_size);
+
+		(*q)->properties.cu_mask_count = mask_size;
+		(*q)->properties.cu_mask = mask;
+
+		drmcg_put(drmcg);
+	}
+
 	(*q)->device = dev;
 	(*q)->process = pqm->process;
 
@@ -495,6 +528,113 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 						       save_area_used_size);
 }
 
+bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
+		unsigned int cu_mask_size)
+{
+	DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY);
+	struct drmcg_device_resource *ddr;
+	struct process_queue_node *pqn;
+	struct amdgpu_device *adev;
+	struct drmcg *drmcg;
+	bool result;
+
+	if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY)
+		return false;
+
+	bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size);
+
+	pqn = get_queue_by_qid(&p->pqm, qid);
+	if (!pqn)
+		return false;
+
+	adev = (struct amdgpu_device *)pqn->q->device->kgd;
+
+	drmcg = drmcg_get(p->lead_thread);
+	ddr = drmcg->dev_resources[adev->ddev->primary->index];
+
+	if (bitmap_subset(curr_mask, ddr->lgpu_allocated,
+				MAX_DRMCG_LGPU_CAPACITY))
+		result = true;
+	else
+		result = false;
+
+	drmcg_put(drmcg);
+
+	return result;
+}
+
+int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
+		struct amdgpu_device *adev, unsigned long *lgpu_bm,
+		unsigned int lgpu_bm_size)
+{
+	struct kfd_dev *kdev = adev->kfd.dev;
+	struct process_queue_node *pqn;
+	struct kfd_process *kfdproc;
+	size_t size_in_bytes;
+	u32 *cu_mask;
+	int rc = 0;
+
+	if ((lgpu_bm_size % 32) != 0) {
+		pr_warn("lgpu_bm_size %d must be a multiple of 32",
+				lgpu_bm_size);
+		return -EINVAL;
+	}
+
+	kfdproc = kfd_get_process(task);
+
+	if (IS_ERR(kfdproc))
+		return -ESRCH;
+
+	size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32);
+
+	mutex_lock(&kfdproc->mutex);
+	list_for_each_entry(pqn, &kfdproc->pqm.queues, process_queue_list) {
+		if (pqn->q && pqn->q->device == kdev) {
+			/* update cu_mask accordingly */
+			cu_mask = kzalloc(size_in_bytes, GFP_KERNEL);
+			if (!cu_mask) {
+				rc = -ENOMEM;
+				break;
+			}
+
+			if (pqn->q->properties.cu_mask) {
+				DECLARE_BITMAP(curr_mask,
+						MAX_DRMCG_LGPU_CAPACITY);
+
+				if (pqn->q->properties.cu_mask_count >
+						lgpu_bm_size) {
+					rc = -EINVAL;
+					kfree(cu_mask);
+					break;
+				}
+
+				bitmap_from_arr32(curr_mask,
+						pqn->q->properties.cu_mask,
+						pqn->q->properties.cu_mask_count);
+
+				bitmap_and(curr_mask, curr_mask, lgpu_bm,
+						lgpu_bm_size);
+
+				bitmap_to_arr32(cu_mask, curr_mask,
+						lgpu_bm_size);
+
+				kfree(curr_mask);
+			} else
+				bitmap_to_arr32(cu_mask, lgpu_bm,
+						lgpu_bm_size);
+
+			pqn->q->properties.cu_mask = cu_mask;
+			pqn->q->properties.cu_mask_count = lgpu_bm_size;
+
+			rc = pqn->q->device->dqm->ops.update_queue(
+					pqn->q->device->dqm, pqn->q);
+		}
+	}
+	mutex_unlock(&kfdproc->mutex);
+
+	return rc;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 int pqm_debugfs_mqds(struct seq_file *m, void *data)
-- 
2.22.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim
  2019-08-29  6:05   ` [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim Kenny Ho
@ 2019-08-29  7:08     ` Koenig, Christian
  2019-08-29 14:07       ` Kenny Ho
  0 siblings, 1 reply; 89+ messages in thread
From: Koenig, Christian @ 2019-08-29  7:08 UTC (permalink / raw)
  To: Ho, Kenny, y2kenny, cgroups, dri-devel, amd-gfx, tj, Deucher,
	Alexander, Kuehling, Felix, Greathouse, Joseph, jsparks, lkaplan,
	daniel

Am 29.08.19 um 08:05 schrieb Kenny Ho:
> Allow DRM TTM memory manager to register a work_struct, such that, when
> a drmcgrp is under memory pressure, memory reclaiming can be triggered
> immediately.
>
> Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c    | 49 +++++++++++++++++++++++++++++++++
>   include/drm/drm_cgroup.h        | 16 +++++++++++
>   include/drm/ttm/ttm_bo_driver.h |  2 ++
>   kernel/cgroup/drm.c             | 30 ++++++++++++++++++++
>   4 files changed, 97 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index d7e3d3128ebb..72efae694b7e 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1590,6 +1590,46 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned mem_type)
>   }
>   EXPORT_SYMBOL(ttm_bo_evict_mm);
>   
> +static void ttm_bo_reclaim_wq(struct work_struct *work)
> +{
> +	struct ttm_operation_ctx ctx = {
> +		.interruptible = false,
> +		.no_wait_gpu = false,
> +		.flags = TTM_OPT_FLAG_FORCE_ALLOC
> +	};
> +	struct ttm_mem_type_manager *man =
> +	    container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> +	struct ttm_bo_device *bdev = man->bdev;
> +	struct dma_fence *fence;
> +	int mem_type;
> +	int ret;
> +
> +	for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> +		if (&bdev->man[mem_type] == man)
> +			break;
> +
> +	WARN_ON(mem_type >= TTM_NUM_MEM_TYPES);
> +	if (mem_type >= TTM_NUM_MEM_TYPES)
> +		return;
> +
> +	if (!drmcg_mem_pressure_scan(bdev, mem_type))
> +		return;
> +
> +	ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx, NULL);
> +	if (ret)
> +		return;
> +
> +	spin_lock(&man->move_lock);
> +	fence = dma_fence_get(man->move);
> +	spin_unlock(&man->move_lock);
> +
> +	if (fence) {
> +		ret = dma_fence_wait(fence, false);
> +		dma_fence_put(fence);
> +	}

Why do you want to block for the fence here? That is a rather bad idea 
and would break pipe-lining.

Apart from that I don't think we should put that into TTM.

Instead drmcg_register_device_mm() should get a function pointer which 
is called from a work item when the group is under pressure.

TTM can then provides the function which can be called, but the actually 
registration is job of the device and not TTM.

Regards,
Christian.

> +
> +}
> +
>   int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>   			unsigned long p_size)
>   {
> @@ -1624,6 +1664,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>   		INIT_LIST_HEAD(&man->lru[i]);
>   	man->move = NULL;
>   
> +	pr_err("drmcg %p type %d\n", bdev->ddev, type);
> +
> +	if (type <= TTM_PL_VRAM) {
> +		INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
> +		drmcg_register_device_mm(bdev->ddev, type, &man->reclaim_wq);
> +	}
> +
>   	return 0;
>   }
>   EXPORT_SYMBOL(ttm_bo_init_mm);
> @@ -1701,6 +1748,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
>   		man = &bdev->man[i];
>   		if (man->has_type) {
>   			man->use_type = false;
> +			drmcg_unregister_device_mm(bdev->ddev, i);
> +			cancel_work_sync(&man->reclaim_wq);
>   			if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
>   				ret = -EBUSY;
>   				pr_err("DRM memory manager type %d is not clean\n",
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> index c11df388fdf2..6d9707e1eb72 100644
> --- a/include/drm/drm_cgroup.h
> +++ b/include/drm/drm_cgroup.h
> @@ -5,6 +5,7 @@
>   #define __DRM_CGROUP_H__
>   
>   #include <linux/cgroup_drm.h>
> +#include <linux/workqueue.h>
>   #include <drm/ttm/ttm_bo_api.h>
>   #include <drm/ttm/ttm_bo_driver.h>
>   
> @@ -25,12 +26,17 @@ struct drmcg_props {
>   	s64			mem_bw_avg_bytes_per_us_default;
>   
>   	s64			mem_highs_default[TTM_PL_PRIV+1];
> +
> +	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
>   };
>   
>   #ifdef CONFIG_CGROUP_DRM
>   
>   void drmcg_device_update(struct drm_device *device);
>   void drmcg_device_early_init(struct drm_device *device);
> +void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
> +		struct work_struct *wq);
> +void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type);
>   bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
>   		size_t size);
>   void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
> @@ -53,6 +59,16 @@ static inline void drmcg_device_early_init(struct drm_device *device)
>   {
>   }
>   
> +static inline void drmcg_register_device_mm(struct drm_device *dev,
> +		unsigned int type, struct work_struct *wq)
> +{
> +}
> +
> +static inline void drmcg_unregister_device_mm(struct drm_device *dev,
> +		unsigned int type)
> +{
> +}
> +
>   static inline void drmcg_try_chg_bo_alloc(struct drmcg *drmcg,
>   		struct drm_device *dev,	size_t size)
>   {
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index e1a805d65b83..529cef92bcf6 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
>   	 * Protected by @move_lock.
>   	 */
>   	struct dma_fence *move;
> +
> +	struct work_struct reclaim_wq;
>   };
>   
>   /**
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 04fb9a398740..0ea7f0619e25 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -804,6 +804,29 @@ void drmcg_device_early_init(struct drm_device *dev)
>   }
>   EXPORT_SYMBOL(drmcg_device_early_init);
>   
> +void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
> +		struct work_struct *wq)
> +{
> +	if (dev == NULL || type >= TTM_PL_PRIV)
> +		return;
> +
> +	mutex_lock(&drmcg_mutex);
> +	dev->drmcg_props.mem_reclaim_wq[type] = wq;
> +	mutex_unlock(&drmcg_mutex);
> +}
> +EXPORT_SYMBOL(drmcg_register_device_mm);
> +
> +void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type)
> +{
> +	if (dev == NULL || type >= TTM_PL_PRIV)
> +		return;
> +
> +	mutex_lock(&drmcg_mutex);
> +	dev->drmcg_props.mem_reclaim_wq[type] = NULL;
> +	mutex_unlock(&drmcg_mutex);
> +}
> +EXPORT_SYMBOL(drmcg_unregister_device_mm);
> +
>   /**
>    * drmcg_try_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
>    * @drmcg: the DRM cgroup to be charged to
> @@ -1013,6 +1036,13 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>   
>   		ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT]
>   			-= move_in_bytes;
> +
> +		if (dev->drmcg_props.mem_reclaim_wq[new_mem_type]
> +			!= NULL &&
> +			ddr->mem_stats[new_mem_type] >
> +				ddr->mem_highs[new_mem_type])
> +			schedule_work(dev->
> +				drmcg_props.mem_reclaim_wq[new_mem_type]);
>   	}
>   	mutex_unlock(&dev->drmcg_mutex);
>   }

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim
  2019-08-29  7:08     ` Koenig, Christian
@ 2019-08-29 14:07       ` Kenny Ho
       [not found]         ` <CAOWid-dzJiqjH9+=36fFYh87OKOzToMDcJZpepOWdjoXpBSF8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-08-29 14:07 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Ho, Kenny, Kuehling, Felix, jsparks, amd-gfx, lkaplan, Deucher,
	Alexander, dri-devel, Greathouse, Joseph, tj, cgroups


[-- Attachment #1.1: Type: text/plain, Size: 8333 bytes --]

Thanks for the feedback Christian.  I am still digging into this one.
Daniel suggested leveraging the Shrinker API for the functionality of this
commit in RFC v3 but I am still trying to figure it out how/if ttm fit with
shrinker (though the idea behind the shrinker API seems fairly
straightforward as far as I understand it currently.)

Regards,
Kenny

On Thu, Aug 29, 2019 at 3:08 AM Koenig, Christian <Christian.Koenig@amd.com>
wrote:

> Am 29.08.19 um 08:05 schrieb Kenny Ho:
> > Allow DRM TTM memory manager to register a work_struct, such that, when
> > a drmcgrp is under memory pressure, memory reclaiming can be triggered
> > immediately.
> >
> > Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c    | 49 +++++++++++++++++++++++++++++++++
> >   include/drm/drm_cgroup.h        | 16 +++++++++++
> >   include/drm/ttm/ttm_bo_driver.h |  2 ++
> >   kernel/cgroup/drm.c             | 30 ++++++++++++++++++++
> >   4 files changed, 97 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index d7e3d3128ebb..72efae694b7e 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -1590,6 +1590,46 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev,
> unsigned mem_type)
> >   }
> >   EXPORT_SYMBOL(ttm_bo_evict_mm);
> >
> > +static void ttm_bo_reclaim_wq(struct work_struct *work)
> > +{
> > +     struct ttm_operation_ctx ctx = {
> > +             .interruptible = false,
> > +             .no_wait_gpu = false,
> > +             .flags = TTM_OPT_FLAG_FORCE_ALLOC
> > +     };
> > +     struct ttm_mem_type_manager *man =
> > +         container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> > +     struct ttm_bo_device *bdev = man->bdev;
> > +     struct dma_fence *fence;
> > +     int mem_type;
> > +     int ret;
> > +
> > +     for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> > +             if (&bdev->man[mem_type] == man)
> > +                     break;
> > +
> > +     WARN_ON(mem_type >= TTM_NUM_MEM_TYPES);
> > +     if (mem_type >= TTM_NUM_MEM_TYPES)
> > +             return;
> > +
> > +     if (!drmcg_mem_pressure_scan(bdev, mem_type))
> > +             return;
> > +
> > +     ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx, NULL);
> > +     if (ret)
> > +             return;
> > +
> > +     spin_lock(&man->move_lock);
> > +     fence = dma_fence_get(man->move);
> > +     spin_unlock(&man->move_lock);
> > +
> > +     if (fence) {
> > +             ret = dma_fence_wait(fence, false);
> > +             dma_fence_put(fence);
> > +     }
>
> Why do you want to block for the fence here? That is a rather bad idea
> and would break pipe-lining.
>
> Apart from that I don't think we should put that into TTM.
>
> Instead drmcg_register_device_mm() should get a function pointer which
> is called from a work item when the group is under pressure.
>
> TTM can then provides the function which can be called, but the actually
> registration is job of the device and not TTM.
>
> Regards,
> Christian.
>
> > +
> > +}
> > +
> >   int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> >                       unsigned long p_size)
> >   {
> > @@ -1624,6 +1664,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev,
> unsigned type,
> >               INIT_LIST_HEAD(&man->lru[i]);
> >       man->move = NULL;
> >
> > +     pr_err("drmcg %p type %d\n", bdev->ddev, type);
> > +
> > +     if (type <= TTM_PL_VRAM) {
> > +             INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
> > +             drmcg_register_device_mm(bdev->ddev, type,
> &man->reclaim_wq);
> > +     }
> > +
> >       return 0;
> >   }
> >   EXPORT_SYMBOL(ttm_bo_init_mm);
> > @@ -1701,6 +1748,8 @@ int ttm_bo_device_release(struct ttm_bo_device
> *bdev)
> >               man = &bdev->man[i];
> >               if (man->has_type) {
> >                       man->use_type = false;
> > +                     drmcg_unregister_device_mm(bdev->ddev, i);
> > +                     cancel_work_sync(&man->reclaim_wq);
> >                       if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev,
> i)) {
> >                               ret = -EBUSY;
> >                               pr_err("DRM memory manager type %d is not
> clean\n",
> > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > index c11df388fdf2..6d9707e1eb72 100644
> > --- a/include/drm/drm_cgroup.h
> > +++ b/include/drm/drm_cgroup.h
> > @@ -5,6 +5,7 @@
> >   #define __DRM_CGROUP_H__
> >
> >   #include <linux/cgroup_drm.h>
> > +#include <linux/workqueue.h>
> >   #include <drm/ttm/ttm_bo_api.h>
> >   #include <drm/ttm/ttm_bo_driver.h>
> >
> > @@ -25,12 +26,17 @@ struct drmcg_props {
> >       s64                     mem_bw_avg_bytes_per_us_default;
> >
> >       s64                     mem_highs_default[TTM_PL_PRIV+1];
> > +
> > +     struct work_struct      *mem_reclaim_wq[TTM_PL_PRIV];
> >   };
> >
> >   #ifdef CONFIG_CGROUP_DRM
> >
> >   void drmcg_device_update(struct drm_device *device);
> >   void drmcg_device_early_init(struct drm_device *device);
> > +void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
> > +             struct work_struct *wq);
> > +void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int
> type);
> >   bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device
> *dev,
> >               size_t size);
> >   void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
> > @@ -53,6 +59,16 @@ static inline void drmcg_device_early_init(struct
> drm_device *device)
> >   {
> >   }
> >
> > +static inline void drmcg_register_device_mm(struct drm_device *dev,
> > +             unsigned int type, struct work_struct *wq)
> > +{
> > +}
> > +
> > +static inline void drmcg_unregister_device_mm(struct drm_device *dev,
> > +             unsigned int type)
> > +{
> > +}
> > +
> >   static inline void drmcg_try_chg_bo_alloc(struct drmcg *drmcg,
> >               struct drm_device *dev, size_t size)
> >   {
> > diff --git a/include/drm/ttm/ttm_bo_driver.h
> b/include/drm/ttm/ttm_bo_driver.h
> > index e1a805d65b83..529cef92bcf6 100644
> > --- a/include/drm/ttm/ttm_bo_driver.h
> > +++ b/include/drm/ttm/ttm_bo_driver.h
> > @@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
> >        * Protected by @move_lock.
> >        */
> >       struct dma_fence *move;
> > +
> > +     struct work_struct reclaim_wq;
> >   };
> >
> >   /**
> > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> > index 04fb9a398740..0ea7f0619e25 100644
> > --- a/kernel/cgroup/drm.c
> > +++ b/kernel/cgroup/drm.c
> > @@ -804,6 +804,29 @@ void drmcg_device_early_init(struct drm_device *dev)
> >   }
> >   EXPORT_SYMBOL(drmcg_device_early_init);
> >
> > +void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
> > +             struct work_struct *wq)
> > +{
> > +     if (dev == NULL || type >= TTM_PL_PRIV)
> > +             return;
> > +
> > +     mutex_lock(&drmcg_mutex);
> > +     dev->drmcg_props.mem_reclaim_wq[type] = wq;
> > +     mutex_unlock(&drmcg_mutex);
> > +}
> > +EXPORT_SYMBOL(drmcg_register_device_mm);
> > +
> > +void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int
> type)
> > +{
> > +     if (dev == NULL || type >= TTM_PL_PRIV)
> > +             return;
> > +
> > +     mutex_lock(&drmcg_mutex);
> > +     dev->drmcg_props.mem_reclaim_wq[type] = NULL;
> > +     mutex_unlock(&drmcg_mutex);
> > +}
> > +EXPORT_SYMBOL(drmcg_unregister_device_mm);
> > +
> >   /**
> >    * drmcg_try_chg_bo_alloc - charge GEM buffer usage for a device and
> cgroup
> >    * @drmcg: the DRM cgroup to be charged to
> > @@ -1013,6 +1036,13 @@ void drmcg_mem_track_move(struct
> ttm_buffer_object *old_bo, bool evict,
> >
> >               ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT]
> >                       -= move_in_bytes;
> > +
> > +             if (dev->drmcg_props.mem_reclaim_wq[new_mem_type]
> > +                     != NULL &&
> > +                     ddr->mem_stats[new_mem_type] >
> > +                             ddr->mem_highs[new_mem_type])
> > +                     schedule_work(dev->
> > +                             drmcg_props.mem_reclaim_wq[new_mem_type]);
> >       }
> >       mutex_unlock(&dev->drmcg_mutex);
> >   }
>
>

[-- Attachment #1.2: Type: text/html, Size: 10774 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim
       [not found]         ` <CAOWid-dzJiqjH9+=36fFYh87OKOzToMDcJZpepOWdjoXpBSF8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-08-29 14:12           ` Koenig, Christian
       [not found]             ` <f6963293-bebe-0dca-b509-799f9096ca91-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Koenig, Christian @ 2019-08-29 14:12 UTC (permalink / raw)
  To: Kenny Ho
  Cc: daniel-/w4YWyX8dFk, Ho, Kenny, Kuehling, Felix,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, Deucher,  Alexander,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Greathouse, Joseph,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 8446 bytes --]

Yeah, that's also a really good idea as well.

The problem with the shrinker API is that it only applies to system memory currently.

So you won't have a distinction which domain you need to evict stuff from.

Regards,
Christian.

Am 29.08.19 um 16:07 schrieb Kenny Ho:
Thanks for the feedback Christian.  I am still digging into this one.  Daniel suggested leveraging the Shrinker API for the functionality of this commit in RFC v3 but I am still trying to figure it out how/if ttm fit with shrinker (though the idea behind the shrinker API seems fairly straightforward as far as I understand it currently.)

Regards,
Kenny

On Thu, Aug 29, 2019 at 3:08 AM Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>> wrote:
Am 29.08.19 um 08:05 schrieb Kenny Ho:
> Allow DRM TTM memory manager to register a work_struct, such that, when
> a drmcgrp is under memory pressure, memory reclaiming can be triggered
> immediately.
>
> Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com<mailto:Kenny.Ho@amd.com>>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c    | 49 +++++++++++++++++++++++++++++++++
>   include/drm/drm_cgroup.h        | 16 +++++++++++
>   include/drm/ttm/ttm_bo_driver.h |  2 ++
>   kernel/cgroup/drm.c             | 30 ++++++++++++++++++++
>   4 files changed, 97 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index d7e3d3128ebb..72efae694b7e 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1590,6 +1590,46 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned mem_type)
>   }
>   EXPORT_SYMBOL(ttm_bo_evict_mm);
>
> +static void ttm_bo_reclaim_wq(struct work_struct *work)
> +{
> +     struct ttm_operation_ctx ctx = {
> +             .interruptible = false,
> +             .no_wait_gpu = false,
> +             .flags = TTM_OPT_FLAG_FORCE_ALLOC
> +     };
> +     struct ttm_mem_type_manager *man =
> +         container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> +     struct ttm_bo_device *bdev = man->bdev;
> +     struct dma_fence *fence;
> +     int mem_type;
> +     int ret;
> +
> +     for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> +             if (&bdev->man[mem_type] == man)
> +                     break;
> +
> +     WARN_ON(mem_type >= TTM_NUM_MEM_TYPES);
> +     if (mem_type >= TTM_NUM_MEM_TYPES)
> +             return;
> +
> +     if (!drmcg_mem_pressure_scan(bdev, mem_type))
> +             return;
> +
> +     ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx, NULL);
> +     if (ret)
> +             return;
> +
> +     spin_lock(&man->move_lock);
> +     fence = dma_fence_get(man->move);
> +     spin_unlock(&man->move_lock);
> +
> +     if (fence) {
> +             ret = dma_fence_wait(fence, false);
> +             dma_fence_put(fence);
> +     }

Why do you want to block for the fence here? That is a rather bad idea
and would break pipe-lining.

Apart from that I don't think we should put that into TTM.

Instead drmcg_register_device_mm() should get a function pointer which
is called from a work item when the group is under pressure.

TTM can then provides the function which can be called, but the actually
registration is job of the device and not TTM.

Regards,
Christian.

> +
> +}
> +
>   int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>                       unsigned long p_size)
>   {
> @@ -1624,6 +1664,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>               INIT_LIST_HEAD(&man->lru[i]);
>       man->move = NULL;
>
> +     pr_err("drmcg %p type %d\n", bdev->ddev, type);
> +
> +     if (type <= TTM_PL_VRAM) {
> +             INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
> +             drmcg_register_device_mm(bdev->ddev, type, &man->reclaim_wq);
> +     }
> +
>       return 0;
>   }
>   EXPORT_SYMBOL(ttm_bo_init_mm);
> @@ -1701,6 +1748,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
>               man = &bdev->man[i];
>               if (man->has_type) {
>                       man->use_type = false;
> +                     drmcg_unregister_device_mm(bdev->ddev, i);
> +                     cancel_work_sync(&man->reclaim_wq);
>                       if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
>                               ret = -EBUSY;
>                               pr_err("DRM memory manager type %d is not clean\n",
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> index c11df388fdf2..6d9707e1eb72 100644
> --- a/include/drm/drm_cgroup.h
> +++ b/include/drm/drm_cgroup.h
> @@ -5,6 +5,7 @@
>   #define __DRM_CGROUP_H__
>
>   #include <linux/cgroup_drm.h>
> +#include <linux/workqueue.h>
>   #include <drm/ttm/ttm_bo_api.h>
>   #include <drm/ttm/ttm_bo_driver.h>
>
> @@ -25,12 +26,17 @@ struct drmcg_props {
>       s64                     mem_bw_avg_bytes_per_us_default;
>
>       s64                     mem_highs_default[TTM_PL_PRIV+1];
> +
> +     struct work_struct      *mem_reclaim_wq[TTM_PL_PRIV];
>   };
>
>   #ifdef CONFIG_CGROUP_DRM
>
>   void drmcg_device_update(struct drm_device *device);
>   void drmcg_device_early_init(struct drm_device *device);
> +void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
> +             struct work_struct *wq);
> +void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type);
>   bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
>               size_t size);
>   void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
> @@ -53,6 +59,16 @@ static inline void drmcg_device_early_init(struct drm_device *device)
>   {
>   }
>
> +static inline void drmcg_register_device_mm(struct drm_device *dev,
> +             unsigned int type, struct work_struct *wq)
> +{
> +}
> +
> +static inline void drmcg_unregister_device_mm(struct drm_device *dev,
> +             unsigned int type)
> +{
> +}
> +
>   static inline void drmcg_try_chg_bo_alloc(struct drmcg *drmcg,
>               struct drm_device *dev, size_t size)
>   {
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index e1a805d65b83..529cef92bcf6 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
>        * Protected by @move_lock.
>        */
>       struct dma_fence *move;
> +
> +     struct work_struct reclaim_wq;
>   };
>
>   /**
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 04fb9a398740..0ea7f0619e25 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -804,6 +804,29 @@ void drmcg_device_early_init(struct drm_device *dev)
>   }
>   EXPORT_SYMBOL(drmcg_device_early_init);
>
> +void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
> +             struct work_struct *wq)
> +{
> +     if (dev == NULL || type >= TTM_PL_PRIV)
> +             return;
> +
> +     mutex_lock(&drmcg_mutex);
> +     dev->drmcg_props.mem_reclaim_wq[type] = wq;
> +     mutex_unlock(&drmcg_mutex);
> +}
> +EXPORT_SYMBOL(drmcg_register_device_mm);
> +
> +void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type)
> +{
> +     if (dev == NULL || type >= TTM_PL_PRIV)
> +             return;
> +
> +     mutex_lock(&drmcg_mutex);
> +     dev->drmcg_props.mem_reclaim_wq[type] = NULL;
> +     mutex_unlock(&drmcg_mutex);
> +}
> +EXPORT_SYMBOL(drmcg_unregister_device_mm);
> +
>   /**
>    * drmcg_try_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
>    * @drmcg: the DRM cgroup to be charged to
> @@ -1013,6 +1036,13 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>
>               ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT]
>                       -= move_in_bytes;
> +
> +             if (dev->drmcg_props.mem_reclaim_wq[new_mem_type]
> +                     != NULL &&
> +                     ddr->mem_stats[new_mem_type] >
> +                             ddr->mem_highs[new_mem_type])
> +                     schedule_work(dev->
> +                             drmcg_props.mem_reclaim_wq[new_mem_type]);
>       }
>       mutex_unlock(&dev->drmcg_mutex);
>   }



[-- Attachment #1.2: Type: text/html, Size: 14696 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim
       [not found]             ` <f6963293-bebe-0dca-b509-799f9096ca91-5C7GfCeVMHo@public.gmane.org>
@ 2019-08-29 14:39               ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-08-29 14:39 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: daniel-/w4YWyX8dFk, Ho, Kenny, Kuehling, Felix,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, Deucher, Alexander,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Greathouse, Joseph,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA

Yes, and I think it has quite a lot of coupling with mm's page and
pressure mechanisms.  My current thought is to just copy the API but
have a separate implementation of "ttm_shrinker" and
"ttm_shrinker_control" or something like that.  I am certainly happy
to listen to additional feedbacks and suggestions.

Regards,
Kenny


On Thu, Aug 29, 2019 at 10:12 AM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Yeah, that's also a really good idea as well.
>
> The problem with the shrinker API is that it only applies to system memory currently.
>
> So you won't have a distinction which domain you need to evict stuff from.
>
> Regards,
> Christian.
>
> Am 29.08.19 um 16:07 schrieb Kenny Ho:
>
> Thanks for the feedback Christian.  I am still digging into this one.  Daniel suggested leveraging the Shrinker API for the functionality of this commit in RFC v3 but I am still trying to figure it out how/if ttm fit with shrinker (though the idea behind the shrinker API seems fairly straightforward as far as I understand it currently.)
>
> Regards,
> Kenny
>
> On Thu, Aug 29, 2019 at 3:08 AM Koenig, Christian <Christian.Koenig@amd.com> wrote:
>>
>> Am 29.08.19 um 08:05 schrieb Kenny Ho:
>> > Allow DRM TTM memory manager to register a work_struct, such that, when
>> > a drmcgrp is under memory pressure, memory reclaiming can be triggered
>> > immediately.
>> >
>> > Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
>> > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
>> > ---
>> >   drivers/gpu/drm/ttm/ttm_bo.c    | 49 +++++++++++++++++++++++++++++++++
>> >   include/drm/drm_cgroup.h        | 16 +++++++++++
>> >   include/drm/ttm/ttm_bo_driver.h |  2 ++
>> >   kernel/cgroup/drm.c             | 30 ++++++++++++++++++++
>> >   4 files changed, 97 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> > index d7e3d3128ebb..72efae694b7e 100644
>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> > @@ -1590,6 +1590,46 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned mem_type)
>> >   }
>> >   EXPORT_SYMBOL(ttm_bo_evict_mm);
>> >
>> > +static void ttm_bo_reclaim_wq(struct work_struct *work)
>> > +{
>> > +     struct ttm_operation_ctx ctx = {
>> > +             .interruptible = false,
>> > +             .no_wait_gpu = false,
>> > +             .flags = TTM_OPT_FLAG_FORCE_ALLOC
>> > +     };
>> > +     struct ttm_mem_type_manager *man =
>> > +         container_of(work, struct ttm_mem_type_manager, reclaim_wq);
>> > +     struct ttm_bo_device *bdev = man->bdev;
>> > +     struct dma_fence *fence;
>> > +     int mem_type;
>> > +     int ret;
>> > +
>> > +     for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
>> > +             if (&bdev->man[mem_type] == man)
>> > +                     break;
>> > +
>> > +     WARN_ON(mem_type >= TTM_NUM_MEM_TYPES);
>> > +     if (mem_type >= TTM_NUM_MEM_TYPES)
>> > +             return;
>> > +
>> > +     if (!drmcg_mem_pressure_scan(bdev, mem_type))
>> > +             return;
>> > +
>> > +     ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx, NULL);
>> > +     if (ret)
>> > +             return;
>> > +
>> > +     spin_lock(&man->move_lock);
>> > +     fence = dma_fence_get(man->move);
>> > +     spin_unlock(&man->move_lock);
>> > +
>> > +     if (fence) {
>> > +             ret = dma_fence_wait(fence, false);
>> > +             dma_fence_put(fence);
>> > +     }
>>
>> Why do you want to block for the fence here? That is a rather bad idea
>> and would break pipe-lining.
>>
>> Apart from that I don't think we should put that into TTM.
>>
>> Instead drmcg_register_device_mm() should get a function pointer which
>> is called from a work item when the group is under pressure.
>>
>> TTM can then provides the function which can be called, but the actually
>> registration is job of the device and not TTM.
>>
>> Regards,
>> Christian.
>>
>> > +
>> > +}
>> > +
>> >   int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>> >                       unsigned long p_size)
>> >   {
>> > @@ -1624,6 +1664,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>> >               INIT_LIST_HEAD(&man->lru[i]);
>> >       man->move = NULL;
>> >
>> > +     pr_err("drmcg %p type %d\n", bdev->ddev, type);
>> > +
>> > +     if (type <= TTM_PL_VRAM) {
>> > +             INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
>> > +             drmcg_register_device_mm(bdev->ddev, type, &man->reclaim_wq);
>> > +     }
>> > +
>> >       return 0;
>> >   }
>> >   EXPORT_SYMBOL(ttm_bo_init_mm);
>> > @@ -1701,6 +1748,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
>> >               man = &bdev->man[i];
>> >               if (man->has_type) {
>> >                       man->use_type = false;
>> > +                     drmcg_unregister_device_mm(bdev->ddev, i);
>> > +                     cancel_work_sync(&man->reclaim_wq);
>> >                       if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
>> >                               ret = -EBUSY;
>> >                               pr_err("DRM memory manager type %d is not clean\n",
>> > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
>> > index c11df388fdf2..6d9707e1eb72 100644
>> > --- a/include/drm/drm_cgroup.h
>> > +++ b/include/drm/drm_cgroup.h
>> > @@ -5,6 +5,7 @@
>> >   #define __DRM_CGROUP_H__
>> >
>> >   #include <linux/cgroup_drm.h>
>> > +#include <linux/workqueue.h>
>> >   #include <drm/ttm/ttm_bo_api.h>
>> >   #include <drm/ttm/ttm_bo_driver.h>
>> >
>> > @@ -25,12 +26,17 @@ struct drmcg_props {
>> >       s64                     mem_bw_avg_bytes_per_us_default;
>> >
>> >       s64                     mem_highs_default[TTM_PL_PRIV+1];
>> > +
>> > +     struct work_struct      *mem_reclaim_wq[TTM_PL_PRIV];
>> >   };
>> >
>> >   #ifdef CONFIG_CGROUP_DRM
>> >
>> >   void drmcg_device_update(struct drm_device *device);
>> >   void drmcg_device_early_init(struct drm_device *device);
>> > +void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
>> > +             struct work_struct *wq);
>> > +void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type);
>> >   bool drmcg_try_chg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
>> >               size_t size);
>> >   void drmcg_unchg_bo_alloc(struct drmcg *drmcg, struct drm_device *dev,
>> > @@ -53,6 +59,16 @@ static inline void drmcg_device_early_init(struct drm_device *device)
>> >   {
>> >   }
>> >
>> > +static inline void drmcg_register_device_mm(struct drm_device *dev,
>> > +             unsigned int type, struct work_struct *wq)
>> > +{
>> > +}
>> > +
>> > +static inline void drmcg_unregister_device_mm(struct drm_device *dev,
>> > +             unsigned int type)
>> > +{
>> > +}
>> > +
>> >   static inline void drmcg_try_chg_bo_alloc(struct drmcg *drmcg,
>> >               struct drm_device *dev, size_t size)
>> >   {
>> > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
>> > index e1a805d65b83..529cef92bcf6 100644
>> > --- a/include/drm/ttm/ttm_bo_driver.h
>> > +++ b/include/drm/ttm/ttm_bo_driver.h
>> > @@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
>> >        * Protected by @move_lock.
>> >        */
>> >       struct dma_fence *move;
>> > +
>> > +     struct work_struct reclaim_wq;
>> >   };
>> >
>> >   /**
>> > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
>> > index 04fb9a398740..0ea7f0619e25 100644
>> > --- a/kernel/cgroup/drm.c
>> > +++ b/kernel/cgroup/drm.c
>> > @@ -804,6 +804,29 @@ void drmcg_device_early_init(struct drm_device *dev)
>> >   }
>> >   EXPORT_SYMBOL(drmcg_device_early_init);
>> >
>> > +void drmcg_register_device_mm(struct drm_device *dev, unsigned int type,
>> > +             struct work_struct *wq)
>> > +{
>> > +     if (dev == NULL || type >= TTM_PL_PRIV)
>> > +             return;
>> > +
>> > +     mutex_lock(&drmcg_mutex);
>> > +     dev->drmcg_props.mem_reclaim_wq[type] = wq;
>> > +     mutex_unlock(&drmcg_mutex);
>> > +}
>> > +EXPORT_SYMBOL(drmcg_register_device_mm);
>> > +
>> > +void drmcg_unregister_device_mm(struct drm_device *dev, unsigned int type)
>> > +{
>> > +     if (dev == NULL || type >= TTM_PL_PRIV)
>> > +             return;
>> > +
>> > +     mutex_lock(&drmcg_mutex);
>> > +     dev->drmcg_props.mem_reclaim_wq[type] = NULL;
>> > +     mutex_unlock(&drmcg_mutex);
>> > +}
>> > +EXPORT_SYMBOL(drmcg_unregister_device_mm);
>> > +
>> >   /**
>> >    * drmcg_try_chg_bo_alloc - charge GEM buffer usage for a device and cgroup
>> >    * @drmcg: the DRM cgroup to be charged to
>> > @@ -1013,6 +1036,13 @@ void drmcg_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>> >
>> >               ddr->mem_bw_stats[DRMCG_MEM_BW_ATTR_BYTE_CREDIT]
>> >                       -= move_in_bytes;
>> > +
>> > +             if (dev->drmcg_props.mem_reclaim_wq[new_mem_type]
>> > +                     != NULL &&
>> > +                     ddr->mem_stats[new_mem_type] >
>> > +                             ddr->mem_highs[new_mem_type])
>> > +                     schedule_work(dev->
>> > +                             drmcg_props.mem_reclaim_wq[new_mem_type]);
>> >       }
>> >       mutex_unlock(&dev->drmcg_mutex);
>> >   }
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2019-08-29  6:05   ` [PATCH RFC v4 15/16] drm, cgroup: add update trigger after limit change Kenny Ho
@ 2019-08-31  4:28   ` Tejun Heo
       [not found]     ` <20190831042857.GD2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
  9 siblings, 1 reply; 89+ messages in thread
From: Tejun Heo @ 2019-08-31  4:28 UTC (permalink / raw)
  To: Kenny Ho
  Cc: daniel-/w4YWyX8dFk, felix.kuehling-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, alexander.deucher-5C7GfCeVMHo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

Hello,

I just glanced through the interface and don't have enough context to
give any kind of detailed review yet.  I'll try to read up and
understand more and would greatly appreciate if you can give me some
pointers to read up on the resources being controlled and how the
actual use cases would look like.  That said, I have some basic
concerns.

* TTM vs. GEM distinction seems to be internal implementation detail
  rather than anything relating to underlying physical resources.
  Provided that's the case, I'm afraid these internal constructs being
  used as primary resource control objects likely isn't the right
  approach.  Whether a given driver uses one or the other internal
  abstraction layer shouldn't determine how resources are represented
  at the userland interface layer.

* While breaking up and applying control to different types of
  internal objects may seem attractive to folks who work day in and
  day out with the subsystem, they aren't all that useful to users and
  the siloed controls are likely to make the whole mechanism a lot
  less useful.  We had the same problem with cgroup1 memcg - putting
  control of different uses of memory under separate knobs.  It made
  the whole thing pretty useless.  e.g. if you constrain all knobs
  tight enough to control the overall usage, overall utilization
  suffers, but if you don't, you really don't have control over actual
  usage.  For memcg, what has to be allocated and controlled is
  physical memory, no matter how they're used.  It's not like you can
  go buy more "socket" memory.  At least from the looks of it, I'm
  afraid gpu controller is repeating the same mistakes.

Thanks.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found]     ` <20190831042857.GD2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
@ 2019-09-03  7:55       ` Daniel Vetter
       [not found]         ` <20190903075550.GJ2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-03  7:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: daniel-/w4YWyX8dFk, Kenny Ho, felix.kuehling-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, alexander.deucher-5C7GfCeVMHo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

On Fri, Aug 30, 2019 at 09:28:57PM -0700, Tejun Heo wrote:
> Hello,
> 
> I just glanced through the interface and don't have enough context to
> give any kind of detailed review yet.  I'll try to read up and
> understand more and would greatly appreciate if you can give me some
> pointers to read up on the resources being controlled and how the
> actual use cases would look like.  That said, I have some basic
> concerns.
> 
> * TTM vs. GEM distinction seems to be internal implementation detail
>   rather than anything relating to underlying physical resources.
>   Provided that's the case, I'm afraid these internal constructs being
>   used as primary resource control objects likely isn't the right
>   approach.  Whether a given driver uses one or the other internal
>   abstraction layer shouldn't determine how resources are represented
>   at the userland interface layer.

Yeah there's another RFC series from Brian Welty to abstract this away as
a memory region concept for gpus.

> * While breaking up and applying control to different types of
>   internal objects may seem attractive to folks who work day in and
>   day out with the subsystem, they aren't all that useful to users and
>   the siloed controls are likely to make the whole mechanism a lot
>   less useful.  We had the same problem with cgroup1 memcg - putting
>   control of different uses of memory under separate knobs.  It made
>   the whole thing pretty useless.  e.g. if you constrain all knobs
>   tight enough to control the overall usage, overall utilization
>   suffers, but if you don't, you really don't have control over actual
>   usage.  For memcg, what has to be allocated and controlled is
>   physical memory, no matter how they're used.  It's not like you can
>   go buy more "socket" memory.  At least from the looks of it, I'm
>   afraid gpu controller is repeating the same mistakes.

We do have quite a pile of different memories and ranges, so I don't
thinkt we're doing the same mistake here. But it is maybe a bit too
complicated, and exposes stuff that most users really don't care about.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]   ` <20190829060533.32315-2-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-03  7:57     ` Daniel Vetter
  2019-09-03 19:45       ` Kenny Ho
  0 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-03  7:57 UTC (permalink / raw)
  To: Kenny Ho
  Cc: daniel-/w4YWyX8dFk, felix.kuehling-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, alexander.deucher-5C7GfCeVMHo,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

On Thu, Aug 29, 2019 at 02:05:18AM -0400, Kenny Ho wrote:
> To allow other subsystems to iterate through all stored DRM minors and
> act upon them.
> 
> Also exposes drm_minor_acquire and drm_minor_release for other subsystem
> to handle drm_minor.  DRM cgroup controller is the initial consumer of
> this new features.
> 
> Change-Id: I7c4b67ce6b31f06d1037b03435386ff5b8144ca5
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>

Iterating over minors for cgroups sounds very, very wrong. Why do we care
whether a buffer was allocated through kms dumb vs render nodes?

I'd expect all the cgroup stuff to only work on drm_device, if it does
care about devices.

(I didn't look through the patch series to find out where exactly you're
using this, so maybe I'm off the rails here).
-Daniel

> ---
>  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
>  drivers/gpu/drm/drm_internal.h |  4 ----
>  include/drm/drm_drv.h          |  4 ++++
>  3 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 862621494a93..000cddabd970 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
>  
>  	return minor;
>  }
> +EXPORT_SYMBOL(drm_minor_acquire);
>  
>  void drm_minor_release(struct drm_minor *minor)
>  {
>  	drm_dev_put(minor->dev);
>  }
> +EXPORT_SYMBOL(drm_minor_release);
>  
>  /**
>   * DOC: driver instance overview
> @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
>  }
>  EXPORT_SYMBOL(drm_dev_set_unique);
>  
> +/**
> + * drm_minor_for_each - Iterate through all stored DRM minors
> + * @fn: Function to be called for each pointer.
> + * @data: Data passed to callback function.
> + *
> + * The callback function will be called for each @drm_minor entry, passing
> + * the minor, the entry and @data.
> + *
> + * If @fn returns anything other than %0, the iteration stops and that
> + * value is returned from this function.
> + */
> +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> +{
> +	return idr_for_each(&drm_minors_idr, fn, data);
> +}
> +EXPORT_SYMBOL(drm_minor_for_each);
> +
>  /*
>   * DRM Core
>   * The DRM core module initializes all global DRM objects and makes them
> diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> index e19ac7ca602d..6bfad76f8e78 100644
> --- a/drivers/gpu/drm/drm_internal.h
> +++ b/drivers/gpu/drm/drm_internal.h
> @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
>  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
>  					struct dma_buf *dma_buf);
>  
> -/* drm_drv.c */
> -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> -void drm_minor_release(struct drm_minor *minor);
> -
>  /* drm_vblank.c */
>  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
>  void drm_vblank_cleanup(struct drm_device *dev);
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 68ca736c548d..24f8d054c570 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
>  
>  int drm_dev_set_unique(struct drm_device *dev, const char *name);
>  
> +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> +
> +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> +void drm_minor_release(struct drm_minor *minor);
>  
>  #endif
> -- 
> 2.22.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
  2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (7 preceding siblings ...)
  2019-08-29  6:05 ` [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup Kenny Ho
@ 2019-09-03  8:02 ` Daniel Vetter
       [not found]   ` <20190903080217.GL2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  8 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-03  8:02 UTC (permalink / raw)
  To: Kenny Ho
  Cc: felix.kuehling, jsparks, amd-gfx, lkaplan, alexander.deucher,
	y2kenny, dri-devel, joseph.greathouse, tj, cgroups,
	christian.koenig

On Thu, Aug 29, 2019 at 02:05:17AM -0400, Kenny Ho wrote:
> This is a follow up to the RFC I made previously to introduce a cgroup
> controller for the GPU/DRM subsystem [v1,v2,v3].  The goal is to be able to
> provide resource management to GPU resources using things like container.  
> 
> With this RFC v4, I am hoping to have some consensus on a merge plan.  I believe
> the GEM related resources (drm.buffer.*) introduced in previous RFC and,
> hopefully, the logical GPU concept (drm.lgpu.*) introduced in this RFC are
> uncontroversial and ready to move out of RFC and into a more formal review.  I
> will continue to work on the memory backend resources (drm.memory.*).
> 
> The cover letter from v1 is copied below for reference.
> 
> [v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
> [v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html

So looking at all this doesn't seem to have changed much, and the old
discussion didn't really conclude anywhere (aside from some details).

One more open though that crossed my mind, having read a ton of ttm again
recently: How does this all interact with ttm global limits? I'd say the
ttm global limits is the ur-cgroups we have in drm, and not looking at
that seems kinda bad.
-Daniel

> 
> v4:
> Unchanged (no review needed)
> * drm.memory.*/ttm resources (Patch 9-13, I am still working on memory bandwidth
> and shrinker)
> Base on feedbacks on v3:
> * update nominclature to drmcg
> * embed per device drmcg properties into drm_device
> * split GEM buffer related commits into stats and limit
> * rename function name to align with convention
> * combined buffer accounting and check into a try_charge function
> * support buffer stats without limit enforcement
> * removed GEM buffer sharing limitation
> * updated documentations
> New features:
> * introducing logical GPU concept
> * example implementation with AMD KFD
> 
> v3:
> Base on feedbacks on v2:
> * removed .help type file from v2
> * conform to cgroup convention for default and max handling
> * conform to cgroup convention for addressing device specific limits (with major:minor)
> New function:
> * adopted memparse for memory size related attributes
> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
> * added ttm buffer usage limit (per cgroup, for vram.)
> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
> 
> v2:
> * Removed the vendoring concepts
> * Add limit to total buffer allocation
> * Add limit to the maximum size of a buffer allocation
> 
> v1: cover letter
> 
> The purpose of this patch series is to start a discussion for a generic cgroup
> controller for the drm subsystem.  The design proposed here is a very early one.
> We are hoping to engage the community as we develop the idea.
> 
> 
> Backgrounds
> ==========
> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
> tasks, and all their future children, into hierarchical groups with specialized
> behaviour, such as accounting/limiting the resources which processes in a cgroup
> can access[1].  Weights, limits, protections, allocations are the main resource
> distribution models.  Existing cgroup controllers includes cpu, memory, io,
> rdma, and more.  cgroup is one of the foundational technologies that enables the
> popular container application deployment and management method.
> 
> Direct Rendering Manager/drm contains code intended to support the needs of
> complex graphics devices. Graphics drivers in the kernel may make use of DRM
> functions to make tasks like memory management, interrupt handling and DMA
> easier, and provide a uniform interface to applications.  The DRM has also
> developed beyond traditional graphics applications to support compute/GPGPU
> applications.
> 
> 
> Motivations
> =========
> As GPU grow beyond the realm of desktop/workstation graphics into areas like
> data center clusters and IoT, there are increasing needs to monitor and regulate
> GPU as a resource like cpu, memory and io.
> 
> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
> purpose of managing GPU priority using the cgroup hierarchy.  While that
> particular use case may not warrant a standalone drm cgroup controller, there
> are other use cases where having one can be useful [3].  Monitoring GPU
> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
> sysadmins get a better understanding of the applications usage profile.  Further
> usage regulations of the aforementioned resources can also help sysadmins
> optimize workload deployment on limited GPU resources.
> 
> With the increased importance of machine learning, data science and other
> cloud-based applications, GPUs are already in production use in data centers
> today [5,6,7].  Existing GPU resource management is very course grain, however,
> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
> alternative is to use GPU virtualization (with or without SRIOV) but it
> generally acts on the entire GPU instead of the specific resources in a GPU.
> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
> resource management (in addition to what may be available via GPU
> virtualization.)
> 
> In addition to production use, the DRM cgroup can also help with testing
> graphics application robustness by providing a mean to artificially limit DRM
> resources availble to the applications.
> 
> 
> Challenges
> ========
> While there are common infrastructure in DRM that is shared across many vendors
> (the scheduler [4] for example), there are also aspects of DRM that are vendor
> specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
> handle different kinds of cgroup controller.
> 
> Resources for DRM are also often device (GPU) specific instead of system
> specific and a system may contain more than one GPU.  For this, we borrowed some
> of the ideas from RDMA cgroup controller.
> 
> Approach
> =======
> To experiment with the idea of a DRM cgroup, we would like to start with basic
> accounting and statistics, then continue to iterate and add regulating
> mechanisms into the driver.
> 
> [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
> [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
> [3] https://www.spinics.net/lists/cgroups/msg20720.html
> [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
> [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
> [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
> [7] https://github.com/RadeonOpenCompute/k8s-device-plugin
> [8] https://github.com/kubernetes/kubernetes/issues/52757
> 
> Kenny Ho (16):
>   drm: Add drm_minor_for_each
>   cgroup: Introduce cgroup for drm subsystem
>   drm, cgroup: Initialize drmcg properties
>   drm, cgroup: Add total GEM buffer allocation stats
>   drm, cgroup: Add peak GEM buffer allocation stats
>   drm, cgroup: Add GEM buffer allocation count stats
>   drm, cgroup: Add total GEM buffer allocation limit
>   drm, cgroup: Add peak GEM buffer allocation limit
>   drm, cgroup: Add TTM buffer allocation stats
>   drm, cgroup: Add TTM buffer peak usage stats
>   drm, cgroup: Add per cgroup bw measure and control
>   drm, cgroup: Add soft VRAM limit
>   drm, cgroup: Allow more aggressive memory reclaim
>   drm, cgroup: Introduce lgpu as DRM cgroup resource
>   drm, cgroup: add update trigger after limit change
>   drm/amdgpu: Integrate with DRM cgroup
> 
>  Documentation/admin-guide/cgroup-v2.rst       |  163 +-
>  Documentation/cgroup-v1/drm.rst               |    1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |    4 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   29 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |    6 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |    3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |    6 +
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |    3 +
>  .../amd/amdkfd/kfd_process_queue_manager.c    |  140 ++
>  drivers/gpu/drm/drm_drv.c                     |   26 +
>  drivers/gpu/drm/drm_gem.c                     |   16 +-
>  drivers/gpu/drm/drm_internal.h                |    4 -
>  drivers/gpu/drm/ttm/ttm_bo.c                  |   93 ++
>  drivers/gpu/drm/ttm/ttm_bo_util.c             |    4 +
>  include/drm/drm_cgroup.h                      |  122 ++
>  include/drm/drm_device.h                      |    7 +
>  include/drm/drm_drv.h                         |   23 +
>  include/drm/drm_gem.h                         |   13 +-
>  include/drm/ttm/ttm_bo_api.h                  |    2 +
>  include/drm/ttm/ttm_bo_driver.h               |   10 +
>  include/linux/cgroup_drm.h                    |  151 ++
>  include/linux/cgroup_subsys.h                 |    4 +
>  init/Kconfig                                  |    5 +
>  kernel/cgroup/Makefile                        |    1 +
>  kernel/cgroup/drm.c                           | 1367 +++++++++++++++++
>  25 files changed, 2193 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/cgroup-v1/drm.rst
>  create mode 100644 include/drm/drm_cgroup.h
>  create mode 100644 include/linux/cgroup_drm.h
>  create mode 100644 kernel/cgroup/drm.c
> 
> -- 
> 2.22.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found]   ` <20190903080217.GL2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-09-03  8:24     ` Koenig, Christian
  2019-09-03  9:19       ` Daniel Vetter
  0 siblings, 1 reply; 89+ messages in thread
From: Koenig, Christian @ 2019-09-03  8:24 UTC (permalink / raw)
  To: Daniel Vetter, Ho, Kenny
  Cc: Kuehling, Felix, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Deucher,  Alexander, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Greathouse, Joseph,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA

Am 03.09.19 um 10:02 schrieb Daniel Vetter:
> On Thu, Aug 29, 2019 at 02:05:17AM -0400, Kenny Ho wrote:
>> This is a follow up to the RFC I made previously to introduce a cgroup
>> controller for the GPU/DRM subsystem [v1,v2,v3].  The goal is to be able to
>> provide resource management to GPU resources using things like container.
>>
>> With this RFC v4, I am hoping to have some consensus on a merge plan.  I believe
>> the GEM related resources (drm.buffer.*) introduced in previous RFC and,
>> hopefully, the logical GPU concept (drm.lgpu.*) introduced in this RFC are
>> uncontroversial and ready to move out of RFC and into a more formal review.  I
>> will continue to work on the memory backend resources (drm.memory.*).
>>
>> The cover letter from v1 is copied below for reference.
>>
>> [v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
>> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
>> [v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html
> So looking at all this doesn't seem to have changed much, and the old
> discussion didn't really conclude anywhere (aside from some details).
>
> One more open though that crossed my mind, having read a ton of ttm again
> recently: How does this all interact with ttm global limits? I'd say the
> ttm global limits is the ur-cgroups we have in drm, and not looking at
> that seems kinda bad.

At least my hope was to completely replace ttm globals with those 
limitations here when it is ready.

Christian.

> -Daniel
>
>> v4:
>> Unchanged (no review needed)
>> * drm.memory.*/ttm resources (Patch 9-13, I am still working on memory bandwidth
>> and shrinker)
>> Base on feedbacks on v3:
>> * update nominclature to drmcg
>> * embed per device drmcg properties into drm_device
>> * split GEM buffer related commits into stats and limit
>> * rename function name to align with convention
>> * combined buffer accounting and check into a try_charge function
>> * support buffer stats without limit enforcement
>> * removed GEM buffer sharing limitation
>> * updated documentations
>> New features:
>> * introducing logical GPU concept
>> * example implementation with AMD KFD
>>
>> v3:
>> Base on feedbacks on v2:
>> * removed .help type file from v2
>> * conform to cgroup convention for default and max handling
>> * conform to cgroup convention for addressing device specific limits (with major:minor)
>> New function:
>> * adopted memparse for memory size related attributes
>> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
>> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
>> * added ttm buffer usage limit (per cgroup, for vram.)
>> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
>>
>> v2:
>> * Removed the vendoring concepts
>> * Add limit to total buffer allocation
>> * Add limit to the maximum size of a buffer allocation
>>
>> v1: cover letter
>>
>> The purpose of this patch series is to start a discussion for a generic cgroup
>> controller for the drm subsystem.  The design proposed here is a very early one.
>> We are hoping to engage the community as we develop the idea.
>>
>>
>> Backgrounds
>> ==========
>> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
>> tasks, and all their future children, into hierarchical groups with specialized
>> behaviour, such as accounting/limiting the resources which processes in a cgroup
>> can access[1].  Weights, limits, protections, allocations are the main resource
>> distribution models.  Existing cgroup controllers includes cpu, memory, io,
>> rdma, and more.  cgroup is one of the foundational technologies that enables the
>> popular container application deployment and management method.
>>
>> Direct Rendering Manager/drm contains code intended to support the needs of
>> complex graphics devices. Graphics drivers in the kernel may make use of DRM
>> functions to make tasks like memory management, interrupt handling and DMA
>> easier, and provide a uniform interface to applications.  The DRM has also
>> developed beyond traditional graphics applications to support compute/GPGPU
>> applications.
>>
>>
>> Motivations
>> =========
>> As GPU grow beyond the realm of desktop/workstation graphics into areas like
>> data center clusters and IoT, there are increasing needs to monitor and regulate
>> GPU as a resource like cpu, memory and io.
>>
>> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
>> purpose of managing GPU priority using the cgroup hierarchy.  While that
>> particular use case may not warrant a standalone drm cgroup controller, there
>> are other use cases where having one can be useful [3].  Monitoring GPU
>> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
>> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
>> sysadmins get a better understanding of the applications usage profile.  Further
>> usage regulations of the aforementioned resources can also help sysadmins
>> optimize workload deployment on limited GPU resources.
>>
>> With the increased importance of machine learning, data science and other
>> cloud-based applications, GPUs are already in production use in data centers
>> today [5,6,7].  Existing GPU resource management is very course grain, however,
>> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
>> alternative is to use GPU virtualization (with or without SRIOV) but it
>> generally acts on the entire GPU instead of the specific resources in a GPU.
>> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
>> resource management (in addition to what may be available via GPU
>> virtualization.)
>>
>> In addition to production use, the DRM cgroup can also help with testing
>> graphics application robustness by providing a mean to artificially limit DRM
>> resources availble to the applications.
>>
>>
>> Challenges
>> ========
>> While there are common infrastructure in DRM that is shared across many vendors
>> (the scheduler [4] for example), there are also aspects of DRM that are vendor
>> specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
>> handle different kinds of cgroup controller.
>>
>> Resources for DRM are also often device (GPU) specific instead of system
>> specific and a system may contain more than one GPU.  For this, we borrowed some
>> of the ideas from RDMA cgroup controller.
>>
>> Approach
>> =======
>> To experiment with the idea of a DRM cgroup, we would like to start with basic
>> accounting and statistics, then continue to iterate and add regulating
>> mechanisms into the driver.
>>
>> [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
>> [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
>> [3] https://www.spinics.net/lists/cgroups/msg20720.html
>> [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
>> [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
>> [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
>> [7] https://github.com/RadeonOpenCompute/k8s-device-plugin
>> [8] https://github.com/kubernetes/kubernetes/issues/52757
>>
>> Kenny Ho (16):
>>    drm: Add drm_minor_for_each
>>    cgroup: Introduce cgroup for drm subsystem
>>    drm, cgroup: Initialize drmcg properties
>>    drm, cgroup: Add total GEM buffer allocation stats
>>    drm, cgroup: Add peak GEM buffer allocation stats
>>    drm, cgroup: Add GEM buffer allocation count stats
>>    drm, cgroup: Add total GEM buffer allocation limit
>>    drm, cgroup: Add peak GEM buffer allocation limit
>>    drm, cgroup: Add TTM buffer allocation stats
>>    drm, cgroup: Add TTM buffer peak usage stats
>>    drm, cgroup: Add per cgroup bw measure and control
>>    drm, cgroup: Add soft VRAM limit
>>    drm, cgroup: Allow more aggressive memory reclaim
>>    drm, cgroup: Introduce lgpu as DRM cgroup resource
>>    drm, cgroup: add update trigger after limit change
>>    drm/amdgpu: Integrate with DRM cgroup
>>
>>   Documentation/admin-guide/cgroup-v2.rst       |  163 +-
>>   Documentation/cgroup-v1/drm.rst               |    1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |    4 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   29 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |    6 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |    3 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |    6 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |    3 +
>>   .../amd/amdkfd/kfd_process_queue_manager.c    |  140 ++
>>   drivers/gpu/drm/drm_drv.c                     |   26 +
>>   drivers/gpu/drm/drm_gem.c                     |   16 +-
>>   drivers/gpu/drm/drm_internal.h                |    4 -
>>   drivers/gpu/drm/ttm/ttm_bo.c                  |   93 ++
>>   drivers/gpu/drm/ttm/ttm_bo_util.c             |    4 +
>>   include/drm/drm_cgroup.h                      |  122 ++
>>   include/drm/drm_device.h                      |    7 +
>>   include/drm/drm_drv.h                         |   23 +
>>   include/drm/drm_gem.h                         |   13 +-
>>   include/drm/ttm/ttm_bo_api.h                  |    2 +
>>   include/drm/ttm/ttm_bo_driver.h               |   10 +
>>   include/linux/cgroup_drm.h                    |  151 ++
>>   include/linux/cgroup_subsys.h                 |    4 +
>>   init/Kconfig                                  |    5 +
>>   kernel/cgroup/Makefile                        |    1 +
>>   kernel/cgroup/drm.c                           | 1367 +++++++++++++++++
>>   25 files changed, 2193 insertions(+), 10 deletions(-)
>>   create mode 100644 Documentation/cgroup-v1/drm.rst
>>   create mode 100644 include/drm/drm_cgroup.h
>>   create mode 100644 include/linux/cgroup_drm.h
>>   create mode 100644 kernel/cgroup/drm.c
>>
>> -- 
>> 2.22.0
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
  2019-09-03  8:24     ` Koenig, Christian
@ 2019-09-03  9:19       ` Daniel Vetter
  2019-09-03 19:30         ` Kenny Ho
  0 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-03  9:19 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Ho, Kenny, Kuehling, Felix, jsparks, amd-gfx, lkaplan, Deucher,
	Alexander, y2kenny, dri-devel, Greathouse, Joseph, tj, cgroups

On Tue, Sep 3, 2019 at 10:24 AM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Am 03.09.19 um 10:02 schrieb Daniel Vetter:
> > On Thu, Aug 29, 2019 at 02:05:17AM -0400, Kenny Ho wrote:
> >> This is a follow up to the RFC I made previously to introduce a cgroup
> >> controller for the GPU/DRM subsystem [v1,v2,v3].  The goal is to be able to
> >> provide resource management to GPU resources using things like container.
> >>
> >> With this RFC v4, I am hoping to have some consensus on a merge plan.  I believe
> >> the GEM related resources (drm.buffer.*) introduced in previous RFC and,
> >> hopefully, the logical GPU concept (drm.lgpu.*) introduced in this RFC are
> >> uncontroversial and ready to move out of RFC and into a more formal review.  I
> >> will continue to work on the memory backend resources (drm.memory.*).
> >>
> >> The cover letter from v1 is copied below for reference.
> >>
> >> [v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> >> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
> >> [v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html
> > So looking at all this doesn't seem to have changed much, and the old
> > discussion didn't really conclude anywhere (aside from some details).
> >
> > One more open though that crossed my mind, having read a ton of ttm again
> > recently: How does this all interact with ttm global limits? I'd say the
> > ttm global limits is the ur-cgroups we have in drm, and not looking at
> > that seems kinda bad.
>
> At least my hope was to completely replace ttm globals with those
> limitations here when it is ready.

You need more, at least some kind of shrinker to cut down bo placed in
system memory when we're under memory pressure. Which drags in a
pretty epic amount of locking lols (see i915's shrinker fun, where we
attempt that). Probably another good idea to share at least some
concepts, maybe even code.
-Daniel

>
> Christian.
>
> > -Daniel
> >
> >> v4:
> >> Unchanged (no review needed)
> >> * drm.memory.*/ttm resources (Patch 9-13, I am still working on memory bandwidth
> >> and shrinker)
> >> Base on feedbacks on v3:
> >> * update nominclature to drmcg
> >> * embed per device drmcg properties into drm_device
> >> * split GEM buffer related commits into stats and limit
> >> * rename function name to align with convention
> >> * combined buffer accounting and check into a try_charge function
> >> * support buffer stats without limit enforcement
> >> * removed GEM buffer sharing limitation
> >> * updated documentations
> >> New features:
> >> * introducing logical GPU concept
> >> * example implementation with AMD KFD
> >>
> >> v3:
> >> Base on feedbacks on v2:
> >> * removed .help type file from v2
> >> * conform to cgroup convention for default and max handling
> >> * conform to cgroup convention for addressing device specific limits (with major:minor)
> >> New function:
> >> * adopted memparse for memory size related attributes
> >> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
> >> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
> >> * added ttm buffer usage limit (per cgroup, for vram.)
> >> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
> >>
> >> v2:
> >> * Removed the vendoring concepts
> >> * Add limit to total buffer allocation
> >> * Add limit to the maximum size of a buffer allocation
> >>
> >> v1: cover letter
> >>
> >> The purpose of this patch series is to start a discussion for a generic cgroup
> >> controller for the drm subsystem.  The design proposed here is a very early one.
> >> We are hoping to engage the community as we develop the idea.
> >>
> >>
> >> Backgrounds
> >> ==========
> >> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
> >> tasks, and all their future children, into hierarchical groups with specialized
> >> behaviour, such as accounting/limiting the resources which processes in a cgroup
> >> can access[1].  Weights, limits, protections, allocations are the main resource
> >> distribution models.  Existing cgroup controllers includes cpu, memory, io,
> >> rdma, and more.  cgroup is one of the foundational technologies that enables the
> >> popular container application deployment and management method.
> >>
> >> Direct Rendering Manager/drm contains code intended to support the needs of
> >> complex graphics devices. Graphics drivers in the kernel may make use of DRM
> >> functions to make tasks like memory management, interrupt handling and DMA
> >> easier, and provide a uniform interface to applications.  The DRM has also
> >> developed beyond traditional graphics applications to support compute/GPGPU
> >> applications.
> >>
> >>
> >> Motivations
> >> =========
> >> As GPU grow beyond the realm of desktop/workstation graphics into areas like
> >> data center clusters and IoT, there are increasing needs to monitor and regulate
> >> GPU as a resource like cpu, memory and io.
> >>
> >> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
> >> purpose of managing GPU priority using the cgroup hierarchy.  While that
> >> particular use case may not warrant a standalone drm cgroup controller, there
> >> are other use cases where having one can be useful [3].  Monitoring GPU
> >> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
> >> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
> >> sysadmins get a better understanding of the applications usage profile.  Further
> >> usage regulations of the aforementioned resources can also help sysadmins
> >> optimize workload deployment on limited GPU resources.
> >>
> >> With the increased importance of machine learning, data science and other
> >> cloud-based applications, GPUs are already in production use in data centers
> >> today [5,6,7].  Existing GPU resource management is very course grain, however,
> >> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
> >> alternative is to use GPU virtualization (with or without SRIOV) but it
> >> generally acts on the entire GPU instead of the specific resources in a GPU.
> >> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
> >> resource management (in addition to what may be available via GPU
> >> virtualization.)
> >>
> >> In addition to production use, the DRM cgroup can also help with testing
> >> graphics application robustness by providing a mean to artificially limit DRM
> >> resources availble to the applications.
> >>
> >>
> >> Challenges
> >> ========
> >> While there are common infrastructure in DRM that is shared across many vendors
> >> (the scheduler [4] for example), there are also aspects of DRM that are vendor
> >> specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
> >> handle different kinds of cgroup controller.
> >>
> >> Resources for DRM are also often device (GPU) specific instead of system
> >> specific and a system may contain more than one GPU.  For this, we borrowed some
> >> of the ideas from RDMA cgroup controller.
> >>
> >> Approach
> >> =======
> >> To experiment with the idea of a DRM cgroup, we would like to start with basic
> >> accounting and statistics, then continue to iterate and add regulating
> >> mechanisms into the driver.
> >>
> >> [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
> >> [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
> >> [3] https://www.spinics.net/lists/cgroups/msg20720.html
> >> [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
> >> [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
> >> [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
> >> [7] https://github.com/RadeonOpenCompute/k8s-device-plugin
> >> [8] https://github.com/kubernetes/kubernetes/issues/52757
> >>
> >> Kenny Ho (16):
> >>    drm: Add drm_minor_for_each
> >>    cgroup: Introduce cgroup for drm subsystem
> >>    drm, cgroup: Initialize drmcg properties
> >>    drm, cgroup: Add total GEM buffer allocation stats
> >>    drm, cgroup: Add peak GEM buffer allocation stats
> >>    drm, cgroup: Add GEM buffer allocation count stats
> >>    drm, cgroup: Add total GEM buffer allocation limit
> >>    drm, cgroup: Add peak GEM buffer allocation limit
> >>    drm, cgroup: Add TTM buffer allocation stats
> >>    drm, cgroup: Add TTM buffer peak usage stats
> >>    drm, cgroup: Add per cgroup bw measure and control
> >>    drm, cgroup: Add soft VRAM limit
> >>    drm, cgroup: Allow more aggressive memory reclaim
> >>    drm, cgroup: Introduce lgpu as DRM cgroup resource
> >>    drm, cgroup: add update trigger after limit change
> >>    drm/amdgpu: Integrate with DRM cgroup
> >>
> >>   Documentation/admin-guide/cgroup-v2.rst       |  163 +-
> >>   Documentation/cgroup-v1/drm.rst               |    1 +
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |    4 +
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   29 +
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |    6 +-
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |    3 +-
> >>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |    6 +
> >>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |    3 +
> >>   .../amd/amdkfd/kfd_process_queue_manager.c    |  140 ++
> >>   drivers/gpu/drm/drm_drv.c                     |   26 +
> >>   drivers/gpu/drm/drm_gem.c                     |   16 +-
> >>   drivers/gpu/drm/drm_internal.h                |    4 -
> >>   drivers/gpu/drm/ttm/ttm_bo.c                  |   93 ++
> >>   drivers/gpu/drm/ttm/ttm_bo_util.c             |    4 +
> >>   include/drm/drm_cgroup.h                      |  122 ++
> >>   include/drm/drm_device.h                      |    7 +
> >>   include/drm/drm_drv.h                         |   23 +
> >>   include/drm/drm_gem.h                         |   13 +-
> >>   include/drm/ttm/ttm_bo_api.h                  |    2 +
> >>   include/drm/ttm/ttm_bo_driver.h               |   10 +
> >>   include/linux/cgroup_drm.h                    |  151 ++
> >>   include/linux/cgroup_subsys.h                 |    4 +
> >>   init/Kconfig                                  |    5 +
> >>   kernel/cgroup/Makefile                        |    1 +
> >>   kernel/cgroup/drm.c                           | 1367 +++++++++++++++++
> >>   25 files changed, 2193 insertions(+), 10 deletions(-)
> >>   create mode 100644 Documentation/cgroup-v1/drm.rst
> >>   create mode 100644 include/drm/drm_cgroup.h
> >>   create mode 100644 include/linux/cgroup_drm.h
> >>   create mode 100644 kernel/cgroup/drm.c
> >>
> >> --
> >> 2.22.0
> >>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found]         ` <20190903075550.GJ2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-09-03 18:50           ` Tejun Heo
  2019-09-03 19:23             ` Kenny Ho
  2019-09-03 19:48             ` Daniel Vetter
  0 siblings, 2 replies; 89+ messages in thread
From: Tejun Heo @ 2019-09-03 18:50 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, felix.kuehling-5C7GfCeVMHo, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, alexander.deucher-5C7GfCeVMHo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

Hello, Daniel.

On Tue, Sep 03, 2019 at 09:55:50AM +0200, Daniel Vetter wrote:
> > * While breaking up and applying control to different types of
> >   internal objects may seem attractive to folks who work day in and
> >   day out with the subsystem, they aren't all that useful to users and
> >   the siloed controls are likely to make the whole mechanism a lot
> >   less useful.  We had the same problem with cgroup1 memcg - putting
> >   control of different uses of memory under separate knobs.  It made
> >   the whole thing pretty useless.  e.g. if you constrain all knobs
> >   tight enough to control the overall usage, overall utilization
> >   suffers, but if you don't, you really don't have control over actual
> >   usage.  For memcg, what has to be allocated and controlled is
> >   physical memory, no matter how they're used.  It's not like you can
> >   go buy more "socket" memory.  At least from the looks of it, I'm
> >   afraid gpu controller is repeating the same mistakes.
> 
> We do have quite a pile of different memories and ranges, so I don't
> thinkt we're doing the same mistake here. But it is maybe a bit too

I see.  One thing which caught my eyes was the system memory control.
Shouldn't that be controlled by memcg?  Is there something special
about system memory used by gpus?

> complicated, and exposes stuff that most users really don't care about.

Could be from me not knowing much about gpus but definitely looks too
complex to me.  I don't see how users would be able to alloate, vram,
system memory and GART with reasonable accuracy.  memcg on cgroup2
deals with just single number and that's already plenty challenging.

Thanks.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
  2019-09-03 18:50           ` Tejun Heo
@ 2019-09-03 19:23             ` Kenny Ho
  2019-09-03 19:48             ` Daniel Vetter
  1 sibling, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-09-03 19:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix, jsparks,
	amd-gfx list, lkaplan, dri-devel, Alex Deucher, cgroups,
	Christian König

Hi Tejun,

Thanks for looking into this.  I can definitely help where I can and I
am sure other experts will jump in if I start misrepresenting the
reality :) (as Daniel already have done.)

Regarding your points, my understanding is that there isn't really a
TTM vs GEM situation anymore (there is an lwn.net article about that,
but it is more than a decade old.)  I believe GEM is the common
interface at this point and more and more features are being
refactored into it.  For example, AMD's driver uses TTM internally but
things are exposed via the GEM interface.

This GEM resource is actually the single number resource you just
referred to.  A GEM buffer (the drm.buffer.* resources) can be backed
by VRAM, or system memory or other type of memory.  The more fine
grain control is the drm.memory.* resources which still need more
discussion.  (As some of the functionalities in TTM are being
refactored into the GEM level.  I have seen some patches that make TTM
a subclass of GEM.)

This RFC can be grouped into 3 areas and they are fairly independent
so they can be reviewed separately: high level device memory control
(buffer.*), fine grain memory control and bandwidth (memory.*) and
compute resources (lgpu.*)  I think the memory.* resources are the
most controversial part but I think it's still needed.

Perhaps an analogy may help.  For a system, we have CPUs and memory.
And within memory, it can be backed by RAM or swap.  For GPU, each
device can have LGPUs and buffers.  And within the buffers, it can be
backed by VRAM, or system RAM or even swap.

As for setting the right amount, I think that's where the profiling
aspect of the *.stats comes in.  And while one can't necessary buy
more VRAM, it is still a useful knob to adjust if the intention is to
pack more work into a GPU device with predictable performance.  This
research on various GPU workload may be of interest:

A Taxonomy of GPGPU Performance Scaling
http://www.computermachines.org/joe/posters/iiswc2015_taxonomy.pdf
http://www.computermachines.org/joe/publications/pdfs/iiswc2015_taxonomy.pdf

(summary: GPU workload can be memory bound or compute bound.  So it's
possible to pack different workload together to improve utilization.)

Regards,
Kenny

On Tue, Sep 3, 2019 at 2:50 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello, Daniel.
>
> On Tue, Sep 03, 2019 at 09:55:50AM +0200, Daniel Vetter wrote:
> > > * While breaking up and applying control to different types of
> > >   internal objects may seem attractive to folks who work day in and
> > >   day out with the subsystem, they aren't all that useful to users and
> > >   the siloed controls are likely to make the whole mechanism a lot
> > >   less useful.  We had the same problem with cgroup1 memcg - putting
> > >   control of different uses of memory under separate knobs.  It made
> > >   the whole thing pretty useless.  e.g. if you constrain all knobs
> > >   tight enough to control the overall usage, overall utilization
> > >   suffers, but if you don't, you really don't have control over actual
> > >   usage.  For memcg, what has to be allocated and controlled is
> > >   physical memory, no matter how they're used.  It's not like you can
> > >   go buy more "socket" memory.  At least from the looks of it, I'm
> > >   afraid gpu controller is repeating the same mistakes.
> >
> > We do have quite a pile of different memories and ranges, so I don't
> > thinkt we're doing the same mistake here. But it is maybe a bit too
>
> I see.  One thing which caught my eyes was the system memory control.
> Shouldn't that be controlled by memcg?  Is there something special
> about system memory used by gpus?
>
> > complicated, and exposes stuff that most users really don't care about.
>
> Could be from me not knowing much about gpus but definitely looks too
> complex to me.  I don't see how users would be able to alloate, vram,
> system memory and GART with reasonable accuracy.  memcg on cgroup2
> deals with just single number and that's already plenty challenging.
>
> Thanks.
>
> --
> tejun
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
  2019-09-03  9:19       ` Daniel Vetter
@ 2019-09-03 19:30         ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-09-03 19:30 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ho, Kenny, Kuehling, Felix, jsparks, amd-gfx, lkaplan, Deucher,
	Alexander, dri-devel, Greathouse, Joseph, tj, cgroups, Koenig,
	Christian

On Tue, Sep 3, 2019 at 5:20 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Sep 3, 2019 at 10:24 AM Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
> >
> > Am 03.09.19 um 10:02 schrieb Daniel Vetter:
> > > On Thu, Aug 29, 2019 at 02:05:17AM -0400, Kenny Ho wrote:
> > >> With this RFC v4, I am hoping to have some consensus on a merge plan.  I believe
> > >> the GEM related resources (drm.buffer.*) introduced in previous RFC and,
> > >> hopefully, the logical GPU concept (drm.lgpu.*) introduced in this RFC are
> > >> uncontroversial and ready to move out of RFC and into a more formal review.  I
> > >> will continue to work on the memory backend resources (drm.memory.*).
> > >>
> > >> The cover letter from v1 is copied below for reference.
> > >>
> > >> [v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> > >> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
> > >> [v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html
> > > So looking at all this doesn't seem to have changed much, and the old
> > > discussion didn't really conclude anywhere (aside from some details).
> > >
> > > One more open though that crossed my mind, having read a ton of ttm again
> > > recently: How does this all interact with ttm global limits? I'd say the
> > > ttm global limits is the ur-cgroups we have in drm, and not looking at
> > > that seems kinda bad.
> >
> > At least my hope was to completely replace ttm globals with those
> > limitations here when it is ready.
>
> You need more, at least some kind of shrinker to cut down bo placed in
> system memory when we're under memory pressure. Which drags in a
> pretty epic amount of locking lols (see i915's shrinker fun, where we
> attempt that). Probably another good idea to share at least some
> concepts, maybe even code.

I am still looking into your shrinker suggestion so the memory.*
resources are untouch from RFC v3.  The main change for the buffer.*
resources is the removal of buffer sharing restriction as you
suggested and additional documentation of that behaviour.  (I may have
neglected mentioning it in the cover.)  The other key part of RFC v4
is the "logical GPU/lgpu" concept.  I am hoping to get it out there
early for feedback while I continue to work on the memory.* parts.

Kenny

> -Daniel
>
> >
> > Christian.
> >
> > > -Daniel
> > >
> > >> v4:
> > >> Unchanged (no review needed)
> > >> * drm.memory.*/ttm resources (Patch 9-13, I am still working on memory bandwidth
> > >> and shrinker)
> > >> Base on feedbacks on v3:
> > >> * update nominclature to drmcg
> > >> * embed per device drmcg properties into drm_device
> > >> * split GEM buffer related commits into stats and limit
> > >> * rename function name to align with convention
> > >> * combined buffer accounting and check into a try_charge function
> > >> * support buffer stats without limit enforcement
> > >> * removed GEM buffer sharing limitation
> > >> * updated documentations
> > >> New features:
> > >> * introducing logical GPU concept
> > >> * example implementation with AMD KFD
> > >>
> > >> v3:
> > >> Base on feedbacks on v2:
> > >> * removed .help type file from v2
> > >> * conform to cgroup convention for default and max handling
> > >> * conform to cgroup convention for addressing device specific limits (with major:minor)
> > >> New function:
> > >> * adopted memparse for memory size related attributes
> > >> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
> > >> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
> > >> * added ttm buffer usage limit (per cgroup, for vram.)
> > >> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
> > >>
> > >> v2:
> > >> * Removed the vendoring concepts
> > >> * Add limit to total buffer allocation
> > >> * Add limit to the maximum size of a buffer allocation
> > >>
> > >> v1: cover letter
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
  2019-09-03  7:57     ` Daniel Vetter
@ 2019-09-03 19:45       ` Kenny Ho
       [not found]         ` <CAOWid-dxxDhyxP2+0R0oKAk29rR-1TbMyhshR1+gbcpGJCAW6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-09-03 19:45 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Alex Deucher, dri-devel, Greathouse, Joseph, Tejun Heo, cgroups,
	Christian König

On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Aug 29, 2019 at 02:05:18AM -0400, Kenny Ho wrote:
> > To allow other subsystems to iterate through all stored DRM minors and
> > act upon them.
> >
> > Also exposes drm_minor_acquire and drm_minor_release for other subsystem
> > to handle drm_minor.  DRM cgroup controller is the initial consumer of
> > this new features.
> >
> > Change-Id: I7c4b67ce6b31f06d1037b03435386ff5b8144ca5
> > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
>
> Iterating over minors for cgroups sounds very, very wrong. Why do we care
> whether a buffer was allocated through kms dumb vs render nodes?
>
> I'd expect all the cgroup stuff to only work on drm_device, if it does
> care about devices.
>
> (I didn't look through the patch series to find out where exactly you're
> using this, so maybe I'm off the rails here).

I am exposing this to remove the need to keep track of a separate list
of available drm_device in the system (to remove the registering and
unregistering of drm_device to the cgroup subsystem and just use
drm_minor as the single source of truth.)  I am only filtering out the
render nodes minor because they point to the same drm_device and is
confusing.

Perhaps I missed an obvious way to list the drm devices without
iterating through the drm_minors?  (I probably jumped to the minors
because $major:$minor is the convention to address devices in cgroup.)

Kenny

> -Daniel
>
> > ---
> >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> >  drivers/gpu/drm/drm_internal.h |  4 ----
> >  include/drm/drm_drv.h          |  4 ++++
> >  3 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > index 862621494a93..000cddabd970 100644
> > --- a/drivers/gpu/drm/drm_drv.c
> > +++ b/drivers/gpu/drm/drm_drv.c
> > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
> >
> >       return minor;
> >  }
> > +EXPORT_SYMBOL(drm_minor_acquire);
> >
> >  void drm_minor_release(struct drm_minor *minor)
> >  {
> >       drm_dev_put(minor->dev);
> >  }
> > +EXPORT_SYMBOL(drm_minor_release);
> >
> >  /**
> >   * DOC: driver instance overview
> > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
> >  }
> >  EXPORT_SYMBOL(drm_dev_set_unique);
> >
> > +/**
> > + * drm_minor_for_each - Iterate through all stored DRM minors
> > + * @fn: Function to be called for each pointer.
> > + * @data: Data passed to callback function.
> > + *
> > + * The callback function will be called for each @drm_minor entry, passing
> > + * the minor, the entry and @data.
> > + *
> > + * If @fn returns anything other than %0, the iteration stops and that
> > + * value is returned from this function.
> > + */
> > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > +{
> > +     return idr_for_each(&drm_minors_idr, fn, data);
> > +}
> > +EXPORT_SYMBOL(drm_minor_for_each);
> > +
> >  /*
> >   * DRM Core
> >   * The DRM core module initializes all global DRM objects and makes them
> > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > index e19ac7ca602d..6bfad76f8e78 100644
> > --- a/drivers/gpu/drm/drm_internal.h
> > +++ b/drivers/gpu/drm/drm_internal.h
> > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
> >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
> >                                       struct dma_buf *dma_buf);
> >
> > -/* drm_drv.c */
> > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > -void drm_minor_release(struct drm_minor *minor);
> > -
> >  /* drm_vblank.c */
> >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
> >  void drm_vblank_cleanup(struct drm_device *dev);
> > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > index 68ca736c548d..24f8d054c570 100644
> > --- a/include/drm/drm_drv.h
> > +++ b/include/drm/drm_drv.h
> > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
> >
> >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> >
> > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > +
> > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > +void drm_minor_release(struct drm_minor *minor);
> >
> >  #endif
> > --
> > 2.22.0
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
  2019-09-03 18:50           ` Tejun Heo
  2019-09-03 19:23             ` Kenny Ho
@ 2019-09-03 19:48             ` Daniel Vetter
       [not found]               ` <CAKMK7uE5Bj-3cJH895iqnLpwUV+GBDM1Y=n4Z4A3xervMdJKXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-03 19:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Kenny Ho, dri-devel, joseph.greathouse, Alex Deucher, cgroups,
	Christian König

On Tue, Sep 3, 2019 at 8:50 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello, Daniel.
>
> On Tue, Sep 03, 2019 at 09:55:50AM +0200, Daniel Vetter wrote:
> > > * While breaking up and applying control to different types of
> > >   internal objects may seem attractive to folks who work day in and
> > >   day out with the subsystem, they aren't all that useful to users and
> > >   the siloed controls are likely to make the whole mechanism a lot
> > >   less useful.  We had the same problem with cgroup1 memcg - putting
> > >   control of different uses of memory under separate knobs.  It made
> > >   the whole thing pretty useless.  e.g. if you constrain all knobs
> > >   tight enough to control the overall usage, overall utilization
> > >   suffers, but if you don't, you really don't have control over actual
> > >   usage.  For memcg, what has to be allocated and controlled is
> > >   physical memory, no matter how they're used.  It's not like you can
> > >   go buy more "socket" memory.  At least from the looks of it, I'm
> > >   afraid gpu controller is repeating the same mistakes.
> >
> > We do have quite a pile of different memories and ranges, so I don't
> > thinkt we're doing the same mistake here. But it is maybe a bit too
>
> I see.  One thing which caught my eyes was the system memory control.
> Shouldn't that be controlled by memcg?  Is there something special
> about system memory used by gpus?

I think system memory separate from vram makes sense. For one, vram is
like 10x+ faster than system memory, so we definitely want to have
good control on that. But maybe we only want one vram bucket overall
for the entire system?

The trouble with system memory is that gpu tasks pin that memory to
prep execution. There's two solutions:
- i915 has a shrinker. Lots (and I really mean lots) of pain with
direct reclaim recursion, which often means we can't free memory, and
we're angering the oom killer a lot. Plus it introduces real bad
latency spikes everywhere (gpu workloads are occasionally really slow,
think "worse than pageout to spinning rust" to get memory freed).
- ttm just has a global limit, set to 50% of system memory.

I do think a global system memory limit to tame the shrinker, without
the ttm approach of possible just wasting half your memory, could be
useful.

> > complicated, and exposes stuff that most users really don't care about.
>
> Could be from me not knowing much about gpus but definitely looks too
> complex to me.  I don't see how users would be able to alloate, vram,
> system memory and GART with reasonable accuracy.  memcg on cgroup2
> deals with just single number and that's already plenty challenging.

Yeah, especially wrt GART and some of the other more specialized
things I don't think there's any modern gpu were you can actually run
out of that stuff. At least not before you run out of every other kind
of memory (GART is just a remapping table to make system memory
visible to the gpu).

I'm also not sure of the bw limits, given all the fun we have on the
block io cgroups side. Aside from that the current bw limit only
controls the bw the kernel uses, userspace can submit unlimited
amounts of copying commands that use the same pcie links directly to
the gpu, bypassing this cg knob. Also, controlling execution time for
gpus is very tricky, since they work a lot more like a block io device
or maybe a network controller with packet scheduling, than a cpu.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]         ` <CAOWid-dxxDhyxP2+0R0oKAk29rR-1TbMyhshR1+gbcpGJCAW6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-03 20:12           ` Daniel Vetter
       [not found]             ` <CAKMK7uEofjdVURu+meonh_YdV5eX8vfNALkW3A_+kLapCV8j+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-03 20:12 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel, Greathouse, Joseph,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho <y2kenny@gmail.com> wrote:
>
> On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Aug 29, 2019 at 02:05:18AM -0400, Kenny Ho wrote:
> > > To allow other subsystems to iterate through all stored DRM minors and
> > > act upon them.
> > >
> > > Also exposes drm_minor_acquire and drm_minor_release for other subsystem
> > > to handle drm_minor.  DRM cgroup controller is the initial consumer of
> > > this new features.
> > >
> > > Change-Id: I7c4b67ce6b31f06d1037b03435386ff5b8144ca5
> > > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> >
> > Iterating over minors for cgroups sounds very, very wrong. Why do we care
> > whether a buffer was allocated through kms dumb vs render nodes?
> >
> > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > care about devices.
> >
> > (I didn't look through the patch series to find out where exactly you're
> > using this, so maybe I'm off the rails here).
>
> I am exposing this to remove the need to keep track of a separate list
> of available drm_device in the system (to remove the registering and
> unregistering of drm_device to the cgroup subsystem and just use
> drm_minor as the single source of truth.)  I am only filtering out the
> render nodes minor because they point to the same drm_device and is
> confusing.
>
> Perhaps I missed an obvious way to list the drm devices without
> iterating through the drm_minors?  (I probably jumped to the minors
> because $major:$minor is the convention to address devices in cgroup.)

Create your own if there's nothing, because you need to anyway:
- You need special locking anyway, we can't just block on the idr lock
for everything.
- This needs to refcount drm_device, no the minors.

Iterating over stuff still feels kinda wrong still, because normally
the way we register/unregister userspace api (and cgroups isn't
anything else from a drm driver pov) is by adding more calls to
drm_dev_register/unregister. If you put a drm_cg_register/unregister
call in there we have a clean separation, and you can track all the
currently active devices however you want. Iterating over objects that
can be hotunplugged any time tends to get really complicated really
quickly.
-Daniel


>
> Kenny
>
> > -Daniel
> >
> > > ---
> > >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> > >  drivers/gpu/drm/drm_internal.h |  4 ----
> > >  include/drm/drm_drv.h          |  4 ++++
> > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > index 862621494a93..000cddabd970 100644
> > > --- a/drivers/gpu/drm/drm_drv.c
> > > +++ b/drivers/gpu/drm/drm_drv.c
> > > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
> > >
> > >       return minor;
> > >  }
> > > +EXPORT_SYMBOL(drm_minor_acquire);
> > >
> > >  void drm_minor_release(struct drm_minor *minor)
> > >  {
> > >       drm_dev_put(minor->dev);
> > >  }
> > > +EXPORT_SYMBOL(drm_minor_release);
> > >
> > >  /**
> > >   * DOC: driver instance overview
> > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
> > >  }
> > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > >
> > > +/**
> > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > + * @fn: Function to be called for each pointer.
> > > + * @data: Data passed to callback function.
> > > + *
> > > + * The callback function will be called for each @drm_minor entry, passing
> > > + * the minor, the entry and @data.
> > > + *
> > > + * If @fn returns anything other than %0, the iteration stops and that
> > > + * value is returned from this function.
> > > + */
> > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > > +{
> > > +     return idr_for_each(&drm_minors_idr, fn, data);
> > > +}
> > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > +
> > >  /*
> > >   * DRM Core
> > >   * The DRM core module initializes all global DRM objects and makes them
> > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > > index e19ac7ca602d..6bfad76f8e78 100644
> > > --- a/drivers/gpu/drm/drm_internal.h
> > > +++ b/drivers/gpu/drm/drm_internal.h
> > > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
> > >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
> > >                                       struct dma_buf *dma_buf);
> > >
> > > -/* drm_drv.c */
> > > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > -void drm_minor_release(struct drm_minor *minor);
> > > -
> > >  /* drm_vblank.c */
> > >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
> > >  void drm_vblank_cleanup(struct drm_device *dev);
> > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > index 68ca736c548d..24f8d054c570 100644
> > > --- a/include/drm/drm_drv.h
> > > +++ b/include/drm/drm_drv.h
> > > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
> > >
> > >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> > >
> > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > > +
> > > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > +void drm_minor_release(struct drm_minor *minor);
> > >
> > >  #endif
> > > --
> > > 2.22.0
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]             ` <CAKMK7uEofjdVURu+meonh_YdV5eX8vfNALkW3A_+kLapCV8j+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-03 20:43               ` Kenny Ho
       [not found]                 ` <CAOWid-eUVztW4hNVpznnJRcwHcjCirGL2aS75p4OY8XoGuJqUg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-09-03 20:43 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel, Greathouse, Joseph,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho <y2kenny@gmail.com> wrote:
> > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > Iterating over minors for cgroups sounds very, very wrong. Why do we care
> > > whether a buffer was allocated through kms dumb vs render nodes?
> > >
> > > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > > care about devices.
> > >
> > > (I didn't look through the patch series to find out where exactly you're
> > > using this, so maybe I'm off the rails here).
> >
> > I am exposing this to remove the need to keep track of a separate list
> > of available drm_device in the system (to remove the registering and
> > unregistering of drm_device to the cgroup subsystem and just use
> > drm_minor as the single source of truth.)  I am only filtering out the
> > render nodes minor because they point to the same drm_device and is
> > confusing.
> >
> > Perhaps I missed an obvious way to list the drm devices without
> > iterating through the drm_minors?  (I probably jumped to the minors
> > because $major:$minor is the convention to address devices in cgroup.)
>
> Create your own if there's nothing, because you need to anyway:
> - You need special locking anyway, we can't just block on the idr lock
> for everything.
> - This needs to refcount drm_device, no the minors.
>
> Iterating over stuff still feels kinda wrong still, because normally
> the way we register/unregister userspace api (and cgroups isn't
> anything else from a drm driver pov) is by adding more calls to
> drm_dev_register/unregister. If you put a drm_cg_register/unregister
> call in there we have a clean separation, and you can track all the
> currently active devices however you want. Iterating over objects that
> can be hotunplugged any time tends to get really complicated really
> quickly.

Um... I thought this is what I had previously.  Did I misunderstood
your feedback from v3?  Doesn't drm_minor already include all these
facilities so isn't creating my own kind of reinventing the wheel?
(as I did previously?)  drm_minor_register is called inside
drm_dev_register so isn't leveraging existing drm_minor facilities
much better solution?

Kenny

>
>
> >
> > Kenny
> >
> > > -Daniel
> > >
> > > > ---
> > > >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> > > >  drivers/gpu/drm/drm_internal.h |  4 ----
> > > >  include/drm/drm_drv.h          |  4 ++++
> > > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > index 862621494a93..000cddabd970 100644
> > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
> > > >
> > > >       return minor;
> > > >  }
> > > > +EXPORT_SYMBOL(drm_minor_acquire);
> > > >
> > > >  void drm_minor_release(struct drm_minor *minor)
> > > >  {
> > > >       drm_dev_put(minor->dev);
> > > >  }
> > > > +EXPORT_SYMBOL(drm_minor_release);
> > > >
> > > >  /**
> > > >   * DOC: driver instance overview
> > > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
> > > >  }
> > > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > > >
> > > > +/**
> > > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > > + * @fn: Function to be called for each pointer.
> > > > + * @data: Data passed to callback function.
> > > > + *
> > > > + * The callback function will be called for each @drm_minor entry, passing
> > > > + * the minor, the entry and @data.
> > > > + *
> > > > + * If @fn returns anything other than %0, the iteration stops and that
> > > > + * value is returned from this function.
> > > > + */
> > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > > > +{
> > > > +     return idr_for_each(&drm_minors_idr, fn, data);
> > > > +}
> > > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > > +
> > > >  /*
> > > >   * DRM Core
> > > >   * The DRM core module initializes all global DRM objects and makes them
> > > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > > > index e19ac7ca602d..6bfad76f8e78 100644
> > > > --- a/drivers/gpu/drm/drm_internal.h
> > > > +++ b/drivers/gpu/drm/drm_internal.h
> > > > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
> > > >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
> > > >                                       struct dma_buf *dma_buf);
> > > >
> > > > -/* drm_drv.c */
> > > > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > -void drm_minor_release(struct drm_minor *minor);
> > > > -
> > > >  /* drm_vblank.c */
> > > >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
> > > >  void drm_vblank_cleanup(struct drm_device *dev);
> > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > index 68ca736c548d..24f8d054c570 100644
> > > > --- a/include/drm/drm_drv.h
> > > > +++ b/include/drm/drm_drv.h
> > > > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
> > > >
> > > >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> > > >
> > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > > > +
> > > > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > +void drm_minor_release(struct drm_minor *minor);
> > > >
> > > >  #endif
> > > > --
> > > > 2.22.0
> > > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]                 ` <CAOWid-eUVztW4hNVpznnJRcwHcjCirGL2aS75p4OY8XoGuJqUg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-04  8:54                   ` Daniel Vetter
       [not found]                     ` <20190904085434.GF2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  2019-09-06 15:29                     ` Tejun Heo
  0 siblings, 2 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-09-04  8:54 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix,
	jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel, Daniel Vetter, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho <y2kenny@gmail.com> wrote:
> > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > Iterating over minors for cgroups sounds very, very wrong. Why do we care
> > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > >
> > > > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > > > care about devices.
> > > >
> > > > (I didn't look through the patch series to find out where exactly you're
> > > > using this, so maybe I'm off the rails here).
> > >
> > > I am exposing this to remove the need to keep track of a separate list
> > > of available drm_device in the system (to remove the registering and
> > > unregistering of drm_device to the cgroup subsystem and just use
> > > drm_minor as the single source of truth.)  I am only filtering out the
> > > render nodes minor because they point to the same drm_device and is
> > > confusing.
> > >
> > > Perhaps I missed an obvious way to list the drm devices without
> > > iterating through the drm_minors?  (I probably jumped to the minors
> > > because $major:$minor is the convention to address devices in cgroup.)
> >
> > Create your own if there's nothing, because you need to anyway:
> > - You need special locking anyway, we can't just block on the idr lock
> > for everything.
> > - This needs to refcount drm_device, no the minors.
> >
> > Iterating over stuff still feels kinda wrong still, because normally
> > the way we register/unregister userspace api (and cgroups isn't
> > anything else from a drm driver pov) is by adding more calls to
> > drm_dev_register/unregister. If you put a drm_cg_register/unregister
> > call in there we have a clean separation, and you can track all the
> > currently active devices however you want. Iterating over objects that
> > can be hotunplugged any time tends to get really complicated really
> > quickly.
> 
> Um... I thought this is what I had previously.  Did I misunderstood
> your feedback from v3?  Doesn't drm_minor already include all these
> facilities so isn't creating my own kind of reinventing the wheel?
> (as I did previously?)  drm_minor_register is called inside
> drm_dev_register so isn't leveraging existing drm_minor facilities
> much better solution?

Hm the previous version already dropped out of my inbox, so hard to find
it again. And I couldn't find this in archieves. Do you have pointers?

I thought the previous version did cgroup init separately from drm_device
setup, and I guess I suggested that it should be moved int
drm_dev_register/unregister?

Anyway, I don't think reusing the drm_minor registration makes sense,
since we want to be on the drm_device, not on the minor. Which is a bit
awkward for cgroups, which wants to identify devices using major.minor
pairs. But I guess drm is the first subsystem where 1 device can be
exposed through multiple minors ...

Tejun, any suggestions on this?

Anyway, I think just leveraging existing code because it can be abused to
make it fit for us doesn't make sense. E.g. for the kms side we also don't
piggy-back on top of drm_minor_register (it would be technically
possible), but instead we have drm_modeset_register_all().
-Daniel

> 
> Kenny
> 
> >
> >
> > >
> > > Kenny
> > >
> > > > -Daniel
> > > >
> > > > > ---
> > > > >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> > > > >  drivers/gpu/drm/drm_internal.h |  4 ----
> > > > >  include/drm/drm_drv.h          |  4 ++++
> > > > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > > index 862621494a93..000cddabd970 100644
> > > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
> > > > >
> > > > >       return minor;
> > > > >  }
> > > > > +EXPORT_SYMBOL(drm_minor_acquire);
> > > > >
> > > > >  void drm_minor_release(struct drm_minor *minor)
> > > > >  {
> > > > >       drm_dev_put(minor->dev);
> > > > >  }
> > > > > +EXPORT_SYMBOL(drm_minor_release);
> > > > >
> > > > >  /**
> > > > >   * DOC: driver instance overview
> > > > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
> > > > >  }
> > > > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > > > >
> > > > > +/**
> > > > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > > > + * @fn: Function to be called for each pointer.
> > > > > + * @data: Data passed to callback function.
> > > > > + *
> > > > > + * The callback function will be called for each @drm_minor entry, passing
> > > > > + * the minor, the entry and @data.
> > > > > + *
> > > > > + * If @fn returns anything other than %0, the iteration stops and that
> > > > > + * value is returned from this function.
> > > > > + */
> > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > > > > +{
> > > > > +     return idr_for_each(&drm_minors_idr, fn, data);
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > > > +
> > > > >  /*
> > > > >   * DRM Core
> > > > >   * The DRM core module initializes all global DRM objects and makes them
> > > > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > > > > index e19ac7ca602d..6bfad76f8e78 100644
> > > > > --- a/drivers/gpu/drm/drm_internal.h
> > > > > +++ b/drivers/gpu/drm/drm_internal.h
> > > > > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
> > > > >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
> > > > >                                       struct dma_buf *dma_buf);
> > > > >
> > > > > -/* drm_drv.c */
> > > > > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > -void drm_minor_release(struct drm_minor *minor);
> > > > > -
> > > > >  /* drm_vblank.c */
> > > > >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
> > > > >  void drm_vblank_cleanup(struct drm_device *dev);
> > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > index 68ca736c548d..24f8d054c570 100644
> > > > > --- a/include/drm/drm_drv.h
> > > > > +++ b/include/drm/drm_drv.h
> > > > > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
> > > > >
> > > > >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> > > > >
> > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > > > > +
> > > > > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > +void drm_minor_release(struct drm_minor *minor);
> > > > >
> > > > >  #endif
> > > > > --
> > > > > 2.22.0
> > > > >
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]                     ` <20190904085434.GF2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-09-05 18:27                       ` Kenny Ho
  2019-09-05 18:28                       ` Kenny Ho
  1 sibling, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-09-05 18:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel, Greathouse, Joseph,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König


[-- Attachment #1.1: Type: text/plain, Size: 8301 bytes --]

Hi Daniel,

This is the previous patch relevant to this discussion:
https://patchwork.freedesktop.org/patch/314343/

So before I refactored the code to leverage drm_minor, I kept my own list
of "known" drm_device inside the controller and have explicit register and
unregister function to init per device cgroup defaults.  For v4, I
refactored the per device cgroup properties and embedded them into the
drm_device and continue to only use the primary minor as a way to index the
device as v3.

Regards,
Kenny

On Wed, Sep 4, 2019 at 4:54 AM Daniel Vetter <daniel-/w4YWyX8dFk@public.gmane.org> wrote:

> On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> > On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter <daniel-/w4YWyX8dFk@public.gmane.org> wrote:
> > > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho <y2kenny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel-/w4YWyX8dFk@public.gmane.org>
> wrote:
> > > > > Iterating over minors for cgroups sounds very, very wrong. Why do
> we care
> > > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > > >
> > > > > I'd expect all the cgroup stuff to only work on drm_device, if it
> does
> > > > > care about devices.
> > > > >
> > > > > (I didn't look through the patch series to find out where exactly
> you're
> > > > > using this, so maybe I'm off the rails here).
> > > >
> > > > I am exposing this to remove the need to keep track of a separate
> list
> > > > of available drm_device in the system (to remove the registering and
> > > > unregistering of drm_device to the cgroup subsystem and just use
> > > > drm_minor as the single source of truth.)  I am only filtering out
> the
> > > > render nodes minor because they point to the same drm_device and is
> > > > confusing.
> > > >
> > > > Perhaps I missed an obvious way to list the drm devices without
> > > > iterating through the drm_minors?  (I probably jumped to the minors
> > > > because $major:$minor is the convention to address devices in
> cgroup.)
> > >
> > > Create your own if there's nothing, because you need to anyway:
> > > - You need special locking anyway, we can't just block on the idr lock
> > > for everything.
> > > - This needs to refcount drm_device, no the minors.
> > >
> > > Iterating over stuff still feels kinda wrong still, because normally
> > > the way we register/unregister userspace api (and cgroups isn't
> > > anything else from a drm driver pov) is by adding more calls to
> > > drm_dev_register/unregister. If you put a drm_cg_register/unregister
> > > call in there we have a clean separation, and you can track all the
> > > currently active devices however you want. Iterating over objects that
> > > can be hotunplugged any time tends to get really complicated really
> > > quickly.
> >
> > Um... I thought this is what I had previously.  Did I misunderstood
> > your feedback from v3?  Doesn't drm_minor already include all these
> > facilities so isn't creating my own kind of reinventing the wheel?
> > (as I did previously?)  drm_minor_register is called inside
> > drm_dev_register so isn't leveraging existing drm_minor facilities
> > much better solution?
>
> Hm the previous version already dropped out of my inbox, so hard to find
> it again. And I couldn't find this in archieves. Do you have pointers?
>
> I thought the previous version did cgroup init separately from drm_device
> setup, and I guess I suggested that it should be moved int
> drm_dev_register/unregister?
>
> Anyway, I don't think reusing the drm_minor registration makes sense,
> since we want to be on the drm_device, not on the minor. Which is a bit
> awkward for cgroups, which wants to identify devices using major.minor
> pairs. But I guess drm is the first subsystem where 1 device can be
> exposed through multiple minors ...
>
> Tejun, any suggestions on this?
>
> Anyway, I think just leveraging existing code because it can be abused to
> make it fit for us doesn't make sense. E.g. for the kms side we also don't
> piggy-back on top of drm_minor_register (it would be technically
> possible), but instead we have drm_modeset_register_all().
> -Daniel
>
> >
> > Kenny
> >
> > >
> > >
> > > >
> > > > Kenny
> > > >
> > > > > -Daniel
> > > > >
> > > > > > ---
> > > > > >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> > > > > >  drivers/gpu/drm/drm_internal.h |  4 ----
> > > > > >  include/drm/drm_drv.h          |  4 ++++
> > > > > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/drm_drv.c
> b/drivers/gpu/drm/drm_drv.c
> > > > > > index 862621494a93..000cddabd970 100644
> > > > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > > > @@ -254,11 +254,13 @@ struct drm_minor
> *drm_minor_acquire(unsigned int minor_id)
> > > > > >
> > > > > >       return minor;
> > > > > >  }
> > > > > > +EXPORT_SYMBOL(drm_minor_acquire);
> > > > > >
> > > > > >  void drm_minor_release(struct drm_minor *minor)
> > > > > >  {
> > > > > >       drm_dev_put(minor->dev);
> > > > > >  }
> > > > > > +EXPORT_SYMBOL(drm_minor_release);
> > > > > >
> > > > > >  /**
> > > > > >   * DOC: driver instance overview
> > > > > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device
> *dev, const char *name)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > > > > >
> > > > > > +/**
> > > > > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > > > > + * @fn: Function to be called for each pointer.
> > > > > > + * @data: Data passed to callback function.
> > > > > > + *
> > > > > > + * The callback function will be called for each @drm_minor
> entry, passing
> > > > > > + * the minor, the entry and @data.
> > > > > > + *
> > > > > > + * If @fn returns anything other than %0, the iteration stops
> and that
> > > > > > + * value is returned from this function.
> > > > > > + */
> > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data),
> void *data)
> > > > > > +{
> > > > > > +     return idr_for_each(&drm_minors_idr, fn, data);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > > > > +
> > > > > >  /*
> > > > > >   * DRM Core
> > > > > >   * The DRM core module initializes all global DRM objects and
> makes them
> > > > > > diff --git a/drivers/gpu/drm/drm_internal.h
> b/drivers/gpu/drm/drm_internal.h
> > > > > > index e19ac7ca602d..6bfad76f8e78 100644
> > > > > > --- a/drivers/gpu/drm/drm_internal.h
> > > > > > +++ b/drivers/gpu/drm/drm_internal.h
> > > > > > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct
> drm_prime_file_private *prime_fpriv);
> > > > > >  void drm_prime_remove_buf_handle_locked(struct
> drm_prime_file_private *prime_fpriv,
> > > > > >                                       struct dma_buf *dma_buf);
> > > > > >
> > > > > > -/* drm_drv.c */
> > > > > > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > -void drm_minor_release(struct drm_minor *minor);
> > > > > > -
> > > > > >  /* drm_vblank.c */
> > > > > >  void drm_vblank_disable_and_save(struct drm_device *dev,
> unsigned int pipe);
> > > > > >  void drm_vblank_cleanup(struct drm_device *dev);
> > > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > > index 68ca736c548d..24f8d054c570 100644
> > > > > > --- a/include/drm/drm_drv.h
> > > > > > +++ b/include/drm/drm_drv.h
> > > > > > @@ -799,5 +799,9 @@ static inline bool
> drm_drv_uses_atomic_modeset(struct drm_device *dev)
> > > > > >
> > > > > >  int drm_dev_set_unique(struct drm_device *dev, const char
> *name);
> > > > > >
> > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data),
> void *data);
> > > > > > +
> > > > > > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > +void drm_minor_release(struct drm_minor *minor);
> > > > > >
> > > > > >  #endif
> > > > > > --
> > > > > > 2.22.0
> > > > > >
> > > > >
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > >
> > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>

[-- Attachment #1.2: Type: text/html, Size: 11475 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]                     ` <20190904085434.GF2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  2019-09-05 18:27                       ` Kenny Ho
@ 2019-09-05 18:28                       ` Kenny Ho
  2019-09-05 20:06                         ` Daniel Vetter
  1 sibling, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-09-05 18:28 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel, Greathouse, Joseph,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

(resent in plain text mode)

Hi Daniel,

This is the previous patch relevant to this discussion:
https://patchwork.freedesktop.org/patch/314343/

So before I refactored the code to leverage drm_minor, I kept my own
list of "known" drm_device inside the controller and have explicit
register and unregister function to init per device cgroup defaults.
For v4, I refactored the per device cgroup properties and embedded
them into the drm_device and continue to only use the primary minor as
a way to index the device as v3.

Regards,
Kenny


On Wed, Sep 4, 2019 at 4:54 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> > On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho <y2kenny@gmail.com> wrote:
> > > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > Iterating over minors for cgroups sounds very, very wrong. Why do we care
> > > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > > >
> > > > > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > > > > care about devices.
> > > > >
> > > > > (I didn't look through the patch series to find out where exactly you're
> > > > > using this, so maybe I'm off the rails here).
> > > >
> > > > I am exposing this to remove the need to keep track of a separate list
> > > > of available drm_device in the system (to remove the registering and
> > > > unregistering of drm_device to the cgroup subsystem and just use
> > > > drm_minor as the single source of truth.)  I am only filtering out the
> > > > render nodes minor because they point to the same drm_device and is
> > > > confusing.
> > > >
> > > > Perhaps I missed an obvious way to list the drm devices without
> > > > iterating through the drm_minors?  (I probably jumped to the minors
> > > > because $major:$minor is the convention to address devices in cgroup.)
> > >
> > > Create your own if there's nothing, because you need to anyway:
> > > - You need special locking anyway, we can't just block on the idr lock
> > > for everything.
> > > - This needs to refcount drm_device, no the minors.
> > >
> > > Iterating over stuff still feels kinda wrong still, because normally
> > > the way we register/unregister userspace api (and cgroups isn't
> > > anything else from a drm driver pov) is by adding more calls to
> > > drm_dev_register/unregister. If you put a drm_cg_register/unregister
> > > call in there we have a clean separation, and you can track all the
> > > currently active devices however you want. Iterating over objects that
> > > can be hotunplugged any time tends to get really complicated really
> > > quickly.
> >
> > Um... I thought this is what I had previously.  Did I misunderstood
> > your feedback from v3?  Doesn't drm_minor already include all these
> > facilities so isn't creating my own kind of reinventing the wheel?
> > (as I did previously?)  drm_minor_register is called inside
> > drm_dev_register so isn't leveraging existing drm_minor facilities
> > much better solution?
>
> Hm the previous version already dropped out of my inbox, so hard to find
> it again. And I couldn't find this in archieves. Do you have pointers?
>
> I thought the previous version did cgroup init separately from drm_device
> setup, and I guess I suggested that it should be moved int
> drm_dev_register/unregister?
>
> Anyway, I don't think reusing the drm_minor registration makes sense,
> since we want to be on the drm_device, not on the minor. Which is a bit
> awkward for cgroups, which wants to identify devices using major.minor
> pairs. But I guess drm is the first subsystem where 1 device can be
> exposed through multiple minors ...
>
> Tejun, any suggestions on this?
>
> Anyway, I think just leveraging existing code because it can be abused to
> make it fit for us doesn't make sense. E.g. for the kms side we also don't
> piggy-back on top of drm_minor_register (it would be technically
> possible), but instead we have drm_modeset_register_all().
> -Daniel
>
> >
> > Kenny
> >
> > >
> > >
> > > >
> > > > Kenny
> > > >
> > > > > -Daniel
> > > > >
> > > > > > ---
> > > > > >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> > > > > >  drivers/gpu/drm/drm_internal.h |  4 ----
> > > > > >  include/drm/drm_drv.h          |  4 ++++
> > > > > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > > > index 862621494a93..000cddabd970 100644
> > > > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > > > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
> > > > > >
> > > > > >       return minor;
> > > > > >  }
> > > > > > +EXPORT_SYMBOL(drm_minor_acquire);
> > > > > >
> > > > > >  void drm_minor_release(struct drm_minor *minor)
> > > > > >  {
> > > > > >       drm_dev_put(minor->dev);
> > > > > >  }
> > > > > > +EXPORT_SYMBOL(drm_minor_release);
> > > > > >
> > > > > >  /**
> > > > > >   * DOC: driver instance overview
> > > > > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > > > > >
> > > > > > +/**
> > > > > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > > > > + * @fn: Function to be called for each pointer.
> > > > > > + * @data: Data passed to callback function.
> > > > > > + *
> > > > > > + * The callback function will be called for each @drm_minor entry, passing
> > > > > > + * the minor, the entry and @data.
> > > > > > + *
> > > > > > + * If @fn returns anything other than %0, the iteration stops and that
> > > > > > + * value is returned from this function.
> > > > > > + */
> > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > > > > > +{
> > > > > > +     return idr_for_each(&drm_minors_idr, fn, data);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > > > > +
> > > > > >  /*
> > > > > >   * DRM Core
> > > > > >   * The DRM core module initializes all global DRM objects and makes them
> > > > > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > > > > > index e19ac7ca602d..6bfad76f8e78 100644
> > > > > > --- a/drivers/gpu/drm/drm_internal.h
> > > > > > +++ b/drivers/gpu/drm/drm_internal.h
> > > > > > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
> > > > > >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
> > > > > >                                       struct dma_buf *dma_buf);
> > > > > >
> > > > > > -/* drm_drv.c */
> > > > > > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > -void drm_minor_release(struct drm_minor *minor);
> > > > > > -
> > > > > >  /* drm_vblank.c */
> > > > > >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
> > > > > >  void drm_vblank_cleanup(struct drm_device *dev);
> > > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > > index 68ca736c548d..24f8d054c570 100644
> > > > > > --- a/include/drm/drm_drv.h
> > > > > > +++ b/include/drm/drm_drv.h
> > > > > > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
> > > > > >
> > > > > >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> > > > > >
> > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > > > > > +
> > > > > > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > +void drm_minor_release(struct drm_minor *minor);
> > > > > >
> > > > > >  #endif
> > > > > > --
> > > > > > 2.22.0
> > > > > >
> > > > >
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > >
> > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
  2019-09-05 18:28                       ` Kenny Ho
@ 2019-09-05 20:06                         ` Daniel Vetter
       [not found]                           ` <CAKMK7uGSrscs-WAv0pYfcxaUGXvx7M6JYbiPHTY=1hxRbFK1sg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-05 20:06 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Alex Deucher, dri-devel, Greathouse, Joseph, Tejun Heo, cgroups,
	Christian König

On Thu, Sep 5, 2019 at 8:28 PM Kenny Ho <y2kenny@gmail.com> wrote:
>
> (resent in plain text mode)
>
> Hi Daniel,
>
> This is the previous patch relevant to this discussion:
> https://patchwork.freedesktop.org/patch/314343/

Ah yes, thanks for finding that.

> So before I refactored the code to leverage drm_minor, I kept my own
> list of "known" drm_device inside the controller and have explicit
> register and unregister function to init per device cgroup defaults.
> For v4, I refactored the per device cgroup properties and embedded
> them into the drm_device and continue to only use the primary minor as
> a way to index the device as v3.

I didn't really like the explicit registration step, at least for the
basic cgroup controls (like gem buffer limits), and suggested that
should happen automatically at drm_dev_register/unregister time. I
also talked about picking a consistent minor (if we have to use
minors, still would like Tejun to confirm what we should do here), but
that was an unrelated comment. So doing auto-registration on drm_minor
was one step too far.

Just doing a drm_cg_register/unregister pair that's called from
drm_dev_register/unregister, and then if you want, looking up the
right minor (I think always picking the render node makes sense for
this, and skipping if there's no render node) would make most sense.
At least for the basic cgroup controllers which are generic across
drivers.
-Daniel



>
> Regards,
> Kenny
>
>
> On Wed, Sep 4, 2019 at 4:54 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> > > On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho <y2kenny@gmail.com> wrote:
> > > > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > Iterating over minors for cgroups sounds very, very wrong. Why do we care
> > > > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > > > >
> > > > > > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > > > > > care about devices.
> > > > > >
> > > > > > (I didn't look through the patch series to find out where exactly you're
> > > > > > using this, so maybe I'm off the rails here).
> > > > >
> > > > > I am exposing this to remove the need to keep track of a separate list
> > > > > of available drm_device in the system (to remove the registering and
> > > > > unregistering of drm_device to the cgroup subsystem and just use
> > > > > drm_minor as the single source of truth.)  I am only filtering out the
> > > > > render nodes minor because they point to the same drm_device and is
> > > > > confusing.
> > > > >
> > > > > Perhaps I missed an obvious way to list the drm devices without
> > > > > iterating through the drm_minors?  (I probably jumped to the minors
> > > > > because $major:$minor is the convention to address devices in cgroup.)
> > > >
> > > > Create your own if there's nothing, because you need to anyway:
> > > > - You need special locking anyway, we can't just block on the idr lock
> > > > for everything.
> > > > - This needs to refcount drm_device, no the minors.
> > > >
> > > > Iterating over stuff still feels kinda wrong still, because normally
> > > > the way we register/unregister userspace api (and cgroups isn't
> > > > anything else from a drm driver pov) is by adding more calls to
> > > > drm_dev_register/unregister. If you put a drm_cg_register/unregister
> > > > call in there we have a clean separation, and you can track all the
> > > > currently active devices however you want. Iterating over objects that
> > > > can be hotunplugged any time tends to get really complicated really
> > > > quickly.
> > >
> > > Um... I thought this is what I had previously.  Did I misunderstood
> > > your feedback from v3?  Doesn't drm_minor already include all these
> > > facilities so isn't creating my own kind of reinventing the wheel?
> > > (as I did previously?)  drm_minor_register is called inside
> > > drm_dev_register so isn't leveraging existing drm_minor facilities
> > > much better solution?
> >
> > Hm the previous version already dropped out of my inbox, so hard to find
> > it again. And I couldn't find this in archieves. Do you have pointers?
> >
> > I thought the previous version did cgroup init separately from drm_device
> > setup, and I guess I suggested that it should be moved int
> > drm_dev_register/unregister?
> >
> > Anyway, I don't think reusing the drm_minor registration makes sense,
> > since we want to be on the drm_device, not on the minor. Which is a bit
> > awkward for cgroups, which wants to identify devices using major.minor
> > pairs. But I guess drm is the first subsystem where 1 device can be
> > exposed through multiple minors ...
> >
> > Tejun, any suggestions on this?
> >
> > Anyway, I think just leveraging existing code because it can be abused to
> > make it fit for us doesn't make sense. E.g. for the kms side we also don't
> > piggy-back on top of drm_minor_register (it would be technically
> > possible), but instead we have drm_modeset_register_all().
> > -Daniel
> >
> > >
> > > Kenny
> > >
> > > >
> > > >
> > > > >
> > > > > Kenny
> > > > >
> > > > > > -Daniel
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> > > > > > >  drivers/gpu/drm/drm_internal.h |  4 ----
> > > > > > >  include/drm/drm_drv.h          |  4 ++++
> > > > > > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > > > > index 862621494a93..000cddabd970 100644
> > > > > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > > > > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
> > > > > > >
> > > > > > >       return minor;
> > > > > > >  }
> > > > > > > +EXPORT_SYMBOL(drm_minor_acquire);
> > > > > > >
> > > > > > >  void drm_minor_release(struct drm_minor *minor)
> > > > > > >  {
> > > > > > >       drm_dev_put(minor->dev);
> > > > > > >  }
> > > > > > > +EXPORT_SYMBOL(drm_minor_release);
> > > > > > >
> > > > > > >  /**
> > > > > > >   * DOC: driver instance overview
> > > > > > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > > > > > >
> > > > > > > +/**
> > > > > > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > > > > > + * @fn: Function to be called for each pointer.
> > > > > > > + * @data: Data passed to callback function.
> > > > > > > + *
> > > > > > > + * The callback function will be called for each @drm_minor entry, passing
> > > > > > > + * the minor, the entry and @data.
> > > > > > > + *
> > > > > > > + * If @fn returns anything other than %0, the iteration stops and that
> > > > > > > + * value is returned from this function.
> > > > > > > + */
> > > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > > > > > > +{
> > > > > > > +     return idr_for_each(&drm_minors_idr, fn, data);
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > > > > > +
> > > > > > >  /*
> > > > > > >   * DRM Core
> > > > > > >   * The DRM core module initializes all global DRM objects and makes them
> > > > > > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > > > > > > index e19ac7ca602d..6bfad76f8e78 100644
> > > > > > > --- a/drivers/gpu/drm/drm_internal.h
> > > > > > > +++ b/drivers/gpu/drm/drm_internal.h
> > > > > > > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
> > > > > > >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
> > > > > > >                                       struct dma_buf *dma_buf);
> > > > > > >
> > > > > > > -/* drm_drv.c */
> > > > > > > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > > -void drm_minor_release(struct drm_minor *minor);
> > > > > > > -
> > > > > > >  /* drm_vblank.c */
> > > > > > >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
> > > > > > >  void drm_vblank_cleanup(struct drm_device *dev);
> > > > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > > > index 68ca736c548d..24f8d054c570 100644
> > > > > > > --- a/include/drm/drm_drv.h
> > > > > > > +++ b/include/drm/drm_drv.h
> > > > > > > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
> > > > > > >
> > > > > > >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> > > > > > >
> > > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > > > > > > +
> > > > > > > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > > +void drm_minor_release(struct drm_minor *minor);
> > > > > > >
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.22.0
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Daniel Vetter
> > > > > > Software Engineer, Intel Corporation
> > > > > > http://blog.ffwll.ch
> > > >
> > > >
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]                           ` <CAKMK7uGSrscs-WAv0pYfcxaUGXvx7M6JYbiPHTY=1hxRbFK1sg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-05 20:20                             ` Kenny Ho
  2019-09-05 20:32                               ` Daniel Vetter
  0 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-09-05 20:20 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel, Greathouse, Joseph,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Thu, Sep 5, 2019 at 4:06 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Sep 5, 2019 at 8:28 PM Kenny Ho <y2kenny@gmail.com> wrote:
> >
> > (resent in plain text mode)
> >
> > Hi Daniel,
> >
> > This is the previous patch relevant to this discussion:
> > https://patchwork.freedesktop.org/patch/314343/
>
> Ah yes, thanks for finding that.
>
> > So before I refactored the code to leverage drm_minor, I kept my own
> > list of "known" drm_device inside the controller and have explicit
> > register and unregister function to init per device cgroup defaults.
> > For v4, I refactored the per device cgroup properties and embedded
> > them into the drm_device and continue to only use the primary minor as
> > a way to index the device as v3.
>
> I didn't really like the explicit registration step, at least for the
> basic cgroup controls (like gem buffer limits), and suggested that
> should happen automatically at drm_dev_register/unregister time. I
> also talked about picking a consistent minor (if we have to use
> minors, still would like Tejun to confirm what we should do here), but
> that was an unrelated comment. So doing auto-registration on drm_minor
> was one step too far.

How about your comments on embedding properties into drm_device?  I am
actually still not clear on the downside of using drm_minor this way.
With this implementation in v4, there isn't additional state that can
go out of sync with the ground truth of drm_device from the
perspective of drm_minor.  Wouldn't the issue with hotplugging drm
device you described earlier get worsen if the cgroup controller keep
its own list?

> Just doing a drm_cg_register/unregister pair that's called from
> drm_dev_register/unregister, and then if you want, looking up the
> right minor (I think always picking the render node makes sense for
> this, and skipping if there's no render node) would make most sense.
> At least for the basic cgroup controllers which are generic across
> drivers.

Why do we want to skip drm devices that does not have a render node
and not just use the primary instead?

Kenny



> -Daniel
>
>
>
> >
> > Regards,
> > Kenny
> >
> >
> > On Wed, Sep 4, 2019 at 4:54 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> > > > On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho <y2kenny@gmail.com> wrote:
> > > > > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > Iterating over minors for cgroups sounds very, very wrong. Why do we care
> > > > > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > > > > >
> > > > > > > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > > > > > > care about devices.
> > > > > > >
> > > > > > > (I didn't look through the patch series to find out where exactly you're
> > > > > > > using this, so maybe I'm off the rails here).
> > > > > >
> > > > > > I am exposing this to remove the need to keep track of a separate list
> > > > > > of available drm_device in the system (to remove the registering and
> > > > > > unregistering of drm_device to the cgroup subsystem and just use
> > > > > > drm_minor as the single source of truth.)  I am only filtering out the
> > > > > > render nodes minor because they point to the same drm_device and is
> > > > > > confusing.
> > > > > >
> > > > > > Perhaps I missed an obvious way to list the drm devices without
> > > > > > iterating through the drm_minors?  (I probably jumped to the minors
> > > > > > because $major:$minor is the convention to address devices in cgroup.)
> > > > >
> > > > > Create your own if there's nothing, because you need to anyway:
> > > > > - You need special locking anyway, we can't just block on the idr lock
> > > > > for everything.
> > > > > - This needs to refcount drm_device, no the minors.
> > > > >
> > > > > Iterating over stuff still feels kinda wrong still, because normally
> > > > > the way we register/unregister userspace api (and cgroups isn't
> > > > > anything else from a drm driver pov) is by adding more calls to
> > > > > drm_dev_register/unregister. If you put a drm_cg_register/unregister
> > > > > call in there we have a clean separation, and you can track all the
> > > > > currently active devices however you want. Iterating over objects that
> > > > > can be hotunplugged any time tends to get really complicated really
> > > > > quickly.
> > > >
> > > > Um... I thought this is what I had previously.  Did I misunderstood
> > > > your feedback from v3?  Doesn't drm_minor already include all these
> > > > facilities so isn't creating my own kind of reinventing the wheel?
> > > > (as I did previously?)  drm_minor_register is called inside
> > > > drm_dev_register so isn't leveraging existing drm_minor facilities
> > > > much better solution?
> > >
> > > Hm the previous version already dropped out of my inbox, so hard to find
> > > it again. And I couldn't find this in archieves. Do you have pointers?
> > >
> > > I thought the previous version did cgroup init separately from drm_device
> > > setup, and I guess I suggested that it should be moved int
> > > drm_dev_register/unregister?
> > >
> > > Anyway, I don't think reusing the drm_minor registration makes sense,
> > > since we want to be on the drm_device, not on the minor. Which is a bit
> > > awkward for cgroups, which wants to identify devices using major.minor
> > > pairs. But I guess drm is the first subsystem where 1 device can be
> > > exposed through multiple minors ...
> > >
> > > Tejun, any suggestions on this?
> > >
> > > Anyway, I think just leveraging existing code because it can be abused to
> > > make it fit for us doesn't make sense. E.g. for the kms side we also don't
> > > piggy-back on top of drm_minor_register (it would be technically
> > > possible), but instead we have drm_modeset_register_all().
> > > -Daniel
> > >
> > > >
> > > > Kenny
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Kenny
> > > > > >
> > > > > > > -Daniel
> > > > > > >
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> > > > > > > >  drivers/gpu/drm/drm_internal.h |  4 ----
> > > > > > > >  include/drm/drm_drv.h          |  4 ++++
> > > > > > > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > > > > > index 862621494a93..000cddabd970 100644
> > > > > > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > > > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > > > > > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
> > > > > > > >
> > > > > > > >       return minor;
> > > > > > > >  }
> > > > > > > > +EXPORT_SYMBOL(drm_minor_acquire);
> > > > > > > >
> > > > > > > >  void drm_minor_release(struct drm_minor *minor)
> > > > > > > >  {
> > > > > > > >       drm_dev_put(minor->dev);
> > > > > > > >  }
> > > > > > > > +EXPORT_SYMBOL(drm_minor_release);
> > > > > > > >
> > > > > > > >  /**
> > > > > > > >   * DOC: driver instance overview
> > > > > > > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
> > > > > > > >  }
> > > > > > > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > > > > > > + * @fn: Function to be called for each pointer.
> > > > > > > > + * @data: Data passed to callback function.
> > > > > > > > + *
> > > > > > > > + * The callback function will be called for each @drm_minor entry, passing
> > > > > > > > + * the minor, the entry and @data.
> > > > > > > > + *
> > > > > > > > + * If @fn returns anything other than %0, the iteration stops and that
> > > > > > > > + * value is returned from this function.
> > > > > > > > + */
> > > > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > > > > > > > +{
> > > > > > > > +     return idr_for_each(&drm_minors_idr, fn, data);
> > > > > > > > +}
> > > > > > > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > > > > > > +
> > > > > > > >  /*
> > > > > > > >   * DRM Core
> > > > > > > >   * The DRM core module initializes all global DRM objects and makes them
> > > > > > > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > > > > > > > index e19ac7ca602d..6bfad76f8e78 100644
> > > > > > > > --- a/drivers/gpu/drm/drm_internal.h
> > > > > > > > +++ b/drivers/gpu/drm/drm_internal.h
> > > > > > > > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
> > > > > > > >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
> > > > > > > >                                       struct dma_buf *dma_buf);
> > > > > > > >
> > > > > > > > -/* drm_drv.c */
> > > > > > > > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > > > -void drm_minor_release(struct drm_minor *minor);
> > > > > > > > -
> > > > > > > >  /* drm_vblank.c */
> > > > > > > >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
> > > > > > > >  void drm_vblank_cleanup(struct drm_device *dev);
> > > > > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > > > > index 68ca736c548d..24f8d054c570 100644
> > > > > > > > --- a/include/drm/drm_drv.h
> > > > > > > > +++ b/include/drm/drm_drv.h
> > > > > > > > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
> > > > > > > >
> > > > > > > >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> > > > > > > >
> > > > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > > > > > > > +
> > > > > > > > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > > > +void drm_minor_release(struct drm_minor *minor);
> > > > > > > >
> > > > > > > >  #endif
> > > > > > > > --
> > > > > > > > 2.22.0
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Daniel Vetter
> > > > > > > Software Engineer, Intel Corporation
> > > > > > > http://blog.ffwll.ch
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
  2019-09-05 20:20                             ` Kenny Ho
@ 2019-09-05 20:32                               ` Daniel Vetter
       [not found]                                 ` <CAKMK7uHy+GRAcpLDuz6STCBW+GNfNWr-i=ZERF3uqkO7jfynnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-05 20:32 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Alex Deucher, dri-devel, Greathouse, Joseph, Tejun Heo, cgroups,
	Christian König

On Thu, Sep 5, 2019 at 10:21 PM Kenny Ho <y2kenny@gmail.com> wrote:
>
> On Thu, Sep 5, 2019 at 4:06 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Sep 5, 2019 at 8:28 PM Kenny Ho <y2kenny@gmail.com> wrote:
> > >
> > > (resent in plain text mode)
> > >
> > > Hi Daniel,
> > >
> > > This is the previous patch relevant to this discussion:
> > > https://patchwork.freedesktop.org/patch/314343/
> >
> > Ah yes, thanks for finding that.
> >
> > > So before I refactored the code to leverage drm_minor, I kept my own
> > > list of "known" drm_device inside the controller and have explicit
> > > register and unregister function to init per device cgroup defaults.
> > > For v4, I refactored the per device cgroup properties and embedded
> > > them into the drm_device and continue to only use the primary minor as
> > > a way to index the device as v3.
> >
> > I didn't really like the explicit registration step, at least for the
> > basic cgroup controls (like gem buffer limits), and suggested that
> > should happen automatically at drm_dev_register/unregister time. I
> > also talked about picking a consistent minor (if we have to use
> > minors, still would like Tejun to confirm what we should do here), but
> > that was an unrelated comment. So doing auto-registration on drm_minor
> > was one step too far.
>
> How about your comments on embedding properties into drm_device?  I am
> actually still not clear on the downside of using drm_minor this way.
> With this implementation in v4, there isn't additional state that can
> go out of sync with the ground truth of drm_device from the
> perspective of drm_minor.  Wouldn't the issue with hotplugging drm
> device you described earlier get worsen if the cgroup controller keep
> its own list?

drm_dev_unregister gets called on hotunplug, so your cgroup-internal
tracking won't get out of sync any more than the drm_minor list gets
out of sync with drm_devices. The trouble with drm_minor is just that
cgroup doesn't track allocations on drm_minor (that's just the uapi
flavour), but on the underlying drm_device. So really doesn't make
much sense to attach cgroup tracking to the drm_minor.

> > Just doing a drm_cg_register/unregister pair that's called from
> > drm_dev_register/unregister, and then if you want, looking up the
> > right minor (I think always picking the render node makes sense for
> > this, and skipping if there's no render node) would make most sense.
> > At least for the basic cgroup controllers which are generic across
> > drivers.
>
> Why do we want to skip drm devices that does not have a render node
> and not just use the primary instead?

I guess we could also take the primary node, but drivers with only
primary node are generaly display-only drm drivers. Not sure we want
cgroups on those (but I guess can't hurt, and more consistent). But
then we'd always need to pick the primary node for cgroup
identification purposes.
-Daniel

>
> Kenny
>
>
>
> > -Daniel
> >
> >
> >
> > >
> > > Regards,
> > > Kenny
> > >
> > >
> > > On Wed, Sep 4, 2019 at 4:54 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Tue, Sep 03, 2019 at 04:43:45PM -0400, Kenny Ho wrote:
> > > > > On Tue, Sep 3, 2019 at 4:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > On Tue, Sep 3, 2019 at 9:45 PM Kenny Ho <y2kenny@gmail.com> wrote:
> > > > > > > On Tue, Sep 3, 2019 at 3:57 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > Iterating over minors for cgroups sounds very, very wrong. Why do we care
> > > > > > > > whether a buffer was allocated through kms dumb vs render nodes?
> > > > > > > >
> > > > > > > > I'd expect all the cgroup stuff to only work on drm_device, if it does
> > > > > > > > care about devices.
> > > > > > > >
> > > > > > > > (I didn't look through the patch series to find out where exactly you're
> > > > > > > > using this, so maybe I'm off the rails here).
> > > > > > >
> > > > > > > I am exposing this to remove the need to keep track of a separate list
> > > > > > > of available drm_device in the system (to remove the registering and
> > > > > > > unregistering of drm_device to the cgroup subsystem and just use
> > > > > > > drm_minor as the single source of truth.)  I am only filtering out the
> > > > > > > render nodes minor because they point to the same drm_device and is
> > > > > > > confusing.
> > > > > > >
> > > > > > > Perhaps I missed an obvious way to list the drm devices without
> > > > > > > iterating through the drm_minors?  (I probably jumped to the minors
> > > > > > > because $major:$minor is the convention to address devices in cgroup.)
> > > > > >
> > > > > > Create your own if there's nothing, because you need to anyway:
> > > > > > - You need special locking anyway, we can't just block on the idr lock
> > > > > > for everything.
> > > > > > - This needs to refcount drm_device, no the minors.
> > > > > >
> > > > > > Iterating over stuff still feels kinda wrong still, because normally
> > > > > > the way we register/unregister userspace api (and cgroups isn't
> > > > > > anything else from a drm driver pov) is by adding more calls to
> > > > > > drm_dev_register/unregister. If you put a drm_cg_register/unregister
> > > > > > call in there we have a clean separation, and you can track all the
> > > > > > currently active devices however you want. Iterating over objects that
> > > > > > can be hotunplugged any time tends to get really complicated really
> > > > > > quickly.
> > > > >
> > > > > Um... I thought this is what I had previously.  Did I misunderstood
> > > > > your feedback from v3?  Doesn't drm_minor already include all these
> > > > > facilities so isn't creating my own kind of reinventing the wheel?
> > > > > (as I did previously?)  drm_minor_register is called inside
> > > > > drm_dev_register so isn't leveraging existing drm_minor facilities
> > > > > much better solution?
> > > >
> > > > Hm the previous version already dropped out of my inbox, so hard to find
> > > > it again. And I couldn't find this in archieves. Do you have pointers?
> > > >
> > > > I thought the previous version did cgroup init separately from drm_device
> > > > setup, and I guess I suggested that it should be moved int
> > > > drm_dev_register/unregister?
> > > >
> > > > Anyway, I don't think reusing the drm_minor registration makes sense,
> > > > since we want to be on the drm_device, not on the minor. Which is a bit
> > > > awkward for cgroups, which wants to identify devices using major.minor
> > > > pairs. But I guess drm is the first subsystem where 1 device can be
> > > > exposed through multiple minors ...
> > > >
> > > > Tejun, any suggestions on this?
> > > >
> > > > Anyway, I think just leveraging existing code because it can be abused to
> > > > make it fit for us doesn't make sense. E.g. for the kms side we also don't
> > > > piggy-back on top of drm_minor_register (it would be technically
> > > > possible), but instead we have drm_modeset_register_all().
> > > > -Daniel
> > > >
> > > > >
> > > > > Kenny
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Kenny
> > > > > > >
> > > > > > > > -Daniel
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  drivers/gpu/drm/drm_drv.c      | 19 +++++++++++++++++++
> > > > > > > > >  drivers/gpu/drm/drm_internal.h |  4 ----
> > > > > > > > >  include/drm/drm_drv.h          |  4 ++++
> > > > > > > > >  3 files changed, 23 insertions(+), 4 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > > > > > > index 862621494a93..000cddabd970 100644
> > > > > > > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > > > > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > > > > > > @@ -254,11 +254,13 @@ struct drm_minor *drm_minor_acquire(unsigned int minor_id)
> > > > > > > > >
> > > > > > > > >       return minor;
> > > > > > > > >  }
> > > > > > > > > +EXPORT_SYMBOL(drm_minor_acquire);
> > > > > > > > >
> > > > > > > > >  void drm_minor_release(struct drm_minor *minor)
> > > > > > > > >  {
> > > > > > > > >       drm_dev_put(minor->dev);
> > > > > > > > >  }
> > > > > > > > > +EXPORT_SYMBOL(drm_minor_release);
> > > > > > > > >
> > > > > > > > >  /**
> > > > > > > > >   * DOC: driver instance overview
> > > > > > > > > @@ -1078,6 +1080,23 @@ int drm_dev_set_unique(struct drm_device *dev, const char *name)
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL(drm_dev_set_unique);
> > > > > > > > >
> > > > > > > > > +/**
> > > > > > > > > + * drm_minor_for_each - Iterate through all stored DRM minors
> > > > > > > > > + * @fn: Function to be called for each pointer.
> > > > > > > > > + * @data: Data passed to callback function.
> > > > > > > > > + *
> > > > > > > > > + * The callback function will be called for each @drm_minor entry, passing
> > > > > > > > > + * the minor, the entry and @data.
> > > > > > > > > + *
> > > > > > > > > + * If @fn returns anything other than %0, the iteration stops and that
> > > > > > > > > + * value is returned from this function.
> > > > > > > > > + */
> > > > > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data)
> > > > > > > > > +{
> > > > > > > > > +     return idr_for_each(&drm_minors_idr, fn, data);
> > > > > > > > > +}
> > > > > > > > > +EXPORT_SYMBOL(drm_minor_for_each);
> > > > > > > > > +
> > > > > > > > >  /*
> > > > > > > > >   * DRM Core
> > > > > > > > >   * The DRM core module initializes all global DRM objects and makes them
> > > > > > > > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > > > > > > > > index e19ac7ca602d..6bfad76f8e78 100644
> > > > > > > > > --- a/drivers/gpu/drm/drm_internal.h
> > > > > > > > > +++ b/drivers/gpu/drm/drm_internal.h
> > > > > > > > > @@ -54,10 +54,6 @@ void drm_prime_destroy_file_private(struct drm_prime_file_private *prime_fpriv);
> > > > > > > > >  void drm_prime_remove_buf_handle_locked(struct drm_prime_file_private *prime_fpriv,
> > > > > > > > >                                       struct dma_buf *dma_buf);
> > > > > > > > >
> > > > > > > > > -/* drm_drv.c */
> > > > > > > > > -struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > > > > -void drm_minor_release(struct drm_minor *minor);
> > > > > > > > > -
> > > > > > > > >  /* drm_vblank.c */
> > > > > > > > >  void drm_vblank_disable_and_save(struct drm_device *dev, unsigned int pipe);
> > > > > > > > >  void drm_vblank_cleanup(struct drm_device *dev);
> > > > > > > > > diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> > > > > > > > > index 68ca736c548d..24f8d054c570 100644
> > > > > > > > > --- a/include/drm/drm_drv.h
> > > > > > > > > +++ b/include/drm/drm_drv.h
> > > > > > > > > @@ -799,5 +799,9 @@ static inline bool drm_drv_uses_atomic_modeset(struct drm_device *dev)
> > > > > > > > >
> > > > > > > > >  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> > > > > > > > >
> > > > > > > > > +int drm_minor_for_each(int (*fn)(int id, void *p, void *data), void *data);
> > > > > > > > > +
> > > > > > > > > +struct drm_minor *drm_minor_acquire(unsigned int minor_id);
> > > > > > > > > +void drm_minor_release(struct drm_minor *minor);
> > > > > > > > >
> > > > > > > > >  #endif
> > > > > > > > > --
> > > > > > > > > 2.22.0
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Daniel Vetter
> > > > > > > > Software Engineer, Intel Corporation
> > > > > > > > http://blog.ffwll.ch
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Daniel Vetter
> > > > > > Software Engineer, Intel Corporation
> > > > > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]                                 ` <CAKMK7uHy+GRAcpLDuz6STCBW+GNfNWr-i=ZERF3uqkO7jfynnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-05 21:26                                   ` Kenny Ho
       [not found]                                     ` <CAOWid-cRP1T2gr2U_ZN+QhS7jFM0kFTWiYy8JPPXXmGW7xBPzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-09-05 21:26 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel, Greathouse, Joseph,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Thu, Sep 5, 2019 at 4:32 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
*snip*
> drm_dev_unregister gets called on hotunplug, so your cgroup-internal
> tracking won't get out of sync any more than the drm_minor list gets
> out of sync with drm_devices. The trouble with drm_minor is just that
> cgroup doesn't track allocations on drm_minor (that's just the uapi
> flavour), but on the underlying drm_device. So really doesn't make
> much sense to attach cgroup tracking to the drm_minor.

Um... I think I get what you are saying, but isn't this a matter of
the cgroup controller doing a drm_dev_get when using the drm_minor?
Or that won't work because it's possible to have a valid drm_minor but
invalid drm_device in it? I understand it's an extra level of
indirection but since the convention for addressing device in cgroup
is using $major:$minor I don't see a way to escape this.  (Tejun
actually already made a comment on my earlier RFC where I didn't
follow the major:minor convention strictly.)

Kenny

> > > Just doing a drm_cg_register/unregister pair that's called from
> > > drm_dev_register/unregister, and then if you want, looking up the
> > > right minor (I think always picking the render node makes sense for
> > > this, and skipping if there's no render node) would make most sense.
> > > At least for the basic cgroup controllers which are generic across
> > > drivers.
> >
> > Why do we want to skip drm devices that does not have a render node
> > and not just use the primary instead?
>
> I guess we could also take the primary node, but drivers with only
> primary node are generaly display-only drm drivers. Not sure we want
> cgroups on those (but I guess can't hurt, and more consistent). But
> then we'd always need to pick the primary node for cgroup
> identification purposes.
> -Daniel
>
> >
> > Kenny
> >
> >
> >
> > > -Daniel
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]                                     ` <CAOWid-cRP1T2gr2U_ZN+QhS7jFM0kFTWiYy8JPPXXmGW7xBPzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-06  9:12                                       ` Daniel Vetter
  0 siblings, 0 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-09-06  9:12 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix,
	jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel, Daniel Vetter, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Thu, Sep 05, 2019 at 05:26:08PM -0400, Kenny Ho wrote:
> On Thu, Sep 5, 2019 at 4:32 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> *snip*
> > drm_dev_unregister gets called on hotunplug, so your cgroup-internal
> > tracking won't get out of sync any more than the drm_minor list gets
> > out of sync with drm_devices. The trouble with drm_minor is just that
> > cgroup doesn't track allocations on drm_minor (that's just the uapi
> > flavour), but on the underlying drm_device. So really doesn't make
> > much sense to attach cgroup tracking to the drm_minor.
> 
> Um... I think I get what you are saying, but isn't this a matter of
> the cgroup controller doing a drm_dev_get when using the drm_minor?
> Or that won't work because it's possible to have a valid drm_minor but
> invalid drm_device in it? I understand it's an extra level of
> indirection but since the convention for addressing device in cgroup
> is using $major:$minor I don't see a way to escape this.  (Tejun
> actually already made a comment on my earlier RFC where I didn't
> follow the major:minor convention strictly.)

drm_device is the object that controls lifetime and everything, that's why
you need to do a drm_dev_get and all that in some places. Going through
the minor really feels like a distraction.

And yes we have a bit a mess between cgroups insisting on using the minor,
and drm_device having more than 1 minor for the same underlying physical
resource. Just because the uapi is a bit a mess in that regard doesn't
mean we should pull that mess into the kernel implementation imo.
-Daniel

> 
> Kenny
> 
> > > > Just doing a drm_cg_register/unregister pair that's called from
> > > > drm_dev_register/unregister, and then if you want, looking up the
> > > > right minor (I think always picking the render node makes sense for
> > > > this, and skipping if there's no render node) would make most sense.
> > > > At least for the basic cgroup controllers which are generic across
> > > > drivers.
> > >
> > > Why do we want to skip drm devices that does not have a render node
> > > and not just use the primary instead?
> >
> > I guess we could also take the primary node, but drivers with only
> > primary node are generaly display-only drm drivers. Not sure we want
> > cgroups on those (but I guess can't hurt, and more consistent). But
> > then we'd always need to pick the primary node for cgroup
> > identification purposes.
> > -Daniel
> >
> > >
> > > Kenny
> > >
> > >
> > >
> > > > -Daniel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found]               ` <CAKMK7uE5Bj-3cJH895iqnLpwUV+GBDM1Y=n4Z4A3xervMdJKXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-06 15:23                 ` Tejun Heo
       [not found]                   ` <20190906152320.GM2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Tejun Heo @ 2019-09-06 15:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Kenny Ho, dri-devel,
	joseph.greathouse-5C7GfCeVMHo, Alex Deucher,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

Hello, Daniel.

On Tue, Sep 03, 2019 at 09:48:22PM +0200, Daniel Vetter wrote:
> I think system memory separate from vram makes sense. For one, vram is
> like 10x+ faster than system memory, so we definitely want to have
> good control on that. But maybe we only want one vram bucket overall
> for the entire system?
> 
> The trouble with system memory is that gpu tasks pin that memory to
> prep execution. There's two solutions:
> - i915 has a shrinker. Lots (and I really mean lots) of pain with
> direct reclaim recursion, which often means we can't free memory, and
> we're angering the oom killer a lot. Plus it introduces real bad
> latency spikes everywhere (gpu workloads are occasionally really slow,
> think "worse than pageout to spinning rust" to get memory freed).
> - ttm just has a global limit, set to 50% of system memory.
> 
> I do think a global system memory limit to tame the shrinker, without
> the ttm approach of possible just wasting half your memory, could be
> useful.

Hmm... what'd be the fundamental difference from slab or socket memory
which are handled through memcg?  Is system memory used by GPUs have
further global restrictions in addition to the amount of physical
memory used?

> I'm also not sure of the bw limits, given all the fun we have on the
> block io cgroups side. Aside from that the current bw limit only
> controls the bw the kernel uses, userspace can submit unlimited
> amounts of copying commands that use the same pcie links directly to
> the gpu, bypassing this cg knob. Also, controlling execution time for
> gpus is very tricky, since they work a lot more like a block io device
> or maybe a network controller with packet scheduling, than a cpu.

At the system level, it just gets folded into cpu time, which isn't
perfect but is usually a good enough approximation of compute related
dynamic resources.  Can gpu do someting similar or at least start with
that?

Thanks.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
  2019-09-04  8:54                   ` Daniel Vetter
       [not found]                     ` <20190904085434.GF2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-09-06 15:29                     ` Tejun Heo
  2019-09-06 15:36                       ` Daniel Vetter
  1 sibling, 1 reply; 89+ messages in thread
From: Tejun Heo @ 2019-09-06 15:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Kenny Ho, dri-devel, Greathouse, Joseph, Alex Deucher, cgroups,
	Christian König

Hello,

On Wed, Sep 04, 2019 at 10:54:34AM +0200, Daniel Vetter wrote:
> Anyway, I don't think reusing the drm_minor registration makes sense,
> since we want to be on the drm_device, not on the minor. Which is a bit
> awkward for cgroups, which wants to identify devices using major.minor
> pairs. But I guess drm is the first subsystem where 1 device can be
> exposed through multiple minors ...
> 
> Tejun, any suggestions on this?

I'm not extremely attached to maj:min.  It's nice in that it'd be
consistent with blkcg but it already isn't the nicest of identifiers
for block devices.  If using maj:min is reasonably straight forward
for gpus even if not perfect, I'd prefer going with maj:min.
Otherwise, please feel free to use the ID best for GPUs - hopefully
something which is easy to understand, consistent with IDs used
elsewhere and easy to build tooling around.

Thanks.

-- 
tejun
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found]                   ` <20190906152320.GM2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
@ 2019-09-06 15:34                     ` Daniel Vetter
       [not found]                       ` <CAKMK7uEXP7XLFB2aFU6+E0TH_DepFRkfCoKoHwkXtjZRDyhHig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-06 15:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Kenny Ho, dri-devel, Greathouse, Joseph,
	Alex Deucher, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Fri, Sep 6, 2019 at 5:23 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello, Daniel.
>
> On Tue, Sep 03, 2019 at 09:48:22PM +0200, Daniel Vetter wrote:
> > I think system memory separate from vram makes sense. For one, vram is
> > like 10x+ faster than system memory, so we definitely want to have
> > good control on that. But maybe we only want one vram bucket overall
> > for the entire system?
> >
> > The trouble with system memory is that gpu tasks pin that memory to
> > prep execution. There's two solutions:
> > - i915 has a shrinker. Lots (and I really mean lots) of pain with
> > direct reclaim recursion, which often means we can't free memory, and
> > we're angering the oom killer a lot. Plus it introduces real bad
> > latency spikes everywhere (gpu workloads are occasionally really slow,
> > think "worse than pageout to spinning rust" to get memory freed).
> > - ttm just has a global limit, set to 50% of system memory.
> >
> > I do think a global system memory limit to tame the shrinker, without
> > the ttm approach of possible just wasting half your memory, could be
> > useful.
>
> Hmm... what'd be the fundamental difference from slab or socket memory
> which are handled through memcg?  Is system memory used by GPUs have
> further global restrictions in addition to the amount of physical
> memory used?

Sometimes, but that would be specific resources (kinda like vram),
e.g. CMA regions used by a gpu. But probably not something you'll run
in a datacenter and want cgroups for ...

I guess we could try to integrate with the memcg group controller. One
trouble is that aside from i915 most gpu drivers do not really have a
full shrinker, so not sure how that would all integrate.

The overall gpu memory controller would still be outside of memcg I
think, since that would include swapped-out gpu objects, and stuff in
special memory regions like vram.

> > I'm also not sure of the bw limits, given all the fun we have on the
> > block io cgroups side. Aside from that the current bw limit only
> > controls the bw the kernel uses, userspace can submit unlimited
> > amounts of copying commands that use the same pcie links directly to
> > the gpu, bypassing this cg knob. Also, controlling execution time for
> > gpus is very tricky, since they work a lot more like a block io device
> > or maybe a network controller with packet scheduling, than a cpu.
>
> At the system level, it just gets folded into cpu time, which isn't
> perfect but is usually a good enough approximation of compute related
> dynamic resources.  Can gpu do someting similar or at least start with
> that?

So generally there's a pile of engines, often of different type (e.g.
amd hw has an entire pile of copy engines), with some ill-defined
sharing charateristics for some (often compute/render engines use the
same shader cores underneath), kinda like hyperthreading. So at that
detail it's all extremely hw specific, and probably too hard to
control in a useful way for users. And I'm not sure we can really do a
reasonable knob for overall gpu usage, e.g. if we include all the copy
engines, but the workloads are only running on compute engines, then
you might only get 10% overall utilization by engine-time. While the
shaders (which is most of the chip area/power consumption) are
actually at 100%. On top, with many userspace apis those engines are
an internal implementation detail of a more abstract gpu device (e.g.
opengl), but with others, this is all fully exposed (like vulkan).

Plus the kernel needs to use at least copy engines for vram management
itself, and you really can't take that away. Although Kenny here has
some proposal for a separate cgroup resource just for that.

I just think it's all a bit too ill-defined, and we might be better
off nailing the memory side first and get some real world experience
on this stuff. For context, there's not even a cross-driver standard
for how priorities are handled, that's all driver-specific interfaces.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
  2019-09-06 15:29                     ` Tejun Heo
@ 2019-09-06 15:36                       ` Daniel Vetter
       [not found]                         ` <CAKMK7uFQqAMB1DbiEy-o2bzr_F25My93imNcg1Qh9DHe=uWQug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Daniel Vetter @ 2019-09-06 15:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Kenny Ho, dri-devel, Greathouse, Joseph, Alex Deucher, cgroups,
	Christian König

On Fri, Sep 6, 2019 at 5:29 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Wed, Sep 04, 2019 at 10:54:34AM +0200, Daniel Vetter wrote:
> > Anyway, I don't think reusing the drm_minor registration makes sense,
> > since we want to be on the drm_device, not on the minor. Which is a bit
> > awkward for cgroups, which wants to identify devices using major.minor
> > pairs. But I guess drm is the first subsystem where 1 device can be
> > exposed through multiple minors ...
> >
> > Tejun, any suggestions on this?
>
> I'm not extremely attached to maj:min.  It's nice in that it'd be
> consistent with blkcg but it already isn't the nicest of identifiers
> for block devices.  If using maj:min is reasonably straight forward
> for gpus even if not perfect, I'd prefer going with maj:min.
> Otherwise, please feel free to use the ID best for GPUs - hopefully
> something which is easy to understand, consistent with IDs used
> elsewhere and easy to build tooling around.

Block devices are a great example I think. How do you handle the
partitions on that? For drm we also have a main minor interface, and
then the render-only interface on drivers that support it. So if blkcg
handles that by only exposing the primary maj:min pair, I think we can
go with that and it's all nicely consistent.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each
       [not found]                         ` <CAKMK7uFQqAMB1DbiEy-o2bzr_F25My93imNcg1Qh9DHe=uWQug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-06 15:38                           ` Tejun Heo
  0 siblings, 0 replies; 89+ messages in thread
From: Tejun Heo @ 2019-09-06 15:38 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Kenny Ho, dri-devel, Greathouse, Joseph,
	Alex Deucher, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

Hello, Daniel.

On Fri, Sep 06, 2019 at 05:36:02PM +0200, Daniel Vetter wrote:
> Block devices are a great example I think. How do you handle the
> partitions on that? For drm we also have a main minor interface, and

cgroup IO controllers only distribute hardware IO capacity and are
blind to partitions.  As there's always the whole device MAJ:MIN for
block devices, we only use that.

> then the render-only interface on drivers that support it. So if blkcg
> handles that by only exposing the primary maj:min pair, I think we can
> go with that and it's all nicely consistent.

Ah yeah, that sounds equivalent.  Great.

Thanks.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found]                       ` <CAKMK7uEXP7XLFB2aFU6+E0TH_DepFRkfCoKoHwkXtjZRDyhHig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-09-06 15:45                         ` Tejun Heo
       [not found]                           ` <20190906154539.GP2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Tejun Heo @ 2019-09-06 15:45 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Kenny Ho, dri-devel, Greathouse, Joseph,
	Alex Deucher, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

Hello, Daniel.

On Fri, Sep 06, 2019 at 05:34:16PM +0200, Daniel Vetter wrote:
> > Hmm... what'd be the fundamental difference from slab or socket memory
> > which are handled through memcg?  Is system memory used by GPUs have
> > further global restrictions in addition to the amount of physical
> > memory used?
> 
> Sometimes, but that would be specific resources (kinda like vram),
> e.g. CMA regions used by a gpu. But probably not something you'll run
> in a datacenter and want cgroups for ...
> 
> I guess we could try to integrate with the memcg group controller. One
> trouble is that aside from i915 most gpu drivers do not really have a
> full shrinker, so not sure how that would all integrate.

So, while it'd great to have shrinkers in the longer term, it's not a
strict requirement to be accounted in memcg.  It already accounts a
lot of memory which isn't reclaimable (a lot of slabs and socket
buffer).

> The overall gpu memory controller would still be outside of memcg I
> think, since that would include swapped-out gpu objects, and stuff in
> special memory regions like vram.

Yeah, for resources which are on the GPU itself or hard limitations
arising from it.  In general, we wanna make cgroup controllers control
something real and concrete as in physical resources.

> > At the system level, it just gets folded into cpu time, which isn't
> > perfect but is usually a good enough approximation of compute related
> > dynamic resources.  Can gpu do someting similar or at least start with
> > that?
> 
> So generally there's a pile of engines, often of different type (e.g.
> amd hw has an entire pile of copy engines), with some ill-defined
> sharing charateristics for some (often compute/render engines use the
> same shader cores underneath), kinda like hyperthreading. So at that
> detail it's all extremely hw specific, and probably too hard to
> control in a useful way for users. And I'm not sure we can really do a
> reasonable knob for overall gpu usage, e.g. if we include all the copy
> engines, but the workloads are only running on compute engines, then
> you might only get 10% overall utilization by engine-time. While the
> shaders (which is most of the chip area/power consumption) are
> actually at 100%. On top, with many userspace apis those engines are
> an internal implementation detail of a more abstract gpu device (e.g.
> opengl), but with others, this is all fully exposed (like vulkan).
> 
> Plus the kernel needs to use at least copy engines for vram management
> itself, and you really can't take that away. Although Kenny here has
> some proposal for a separate cgroup resource just for that.
> 
> I just think it's all a bit too ill-defined, and we might be better
> off nailing the memory side first and get some real world experience
> on this stuff. For context, there's not even a cross-driver standard
> for how priorities are handled, that's all driver-specific interfaces.

I see.  Yeah, figuring it out as this develops makes sense to me.  One
thing I wanna raise is that in general we don't want to expose device
or implementation details in cgroup interface.  What we want expressed
there is the intentions of the user.  The more internal details we
expose the more we end up getting tied down to the specific
implementation which we should avoid especially given the early stage
of development.

Thanks.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found]                           ` <20190906154539.GP2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
@ 2019-09-10 11:54                             ` Michal Hocko
       [not found]                               ` <20190910115448.GT2063-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2019-09-17 12:21                               ` Daniel Vetter
  0 siblings, 2 replies; 89+ messages in thread
From: Michal Hocko @ 2019-09-10 11:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix,
	jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc, Kenny Ho,
	dri-devel, Daniel Vetter, Alex Deucher,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Fri 06-09-19 08:45:39, Tejun Heo wrote:
> Hello, Daniel.
> 
> On Fri, Sep 06, 2019 at 05:34:16PM +0200, Daniel Vetter wrote:
> > > Hmm... what'd be the fundamental difference from slab or socket memory
> > > which are handled through memcg?  Is system memory used by GPUs have
> > > further global restrictions in addition to the amount of physical
> > > memory used?
> > 
> > Sometimes, but that would be specific resources (kinda like vram),
> > e.g. CMA regions used by a gpu. But probably not something you'll run
> > in a datacenter and want cgroups for ...
> > 
> > I guess we could try to integrate with the memcg group controller. One
> > trouble is that aside from i915 most gpu drivers do not really have a
> > full shrinker, so not sure how that would all integrate.
> 
> So, while it'd great to have shrinkers in the longer term, it's not a
> strict requirement to be accounted in memcg.  It already accounts a
> lot of memory which isn't reclaimable (a lot of slabs and socket
> buffer).

Yeah, having a shrinker is preferred but the memory should better be
reclaimable in some form. If not by any other means then at least bound
to a user process context so that it goes away with a task being killed
by the OOM killer. If that is not the case then we cannot really charge
it because then the memcg controller is of no use. We can tolerate it to
some degree if the amount of memory charged like that is negligible to
the overall size. But from the discussion it seems that these buffers
are really large.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
       [not found]                               ` <20190910115448.GT2063-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2019-09-10 16:03                                 ` Tejun Heo
  2019-09-10 17:25                                   ` Michal Hocko
  0 siblings, 1 reply; 89+ messages in thread
From: Tejun Heo @ 2019-09-10 16:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix,
	jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc, Kenny Ho,
	dri-devel, Daniel Vetter, Alex Deucher,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

Hello, Michal.

On Tue, Sep 10, 2019 at 01:54:48PM +0200, Michal Hocko wrote:
> > So, while it'd great to have shrinkers in the longer term, it's not a
> > strict requirement to be accounted in memcg.  It already accounts a
> > lot of memory which isn't reclaimable (a lot of slabs and socket
> > buffer).
> 
> Yeah, having a shrinker is preferred but the memory should better be
> reclaimable in some form. If not by any other means then at least bound
> to a user process context so that it goes away with a task being killed
> by the OOM killer. If that is not the case then we cannot really charge
> it because then the memcg controller is of no use. We can tolerate it to
> some degree if the amount of memory charged like that is negligible to
> the overall size. But from the discussion it seems that these buffers
> are really large.

Yeah, oom kills should be able to reduce the usage; however, please
note that tmpfs, among other things, can already escape this
restriction and we can have cgroups which are over max and empty.
It's obviously not ideal but the system doesn't melt down from it
either.

Thanks.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
  2019-09-10 16:03                                 ` Tejun Heo
@ 2019-09-10 17:25                                   ` Michal Hocko
  0 siblings, 0 replies; 89+ messages in thread
From: Michal Hocko @ 2019-09-10 17:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix, jsparks,
	amd-gfx list, lkaplan, Kenny Ho, dri-devel, Alex Deucher,
	cgroups, Christian König

On Tue 10-09-19 09:03:29, Tejun Heo wrote:
> Hello, Michal.
> 
> On Tue, Sep 10, 2019 at 01:54:48PM +0200, Michal Hocko wrote:
> > > So, while it'd great to have shrinkers in the longer term, it's not a
> > > strict requirement to be accounted in memcg.  It already accounts a
> > > lot of memory which isn't reclaimable (a lot of slabs and socket
> > > buffer).
> > 
> > Yeah, having a shrinker is preferred but the memory should better be
> > reclaimable in some form. If not by any other means then at least bound
> > to a user process context so that it goes away with a task being killed
> > by the OOM killer. If that is not the case then we cannot really charge
> > it because then the memcg controller is of no use. We can tolerate it to
> > some degree if the amount of memory charged like that is negligible to
> > the overall size. But from the discussion it seems that these buffers
> > are really large.
> 
> Yeah, oom kills should be able to reduce the usage; however, please
> note that tmpfs, among other things, can already escape this
> restriction and we can have cgroups which are over max and empty.
> It's obviously not ideal but the system doesn't melt down from it
> either.

Right, and that is a reason why an access to tmpfs should be restricted
when containing a workload by memcg. My understanding of this particular
feature is that memcg should be the primary containment method and
that's why I brought this up.

-- 
Michal Hocko
SUSE Labs
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
  2019-09-10 11:54                             ` Michal Hocko
       [not found]                               ` <20190910115448.GT2063-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2019-09-17 12:21                               ` Daniel Vetter
  1 sibling, 0 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-09-17 12:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix, jsparks,
	dri-devel, lkaplan, Alex Deucher, Kenny Ho, amd-gfx list,
	Tejun Heo, cgroups, Christian König

On Tue, Sep 10, 2019 at 01:54:48PM +0200, Michal Hocko wrote:
> On Fri 06-09-19 08:45:39, Tejun Heo wrote:
> > Hello, Daniel.
> > 
> > On Fri, Sep 06, 2019 at 05:34:16PM +0200, Daniel Vetter wrote:
> > > > Hmm... what'd be the fundamental difference from slab or socket memory
> > > > which are handled through memcg?  Is system memory used by GPUs have
> > > > further global restrictions in addition to the amount of physical
> > > > memory used?
> > > 
> > > Sometimes, but that would be specific resources (kinda like vram),
> > > e.g. CMA regions used by a gpu. But probably not something you'll run
> > > in a datacenter and want cgroups for ...
> > > 
> > > I guess we could try to integrate with the memcg group controller. One
> > > trouble is that aside from i915 most gpu drivers do not really have a
> > > full shrinker, so not sure how that would all integrate.
> > 
> > So, while it'd great to have shrinkers in the longer term, it's not a
> > strict requirement to be accounted in memcg.  It already accounts a
> > lot of memory which isn't reclaimable (a lot of slabs and socket
> > buffer).
> 
> Yeah, having a shrinker is preferred but the memory should better be
> reclaimable in some form. If not by any other means then at least bound
> to a user process context so that it goes away with a task being killed
> by the OOM killer. If that is not the case then we cannot really charge
> it because then the memcg controller is of no use. We can tolerate it to
> some degree if the amount of memory charged like that is negligible to
> the overall size. But from the discussion it seems that these buffers
> are really large.

I think we can just make "must have a shrinker" as a requirement for
system memory cgroup thing for gpu buffers. There's mild locking inversion
fun to be had when typing one, but I think the problem is well-understood
enough that this isn't a huge hurdle to climb over. And should give admins
an easier to mange system, since it works more like what they know
already.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit
       [not found]   ` <20190829060533.32315-8-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-01 14:30     ` Michal Koutný
  2019-11-29  7:18         ` Kenny Ho
  0 siblings, 1 reply; 89+ messages in thread
From: Michal Koutný @ 2019-10-01 14:30 UTC (permalink / raw)
  To: Kenny Ho
  Cc: daniel-/w4YWyX8dFk, felix.kuehling-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, tj-DgEjT+Ai2ygdnm+yROfE0A,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, alexander.deucher-5C7GfCeVMHo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo


[-- Attachment #1.1: Type: text/plain, Size: 1801 bytes --]

Hello.

On Thu, Aug 29, 2019 at 02:05:24AM -0400, Kenny Ho <Kenny.Ho-5C7GfCeVMHo@public.gmane.org> wrote:
> drm.buffer.default
>         A read-only flat-keyed file which exists on the root cgroup.
>         Each entry is keyed by the drm device's major:minor.
> 
>         Default limits on the total GEM buffer allocation in bytes.
What is the purpose of this attribute (and alikes for other resources)?
I can't see it being set differently but S64_MAX in
drmcg_device_early_init.

> +static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> [...]
> +		switch (type) {
> +		case DRMCG_TYPE_BO_TOTAL:
> +			p_max = parent == NULL ? S64_MAX :
> +				parent->dev_resources[minor]->
> +				bo_limits_total_allocated;
> +
> +			rc = drmcg_process_limit_s64_val(sattr, true,
> +				props->bo_limits_total_allocated_default,
> +				p_max,
> +				&val);
IIUC, this allows initiating the particular limit value based either on
parent or the default per-device value. This is alas rather an
antipattern. The most stringent limit on the path from a cgroup to the
root should be applied at the charging time. However, the child should
not inherit the verbatim value from the parent (may race with parent and
it won't be updated upon parent change).
You already do the appropriate hierarchical check in
drmcg_try_chb_bo_alloc, so the parent propagation could be simply
dropped if I'm not mistaken.


Also, I can't find how the read of
parent->dev_resources[minor]->bo_limits_total_allocated and its
concurrent update are synchronized (i.e. someone writing
buffer.total.max for parent and child in parallel). (It may just my
oversight.)

I'm posting this to the buffer knobs patch but similar applies to lgpu
resource controls as well.

HTH,
Michal

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 11/16] drm, cgroup: Add per cgroup bw measure and control
       [not found]     ` <20190829060533.32315-12-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-01 14:30       ` Michal Koutný
  0 siblings, 0 replies; 89+ messages in thread
From: Michal Koutný @ 2019-10-01 14:30 UTC (permalink / raw)
  To: Kenny Ho
  Cc: daniel-/w4YWyX8dFk, felix.kuehling-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, tj-DgEjT+Ai2ygdnm+yROfE0A,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, alexander.deucher-5C7GfCeVMHo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo


[-- Attachment #1.1: Type: text/plain, Size: 694 bytes --]

Hi.

On Thu, Aug 29, 2019 at 02:05:28AM -0400, Kenny Ho <Kenny.Ho-5C7GfCeVMHo@public.gmane.org> wrote:
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1256,6 +1257,12 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
> [...]
> +		move_delay /= 2000; /* check every half period in ms*/
> [...]
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> [...]
> @@ -382,6 +548,25 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> [...]
> +			if (rc || val < 2000) {
This just caught my eye and it may be simply caused by RFC-ness of the
series but I'd suggest turning this into a constant with descriptive
name.

My 2 cents,
Michal

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
       [not found]   ` <20190829060533.32315-3-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-01 14:31     ` Michal Koutný
       [not found]       ` <20191001143106.GA4749-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Michal Koutný @ 2019-10-01 14:31 UTC (permalink / raw)
  To: Kenny Ho
  Cc: daniel-/w4YWyX8dFk, felix.kuehling-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, tj-DgEjT+Ai2ygdnm+yROfE0A,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, alexander.deucher-5C7GfCeVMHo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo


[-- Attachment #1.1: Type: text/plain, Size: 495 bytes --]

Hi.

On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho-5C7GfCeVMHo@public.gmane.org> wrote:
> +struct cgroup_subsys drm_cgrp_subsys = {
> +	.css_alloc	= drmcg_css_alloc,
> +	.css_free	= drmcg_css_free,
> +	.early_init	= false,
> +	.legacy_cftypes	= files,
Do you really want to expose the DRM controller on v1 hierarchies (where
threads of one process can be in different cgroups, or children cgroups
compete with their parents)?

> +	.dfl_cftypes	= files,
> +};

Just asking,
Michal

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
       [not found]   ` <20190829060533.32315-15-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-08 18:53     ` Kuehling, Felix
       [not found]       ` <b3d2b3c1-8854-10ca-3e39-b3bef35bdfa9-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Kuehling, Felix @ 2019-10-08 18:53 UTC (permalink / raw)
  To: Ho, Kenny, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, Deucher, Alexander, Koenig, Christian,
	Greathouse, Joseph, jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc,
	daniel-/w4YWyX8dFk

On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> drm.lgpu
>          A read-write nested-keyed file which exists on all cgroups.
>          Each entry is keyed by the DRM device's major:minor.
>
>          lgpu stands for logical GPU, it is an abstraction used to
>          subdivide a physical DRM device for the purpose of resource
>          management.
>
>          The lgpu is a discrete quantity that is device specific (i.e.
>          some DRM devices may have 64 lgpus while others may have 100
>          lgpus.)  The lgpu is a single quantity with two representations
>          denoted by the following nested keys.
>
>            =====     ========================================
>            count     Representing lgpu as anonymous resource
>            list      Representing lgpu as named resource
>            =====     ========================================
>
>          For example:
>          226:0 count=256 list=0-255
>          226:1 count=4 list=0,2,4,6
>          226:2 count=32 list=32-63
>
>          lgpu is represented by a bitmap and uses the bitmap_parselist
>          kernel function so the list key input format is a
>          comma-separated list of decimal numbers and ranges.
>
>          Consecutively set bits are shown as two hyphen-separated decimal
>          numbers, the smallest and largest bit numbers set in the range.
>          Optionally each range can be postfixed to denote that only parts
>          of it should be set.  The range will divided to groups of
>          specific size.
>          Syntax: range:used_size/group_size
>          Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>
>          The count key is the hamming weight / hweight of the bitmap.
>
>          Both count and list accept the max and default keywords.
>
>          Some DRM devices may only support lgpu as anonymous resources.
>          In such case, the significance of the position of the set bits
>          in list will be ignored.
>
>          This lgpu resource supports the 'allocation' resource
>          distribution model.
>
> Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>

The description sounds reasonable to me and maps well to the CU masking 
feature in our GPUs.

It would also allow us to do more coarse-grained masking for example to 
guarantee balanced allocation of CUs across shader engines or 
partitioning of memory bandwidth or CP pipes (if that is supported by 
the hardware/firmware).

I can't comment on the code as I'm unfamiliar with the details of the 
cgroup code.

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
>   include/drm/drm_cgroup.h                |   4 +
>   include/linux/cgroup_drm.h              |   6 ++
>   kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
>   4 files changed, 191 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 87a195133eaa..57f18469bd76 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1958,6 +1958,52 @@ DRM Interface Files
>   	Set largest allocation for /dev/dri/card1 to 4MB
>   	echo "226:1 4m" > drm.buffer.peak.max
>   
> +  drm.lgpu
> +	A read-write nested-keyed file which exists on all cgroups.
> +	Each entry is keyed by the DRM device's major:minor.
> +
> +	lgpu stands for logical GPU, it is an abstraction used to
> +	subdivide a physical DRM device for the purpose of resource
> +	management.
> +
> +	The lgpu is a discrete quantity that is device specific (i.e.
> +	some DRM devices may have 64 lgpus while others may have 100
> +	lgpus.)  The lgpu is a single quantity with two representations
> +	denoted by the following nested keys.
> +
> +	  =====     ========================================
> +	  count     Representing lgpu as anonymous resource
> +	  list      Representing lgpu as named resource
> +	  =====     ========================================
> +
> +	For example:
> +	226:0 count=256 list=0-255
> +	226:1 count=4 list=0,2,4,6
> +	226:2 count=32 list=32-63
> +
> +	lgpu is represented by a bitmap and uses the bitmap_parselist
> +	kernel function so the list key input format is a
> +	comma-separated list of decimal numbers and ranges.
> +
> +	Consecutively set bits are shown as two hyphen-separated decimal
> +	numbers, the smallest and largest bit numbers set in the range.
> +	Optionally each range can be postfixed to denote that only parts
> +	of it should be set.  The range will divided to groups of
> +	specific size.
> +	Syntax: range:used_size/group_size
> +	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> +
> +	The count key is the hamming weight / hweight of the bitmap.
> +
> +	Both count and list accept the max and default keywords.
> +
> +	Some DRM devices may only support lgpu as anonymous resources.
> +	In such case, the significance of the position of the set bits
> +	in list will be ignored.
> +
> +	This lgpu resource supports the 'allocation' resource
> +	distribution model.
> +
>   GEM Buffer Ownership
>   ~~~~~~~~~~~~~~~~~~~~
>   
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> index 6d9707e1eb72..a8d6be0b075b 100644
> --- a/include/drm/drm_cgroup.h
> +++ b/include/drm/drm_cgroup.h
> @@ -6,6 +6,7 @@
>   
>   #include <linux/cgroup_drm.h>
>   #include <linux/workqueue.h>
> +#include <linux/types.h>
>   #include <drm/ttm/ttm_bo_api.h>
>   #include <drm/ttm/ttm_bo_driver.h>
>   
> @@ -28,6 +29,9 @@ struct drmcg_props {
>   	s64			mem_highs_default[TTM_PL_PRIV+1];
>   
>   	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
> +
> +	int			lgpu_capacity;
> +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>   };
>   
>   #ifdef CONFIG_CGROUP_DRM
> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> index c56cfe74d1a6..7b1cfc4ce4c3 100644
> --- a/include/linux/cgroup_drm.h
> +++ b/include/linux/cgroup_drm.h
> @@ -14,6 +14,8 @@
>   /* limit defined per the way drm_minor_alloc operates */
>   #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
>   
> +#define MAX_DRMCG_LGPU_CAPACITY 256
> +
>   enum drmcg_mem_bw_attr {
>   	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
>   	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
> @@ -32,6 +34,7 @@ enum drmcg_res_type {
>   	DRMCG_TYPE_MEM_PEAK,
>   	DRMCG_TYPE_BANDWIDTH,
>   	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
> +	DRMCG_TYPE_LGPU,
>   	__DRMCG_TYPE_LAST,
>   };
>   
> @@ -58,6 +61,9 @@ struct drmcg_device_resource {
>   	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
>   	s64			mem_bw_limits_bytes_in_period;
>   	s64			mem_bw_limits_avg_bytes_per_us;
> +
> +	s64			lgpu_used;
> +	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>   };
>   
>   /**
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 0ea7f0619e25..18c4368e2c29 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -9,6 +9,7 @@
>   #include <linux/cgroup_drm.h>
>   #include <linux/ktime.h>
>   #include <linux/kernel.h>
> +#include <linux/bitmap.h>
>   #include <drm/drm_file.h>
>   #include <drm/drm_drv.h>
>   #include <drm/ttm/ttm_bo_api.h>
> @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
>   #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
>   #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
>   
> +#define LGPU_LIMITS_NAME_LIST "list"
> +#define LGPU_LIMITS_NAME_COUNT "count"
> +
>   static struct drmcg *root_drmcg __read_mostly;
>   
>   static int drmcg_css_free_fn(int id, void *ptr, void *data)
> @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
>   	for (i = 0; i <= TTM_PL_PRIV; i++)
>   		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
>   
> +	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
> +			MAX_DRMCG_LGPU_CAPACITY);
> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> +
>   	mutex_unlock(&dev->drmcg_mutex);
>   	return 0;
>   }
> @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
>   				MEM_BW_LIMITS_NAME_AVG,
>   				ddr->mem_bw_limits_avg_bytes_per_us);
>   		break;
> +	case DRMCG_TYPE_LGPU:
> +		seq_printf(sf, "%s=%lld %s=%*pbl\n",
> +				LGPU_LIMITS_NAME_COUNT,
> +				ddr->lgpu_used,
> +				LGPU_LIMITS_NAME_LIST,
> +				dev->drmcg_props.lgpu_capacity,
> +				ddr->lgpu_allocated);
> +		break;
>   	default:
>   		seq_puts(sf, "\n");
>   		break;
> @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
>   				MEM_BW_LIMITS_NAME_AVG,
>   				props->mem_bw_avg_bytes_per_us_default);
>   		break;
> +	case DRMCG_TYPE_LGPU:
> +		seq_printf(sf, "%s=%d %s=%*pbl\n",
> +				LGPU_LIMITS_NAME_COUNT,
> +				bitmap_weight(props->lgpu_slots,
> +					props->lgpu_capacity),
> +				LGPU_LIMITS_NAME_LIST,
> +				props->lgpu_capacity,
> +				props->lgpu_slots);
> +		break;
>   	default:
>   		seq_puts(sf, "\n");
>   		break;
> @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
>   	mutex_unlock(&dev->drmcg_mutex);
>   }
>   
> +static void drmcg_lgpu_values_apply(struct drm_device *dev,
> +		struct drmcg_device_resource *ddr, unsigned long *val)
> +{
> +
> +	mutex_lock(&dev->drmcg_mutex);
> +	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> +	mutex_unlock(&dev->drmcg_mutex);
> +}
> +
>   static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>   		struct drm_device *dev, char *attrs)
>   {
> +	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> +	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>   	enum drmcg_res_type type =
>   		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
>   	struct drmcg *drmcg = css_to_drmcg(of_css(of));
> @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>   				continue;
>   			}
>   			break; /* DRMCG_TYPE_MEM */
> +		case DRMCG_TYPE_LGPU:
> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> +				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
> +				continue;
> +
> +                        if (!strcmp("max", sval) ||
> +					!strcmp("default", sval)) {
> +				if (parent != NULL)
> +					drmcg_lgpu_values_apply(dev, ddr,
> +						parent->dev_resources[minor]->
> +						lgpu_allocated);
> +				else
> +					drmcg_lgpu_values_apply(dev, ddr,
> +						props->lgpu_slots);
> +
> +				continue;
> +			}
> +
> +			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> +				p_max = parent == NULL ? props->lgpu_capacity:
> +					bitmap_weight(
> +					parent->dev_resources[minor]->
> +					lgpu_allocated, props->lgpu_capacity);
> +
> +				rc = drmcg_process_limit_s64_val(sval,
> +					false, p_max, p_max, &val);
> +
> +				if (rc || val < 0) {
> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> +							minor);
> +					continue;
> +				}
> +
> +				bitmap_zero(tmp_bitmap,
> +						MAX_DRMCG_LGPU_CAPACITY);
> +				bitmap_set(tmp_bitmap, 0, val);
> +			}
> +
> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> +				rc = bitmap_parselist(sval, tmp_bitmap,
> +						MAX_DRMCG_LGPU_CAPACITY);
> +
> +				if (rc) {
> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> +							minor);
> +					continue;
> +				}
> +
> +                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
> +					props->lgpu_slots,
> +					MAX_DRMCG_LGPU_CAPACITY);
> +
> +                        	if (!bitmap_empty(chk_bitmap,
> +						MAX_DRMCG_LGPU_CAPACITY)) {
> +					drmcg_pr_cft_err(drmcg, 0, cft_name,
> +							minor);
> +					continue;
> +				}
> +			}
> +
> +
> +                        if (parent != NULL) {
> +				bitmap_and(chk_bitmap, tmp_bitmap,
> +				parent->dev_resources[minor]->lgpu_allocated,
> +				props->lgpu_capacity);
> +
> +				if (bitmap_empty(chk_bitmap,
> +						props->lgpu_capacity)) {
> +					drmcg_pr_cft_err(drmcg, 0,
> +							cft_name, minor);
> +					continue;
> +				}
> +			}
> +
> +			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
> +
> +			break; /* DRMCG_TYPE_LGPU */
>   		default:
>   			break;
>   		} /* switch (type) */
> @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
>   			break;
>   		case DRMCG_TYPE_BANDWIDTH:
>   		case DRMCG_TYPE_MEM:
> +		case DRMCG_TYPE_LGPU:
>   			drmcg_nested_limit_parse(of, dm->dev, sattr);
>   			break;
>   		default:
> @@ -731,6 +846,20 @@ struct cftype files[] = {
>   		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
>   						DRMCG_FTYPE_DEFAULT),
>   	},
> +	{
> +		.name = "lgpu",
> +		.seq_show = drmcg_seq_show,
> +		.write = drmcg_limit_write,
> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> +						DRMCG_FTYPE_LIMIT),
> +	},
> +	{
> +		.name = "lgpu.default",
> +		.seq_show = drmcg_seq_show,
> +		.flags = CFTYPE_ONLY_ON_ROOT,
> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> +						DRMCG_FTYPE_DEFAULT),
> +	},
>   	{ }	/* terminate */
>   };
>   
> @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
>   
>   static inline void drmcg_update_cg_tree(struct drm_device *dev)
>   {
> +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> +        bitmap_fill(dev->drmcg_props.lgpu_slots,
> +			dev->drmcg_props.lgpu_capacity);
> +
>   	/* init cgroups created before registration (i.e. root cgroup) */
>   	if (root_drmcg != NULL) {
>   		struct cgroup_subsys_state *pos;
> @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
>   	for (i = 0; i <= TTM_PL_PRIV; i++)
>   		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
>   
> +	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> +
>   	drmcg_update_cg_tree(dev);
>   }
>   EXPORT_SYMBOL(drmcg_device_early_init);
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup
       [not found]   ` <20190829060533.32315-17-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-08 19:11       ` Kuehling, Felix
  0 siblings, 0 replies; 89+ messages in thread
From: Kuehling, Felix @ 2019-10-08 19:11 UTC (permalink / raw)
  To: Ho, Kenny, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, Deucher, Alexander, Koenig, Christian,
	Greathouse, Joseph, jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc,
	daniel-/w4YWyX8dFk

On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> The number of logical gpu (lgpu) is defined to be the number of compute
> unit (CU) for a device.  The lgpu allocation limit only applies to
> compute workload for the moment (enforced via kfd queue creation.)  Any
> cu_mask update is validated against the availability of the compute unit
> as defined by the drmcg the kfd process belongs to.

There is something missing here. There is an API for the application to 
specify a CU mask. Right now it looks like the application-specified and 
CGroup-specified CU masks would clobber each other. Instead the two 
should be merged.

The CGroup-specified mask should specify a subset of CUs available for 
application-specified CU masks. When the cgroup CU mask changes, you'd 
need to take any application-specified CU masks into account before 
updating the hardware.

The KFD topology APIs report the number of available CUs to the 
application. CGroups would change that number at runtime and 
applications would not expect that. I think the best way to deal with 
that would be to have multiple bits in the application-specified CU mask 
map to the same CU. How to do that in a fair way is not obvious. I guess 
a more coarse-grain division of the GPU into LGPUs would make this 
somewhat easier.

How is this problem handled for CPU cores and the interaction with CPU 
pthread_setaffinity_np?

Regards,
   Felix


>
> Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  21 +++
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   6 +
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +
>   .../amd/amdkfd/kfd_process_queue_manager.c    | 140 ++++++++++++++++++
>   5 files changed, 174 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 55cb1b2094fd..369915337213 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
>   		valid;							\
>   	})
>   
> +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> +		struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
> +		unsigned int nbits);
> +
>   /* GPUVM API */
>   int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
>   					void **vm, void **process_info,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 163a4fbf0611..8abeffdd2e5b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
>   static void amdgpu_drmcg_custom_init(struct drm_device *dev,
>   	struct drmcg_props *props)
>   {
> +	struct amdgpu_device *adev = dev->dev_private;
> +
> +	props->lgpu_capacity = adev->gfx.cu_info.number;
> +
>   	props->limit_enforced = true;
>   }
>   
> +static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
> +		struct task_struct *task, struct drmcg_device_resource *ddr,
> +		enum drmcg_res_type res_type)
> +{
> +	struct amdgpu_device *adev = dev->dev_private;
> +
> +	switch (res_type) {
> +	case DRMCG_TYPE_LGPU:
> +		amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
> +                        ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
>   static struct drm_driver kms_driver = {
>   	.driver_features =
>   	    DRIVER_USE_AGP | DRIVER_ATOMIC |
> @@ -1438,6 +1458,7 @@ static struct drm_driver kms_driver = {
>   	.gem_prime_mmap = amdgpu_gem_prime_mmap,
>   
>   	.drmcg_custom_init = amdgpu_drmcg_custom_init,
> +	.drmcg_limit_updated = amdgpu_drmcg_limit_updated,
>   
>   	.name = DRIVER_NAME,
>   	.desc = DRIVER_DESC,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 138c70454e2b..fa765b803f97 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
>   		return -EFAULT;
>   	}
>   
> +	if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, cu_mask_size)) {
> +		pr_debug("CU mask not permitted by DRM Cgroup");
> +		kfree(properties.cu_mask);
> +		return -EACCES;
> +	}
> +
>   	mutex_lock(&p->mutex);
>   
>   	retval = pqm_set_cu_mask(&p->pqm, args->queue_id, &properties);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 8b0eee5b3521..88881bec7550 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
>   		       u32 *ctl_stack_used_size,
>   		       u32 *save_area_used_size);
>   
> +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> +		unsigned int cu_mask_size);
> +
>   int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
>   				unsigned int fence_value,
>   				unsigned int timeout_ms);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 7e6c3ee82f5b..a896de290307 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -23,9 +23,11 @@
>   
>   #include <linux/slab.h>
>   #include <linux/list.h>
> +#include <linux/cgroup_drm.h>
>   #include "kfd_device_queue_manager.h"
>   #include "kfd_priv.h"
>   #include "kfd_kernel_queue.h"
> +#include "amdgpu.h"
>   #include "amdgpu_amdkfd.h"
>   
>   static inline struct process_queue_node *get_queue_by_qid(
> @@ -167,6 +169,7 @@ static int create_cp_queue(struct process_queue_manager *pqm,
>   				struct queue_properties *q_properties,
>   				struct file *f, unsigned int qid)
>   {
> +	struct drmcg *drmcg;
>   	int retval;
>   
>   	/* Doorbell initialized in user space*/
> @@ -180,6 +183,36 @@ static int create_cp_queue(struct process_queue_manager *pqm,
>   	if (retval != 0)
>   		return retval;
>   
> +
> +	drmcg = drmcg_get(pqm->process->lead_thread);
> +	if (drmcg) {
> +		struct amdgpu_device *adev;
> +		struct drmcg_device_resource *ddr;
> +		int mask_size;
> +		u32 *mask;
> +
> +		adev = (struct amdgpu_device *) dev->kgd;
> +
> +		mask_size = adev->ddev->drmcg_props.lgpu_capacity;
> +		mask = kzalloc(sizeof(u32) * round_up(mask_size, 32),
> +				GFP_KERNEL);
> +
> +		if (!mask) {
> +			drmcg_put(drmcg);
> +			uninit_queue(*q);
> +			return -ENOMEM;
> +		}
> +
> +		ddr = drmcg->dev_resources[adev->ddev->primary->index];
> +
> +		bitmap_to_arr32(mask, ddr->lgpu_allocated, mask_size);
> +
> +		(*q)->properties.cu_mask_count = mask_size;
> +		(*q)->properties.cu_mask = mask;
> +
> +		drmcg_put(drmcg);
> +	}
> +
>   	(*q)->device = dev;
>   	(*q)->process = pqm->process;
>   
> @@ -495,6 +528,113 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
>   						       save_area_used_size);
>   }
>   
> +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> +		unsigned int cu_mask_size)
> +{
> +	DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY);
> +	struct drmcg_device_resource *ddr;
> +	struct process_queue_node *pqn;
> +	struct amdgpu_device *adev;
> +	struct drmcg *drmcg;
> +	bool result;
> +
> +	if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY)
> +		return false;
> +
> +	bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size);
> +
> +	pqn = get_queue_by_qid(&p->pqm, qid);
> +	if (!pqn)
> +		return false;
> +
> +	adev = (struct amdgpu_device *)pqn->q->device->kgd;
> +
> +	drmcg = drmcg_get(p->lead_thread);
> +	ddr = drmcg->dev_resources[adev->ddev->primary->index];
> +
> +	if (bitmap_subset(curr_mask, ddr->lgpu_allocated,
> +				MAX_DRMCG_LGPU_CAPACITY))
> +		result = true;
> +	else
> +		result = false;
> +
> +	drmcg_put(drmcg);
> +
> +	return result;
> +}
> +
> +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> +		struct amdgpu_device *adev, unsigned long *lgpu_bm,
> +		unsigned int lgpu_bm_size)
> +{
> +	struct kfd_dev *kdev = adev->kfd.dev;
> +	struct process_queue_node *pqn;
> +	struct kfd_process *kfdproc;
> +	size_t size_in_bytes;
> +	u32 *cu_mask;
> +	int rc = 0;
> +
> +	if ((lgpu_bm_size % 32) != 0) {
> +		pr_warn("lgpu_bm_size %d must be a multiple of 32",
> +				lgpu_bm_size);
> +		return -EINVAL;
> +	}
> +
> +	kfdproc = kfd_get_process(task);
> +
> +	if (IS_ERR(kfdproc))
> +		return -ESRCH;
> +
> +	size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32);
> +
> +	mutex_lock(&kfdproc->mutex);
> +	list_for_each_entry(pqn, &kfdproc->pqm.queues, process_queue_list) {
> +		if (pqn->q && pqn->q->device == kdev) {
> +			/* update cu_mask accordingly */
> +			cu_mask = kzalloc(size_in_bytes, GFP_KERNEL);
> +			if (!cu_mask) {
> +				rc = -ENOMEM;
> +				break;
> +			}
> +
> +			if (pqn->q->properties.cu_mask) {
> +				DECLARE_BITMAP(curr_mask,
> +						MAX_DRMCG_LGPU_CAPACITY);
> +
> +				if (pqn->q->properties.cu_mask_count >
> +						lgpu_bm_size) {
> +					rc = -EINVAL;
> +					kfree(cu_mask);
> +					break;
> +				}
> +
> +				bitmap_from_arr32(curr_mask,
> +						pqn->q->properties.cu_mask,
> +						pqn->q->properties.cu_mask_count);
> +
> +				bitmap_and(curr_mask, curr_mask, lgpu_bm,
> +						lgpu_bm_size);
> +
> +				bitmap_to_arr32(cu_mask, curr_mask,
> +						lgpu_bm_size);
> +
> +				kfree(curr_mask);
> +			} else
> +				bitmap_to_arr32(cu_mask, lgpu_bm,
> +						lgpu_bm_size);
> +
> +			pqn->q->properties.cu_mask = cu_mask;
> +			pqn->q->properties.cu_mask_count = lgpu_bm_size;
> +
> +			rc = pqn->q->device->dqm->ops.update_queue(
> +					pqn->q->device->dqm, pqn->q);
> +		}
> +	}
> +	mutex_unlock(&kfdproc->mutex);
> +
> +	return rc;
> +}
> +
>   #if defined(CONFIG_DEBUG_FS)
>   
>   int pqm_debugfs_mqds(struct seq_file *m, void *data)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup
@ 2019-10-08 19:11       ` Kuehling, Felix
  0 siblings, 0 replies; 89+ messages in thread
From: Kuehling, Felix @ 2019-10-08 19:11 UTC (permalink / raw)
  To: Ho, Kenny, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, Deucher, Alexander, Koenig, Christian,
	Greathouse, Joseph, jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc,
	daniel-/w4YWyX8dFk

On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> The number of logical gpu (lgpu) is defined to be the number of compute
> unit (CU) for a device.  The lgpu allocation limit only applies to
> compute workload for the moment (enforced via kfd queue creation.)  Any
> cu_mask update is validated against the availability of the compute unit
> as defined by the drmcg the kfd process belongs to.

There is something missing here. There is an API for the application to 
specify a CU mask. Right now it looks like the application-specified and 
CGroup-specified CU masks would clobber each other. Instead the two 
should be merged.

The CGroup-specified mask should specify a subset of CUs available for 
application-specified CU masks. When the cgroup CU mask changes, you'd 
need to take any application-specified CU masks into account before 
updating the hardware.

The KFD topology APIs report the number of available CUs to the 
application. CGroups would change that number at runtime and 
applications would not expect that. I think the best way to deal with 
that would be to have multiple bits in the application-specified CU mask 
map to the same CU. How to do that in a fair way is not obvious. I guess 
a more coarse-grain division of the GPU into LGPUs would make this 
somewhat easier.

How is this problem handled for CPU cores and the interaction with CPU 
pthread_setaffinity_np?

Regards,
   Felix


>
> Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  21 +++
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   6 +
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +
>   .../amd/amdkfd/kfd_process_queue_manager.c    | 140 ++++++++++++++++++
>   5 files changed, 174 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 55cb1b2094fd..369915337213 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
>   		valid;							\
>   	})
>   
> +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> +		struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
> +		unsigned int nbits);
> +
>   /* GPUVM API */
>   int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
>   					void **vm, void **process_info,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 163a4fbf0611..8abeffdd2e5b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
>   static void amdgpu_drmcg_custom_init(struct drm_device *dev,
>   	struct drmcg_props *props)
>   {
> +	struct amdgpu_device *adev = dev->dev_private;
> +
> +	props->lgpu_capacity = adev->gfx.cu_info.number;
> +
>   	props->limit_enforced = true;
>   }
>   
> +static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
> +		struct task_struct *task, struct drmcg_device_resource *ddr,
> +		enum drmcg_res_type res_type)
> +{
> +	struct amdgpu_device *adev = dev->dev_private;
> +
> +	switch (res_type) {
> +	case DRMCG_TYPE_LGPU:
> +		amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
> +                        ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
>   static struct drm_driver kms_driver = {
>   	.driver_features =
>   	    DRIVER_USE_AGP | DRIVER_ATOMIC |
> @@ -1438,6 +1458,7 @@ static struct drm_driver kms_driver = {
>   	.gem_prime_mmap = amdgpu_gem_prime_mmap,
>   
>   	.drmcg_custom_init = amdgpu_drmcg_custom_init,
> +	.drmcg_limit_updated = amdgpu_drmcg_limit_updated,
>   
>   	.name = DRIVER_NAME,
>   	.desc = DRIVER_DESC,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 138c70454e2b..fa765b803f97 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
>   		return -EFAULT;
>   	}
>   
> +	if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, cu_mask_size)) {
> +		pr_debug("CU mask not permitted by DRM Cgroup");
> +		kfree(properties.cu_mask);
> +		return -EACCES;
> +	}
> +
>   	mutex_lock(&p->mutex);
>   
>   	retval = pqm_set_cu_mask(&p->pqm, args->queue_id, &properties);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 8b0eee5b3521..88881bec7550 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
>   		       u32 *ctl_stack_used_size,
>   		       u32 *save_area_used_size);
>   
> +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> +		unsigned int cu_mask_size);
> +
>   int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
>   				unsigned int fence_value,
>   				unsigned int timeout_ms);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 7e6c3ee82f5b..a896de290307 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -23,9 +23,11 @@
>   
>   #include <linux/slab.h>
>   #include <linux/list.h>
> +#include <linux/cgroup_drm.h>
>   #include "kfd_device_queue_manager.h"
>   #include "kfd_priv.h"
>   #include "kfd_kernel_queue.h"
> +#include "amdgpu.h"
>   #include "amdgpu_amdkfd.h"
>   
>   static inline struct process_queue_node *get_queue_by_qid(
> @@ -167,6 +169,7 @@ static int create_cp_queue(struct process_queue_manager *pqm,
>   				struct queue_properties *q_properties,
>   				struct file *f, unsigned int qid)
>   {
> +	struct drmcg *drmcg;
>   	int retval;
>   
>   	/* Doorbell initialized in user space*/
> @@ -180,6 +183,36 @@ static int create_cp_queue(struct process_queue_manager *pqm,
>   	if (retval != 0)
>   		return retval;
>   
> +
> +	drmcg = drmcg_get(pqm->process->lead_thread);
> +	if (drmcg) {
> +		struct amdgpu_device *adev;
> +		struct drmcg_device_resource *ddr;
> +		int mask_size;
> +		u32 *mask;
> +
> +		adev = (struct amdgpu_device *) dev->kgd;
> +
> +		mask_size = adev->ddev->drmcg_props.lgpu_capacity;
> +		mask = kzalloc(sizeof(u32) * round_up(mask_size, 32),
> +				GFP_KERNEL);
> +
> +		if (!mask) {
> +			drmcg_put(drmcg);
> +			uninit_queue(*q);
> +			return -ENOMEM;
> +		}
> +
> +		ddr = drmcg->dev_resources[adev->ddev->primary->index];
> +
> +		bitmap_to_arr32(mask, ddr->lgpu_allocated, mask_size);
> +
> +		(*q)->properties.cu_mask_count = mask_size;
> +		(*q)->properties.cu_mask = mask;
> +
> +		drmcg_put(drmcg);
> +	}
> +
>   	(*q)->device = dev;
>   	(*q)->process = pqm->process;
>   
> @@ -495,6 +528,113 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
>   						       save_area_used_size);
>   }
>   
> +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> +		unsigned int cu_mask_size)
> +{
> +	DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY);
> +	struct drmcg_device_resource *ddr;
> +	struct process_queue_node *pqn;
> +	struct amdgpu_device *adev;
> +	struct drmcg *drmcg;
> +	bool result;
> +
> +	if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY)
> +		return false;
> +
> +	bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size);
> +
> +	pqn = get_queue_by_qid(&p->pqm, qid);
> +	if (!pqn)
> +		return false;
> +
> +	adev = (struct amdgpu_device *)pqn->q->device->kgd;
> +
> +	drmcg = drmcg_get(p->lead_thread);
> +	ddr = drmcg->dev_resources[adev->ddev->primary->index];
> +
> +	if (bitmap_subset(curr_mask, ddr->lgpu_allocated,
> +				MAX_DRMCG_LGPU_CAPACITY))
> +		result = true;
> +	else
> +		result = false;
> +
> +	drmcg_put(drmcg);
> +
> +	return result;
> +}
> +
> +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> +		struct amdgpu_device *adev, unsigned long *lgpu_bm,
> +		unsigned int lgpu_bm_size)
> +{
> +	struct kfd_dev *kdev = adev->kfd.dev;
> +	struct process_queue_node *pqn;
> +	struct kfd_process *kfdproc;
> +	size_t size_in_bytes;
> +	u32 *cu_mask;
> +	int rc = 0;
> +
> +	if ((lgpu_bm_size % 32) != 0) {
> +		pr_warn("lgpu_bm_size %d must be a multiple of 32",
> +				lgpu_bm_size);
> +		return -EINVAL;
> +	}
> +
> +	kfdproc = kfd_get_process(task);
> +
> +	if (IS_ERR(kfdproc))
> +		return -ESRCH;
> +
> +	size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32);
> +
> +	mutex_lock(&kfdproc->mutex);
> +	list_for_each_entry(pqn, &kfdproc->pqm.queues, process_queue_list) {
> +		if (pqn->q && pqn->q->device == kdev) {
> +			/* update cu_mask accordingly */
> +			cu_mask = kzalloc(size_in_bytes, GFP_KERNEL);
> +			if (!cu_mask) {
> +				rc = -ENOMEM;
> +				break;
> +			}
> +
> +			if (pqn->q->properties.cu_mask) {
> +				DECLARE_BITMAP(curr_mask,
> +						MAX_DRMCG_LGPU_CAPACITY);
> +
> +				if (pqn->q->properties.cu_mask_count >
> +						lgpu_bm_size) {
> +					rc = -EINVAL;
> +					kfree(cu_mask);
> +					break;
> +				}
> +
> +				bitmap_from_arr32(curr_mask,
> +						pqn->q->properties.cu_mask,
> +						pqn->q->properties.cu_mask_count);
> +
> +				bitmap_and(curr_mask, curr_mask, lgpu_bm,
> +						lgpu_bm_size);
> +
> +				bitmap_to_arr32(cu_mask, curr_mask,
> +						lgpu_bm_size);
> +
> +				kfree(curr_mask);
> +			} else
> +				bitmap_to_arr32(cu_mask, lgpu_bm,
> +						lgpu_bm_size);
> +
> +			pqn->q->properties.cu_mask = cu_mask;
> +			pqn->q->properties.cu_mask_count = lgpu_bm_size;
> +
> +			rc = pqn->q->device->dqm->ops.update_queue(
> +					pqn->q->device->dqm, pqn->q);
> +		}
> +	}
> +	mutex_unlock(&kfdproc->mutex);
> +
> +	return rc;
> +}
> +
>   #if defined(CONFIG_DEBUG_FS)
>   
>   int pqm_debugfs_mqds(struct seq_file *m, void *data)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
       [not found]       ` <b3d2b3c1-8854-10ca-3e39-b3bef35bdfa9-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-09 10:31         ` Daniel Vetter
       [not found]           ` <20191009103153.GU16989-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  2019-10-09 15:25             ` Kuehling, Felix
  0 siblings, 2 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-10-09 10:31 UTC (permalink / raw)
  To: Kuehling, Felix
  Cc: daniel-/w4YWyX8dFk, Ho, Kenny, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Deucher, Alexander, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Greathouse, Joseph,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Koenig, Christian

On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > drm.lgpu
> >          A read-write nested-keyed file which exists on all cgroups.
> >          Each entry is keyed by the DRM device's major:minor.
> >
> >          lgpu stands for logical GPU, it is an abstraction used to
> >          subdivide a physical DRM device for the purpose of resource
> >          management.
> >
> >          The lgpu is a discrete quantity that is device specific (i.e.
> >          some DRM devices may have 64 lgpus while others may have 100
> >          lgpus.)  The lgpu is a single quantity with two representations
> >          denoted by the following nested keys.
> >
> >            =====     ========================================
> >            count     Representing lgpu as anonymous resource
> >            list      Representing lgpu as named resource
> >            =====     ========================================
> >
> >          For example:
> >          226:0 count=256 list=0-255
> >          226:1 count=4 list=0,2,4,6
> >          226:2 count=32 list=32-63
> >
> >          lgpu is represented by a bitmap and uses the bitmap_parselist
> >          kernel function so the list key input format is a
> >          comma-separated list of decimal numbers and ranges.
> >
> >          Consecutively set bits are shown as two hyphen-separated decimal
> >          numbers, the smallest and largest bit numbers set in the range.
> >          Optionally each range can be postfixed to denote that only parts
> >          of it should be set.  The range will divided to groups of
> >          specific size.
> >          Syntax: range:used_size/group_size
> >          Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >
> >          The count key is the hamming weight / hweight of the bitmap.
> >
> >          Both count and list accept the max and default keywords.
> >
> >          Some DRM devices may only support lgpu as anonymous resources.
> >          In such case, the significance of the position of the set bits
> >          in list will be ignored.
> >
> >          This lgpu resource supports the 'allocation' resource
> >          distribution model.
> >
> > Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
> > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> 
> The description sounds reasonable to me and maps well to the CU masking 
> feature in our GPUs.
> 
> It would also allow us to do more coarse-grained masking for example to 
> guarantee balanced allocation of CUs across shader engines or 
> partitioning of memory bandwidth or CP pipes (if that is supported by 
> the hardware/firmware).

Hm, so this sounds like the definition for how this cgroup is supposed to
work is "amd CU masking" (whatever that exactly is). And the abstract
description is just prettification on top, but not actually the real
definition you guys want.

I think adding a cgroup which is that much depending upon the hw
implementation of the first driver supporting it is not a good idea.
-Daniel

> 
> I can't comment on the code as I'm unfamiliar with the details of the 
> cgroup code.
> 
> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> 
> 
> > ---
> >   Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
> >   include/drm/drm_cgroup.h                |   4 +
> >   include/linux/cgroup_drm.h              |   6 ++
> >   kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
> >   4 files changed, 191 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 87a195133eaa..57f18469bd76 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1958,6 +1958,52 @@ DRM Interface Files
> >   	Set largest allocation for /dev/dri/card1 to 4MB
> >   	echo "226:1 4m" > drm.buffer.peak.max
> >   
> > +  drm.lgpu
> > +	A read-write nested-keyed file which exists on all cgroups.
> > +	Each entry is keyed by the DRM device's major:minor.
> > +
> > +	lgpu stands for logical GPU, it is an abstraction used to
> > +	subdivide a physical DRM device for the purpose of resource
> > +	management.
> > +
> > +	The lgpu is a discrete quantity that is device specific (i.e.
> > +	some DRM devices may have 64 lgpus while others may have 100
> > +	lgpus.)  The lgpu is a single quantity with two representations
> > +	denoted by the following nested keys.
> > +
> > +	  =====     ========================================
> > +	  count     Representing lgpu as anonymous resource
> > +	  list      Representing lgpu as named resource
> > +	  =====     ========================================
> > +
> > +	For example:
> > +	226:0 count=256 list=0-255
> > +	226:1 count=4 list=0,2,4,6
> > +	226:2 count=32 list=32-63
> > +
> > +	lgpu is represented by a bitmap and uses the bitmap_parselist
> > +	kernel function so the list key input format is a
> > +	comma-separated list of decimal numbers and ranges.
> > +
> > +	Consecutively set bits are shown as two hyphen-separated decimal
> > +	numbers, the smallest and largest bit numbers set in the range.
> > +	Optionally each range can be postfixed to denote that only parts
> > +	of it should be set.  The range will divided to groups of
> > +	specific size.
> > +	Syntax: range:used_size/group_size
> > +	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > +
> > +	The count key is the hamming weight / hweight of the bitmap.
> > +
> > +	Both count and list accept the max and default keywords.
> > +
> > +	Some DRM devices may only support lgpu as anonymous resources.
> > +	In such case, the significance of the position of the set bits
> > +	in list will be ignored.
> > +
> > +	This lgpu resource supports the 'allocation' resource
> > +	distribution model.
> > +
> >   GEM Buffer Ownership
> >   ~~~~~~~~~~~~~~~~~~~~
> >   
> > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > index 6d9707e1eb72..a8d6be0b075b 100644
> > --- a/include/drm/drm_cgroup.h
> > +++ b/include/drm/drm_cgroup.h
> > @@ -6,6 +6,7 @@
> >   
> >   #include <linux/cgroup_drm.h>
> >   #include <linux/workqueue.h>
> > +#include <linux/types.h>
> >   #include <drm/ttm/ttm_bo_api.h>
> >   #include <drm/ttm/ttm_bo_driver.h>
> >   
> > @@ -28,6 +29,9 @@ struct drmcg_props {
> >   	s64			mem_highs_default[TTM_PL_PRIV+1];
> >   
> >   	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
> > +
> > +	int			lgpu_capacity;
> > +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >   };
> >   
> >   #ifdef CONFIG_CGROUP_DRM
> > diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> > index c56cfe74d1a6..7b1cfc4ce4c3 100644
> > --- a/include/linux/cgroup_drm.h
> > +++ b/include/linux/cgroup_drm.h
> > @@ -14,6 +14,8 @@
> >   /* limit defined per the way drm_minor_alloc operates */
> >   #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >   
> > +#define MAX_DRMCG_LGPU_CAPACITY 256
> > +
> >   enum drmcg_mem_bw_attr {
> >   	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
> >   	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
> > @@ -32,6 +34,7 @@ enum drmcg_res_type {
> >   	DRMCG_TYPE_MEM_PEAK,
> >   	DRMCG_TYPE_BANDWIDTH,
> >   	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
> > +	DRMCG_TYPE_LGPU,
> >   	__DRMCG_TYPE_LAST,
> >   };
> >   
> > @@ -58,6 +61,9 @@ struct drmcg_device_resource {
> >   	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
> >   	s64			mem_bw_limits_bytes_in_period;
> >   	s64			mem_bw_limits_avg_bytes_per_us;
> > +
> > +	s64			lgpu_used;
> > +	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >   };
> >   
> >   /**
> > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> > index 0ea7f0619e25..18c4368e2c29 100644
> > --- a/kernel/cgroup/drm.c
> > +++ b/kernel/cgroup/drm.c
> > @@ -9,6 +9,7 @@
> >   #include <linux/cgroup_drm.h>
> >   #include <linux/ktime.h>
> >   #include <linux/kernel.h>
> > +#include <linux/bitmap.h>
> >   #include <drm/drm_file.h>
> >   #include <drm/drm_drv.h>
> >   #include <drm/ttm/ttm_bo_api.h>
> > @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
> >   #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
> >   #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
> >   
> > +#define LGPU_LIMITS_NAME_LIST "list"
> > +#define LGPU_LIMITS_NAME_COUNT "count"
> > +
> >   static struct drmcg *root_drmcg __read_mostly;
> >   
> >   static int drmcg_css_free_fn(int id, void *ptr, void *data)
> > @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >   	for (i = 0; i <= TTM_PL_PRIV; i++)
> >   		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
> >   
> > +	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
> > +			MAX_DRMCG_LGPU_CAPACITY);
> > +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > +
> >   	mutex_unlock(&dev->drmcg_mutex);
> >   	return 0;
> >   }
> > @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >   				MEM_BW_LIMITS_NAME_AVG,
> >   				ddr->mem_bw_limits_avg_bytes_per_us);
> >   		break;
> > +	case DRMCG_TYPE_LGPU:
> > +		seq_printf(sf, "%s=%lld %s=%*pbl\n",
> > +				LGPU_LIMITS_NAME_COUNT,
> > +				ddr->lgpu_used,
> > +				LGPU_LIMITS_NAME_LIST,
> > +				dev->drmcg_props.lgpu_capacity,
> > +				ddr->lgpu_allocated);
> > +		break;
> >   	default:
> >   		seq_puts(sf, "\n");
> >   		break;
> > @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
> >   				MEM_BW_LIMITS_NAME_AVG,
> >   				props->mem_bw_avg_bytes_per_us_default);
> >   		break;
> > +	case DRMCG_TYPE_LGPU:
> > +		seq_printf(sf, "%s=%d %s=%*pbl\n",
> > +				LGPU_LIMITS_NAME_COUNT,
> > +				bitmap_weight(props->lgpu_slots,
> > +					props->lgpu_capacity),
> > +				LGPU_LIMITS_NAME_LIST,
> > +				props->lgpu_capacity,
> > +				props->lgpu_slots);
> > +		break;
> >   	default:
> >   		seq_puts(sf, "\n");
> >   		break;
> > @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
> >   	mutex_unlock(&dev->drmcg_mutex);
> >   }
> >   
> > +static void drmcg_lgpu_values_apply(struct drm_device *dev,
> > +		struct drmcg_device_resource *ddr, unsigned long *val)
> > +{
> > +
> > +	mutex_lock(&dev->drmcg_mutex);
> > +	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
> > +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > +	mutex_unlock(&dev->drmcg_mutex);
> > +}
> > +
> >   static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >   		struct drm_device *dev, char *attrs)
> >   {
> > +	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > +	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >   	enum drmcg_res_type type =
> >   		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >   	struct drmcg *drmcg = css_to_drmcg(of_css(of));
> > @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >   				continue;
> >   			}
> >   			break; /* DRMCG_TYPE_MEM */
> > +		case DRMCG_TYPE_LGPU:
> > +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> > +				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
> > +				continue;
> > +
> > +                        if (!strcmp("max", sval) ||
> > +					!strcmp("default", sval)) {
> > +				if (parent != NULL)
> > +					drmcg_lgpu_values_apply(dev, ddr,
> > +						parent->dev_resources[minor]->
> > +						lgpu_allocated);
> > +				else
> > +					drmcg_lgpu_values_apply(dev, ddr,
> > +						props->lgpu_slots);
> > +
> > +				continue;
> > +			}
> > +
> > +			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> > +				p_max = parent == NULL ? props->lgpu_capacity:
> > +					bitmap_weight(
> > +					parent->dev_resources[minor]->
> > +					lgpu_allocated, props->lgpu_capacity);
> > +
> > +				rc = drmcg_process_limit_s64_val(sval,
> > +					false, p_max, p_max, &val);
> > +
> > +				if (rc || val < 0) {
> > +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> > +							minor);
> > +					continue;
> > +				}
> > +
> > +				bitmap_zero(tmp_bitmap,
> > +						MAX_DRMCG_LGPU_CAPACITY);
> > +				bitmap_set(tmp_bitmap, 0, val);
> > +			}
> > +
> > +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> > +				rc = bitmap_parselist(sval, tmp_bitmap,
> > +						MAX_DRMCG_LGPU_CAPACITY);
> > +
> > +				if (rc) {
> > +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> > +							minor);
> > +					continue;
> > +				}
> > +
> > +                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
> > +					props->lgpu_slots,
> > +					MAX_DRMCG_LGPU_CAPACITY);
> > +
> > +                        	if (!bitmap_empty(chk_bitmap,
> > +						MAX_DRMCG_LGPU_CAPACITY)) {
> > +					drmcg_pr_cft_err(drmcg, 0, cft_name,
> > +							minor);
> > +					continue;
> > +				}
> > +			}
> > +
> > +
> > +                        if (parent != NULL) {
> > +				bitmap_and(chk_bitmap, tmp_bitmap,
> > +				parent->dev_resources[minor]->lgpu_allocated,
> > +				props->lgpu_capacity);
> > +
> > +				if (bitmap_empty(chk_bitmap,
> > +						props->lgpu_capacity)) {
> > +					drmcg_pr_cft_err(drmcg, 0,
> > +							cft_name, minor);
> > +					continue;
> > +				}
> > +			}
> > +
> > +			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
> > +
> > +			break; /* DRMCG_TYPE_LGPU */
> >   		default:
> >   			break;
> >   		} /* switch (type) */
> > @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >   			break;
> >   		case DRMCG_TYPE_BANDWIDTH:
> >   		case DRMCG_TYPE_MEM:
> > +		case DRMCG_TYPE_LGPU:
> >   			drmcg_nested_limit_parse(of, dm->dev, sattr);
> >   			break;
> >   		default:
> > @@ -731,6 +846,20 @@ struct cftype files[] = {
> >   		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
> >   						DRMCG_FTYPE_DEFAULT),
> >   	},
> > +	{
> > +		.name = "lgpu",
> > +		.seq_show = drmcg_seq_show,
> > +		.write = drmcg_limit_write,
> > +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > +						DRMCG_FTYPE_LIMIT),
> > +	},
> > +	{
> > +		.name = "lgpu.default",
> > +		.seq_show = drmcg_seq_show,
> > +		.flags = CFTYPE_ONLY_ON_ROOT,
> > +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > +						DRMCG_FTYPE_DEFAULT),
> > +	},
> >   	{ }	/* terminate */
> >   };
> >   
> > @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
> >   
> >   static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >   {
> > +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > +        bitmap_fill(dev->drmcg_props.lgpu_slots,
> > +			dev->drmcg_props.lgpu_capacity);
> > +
> >   	/* init cgroups created before registration (i.e. root cgroup) */
> >   	if (root_drmcg != NULL) {
> >   		struct cgroup_subsys_state *pos;
> > @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
> >   	for (i = 0; i <= TTM_PL_PRIV; i++)
> >   		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
> >   
> > +	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> > +
> >   	drmcg_update_cg_tree(dev);
> >   }
> >   EXPORT_SYMBOL(drmcg_device_early_init);

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
       [not found]           ` <20191009103153.GU16989-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-10-09 15:08             ` Kenny Ho
       [not found]               ` <CAOWid-fLurBT6-h5WjQsEPA+dq1fgfWztbyZuLV4ypmWH8SC9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Kenny Ho @ 2019-10-09 15:08 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ho, Kenny, Kuehling, Felix, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Deucher, Alexander, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Greathouse, Joseph, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Koenig, Christian

Hi Daniel,

Can you elaborate what you mean in more details?  The goal of lgpu is
to provide the ability to subdivide a GPU device and give those slices
to different users as needed.  I don't think there is anything
controversial or vendor specific here as requests for this are well
documented.  The underlying representation is just a bitmap, which is
neither unprecedented nor vendor specific (bitmap is used in cpuset
for instance.)

An implementation of this abstraction is not hardware specific either.
For example, one can associate a virtual function in SRIOV as a lgpu.
Alternatively, a device can also declare to have 100 lgpus and treat
the lgpu quantity as a percentage representation of GPU subdivision.
The fact that an abstraction works well with a vendor implementation
does not make it a "prettification" of a vendor feature (by this
logic, I hope you are not implying an abstraction is only valid if it
does not work with amd CU masking because that seems fairly partisan.)

Did I misread your characterization of this patch?

Regards,
Kenny


On Wed, Oct 9, 2019 at 6:31 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> > On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > > drm.lgpu
> > >          A read-write nested-keyed file which exists on all cgroups.
> > >          Each entry is keyed by the DRM device's major:minor.
> > >
> > >          lgpu stands for logical GPU, it is an abstraction used to
> > >          subdivide a physical DRM device for the purpose of resource
> > >          management.
> > >
> > >          The lgpu is a discrete quantity that is device specific (i.e.
> > >          some DRM devices may have 64 lgpus while others may have 100
> > >          lgpus.)  The lgpu is a single quantity with two representations
> > >          denoted by the following nested keys.
> > >
> > >            =====     ========================================
> > >            count     Representing lgpu as anonymous resource
> > >            list      Representing lgpu as named resource
> > >            =====     ========================================
> > >
> > >          For example:
> > >          226:0 count=256 list=0-255
> > >          226:1 count=4 list=0,2,4,6
> > >          226:2 count=32 list=32-63
> > >
> > >          lgpu is represented by a bitmap and uses the bitmap_parselist
> > >          kernel function so the list key input format is a
> > >          comma-separated list of decimal numbers and ranges.
> > >
> > >          Consecutively set bits are shown as two hyphen-separated decimal
> > >          numbers, the smallest and largest bit numbers set in the range.
> > >          Optionally each range can be postfixed to denote that only parts
> > >          of it should be set.  The range will divided to groups of
> > >          specific size.
> > >          Syntax: range:used_size/group_size
> > >          Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > >
> > >          The count key is the hamming weight / hweight of the bitmap.
> > >
> > >          Both count and list accept the max and default keywords.
> > >
> > >          Some DRM devices may only support lgpu as anonymous resources.
> > >          In such case, the significance of the position of the set bits
> > >          in list will be ignored.
> > >
> > >          This lgpu resource supports the 'allocation' resource
> > >          distribution model.
> > >
> > > Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
> > > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> >
> > The description sounds reasonable to me and maps well to the CU masking
> > feature in our GPUs.
> >
> > It would also allow us to do more coarse-grained masking for example to
> > guarantee balanced allocation of CUs across shader engines or
> > partitioning of memory bandwidth or CP pipes (if that is supported by
> > the hardware/firmware).
>
> Hm, so this sounds like the definition for how this cgroup is supposed to
> work is "amd CU masking" (whatever that exactly is). And the abstract
> description is just prettification on top, but not actually the real
> definition you guys want.
>
> I think adding a cgroup which is that much depending upon the hw
> implementation of the first driver supporting it is not a good idea.
> -Daniel
>
> >
> > I can't comment on the code as I'm unfamiliar with the details of the
> > cgroup code.
> >
> > Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> >
> >
> > > ---
> > >   Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
> > >   include/drm/drm_cgroup.h                |   4 +
> > >   include/linux/cgroup_drm.h              |   6 ++
> > >   kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
> > >   4 files changed, 191 insertions(+)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index 87a195133eaa..57f18469bd76 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1958,6 +1958,52 @@ DRM Interface Files
> > >     Set largest allocation for /dev/dri/card1 to 4MB
> > >     echo "226:1 4m" > drm.buffer.peak.max
> > >
> > > +  drm.lgpu
> > > +   A read-write nested-keyed file which exists on all cgroups.
> > > +   Each entry is keyed by the DRM device's major:minor.
> > > +
> > > +   lgpu stands for logical GPU, it is an abstraction used to
> > > +   subdivide a physical DRM device for the purpose of resource
> > > +   management.
> > > +
> > > +   The lgpu is a discrete quantity that is device specific (i.e.
> > > +   some DRM devices may have 64 lgpus while others may have 100
> > > +   lgpus.)  The lgpu is a single quantity with two representations
> > > +   denoted by the following nested keys.
> > > +
> > > +     =====     ========================================
> > > +     count     Representing lgpu as anonymous resource
> > > +     list      Representing lgpu as named resource
> > > +     =====     ========================================
> > > +
> > > +   For example:
> > > +   226:0 count=256 list=0-255
> > > +   226:1 count=4 list=0,2,4,6
> > > +   226:2 count=32 list=32-63
> > > +
> > > +   lgpu is represented by a bitmap and uses the bitmap_parselist
> > > +   kernel function so the list key input format is a
> > > +   comma-separated list of decimal numbers and ranges.
> > > +
> > > +   Consecutively set bits are shown as two hyphen-separated decimal
> > > +   numbers, the smallest and largest bit numbers set in the range.
> > > +   Optionally each range can be postfixed to denote that only parts
> > > +   of it should be set.  The range will divided to groups of
> > > +   specific size.
> > > +   Syntax: range:used_size/group_size
> > > +   Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > > +
> > > +   The count key is the hamming weight / hweight of the bitmap.
> > > +
> > > +   Both count and list accept the max and default keywords.
> > > +
> > > +   Some DRM devices may only support lgpu as anonymous resources.
> > > +   In such case, the significance of the position of the set bits
> > > +   in list will be ignored.
> > > +
> > > +   This lgpu resource supports the 'allocation' resource
> > > +   distribution model.
> > > +
> > >   GEM Buffer Ownership
> > >   ~~~~~~~~~~~~~~~~~~~~
> > >
> > > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > > index 6d9707e1eb72..a8d6be0b075b 100644
> > > --- a/include/drm/drm_cgroup.h
> > > +++ b/include/drm/drm_cgroup.h
> > > @@ -6,6 +6,7 @@
> > >
> > >   #include <linux/cgroup_drm.h>
> > >   #include <linux/workqueue.h>
> > > +#include <linux/types.h>
> > >   #include <drm/ttm/ttm_bo_api.h>
> > >   #include <drm/ttm/ttm_bo_driver.h>
> > >
> > > @@ -28,6 +29,9 @@ struct drmcg_props {
> > >     s64                     mem_highs_default[TTM_PL_PRIV+1];
> > >
> > >     struct work_struct      *mem_reclaim_wq[TTM_PL_PRIV];
> > > +
> > > +   int                     lgpu_capacity;
> > > +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > >   };
> > >
> > >   #ifdef CONFIG_CGROUP_DRM
> > > diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> > > index c56cfe74d1a6..7b1cfc4ce4c3 100644
> > > --- a/include/linux/cgroup_drm.h
> > > +++ b/include/linux/cgroup_drm.h
> > > @@ -14,6 +14,8 @@
> > >   /* limit defined per the way drm_minor_alloc operates */
> > >   #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> > >
> > > +#define MAX_DRMCG_LGPU_CAPACITY 256
> > > +
> > >   enum drmcg_mem_bw_attr {
> > >     DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
> > >     DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
> > > @@ -32,6 +34,7 @@ enum drmcg_res_type {
> > >     DRMCG_TYPE_MEM_PEAK,
> > >     DRMCG_TYPE_BANDWIDTH,
> > >     DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
> > > +   DRMCG_TYPE_LGPU,
> > >     __DRMCG_TYPE_LAST,
> > >   };
> > >
> > > @@ -58,6 +61,9 @@ struct drmcg_device_resource {
> > >     s64                     mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
> > >     s64                     mem_bw_limits_bytes_in_period;
> > >     s64                     mem_bw_limits_avg_bytes_per_us;
> > > +
> > > +   s64                     lgpu_used;
> > > +   DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > >   };
> > >
> > >   /**
> > > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> > > index 0ea7f0619e25..18c4368e2c29 100644
> > > --- a/kernel/cgroup/drm.c
> > > +++ b/kernel/cgroup/drm.c
> > > @@ -9,6 +9,7 @@
> > >   #include <linux/cgroup_drm.h>
> > >   #include <linux/ktime.h>
> > >   #include <linux/kernel.h>
> > > +#include <linux/bitmap.h>
> > >   #include <drm/drm_file.h>
> > >   #include <drm/drm_drv.h>
> > >   #include <drm/ttm/ttm_bo_api.h>
> > > @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
> > >   #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
> > >   #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
> > >
> > > +#define LGPU_LIMITS_NAME_LIST "list"
> > > +#define LGPU_LIMITS_NAME_COUNT "count"
> > > +
> > >   static struct drmcg *root_drmcg __read_mostly;
> > >
> > >   static int drmcg_css_free_fn(int id, void *ptr, void *data)
> > > @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> > >     for (i = 0; i <= TTM_PL_PRIV; i++)
> > >             ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
> > >
> > > +   bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
> > > +                   MAX_DRMCG_LGPU_CAPACITY);
> > > +   ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > > +
> > >     mutex_unlock(&dev->drmcg_mutex);
> > >     return 0;
> > >   }
> > > @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> > >                             MEM_BW_LIMITS_NAME_AVG,
> > >                             ddr->mem_bw_limits_avg_bytes_per_us);
> > >             break;
> > > +   case DRMCG_TYPE_LGPU:
> > > +           seq_printf(sf, "%s=%lld %s=%*pbl\n",
> > > +                           LGPU_LIMITS_NAME_COUNT,
> > > +                           ddr->lgpu_used,
> > > +                           LGPU_LIMITS_NAME_LIST,
> > > +                           dev->drmcg_props.lgpu_capacity,
> > > +                           ddr->lgpu_allocated);
> > > +           break;
> > >     default:
> > >             seq_puts(sf, "\n");
> > >             break;
> > > @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
> > >                             MEM_BW_LIMITS_NAME_AVG,
> > >                             props->mem_bw_avg_bytes_per_us_default);
> > >             break;
> > > +   case DRMCG_TYPE_LGPU:
> > > +           seq_printf(sf, "%s=%d %s=%*pbl\n",
> > > +                           LGPU_LIMITS_NAME_COUNT,
> > > +                           bitmap_weight(props->lgpu_slots,
> > > +                                   props->lgpu_capacity),
> > > +                           LGPU_LIMITS_NAME_LIST,
> > > +                           props->lgpu_capacity,
> > > +                           props->lgpu_slots);
> > > +           break;
> > >     default:
> > >             seq_puts(sf, "\n");
> > >             break;
> > > @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
> > >     mutex_unlock(&dev->drmcg_mutex);
> > >   }
> > >
> > > +static void drmcg_lgpu_values_apply(struct drm_device *dev,
> > > +           struct drmcg_device_resource *ddr, unsigned long *val)
> > > +{
> > > +
> > > +   mutex_lock(&dev->drmcg_mutex);
> > > +   bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
> > > +   ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > > +   mutex_unlock(&dev->drmcg_mutex);
> > > +}
> > > +
> > >   static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> > >             struct drm_device *dev, char *attrs)
> > >   {
> > > +   DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > > +   DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > >     enum drmcg_res_type type =
> > >             DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> > >     struct drmcg *drmcg = css_to_drmcg(of_css(of));
> > > @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> > >                             continue;
> > >                     }
> > >                     break; /* DRMCG_TYPE_MEM */
> > > +           case DRMCG_TYPE_LGPU:
> > > +                   if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> > > +                           strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
> > > +                           continue;
> > > +
> > > +                        if (!strcmp("max", sval) ||
> > > +                                   !strcmp("default", sval)) {
> > > +                           if (parent != NULL)
> > > +                                   drmcg_lgpu_values_apply(dev, ddr,
> > > +                                           parent->dev_resources[minor]->
> > > +                                           lgpu_allocated);
> > > +                           else
> > > +                                   drmcg_lgpu_values_apply(dev, ddr,
> > > +                                           props->lgpu_slots);
> > > +
> > > +                           continue;
> > > +                   }
> > > +
> > > +                   if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> > > +                           p_max = parent == NULL ? props->lgpu_capacity:
> > > +                                   bitmap_weight(
> > > +                                   parent->dev_resources[minor]->
> > > +                                   lgpu_allocated, props->lgpu_capacity);
> > > +
> > > +                           rc = drmcg_process_limit_s64_val(sval,
> > > +                                   false, p_max, p_max, &val);
> > > +
> > > +                           if (rc || val < 0) {
> > > +                                   drmcg_pr_cft_err(drmcg, rc, cft_name,
> > > +                                                   minor);
> > > +                                   continue;
> > > +                           }
> > > +
> > > +                           bitmap_zero(tmp_bitmap,
> > > +                                           MAX_DRMCG_LGPU_CAPACITY);
> > > +                           bitmap_set(tmp_bitmap, 0, val);
> > > +                   }
> > > +
> > > +                   if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> > > +                           rc = bitmap_parselist(sval, tmp_bitmap,
> > > +                                           MAX_DRMCG_LGPU_CAPACITY);
> > > +
> > > +                           if (rc) {
> > > +                                   drmcg_pr_cft_err(drmcg, rc, cft_name,
> > > +                                                   minor);
> > > +                                   continue;
> > > +                           }
> > > +
> > > +                           bitmap_andnot(chk_bitmap, tmp_bitmap,
> > > +                                   props->lgpu_slots,
> > > +                                   MAX_DRMCG_LGPU_CAPACITY);
> > > +
> > > +                           if (!bitmap_empty(chk_bitmap,
> > > +                                           MAX_DRMCG_LGPU_CAPACITY)) {
> > > +                                   drmcg_pr_cft_err(drmcg, 0, cft_name,
> > > +                                                   minor);
> > > +                                   continue;
> > > +                           }
> > > +                   }
> > > +
> > > +
> > > +                        if (parent != NULL) {
> > > +                           bitmap_and(chk_bitmap, tmp_bitmap,
> > > +                           parent->dev_resources[minor]->lgpu_allocated,
> > > +                           props->lgpu_capacity);
> > > +
> > > +                           if (bitmap_empty(chk_bitmap,
> > > +                                           props->lgpu_capacity)) {
> > > +                                   drmcg_pr_cft_err(drmcg, 0,
> > > +                                                   cft_name, minor);
> > > +                                   continue;
> > > +                           }
> > > +                   }
> > > +
> > > +                   drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
> > > +
> > > +                   break; /* DRMCG_TYPE_LGPU */
> > >             default:
> > >                     break;
> > >             } /* switch (type) */
> > > @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > >                     break;
> > >             case DRMCG_TYPE_BANDWIDTH:
> > >             case DRMCG_TYPE_MEM:
> > > +           case DRMCG_TYPE_LGPU:
> > >                     drmcg_nested_limit_parse(of, dm->dev, sattr);
> > >                     break;
> > >             default:
> > > @@ -731,6 +846,20 @@ struct cftype files[] = {
> > >             .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
> > >                                             DRMCG_FTYPE_DEFAULT),
> > >     },
> > > +   {
> > > +           .name = "lgpu",
> > > +           .seq_show = drmcg_seq_show,
> > > +           .write = drmcg_limit_write,
> > > +           .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > > +                                           DRMCG_FTYPE_LIMIT),
> > > +   },
> > > +   {
> > > +           .name = "lgpu.default",
> > > +           .seq_show = drmcg_seq_show,
> > > +           .flags = CFTYPE_ONLY_ON_ROOT,
> > > +           .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > > +                                           DRMCG_FTYPE_DEFAULT),
> > > +   },
> > >     { }     /* terminate */
> > >   };
> > >
> > > @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
> > >
> > >   static inline void drmcg_update_cg_tree(struct drm_device *dev)
> > >   {
> > > +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > > +        bitmap_fill(dev->drmcg_props.lgpu_slots,
> > > +                   dev->drmcg_props.lgpu_capacity);
> > > +
> > >     /* init cgroups created before registration (i.e. root cgroup) */
> > >     if (root_drmcg != NULL) {
> > >             struct cgroup_subsys_state *pos;
> > > @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
> > >     for (i = 0; i <= TTM_PL_PRIV; i++)
> > >             dev->drmcg_props.mem_highs_default[i] = S64_MAX;
> > >
> > > +   dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> > > +
> > >     drmcg_update_cg_tree(dev);
> > >   }
> > >   EXPORT_SYMBOL(drmcg_device_early_init);
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
       [not found]               ` <CAOWid-fLurBT6-h5WjQsEPA+dq1fgfWztbyZuLV4ypmWH8SC9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-10-09 15:23                 ` Daniel Vetter
  0 siblings, 0 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-10-09 15:23 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Greathouse, Joseph, Ho, Kenny, Kuehling, Felix,
	jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, Deucher, Alexander,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Koenig, Christian

On Wed, Oct 09, 2019 at 11:08:45AM -0400, Kenny Ho wrote:
> Hi Daniel,
> 
> Can you elaborate what you mean in more details?  The goal of lgpu is
> to provide the ability to subdivide a GPU device and give those slices
> to different users as needed.  I don't think there is anything
> controversial or vendor specific here as requests for this are well
> documented.  The underlying representation is just a bitmap, which is
> neither unprecedented nor vendor specific (bitmap is used in cpuset
> for instance.)
> 
> An implementation of this abstraction is not hardware specific either.
> For example, one can associate a virtual function in SRIOV as a lgpu.
> Alternatively, a device can also declare to have 100 lgpus and treat
> the lgpu quantity as a percentage representation of GPU subdivision.
> The fact that an abstraction works well with a vendor implementation
> does not make it a "prettification" of a vendor feature (by this
> logic, I hope you are not implying an abstraction is only valid if it
> does not work with amd CU masking because that seems fairly partisan.)
> 
> Did I misread your characterization of this patch?

Scenario: I'm a gpgpu customer, and I type some gpgpu program (probably in
cude, transpiled for amd using rocm).

How does the stuff I'm seeing in cuda (or vk compute, or whatever) map to
the bitmasks I can set in this cgroup controller?

That's the stuff which this spec needs to explain. Currently the answer is
"amd CU masking", and that's not going to work on e.g. nvidia hw. We need
to come up with end-user relevant resources/meanings for these bits which
works across vendors.

On cpu a "cpu core" is rather well-defined, and customers actually know
what it means on intel, amd, ibm powerpc or arm. Both on the program side
(e.g. what do I need to stuff into relevant system calls to run on a
specific "cpu core") and on the admin side.

We need to achieve the same for gpus. "it's a bitmask" is not even close
enough imo.
-Daniel

> 
> Regards,
> Kenny
> 
> 
> On Wed, Oct 9, 2019 at 6:31 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> > > On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > > > drm.lgpu
> > > >          A read-write nested-keyed file which exists on all cgroups.
> > > >          Each entry is keyed by the DRM device's major:minor.
> > > >
> > > >          lgpu stands for logical GPU, it is an abstraction used to
> > > >          subdivide a physical DRM device for the purpose of resource
> > > >          management.
> > > >
> > > >          The lgpu is a discrete quantity that is device specific (i.e.
> > > >          some DRM devices may have 64 lgpus while others may have 100
> > > >          lgpus.)  The lgpu is a single quantity with two representations
> > > >          denoted by the following nested keys.
> > > >
> > > >            =====     ========================================
> > > >            count     Representing lgpu as anonymous resource
> > > >            list      Representing lgpu as named resource
> > > >            =====     ========================================
> > > >
> > > >          For example:
> > > >          226:0 count=256 list=0-255
> > > >          226:1 count=4 list=0,2,4,6
> > > >          226:2 count=32 list=32-63
> > > >
> > > >          lgpu is represented by a bitmap and uses the bitmap_parselist
> > > >          kernel function so the list key input format is a
> > > >          comma-separated list of decimal numbers and ranges.
> > > >
> > > >          Consecutively set bits are shown as two hyphen-separated decimal
> > > >          numbers, the smallest and largest bit numbers set in the range.
> > > >          Optionally each range can be postfixed to denote that only parts
> > > >          of it should be set.  The range will divided to groups of
> > > >          specific size.
> > > >          Syntax: range:used_size/group_size
> > > >          Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > > >
> > > >          The count key is the hamming weight / hweight of the bitmap.
> > > >
> > > >          Both count and list accept the max and default keywords.
> > > >
> > > >          Some DRM devices may only support lgpu as anonymous resources.
> > > >          In such case, the significance of the position of the set bits
> > > >          in list will be ignored.
> > > >
> > > >          This lgpu resource supports the 'allocation' resource
> > > >          distribution model.
> > > >
> > > > Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
> > > > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > >
> > > The description sounds reasonable to me and maps well to the CU masking
> > > feature in our GPUs.
> > >
> > > It would also allow us to do more coarse-grained masking for example to
> > > guarantee balanced allocation of CUs across shader engines or
> > > partitioning of memory bandwidth or CP pipes (if that is supported by
> > > the hardware/firmware).
> >
> > Hm, so this sounds like the definition for how this cgroup is supposed to
> > work is "amd CU masking" (whatever that exactly is). And the abstract
> > description is just prettification on top, but not actually the real
> > definition you guys want.
> >
> > I think adding a cgroup which is that much depending upon the hw
> > implementation of the first driver supporting it is not a good idea.
> > -Daniel
> >
> > >
> > > I can't comment on the code as I'm unfamiliar with the details of the
> > > cgroup code.
> > >
> > > Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> > >
> > >
> > > > ---
> > > >   Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
> > > >   include/drm/drm_cgroup.h                |   4 +
> > > >   include/linux/cgroup_drm.h              |   6 ++
> > > >   kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
> > > >   4 files changed, 191 insertions(+)
> > > >
> > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > > index 87a195133eaa..57f18469bd76 100644
> > > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > > @@ -1958,6 +1958,52 @@ DRM Interface Files
> > > >     Set largest allocation for /dev/dri/card1 to 4MB
> > > >     echo "226:1 4m" > drm.buffer.peak.max
> > > >
> > > > +  drm.lgpu
> > > > +   A read-write nested-keyed file which exists on all cgroups.
> > > > +   Each entry is keyed by the DRM device's major:minor.
> > > > +
> > > > +   lgpu stands for logical GPU, it is an abstraction used to
> > > > +   subdivide a physical DRM device for the purpose of resource
> > > > +   management.
> > > > +
> > > > +   The lgpu is a discrete quantity that is device specific (i.e.
> > > > +   some DRM devices may have 64 lgpus while others may have 100
> > > > +   lgpus.)  The lgpu is a single quantity with two representations
> > > > +   denoted by the following nested keys.
> > > > +
> > > > +     =====     ========================================
> > > > +     count     Representing lgpu as anonymous resource
> > > > +     list      Representing lgpu as named resource
> > > > +     =====     ========================================
> > > > +
> > > > +   For example:
> > > > +   226:0 count=256 list=0-255
> > > > +   226:1 count=4 list=0,2,4,6
> > > > +   226:2 count=32 list=32-63
> > > > +
> > > > +   lgpu is represented by a bitmap and uses the bitmap_parselist
> > > > +   kernel function so the list key input format is a
> > > > +   comma-separated list of decimal numbers and ranges.
> > > > +
> > > > +   Consecutively set bits are shown as two hyphen-separated decimal
> > > > +   numbers, the smallest and largest bit numbers set in the range.
> > > > +   Optionally each range can be postfixed to denote that only parts
> > > > +   of it should be set.  The range will divided to groups of
> > > > +   specific size.
> > > > +   Syntax: range:used_size/group_size
> > > > +   Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > > > +
> > > > +   The count key is the hamming weight / hweight of the bitmap.
> > > > +
> > > > +   Both count and list accept the max and default keywords.
> > > > +
> > > > +   Some DRM devices may only support lgpu as anonymous resources.
> > > > +   In such case, the significance of the position of the set bits
> > > > +   in list will be ignored.
> > > > +
> > > > +   This lgpu resource supports the 'allocation' resource
> > > > +   distribution model.
> > > > +
> > > >   GEM Buffer Ownership
> > > >   ~~~~~~~~~~~~~~~~~~~~
> > > >
> > > > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > > > index 6d9707e1eb72..a8d6be0b075b 100644
> > > > --- a/include/drm/drm_cgroup.h
> > > > +++ b/include/drm/drm_cgroup.h
> > > > @@ -6,6 +6,7 @@
> > > >
> > > >   #include <linux/cgroup_drm.h>
> > > >   #include <linux/workqueue.h>
> > > > +#include <linux/types.h>
> > > >   #include <drm/ttm/ttm_bo_api.h>
> > > >   #include <drm/ttm/ttm_bo_driver.h>
> > > >
> > > > @@ -28,6 +29,9 @@ struct drmcg_props {
> > > >     s64                     mem_highs_default[TTM_PL_PRIV+1];
> > > >
> > > >     struct work_struct      *mem_reclaim_wq[TTM_PL_PRIV];
> > > > +
> > > > +   int                     lgpu_capacity;
> > > > +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > > >   };
> > > >
> > > >   #ifdef CONFIG_CGROUP_DRM
> > > > diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> > > > index c56cfe74d1a6..7b1cfc4ce4c3 100644
> > > > --- a/include/linux/cgroup_drm.h
> > > > +++ b/include/linux/cgroup_drm.h
> > > > @@ -14,6 +14,8 @@
> > > >   /* limit defined per the way drm_minor_alloc operates */
> > > >   #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> > > >
> > > > +#define MAX_DRMCG_LGPU_CAPACITY 256
> > > > +
> > > >   enum drmcg_mem_bw_attr {
> > > >     DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
> > > >     DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
> > > > @@ -32,6 +34,7 @@ enum drmcg_res_type {
> > > >     DRMCG_TYPE_MEM_PEAK,
> > > >     DRMCG_TYPE_BANDWIDTH,
> > > >     DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
> > > > +   DRMCG_TYPE_LGPU,
> > > >     __DRMCG_TYPE_LAST,
> > > >   };
> > > >
> > > > @@ -58,6 +61,9 @@ struct drmcg_device_resource {
> > > >     s64                     mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
> > > >     s64                     mem_bw_limits_bytes_in_period;
> > > >     s64                     mem_bw_limits_avg_bytes_per_us;
> > > > +
> > > > +   s64                     lgpu_used;
> > > > +   DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > > >   };
> > > >
> > > >   /**
> > > > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> > > > index 0ea7f0619e25..18c4368e2c29 100644
> > > > --- a/kernel/cgroup/drm.c
> > > > +++ b/kernel/cgroup/drm.c
> > > > @@ -9,6 +9,7 @@
> > > >   #include <linux/cgroup_drm.h>
> > > >   #include <linux/ktime.h>
> > > >   #include <linux/kernel.h>
> > > > +#include <linux/bitmap.h>
> > > >   #include <drm/drm_file.h>
> > > >   #include <drm/drm_drv.h>
> > > >   #include <drm/ttm/ttm_bo_api.h>
> > > > @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
> > > >   #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
> > > >   #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
> > > >
> > > > +#define LGPU_LIMITS_NAME_LIST "list"
> > > > +#define LGPU_LIMITS_NAME_COUNT "count"
> > > > +
> > > >   static struct drmcg *root_drmcg __read_mostly;
> > > >
> > > >   static int drmcg_css_free_fn(int id, void *ptr, void *data)
> > > > @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> > > >     for (i = 0; i <= TTM_PL_PRIV; i++)
> > > >             ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
> > > >
> > > > +   bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
> > > > +                   MAX_DRMCG_LGPU_CAPACITY);
> > > > +   ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > > > +
> > > >     mutex_unlock(&dev->drmcg_mutex);
> > > >     return 0;
> > > >   }
> > > > @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> > > >                             MEM_BW_LIMITS_NAME_AVG,
> > > >                             ddr->mem_bw_limits_avg_bytes_per_us);
> > > >             break;
> > > > +   case DRMCG_TYPE_LGPU:
> > > > +           seq_printf(sf, "%s=%lld %s=%*pbl\n",
> > > > +                           LGPU_LIMITS_NAME_COUNT,
> > > > +                           ddr->lgpu_used,
> > > > +                           LGPU_LIMITS_NAME_LIST,
> > > > +                           dev->drmcg_props.lgpu_capacity,
> > > > +                           ddr->lgpu_allocated);
> > > > +           break;
> > > >     default:
> > > >             seq_puts(sf, "\n");
> > > >             break;
> > > > @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
> > > >                             MEM_BW_LIMITS_NAME_AVG,
> > > >                             props->mem_bw_avg_bytes_per_us_default);
> > > >             break;
> > > > +   case DRMCG_TYPE_LGPU:
> > > > +           seq_printf(sf, "%s=%d %s=%*pbl\n",
> > > > +                           LGPU_LIMITS_NAME_COUNT,
> > > > +                           bitmap_weight(props->lgpu_slots,
> > > > +                                   props->lgpu_capacity),
> > > > +                           LGPU_LIMITS_NAME_LIST,
> > > > +                           props->lgpu_capacity,
> > > > +                           props->lgpu_slots);
> > > > +           break;
> > > >     default:
> > > >             seq_puts(sf, "\n");
> > > >             break;
> > > > @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
> > > >     mutex_unlock(&dev->drmcg_mutex);
> > > >   }
> > > >
> > > > +static void drmcg_lgpu_values_apply(struct drm_device *dev,
> > > > +           struct drmcg_device_resource *ddr, unsigned long *val)
> > > > +{
> > > > +
> > > > +   mutex_lock(&dev->drmcg_mutex);
> > > > +   bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
> > > > +   ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > > > +   mutex_unlock(&dev->drmcg_mutex);
> > > > +}
> > > > +
> > > >   static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> > > >             struct drm_device *dev, char *attrs)
> > > >   {
> > > > +   DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > > > +   DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > > >     enum drmcg_res_type type =
> > > >             DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> > > >     struct drmcg *drmcg = css_to_drmcg(of_css(of));
> > > > @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> > > >                             continue;
> > > >                     }
> > > >                     break; /* DRMCG_TYPE_MEM */
> > > > +           case DRMCG_TYPE_LGPU:
> > > > +                   if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> > > > +                           strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
> > > > +                           continue;
> > > > +
> > > > +                        if (!strcmp("max", sval) ||
> > > > +                                   !strcmp("default", sval)) {
> > > > +                           if (parent != NULL)
> > > > +                                   drmcg_lgpu_values_apply(dev, ddr,
> > > > +                                           parent->dev_resources[minor]->
> > > > +                                           lgpu_allocated);
> > > > +                           else
> > > > +                                   drmcg_lgpu_values_apply(dev, ddr,
> > > > +                                           props->lgpu_slots);
> > > > +
> > > > +                           continue;
> > > > +                   }
> > > > +
> > > > +                   if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> > > > +                           p_max = parent == NULL ? props->lgpu_capacity:
> > > > +                                   bitmap_weight(
> > > > +                                   parent->dev_resources[minor]->
> > > > +                                   lgpu_allocated, props->lgpu_capacity);
> > > > +
> > > > +                           rc = drmcg_process_limit_s64_val(sval,
> > > > +                                   false, p_max, p_max, &val);
> > > > +
> > > > +                           if (rc || val < 0) {
> > > > +                                   drmcg_pr_cft_err(drmcg, rc, cft_name,
> > > > +                                                   minor);
> > > > +                                   continue;
> > > > +                           }
> > > > +
> > > > +                           bitmap_zero(tmp_bitmap,
> > > > +                                           MAX_DRMCG_LGPU_CAPACITY);
> > > > +                           bitmap_set(tmp_bitmap, 0, val);
> > > > +                   }
> > > > +
> > > > +                   if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> > > > +                           rc = bitmap_parselist(sval, tmp_bitmap,
> > > > +                                           MAX_DRMCG_LGPU_CAPACITY);
> > > > +
> > > > +                           if (rc) {
> > > > +                                   drmcg_pr_cft_err(drmcg, rc, cft_name,
> > > > +                                                   minor);
> > > > +                                   continue;
> > > > +                           }
> > > > +
> > > > +                           bitmap_andnot(chk_bitmap, tmp_bitmap,
> > > > +                                   props->lgpu_slots,
> > > > +                                   MAX_DRMCG_LGPU_CAPACITY);
> > > > +
> > > > +                           if (!bitmap_empty(chk_bitmap,
> > > > +                                           MAX_DRMCG_LGPU_CAPACITY)) {
> > > > +                                   drmcg_pr_cft_err(drmcg, 0, cft_name,
> > > > +                                                   minor);
> > > > +                                   continue;
> > > > +                           }
> > > > +                   }
> > > > +
> > > > +
> > > > +                        if (parent != NULL) {
> > > > +                           bitmap_and(chk_bitmap, tmp_bitmap,
> > > > +                           parent->dev_resources[minor]->lgpu_allocated,
> > > > +                           props->lgpu_capacity);
> > > > +
> > > > +                           if (bitmap_empty(chk_bitmap,
> > > > +                                           props->lgpu_capacity)) {
> > > > +                                   drmcg_pr_cft_err(drmcg, 0,
> > > > +                                                   cft_name, minor);
> > > > +                                   continue;
> > > > +                           }
> > > > +                   }
> > > > +
> > > > +                   drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
> > > > +
> > > > +                   break; /* DRMCG_TYPE_LGPU */
> > > >             default:
> > > >                     break;
> > > >             } /* switch (type) */
> > > > @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > > >                     break;
> > > >             case DRMCG_TYPE_BANDWIDTH:
> > > >             case DRMCG_TYPE_MEM:
> > > > +           case DRMCG_TYPE_LGPU:
> > > >                     drmcg_nested_limit_parse(of, dm->dev, sattr);
> > > >                     break;
> > > >             default:
> > > > @@ -731,6 +846,20 @@ struct cftype files[] = {
> > > >             .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
> > > >                                             DRMCG_FTYPE_DEFAULT),
> > > >     },
> > > > +   {
> > > > +           .name = "lgpu",
> > > > +           .seq_show = drmcg_seq_show,
> > > > +           .write = drmcg_limit_write,
> > > > +           .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > > > +                                           DRMCG_FTYPE_LIMIT),
> > > > +   },
> > > > +   {
> > > > +           .name = "lgpu.default",
> > > > +           .seq_show = drmcg_seq_show,
> > > > +           .flags = CFTYPE_ONLY_ON_ROOT,
> > > > +           .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > > > +                                           DRMCG_FTYPE_DEFAULT),
> > > > +   },
> > > >     { }     /* terminate */
> > > >   };
> > > >
> > > > @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
> > > >
> > > >   static inline void drmcg_update_cg_tree(struct drm_device *dev)
> > > >   {
> > > > +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > > > +        bitmap_fill(dev->drmcg_props.lgpu_slots,
> > > > +                   dev->drmcg_props.lgpu_capacity);
> > > > +
> > > >     /* init cgroups created before registration (i.e. root cgroup) */
> > > >     if (root_drmcg != NULL) {
> > > >             struct cgroup_subsys_state *pos;
> > > > @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
> > > >     for (i = 0; i <= TTM_PL_PRIV; i++)
> > > >             dev->drmcg_props.mem_highs_default[i] = S64_MAX;
> > > >
> > > > +   dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> > > > +
> > > >     drmcg_update_cg_tree(dev);
> > > >   }
> > > >   EXPORT_SYMBOL(drmcg_device_early_init);
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2019-10-09 10:31         ` Daniel Vetter
@ 2019-10-09 15:25             ` Kuehling, Felix
  2019-10-09 15:25             ` Kuehling, Felix
  1 sibling, 0 replies; 89+ messages in thread
From: Kuehling, Felix @ 2019-10-09 15:25 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ho, Kenny, jsparks, amd-gfx, lkaplan, Deucher, Alexander,
	y2kenny, dri-devel, Greathouse, Joseph, tj, cgroups, Koenig,
	Christian

On 2019-10-09 6:31, Daniel Vetter wrote:
> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
>>
>> The description sounds reasonable to me and maps well to the CU masking
>> feature in our GPUs.
>>
>> It would also allow us to do more coarse-grained masking for example to
>> guarantee balanced allocation of CUs across shader engines or
>> partitioning of memory bandwidth or CP pipes (if that is supported by
>> the hardware/firmware).
> Hm, so this sounds like the definition for how this cgroup is supposed to
> work is "amd CU masking" (whatever that exactly is). And the abstract
> description is just prettification on top, but not actually the real
> definition you guys want.

I think you're reading this as the opposite of what I was trying to say. 
Using CU masking is one possible implementation of LGPUs on AMD 
hardware. It's the one that Kenny implemented at the end of this patch 
series, and I pointed out some problems with that approach. Other ways 
to partition the hardware into LGPUs are conceivable. For example we're 
considering splitting it along the lines of shader engines, which is 
more coarse-grain and would also affect memory bandwidth available to 
each partition.

We could also consider partitioning pipes in our command processor, 
although that is not supported by our current CP scheduler firmware.

The bottom line is, the LGPU model proposed by Kenny is quite abstract 
and allows drivers implementing it a lot of flexibility depending on the 
capability of their hardware and firmware. We haven't settled on a final 
implementation choice even for AMD.

Regards,
   Felix


>
> I think adding a cgroup which is that much depending upon the hw
> implementation of the first driver supporting it is not a good idea.
> -Daniel
>
>> I can't comment on the code as I'm unfamiliar with the details of the
>> cgroup code.
>>
>> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>
>>
>>> ---
>>>    Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
>>>    include/drm/drm_cgroup.h                |   4 +
>>>    include/linux/cgroup_drm.h              |   6 ++
>>>    kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
>>>    4 files changed, 191 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>>> index 87a195133eaa..57f18469bd76 100644
>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>> @@ -1958,6 +1958,52 @@ DRM Interface Files
>>>    	Set largest allocation for /dev/dri/card1 to 4MB
>>>    	echo "226:1 4m" > drm.buffer.peak.max
>>>    
>>> +  drm.lgpu
>>> +	A read-write nested-keyed file which exists on all cgroups.
>>> +	Each entry is keyed by the DRM device's major:minor.
>>> +
>>> +	lgpu stands for logical GPU, it is an abstraction used to
>>> +	subdivide a physical DRM device for the purpose of resource
>>> +	management.
>>> +
>>> +	The lgpu is a discrete quantity that is device specific (i.e.
>>> +	some DRM devices may have 64 lgpus while others may have 100
>>> +	lgpus.)  The lgpu is a single quantity with two representations
>>> +	denoted by the following nested keys.
>>> +
>>> +	  =====     ========================================
>>> +	  count     Representing lgpu as anonymous resource
>>> +	  list      Representing lgpu as named resource
>>> +	  =====     ========================================
>>> +
>>> +	For example:
>>> +	226:0 count=256 list=0-255
>>> +	226:1 count=4 list=0,2,4,6
>>> +	226:2 count=32 list=32-63
>>> +
>>> +	lgpu is represented by a bitmap and uses the bitmap_parselist
>>> +	kernel function so the list key input format is a
>>> +	comma-separated list of decimal numbers and ranges.
>>> +
>>> +	Consecutively set bits are shown as two hyphen-separated decimal
>>> +	numbers, the smallest and largest bit numbers set in the range.
>>> +	Optionally each range can be postfixed to denote that only parts
>>> +	of it should be set.  The range will divided to groups of
>>> +	specific size.
>>> +	Syntax: range:used_size/group_size
>>> +	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>>> +
>>> +	The count key is the hamming weight / hweight of the bitmap.
>>> +
>>> +	Both count and list accept the max and default keywords.
>>> +
>>> +	Some DRM devices may only support lgpu as anonymous resources.
>>> +	In such case, the significance of the position of the set bits
>>> +	in list will be ignored.
>>> +
>>> +	This lgpu resource supports the 'allocation' resource
>>> +	distribution model.
>>> +
>>>    GEM Buffer Ownership
>>>    ~~~~~~~~~~~~~~~~~~~~
>>>    
>>> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
>>> index 6d9707e1eb72..a8d6be0b075b 100644
>>> --- a/include/drm/drm_cgroup.h
>>> +++ b/include/drm/drm_cgroup.h
>>> @@ -6,6 +6,7 @@
>>>    
>>>    #include <linux/cgroup_drm.h>
>>>    #include <linux/workqueue.h>
>>> +#include <linux/types.h>
>>>    #include <drm/ttm/ttm_bo_api.h>
>>>    #include <drm/ttm/ttm_bo_driver.h>
>>>    
>>> @@ -28,6 +29,9 @@ struct drmcg_props {
>>>    	s64			mem_highs_default[TTM_PL_PRIV+1];
>>>    
>>>    	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
>>> +
>>> +	int			lgpu_capacity;
>>> +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>>>    };
>>>    
>>>    #ifdef CONFIG_CGROUP_DRM
>>> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
>>> index c56cfe74d1a6..7b1cfc4ce4c3 100644
>>> --- a/include/linux/cgroup_drm.h
>>> +++ b/include/linux/cgroup_drm.h
>>> @@ -14,6 +14,8 @@
>>>    /* limit defined per the way drm_minor_alloc operates */
>>>    #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
>>>    
>>> +#define MAX_DRMCG_LGPU_CAPACITY 256
>>> +
>>>    enum drmcg_mem_bw_attr {
>>>    	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
>>>    	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
>>> @@ -32,6 +34,7 @@ enum drmcg_res_type {
>>>    	DRMCG_TYPE_MEM_PEAK,
>>>    	DRMCG_TYPE_BANDWIDTH,
>>>    	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
>>> +	DRMCG_TYPE_LGPU,
>>>    	__DRMCG_TYPE_LAST,
>>>    };
>>>    
>>> @@ -58,6 +61,9 @@ struct drmcg_device_resource {
>>>    	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
>>>    	s64			mem_bw_limits_bytes_in_period;
>>>    	s64			mem_bw_limits_avg_bytes_per_us;
>>> +
>>> +	s64			lgpu_used;
>>> +	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>>    };
>>>    
>>>    /**
>>> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
>>> index 0ea7f0619e25..18c4368e2c29 100644
>>> --- a/kernel/cgroup/drm.c
>>> +++ b/kernel/cgroup/drm.c
>>> @@ -9,6 +9,7 @@
>>>    #include <linux/cgroup_drm.h>
>>>    #include <linux/ktime.h>
>>>    #include <linux/kernel.h>
>>> +#include <linux/bitmap.h>
>>>    #include <drm/drm_file.h>
>>>    #include <drm/drm_drv.h>
>>>    #include <drm/ttm/ttm_bo_api.h>
>>> @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
>>>    #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
>>>    #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
>>>    
>>> +#define LGPU_LIMITS_NAME_LIST "list"
>>> +#define LGPU_LIMITS_NAME_COUNT "count"
>>> +
>>>    static struct drmcg *root_drmcg __read_mostly;
>>>    
>>>    static int drmcg_css_free_fn(int id, void *ptr, void *data)
>>> @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
>>>    	for (i = 0; i <= TTM_PL_PRIV; i++)
>>>    		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
>>>    
>>> +	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
>>> +			MAX_DRMCG_LGPU_CAPACITY);
>>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>> +
>>>    	mutex_unlock(&dev->drmcg_mutex);
>>>    	return 0;
>>>    }
>>> @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
>>>    				MEM_BW_LIMITS_NAME_AVG,
>>>    				ddr->mem_bw_limits_avg_bytes_per_us);
>>>    		break;
>>> +	case DRMCG_TYPE_LGPU:
>>> +		seq_printf(sf, "%s=%lld %s=%*pbl\n",
>>> +				LGPU_LIMITS_NAME_COUNT,
>>> +				ddr->lgpu_used,
>>> +				LGPU_LIMITS_NAME_LIST,
>>> +				dev->drmcg_props.lgpu_capacity,
>>> +				ddr->lgpu_allocated);
>>> +		break;
>>>    	default:
>>>    		seq_puts(sf, "\n");
>>>    		break;
>>> @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
>>>    				MEM_BW_LIMITS_NAME_AVG,
>>>    				props->mem_bw_avg_bytes_per_us_default);
>>>    		break;
>>> +	case DRMCG_TYPE_LGPU:
>>> +		seq_printf(sf, "%s=%d %s=%*pbl\n",
>>> +				LGPU_LIMITS_NAME_COUNT,
>>> +				bitmap_weight(props->lgpu_slots,
>>> +					props->lgpu_capacity),
>>> +				LGPU_LIMITS_NAME_LIST,
>>> +				props->lgpu_capacity,
>>> +				props->lgpu_slots);
>>> +		break;
>>>    	default:
>>>    		seq_puts(sf, "\n");
>>>    		break;
>>> @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
>>>    	mutex_unlock(&dev->drmcg_mutex);
>>>    }
>>>    
>>> +static void drmcg_lgpu_values_apply(struct drm_device *dev,
>>> +		struct drmcg_device_resource *ddr, unsigned long *val)
>>> +{
>>> +
>>> +	mutex_lock(&dev->drmcg_mutex);
>>> +	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
>>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>> +	mutex_unlock(&dev->drmcg_mutex);
>>> +}
>>> +
>>>    static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>>>    		struct drm_device *dev, char *attrs)
>>>    {
>>> +	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>>> +	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>>>    	enum drmcg_res_type type =
>>>    		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
>>>    	struct drmcg *drmcg = css_to_drmcg(of_css(of));
>>> @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>>>    				continue;
>>>    			}
>>>    			break; /* DRMCG_TYPE_MEM */
>>> +		case DRMCG_TYPE_LGPU:
>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
>>> +				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
>>> +				continue;
>>> +
>>> +                        if (!strcmp("max", sval) ||
>>> +					!strcmp("default", sval)) {
>>> +				if (parent != NULL)
>>> +					drmcg_lgpu_values_apply(dev, ddr,
>>> +						parent->dev_resources[minor]->
>>> +						lgpu_allocated);
>>> +				else
>>> +					drmcg_lgpu_values_apply(dev, ddr,
>>> +						props->lgpu_slots);
>>> +
>>> +				continue;
>>> +			}
>>> +
>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
>>> +				p_max = parent == NULL ? props->lgpu_capacity:
>>> +					bitmap_weight(
>>> +					parent->dev_resources[minor]->
>>> +					lgpu_allocated, props->lgpu_capacity);
>>> +
>>> +				rc = drmcg_process_limit_s64_val(sval,
>>> +					false, p_max, p_max, &val);
>>> +
>>> +				if (rc || val < 0) {
>>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
>>> +							minor);
>>> +					continue;
>>> +				}
>>> +
>>> +				bitmap_zero(tmp_bitmap,
>>> +						MAX_DRMCG_LGPU_CAPACITY);
>>> +				bitmap_set(tmp_bitmap, 0, val);
>>> +			}
>>> +
>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
>>> +				rc = bitmap_parselist(sval, tmp_bitmap,
>>> +						MAX_DRMCG_LGPU_CAPACITY);
>>> +
>>> +				if (rc) {
>>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
>>> +							minor);
>>> +					continue;
>>> +				}
>>> +
>>> +                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
>>> +					props->lgpu_slots,
>>> +					MAX_DRMCG_LGPU_CAPACITY);
>>> +
>>> +                        	if (!bitmap_empty(chk_bitmap,
>>> +						MAX_DRMCG_LGPU_CAPACITY)) {
>>> +					drmcg_pr_cft_err(drmcg, 0, cft_name,
>>> +							minor);
>>> +					continue;
>>> +				}
>>> +			}
>>> +
>>> +
>>> +                        if (parent != NULL) {
>>> +				bitmap_and(chk_bitmap, tmp_bitmap,
>>> +				parent->dev_resources[minor]->lgpu_allocated,
>>> +				props->lgpu_capacity);
>>> +
>>> +				if (bitmap_empty(chk_bitmap,
>>> +						props->lgpu_capacity)) {
>>> +					drmcg_pr_cft_err(drmcg, 0,
>>> +							cft_name, minor);
>>> +					continue;
>>> +				}
>>> +			}
>>> +
>>> +			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
>>> +
>>> +			break; /* DRMCG_TYPE_LGPU */
>>>    		default:
>>>    			break;
>>>    		} /* switch (type) */
>>> @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
>>>    			break;
>>>    		case DRMCG_TYPE_BANDWIDTH:
>>>    		case DRMCG_TYPE_MEM:
>>> +		case DRMCG_TYPE_LGPU:
>>>    			drmcg_nested_limit_parse(of, dm->dev, sattr);
>>>    			break;
>>>    		default:
>>> @@ -731,6 +846,20 @@ struct cftype files[] = {
>>>    		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
>>>    						DRMCG_FTYPE_DEFAULT),
>>>    	},
>>> +	{
>>> +		.name = "lgpu",
>>> +		.seq_show = drmcg_seq_show,
>>> +		.write = drmcg_limit_write,
>>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>>> +						DRMCG_FTYPE_LIMIT),
>>> +	},
>>> +	{
>>> +		.name = "lgpu.default",
>>> +		.seq_show = drmcg_seq_show,
>>> +		.flags = CFTYPE_ONLY_ON_ROOT,
>>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>>> +						DRMCG_FTYPE_DEFAULT),
>>> +	},
>>>    	{ }	/* terminate */
>>>    };
>>>    
>>> @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
>>>    
>>>    static inline void drmcg_update_cg_tree(struct drm_device *dev)
>>>    {
>>> +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>>> +        bitmap_fill(dev->drmcg_props.lgpu_slots,
>>> +			dev->drmcg_props.lgpu_capacity);
>>> +
>>>    	/* init cgroups created before registration (i.e. root cgroup) */
>>>    	if (root_drmcg != NULL) {
>>>    		struct cgroup_subsys_state *pos;
>>> @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
>>>    	for (i = 0; i <= TTM_PL_PRIV; i++)
>>>    		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
>>>    
>>> +	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
>>> +
>>>    	drmcg_update_cg_tree(dev);
>>>    }
>>>    EXPORT_SYMBOL(drmcg_device_early_init);
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
@ 2019-10-09 15:25             ` Kuehling, Felix
  0 siblings, 0 replies; 89+ messages in thread
From: Kuehling, Felix @ 2019-10-09 15:25 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ho, Kenny, jsparks, amd-gfx, lkaplan, Deucher, Alexander,
	y2kenny, dri-devel, Greathouse, Joseph, tj, cgroups, Koenig,
	Christian

On 2019-10-09 6:31, Daniel Vetter wrote:
> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
>>
>> The description sounds reasonable to me and maps well to the CU masking
>> feature in our GPUs.
>>
>> It would also allow us to do more coarse-grained masking for example to
>> guarantee balanced allocation of CUs across shader engines or
>> partitioning of memory bandwidth or CP pipes (if that is supported by
>> the hardware/firmware).
> Hm, so this sounds like the definition for how this cgroup is supposed to
> work is "amd CU masking" (whatever that exactly is). And the abstract
> description is just prettification on top, but not actually the real
> definition you guys want.

I think you're reading this as the opposite of what I was trying to say. 
Using CU masking is one possible implementation of LGPUs on AMD 
hardware. It's the one that Kenny implemented at the end of this patch 
series, and I pointed out some problems with that approach. Other ways 
to partition the hardware into LGPUs are conceivable. For example we're 
considering splitting it along the lines of shader engines, which is 
more coarse-grain and would also affect memory bandwidth available to 
each partition.

We could also consider partitioning pipes in our command processor, 
although that is not supported by our current CP scheduler firmware.

The bottom line is, the LGPU model proposed by Kenny is quite abstract 
and allows drivers implementing it a lot of flexibility depending on the 
capability of their hardware and firmware. We haven't settled on a final 
implementation choice even for AMD.

Regards,
   Felix


>
> I think adding a cgroup which is that much depending upon the hw
> implementation of the first driver supporting it is not a good idea.
> -Daniel
>
>> I can't comment on the code as I'm unfamiliar with the details of the
>> cgroup code.
>>
>> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>
>>
>>> ---
>>>    Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
>>>    include/drm/drm_cgroup.h                |   4 +
>>>    include/linux/cgroup_drm.h              |   6 ++
>>>    kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
>>>    4 files changed, 191 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>>> index 87a195133eaa..57f18469bd76 100644
>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>> @@ -1958,6 +1958,52 @@ DRM Interface Files
>>>    	Set largest allocation for /dev/dri/card1 to 4MB
>>>    	echo "226:1 4m" > drm.buffer.peak.max
>>>    
>>> +  drm.lgpu
>>> +	A read-write nested-keyed file which exists on all cgroups.
>>> +	Each entry is keyed by the DRM device's major:minor.
>>> +
>>> +	lgpu stands for logical GPU, it is an abstraction used to
>>> +	subdivide a physical DRM device for the purpose of resource
>>> +	management.
>>> +
>>> +	The lgpu is a discrete quantity that is device specific (i.e.
>>> +	some DRM devices may have 64 lgpus while others may have 100
>>> +	lgpus.)  The lgpu is a single quantity with two representations
>>> +	denoted by the following nested keys.
>>> +
>>> +	  =====     ========================================
>>> +	  count     Representing lgpu as anonymous resource
>>> +	  list      Representing lgpu as named resource
>>> +	  =====     ========================================
>>> +
>>> +	For example:
>>> +	226:0 count=256 list=0-255
>>> +	226:1 count=4 list=0,2,4,6
>>> +	226:2 count=32 list=32-63
>>> +
>>> +	lgpu is represented by a bitmap and uses the bitmap_parselist
>>> +	kernel function so the list key input format is a
>>> +	comma-separated list of decimal numbers and ranges.
>>> +
>>> +	Consecutively set bits are shown as two hyphen-separated decimal
>>> +	numbers, the smallest and largest bit numbers set in the range.
>>> +	Optionally each range can be postfixed to denote that only parts
>>> +	of it should be set.  The range will divided to groups of
>>> +	specific size.
>>> +	Syntax: range:used_size/group_size
>>> +	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>>> +
>>> +	The count key is the hamming weight / hweight of the bitmap.
>>> +
>>> +	Both count and list accept the max and default keywords.
>>> +
>>> +	Some DRM devices may only support lgpu as anonymous resources.
>>> +	In such case, the significance of the position of the set bits
>>> +	in list will be ignored.
>>> +
>>> +	This lgpu resource supports the 'allocation' resource
>>> +	distribution model.
>>> +
>>>    GEM Buffer Ownership
>>>    ~~~~~~~~~~~~~~~~~~~~
>>>    
>>> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
>>> index 6d9707e1eb72..a8d6be0b075b 100644
>>> --- a/include/drm/drm_cgroup.h
>>> +++ b/include/drm/drm_cgroup.h
>>> @@ -6,6 +6,7 @@
>>>    
>>>    #include <linux/cgroup_drm.h>
>>>    #include <linux/workqueue.h>
>>> +#include <linux/types.h>
>>>    #include <drm/ttm/ttm_bo_api.h>
>>>    #include <drm/ttm/ttm_bo_driver.h>
>>>    
>>> @@ -28,6 +29,9 @@ struct drmcg_props {
>>>    	s64			mem_highs_default[TTM_PL_PRIV+1];
>>>    
>>>    	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
>>> +
>>> +	int			lgpu_capacity;
>>> +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>>>    };
>>>    
>>>    #ifdef CONFIG_CGROUP_DRM
>>> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
>>> index c56cfe74d1a6..7b1cfc4ce4c3 100644
>>> --- a/include/linux/cgroup_drm.h
>>> +++ b/include/linux/cgroup_drm.h
>>> @@ -14,6 +14,8 @@
>>>    /* limit defined per the way drm_minor_alloc operates */
>>>    #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
>>>    
>>> +#define MAX_DRMCG_LGPU_CAPACITY 256
>>> +
>>>    enum drmcg_mem_bw_attr {
>>>    	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
>>>    	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
>>> @@ -32,6 +34,7 @@ enum drmcg_res_type {
>>>    	DRMCG_TYPE_MEM_PEAK,
>>>    	DRMCG_TYPE_BANDWIDTH,
>>>    	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
>>> +	DRMCG_TYPE_LGPU,
>>>    	__DRMCG_TYPE_LAST,
>>>    };
>>>    
>>> @@ -58,6 +61,9 @@ struct drmcg_device_resource {
>>>    	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
>>>    	s64			mem_bw_limits_bytes_in_period;
>>>    	s64			mem_bw_limits_avg_bytes_per_us;
>>> +
>>> +	s64			lgpu_used;
>>> +	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>>    };
>>>    
>>>    /**
>>> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
>>> index 0ea7f0619e25..18c4368e2c29 100644
>>> --- a/kernel/cgroup/drm.c
>>> +++ b/kernel/cgroup/drm.c
>>> @@ -9,6 +9,7 @@
>>>    #include <linux/cgroup_drm.h>
>>>    #include <linux/ktime.h>
>>>    #include <linux/kernel.h>
>>> +#include <linux/bitmap.h>
>>>    #include <drm/drm_file.h>
>>>    #include <drm/drm_drv.h>
>>>    #include <drm/ttm/ttm_bo_api.h>
>>> @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
>>>    #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
>>>    #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
>>>    
>>> +#define LGPU_LIMITS_NAME_LIST "list"
>>> +#define LGPU_LIMITS_NAME_COUNT "count"
>>> +
>>>    static struct drmcg *root_drmcg __read_mostly;
>>>    
>>>    static int drmcg_css_free_fn(int id, void *ptr, void *data)
>>> @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
>>>    	for (i = 0; i <= TTM_PL_PRIV; i++)
>>>    		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
>>>    
>>> +	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
>>> +			MAX_DRMCG_LGPU_CAPACITY);
>>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>> +
>>>    	mutex_unlock(&dev->drmcg_mutex);
>>>    	return 0;
>>>    }
>>> @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
>>>    				MEM_BW_LIMITS_NAME_AVG,
>>>    				ddr->mem_bw_limits_avg_bytes_per_us);
>>>    		break;
>>> +	case DRMCG_TYPE_LGPU:
>>> +		seq_printf(sf, "%s=%lld %s=%*pbl\n",
>>> +				LGPU_LIMITS_NAME_COUNT,
>>> +				ddr->lgpu_used,
>>> +				LGPU_LIMITS_NAME_LIST,
>>> +				dev->drmcg_props.lgpu_capacity,
>>> +				ddr->lgpu_allocated);
>>> +		break;
>>>    	default:
>>>    		seq_puts(sf, "\n");
>>>    		break;
>>> @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
>>>    				MEM_BW_LIMITS_NAME_AVG,
>>>    				props->mem_bw_avg_bytes_per_us_default);
>>>    		break;
>>> +	case DRMCG_TYPE_LGPU:
>>> +		seq_printf(sf, "%s=%d %s=%*pbl\n",
>>> +				LGPU_LIMITS_NAME_COUNT,
>>> +				bitmap_weight(props->lgpu_slots,
>>> +					props->lgpu_capacity),
>>> +				LGPU_LIMITS_NAME_LIST,
>>> +				props->lgpu_capacity,
>>> +				props->lgpu_slots);
>>> +		break;
>>>    	default:
>>>    		seq_puts(sf, "\n");
>>>    		break;
>>> @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
>>>    	mutex_unlock(&dev->drmcg_mutex);
>>>    }
>>>    
>>> +static void drmcg_lgpu_values_apply(struct drm_device *dev,
>>> +		struct drmcg_device_resource *ddr, unsigned long *val)
>>> +{
>>> +
>>> +	mutex_lock(&dev->drmcg_mutex);
>>> +	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
>>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>> +	mutex_unlock(&dev->drmcg_mutex);
>>> +}
>>> +
>>>    static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>>>    		struct drm_device *dev, char *attrs)
>>>    {
>>> +	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>>> +	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>>>    	enum drmcg_res_type type =
>>>    		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
>>>    	struct drmcg *drmcg = css_to_drmcg(of_css(of));
>>> @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>>>    				continue;
>>>    			}
>>>    			break; /* DRMCG_TYPE_MEM */
>>> +		case DRMCG_TYPE_LGPU:
>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
>>> +				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
>>> +				continue;
>>> +
>>> +                        if (!strcmp("max", sval) ||
>>> +					!strcmp("default", sval)) {
>>> +				if (parent != NULL)
>>> +					drmcg_lgpu_values_apply(dev, ddr,
>>> +						parent->dev_resources[minor]->
>>> +						lgpu_allocated);
>>> +				else
>>> +					drmcg_lgpu_values_apply(dev, ddr,
>>> +						props->lgpu_slots);
>>> +
>>> +				continue;
>>> +			}
>>> +
>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
>>> +				p_max = parent == NULL ? props->lgpu_capacity:
>>> +					bitmap_weight(
>>> +					parent->dev_resources[minor]->
>>> +					lgpu_allocated, props->lgpu_capacity);
>>> +
>>> +				rc = drmcg_process_limit_s64_val(sval,
>>> +					false, p_max, p_max, &val);
>>> +
>>> +				if (rc || val < 0) {
>>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
>>> +							minor);
>>> +					continue;
>>> +				}
>>> +
>>> +				bitmap_zero(tmp_bitmap,
>>> +						MAX_DRMCG_LGPU_CAPACITY);
>>> +				bitmap_set(tmp_bitmap, 0, val);
>>> +			}
>>> +
>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
>>> +				rc = bitmap_parselist(sval, tmp_bitmap,
>>> +						MAX_DRMCG_LGPU_CAPACITY);
>>> +
>>> +				if (rc) {
>>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
>>> +							minor);
>>> +					continue;
>>> +				}
>>> +
>>> +                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
>>> +					props->lgpu_slots,
>>> +					MAX_DRMCG_LGPU_CAPACITY);
>>> +
>>> +                        	if (!bitmap_empty(chk_bitmap,
>>> +						MAX_DRMCG_LGPU_CAPACITY)) {
>>> +					drmcg_pr_cft_err(drmcg, 0, cft_name,
>>> +							minor);
>>> +					continue;
>>> +				}
>>> +			}
>>> +
>>> +
>>> +                        if (parent != NULL) {
>>> +				bitmap_and(chk_bitmap, tmp_bitmap,
>>> +				parent->dev_resources[minor]->lgpu_allocated,
>>> +				props->lgpu_capacity);
>>> +
>>> +				if (bitmap_empty(chk_bitmap,
>>> +						props->lgpu_capacity)) {
>>> +					drmcg_pr_cft_err(drmcg, 0,
>>> +							cft_name, minor);
>>> +					continue;
>>> +				}
>>> +			}
>>> +
>>> +			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
>>> +
>>> +			break; /* DRMCG_TYPE_LGPU */
>>>    		default:
>>>    			break;
>>>    		} /* switch (type) */
>>> @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
>>>    			break;
>>>    		case DRMCG_TYPE_BANDWIDTH:
>>>    		case DRMCG_TYPE_MEM:
>>> +		case DRMCG_TYPE_LGPU:
>>>    			drmcg_nested_limit_parse(of, dm->dev, sattr);
>>>    			break;
>>>    		default:
>>> @@ -731,6 +846,20 @@ struct cftype files[] = {
>>>    		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
>>>    						DRMCG_FTYPE_DEFAULT),
>>>    	},
>>> +	{
>>> +		.name = "lgpu",
>>> +		.seq_show = drmcg_seq_show,
>>> +		.write = drmcg_limit_write,
>>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>>> +						DRMCG_FTYPE_LIMIT),
>>> +	},
>>> +	{
>>> +		.name = "lgpu.default",
>>> +		.seq_show = drmcg_seq_show,
>>> +		.flags = CFTYPE_ONLY_ON_ROOT,
>>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>>> +						DRMCG_FTYPE_DEFAULT),
>>> +	},
>>>    	{ }	/* terminate */
>>>    };
>>>    
>>> @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
>>>    
>>>    static inline void drmcg_update_cg_tree(struct drm_device *dev)
>>>    {
>>> +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>>> +        bitmap_fill(dev->drmcg_props.lgpu_slots,
>>> +			dev->drmcg_props.lgpu_capacity);
>>> +
>>>    	/* init cgroups created before registration (i.e. root cgroup) */
>>>    	if (root_drmcg != NULL) {
>>>    		struct cgroup_subsys_state *pos;
>>> @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
>>>    	for (i = 0; i <= TTM_PL_PRIV; i++)
>>>    		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
>>>    
>>> +	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
>>> +
>>>    	drmcg_update_cg_tree(dev);
>>>    }
>>>    EXPORT_SYMBOL(drmcg_device_early_init);
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
       [not found]             ` <ee873e89-48fd-c4c9-1ce0-73965f4ad2ba-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-09 15:34                 ` Daniel Vetter
  0 siblings, 0 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-10-09 15:34 UTC (permalink / raw)
  To: Kuehling, Felix
  Cc: Greathouse, Joseph, Ho, Kenny, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Deucher, Alexander, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Koenig, Christian

On Wed, Oct 09, 2019 at 03:25:22PM +0000, Kuehling, Felix wrote:
> On 2019-10-09 6:31, Daniel Vetter wrote:
> > On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> >>
> >> The description sounds reasonable to me and maps well to the CU masking
> >> feature in our GPUs.
> >>
> >> It would also allow us to do more coarse-grained masking for example to
> >> guarantee balanced allocation of CUs across shader engines or
> >> partitioning of memory bandwidth or CP pipes (if that is supported by
> >> the hardware/firmware).
> > Hm, so this sounds like the definition for how this cgroup is supposed to
> > work is "amd CU masking" (whatever that exactly is). And the abstract
> > description is just prettification on top, but not actually the real
> > definition you guys want.
> 
> I think you're reading this as the opposite of what I was trying to say. 
> Using CU masking is one possible implementation of LGPUs on AMD 
> hardware. It's the one that Kenny implemented at the end of this patch 
> series, and I pointed out some problems with that approach. Other ways 
> to partition the hardware into LGPUs are conceivable. For example we're 
> considering splitting it along the lines of shader engines, which is 
> more coarse-grain and would also affect memory bandwidth available to 
> each partition.

If this is supposed to be useful for admins then "other ways to partition
the hw are conceivable" is the problem. This should be unique&clear for
admins/end-users. Reading the implementation details and realizing that
the actual meaning is "amd CU masking" isn't good enough by far, since
that's meaningless on any other hw.

And if there's other ways to implement this cgroup for amd, it's also
meaningless (to sysadmins/users) for amd hw.

> We could also consider partitioning pipes in our command processor, 
> although that is not supported by our current CP scheduler firmware.
> 
> The bottom line is, the LGPU model proposed by Kenny is quite abstract 
> and allows drivers implementing it a lot of flexibility depending on the 
> capability of their hardware and firmware. We haven't settled on a final 
> implementation choice even for AMD.

That abstract model of essentially "anything goes" is the problem here
imo. E.g. for cpu cgroups this would be similar to allowing the bitmaks to
mean "cpu core" on one machine "physical die" on the next and maybe
"hyperthread unit" on the 3rd. Useless for admins.

So if we have a gpu bitmaks thing that might mean a command submissio pipe
on one hw (maybe matching what vk exposed, maybe not), some compute unit
mask on the next and something entirely different (e.g. intel has so
called GT slices with compute cores + more stuff around) on the 3rd vendor
then that's not useful for admins.
-Daniel

> 
> Regards,
>    Felix
> 
> 
> >
> > I think adding a cgroup which is that much depending upon the hw
> > implementation of the first driver supporting it is not a good idea.
> > -Daniel
> >
> >> I can't comment on the code as I'm unfamiliar with the details of the
> >> cgroup code.
> >>
> >> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> >>
> >>
> >>> ---
> >>>    Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
> >>>    include/drm/drm_cgroup.h                |   4 +
> >>>    include/linux/cgroup_drm.h              |   6 ++
> >>>    kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
> >>>    4 files changed, 191 insertions(+)
> >>>
> >>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >>> index 87a195133eaa..57f18469bd76 100644
> >>> --- a/Documentation/admin-guide/cgroup-v2.rst
> >>> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >>> @@ -1958,6 +1958,52 @@ DRM Interface Files
> >>>    	Set largest allocation for /dev/dri/card1 to 4MB
> >>>    	echo "226:1 4m" > drm.buffer.peak.max
> >>>    
> >>> +  drm.lgpu
> >>> +	A read-write nested-keyed file which exists on all cgroups.
> >>> +	Each entry is keyed by the DRM device's major:minor.
> >>> +
> >>> +	lgpu stands for logical GPU, it is an abstraction used to
> >>> +	subdivide a physical DRM device for the purpose of resource
> >>> +	management.
> >>> +
> >>> +	The lgpu is a discrete quantity that is device specific (i.e.
> >>> +	some DRM devices may have 64 lgpus while others may have 100
> >>> +	lgpus.)  The lgpu is a single quantity with two representations
> >>> +	denoted by the following nested keys.
> >>> +
> >>> +	  =====     ========================================
> >>> +	  count     Representing lgpu as anonymous resource
> >>> +	  list      Representing lgpu as named resource
> >>> +	  =====     ========================================
> >>> +
> >>> +	For example:
> >>> +	226:0 count=256 list=0-255
> >>> +	226:1 count=4 list=0,2,4,6
> >>> +	226:2 count=32 list=32-63
> >>> +
> >>> +	lgpu is represented by a bitmap and uses the bitmap_parselist
> >>> +	kernel function so the list key input format is a
> >>> +	comma-separated list of decimal numbers and ranges.
> >>> +
> >>> +	Consecutively set bits are shown as two hyphen-separated decimal
> >>> +	numbers, the smallest and largest bit numbers set in the range.
> >>> +	Optionally each range can be postfixed to denote that only parts
> >>> +	of it should be set.  The range will divided to groups of
> >>> +	specific size.
> >>> +	Syntax: range:used_size/group_size
> >>> +	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >>> +
> >>> +	The count key is the hamming weight / hweight of the bitmap.
> >>> +
> >>> +	Both count and list accept the max and default keywords.
> >>> +
> >>> +	Some DRM devices may only support lgpu as anonymous resources.
> >>> +	In such case, the significance of the position of the set bits
> >>> +	in list will be ignored.
> >>> +
> >>> +	This lgpu resource supports the 'allocation' resource
> >>> +	distribution model.
> >>> +
> >>>    GEM Buffer Ownership
> >>>    ~~~~~~~~~~~~~~~~~~~~
> >>>    
> >>> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> >>> index 6d9707e1eb72..a8d6be0b075b 100644
> >>> --- a/include/drm/drm_cgroup.h
> >>> +++ b/include/drm/drm_cgroup.h
> >>> @@ -6,6 +6,7 @@
> >>>    
> >>>    #include <linux/cgroup_drm.h>
> >>>    #include <linux/workqueue.h>
> >>> +#include <linux/types.h>
> >>>    #include <drm/ttm/ttm_bo_api.h>
> >>>    #include <drm/ttm/ttm_bo_driver.h>
> >>>    
> >>> @@ -28,6 +29,9 @@ struct drmcg_props {
> >>>    	s64			mem_highs_default[TTM_PL_PRIV+1];
> >>>    
> >>>    	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
> >>> +
> >>> +	int			lgpu_capacity;
> >>> +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >>>    };
> >>>    
> >>>    #ifdef CONFIG_CGROUP_DRM
> >>> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> >>> index c56cfe74d1a6..7b1cfc4ce4c3 100644
> >>> --- a/include/linux/cgroup_drm.h
> >>> +++ b/include/linux/cgroup_drm.h
> >>> @@ -14,6 +14,8 @@
> >>>    /* limit defined per the way drm_minor_alloc operates */
> >>>    #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >>>    
> >>> +#define MAX_DRMCG_LGPU_CAPACITY 256
> >>> +
> >>>    enum drmcg_mem_bw_attr {
> >>>    	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
> >>>    	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
> >>> @@ -32,6 +34,7 @@ enum drmcg_res_type {
> >>>    	DRMCG_TYPE_MEM_PEAK,
> >>>    	DRMCG_TYPE_BANDWIDTH,
> >>>    	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
> >>> +	DRMCG_TYPE_LGPU,
> >>>    	__DRMCG_TYPE_LAST,
> >>>    };
> >>>    
> >>> @@ -58,6 +61,9 @@ struct drmcg_device_resource {
> >>>    	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
> >>>    	s64			mem_bw_limits_bytes_in_period;
> >>>    	s64			mem_bw_limits_avg_bytes_per_us;
> >>> +
> >>> +	s64			lgpu_used;
> >>> +	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>>    };
> >>>    
> >>>    /**
> >>> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> >>> index 0ea7f0619e25..18c4368e2c29 100644
> >>> --- a/kernel/cgroup/drm.c
> >>> +++ b/kernel/cgroup/drm.c
> >>> @@ -9,6 +9,7 @@
> >>>    #include <linux/cgroup_drm.h>
> >>>    #include <linux/ktime.h>
> >>>    #include <linux/kernel.h>
> >>> +#include <linux/bitmap.h>
> >>>    #include <drm/drm_file.h>
> >>>    #include <drm/drm_drv.h>
> >>>    #include <drm/ttm/ttm_bo_api.h>
> >>> @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
> >>>    #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
> >>>    #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
> >>>    
> >>> +#define LGPU_LIMITS_NAME_LIST "list"
> >>> +#define LGPU_LIMITS_NAME_COUNT "count"
> >>> +
> >>>    static struct drmcg *root_drmcg __read_mostly;
> >>>    
> >>>    static int drmcg_css_free_fn(int id, void *ptr, void *data)
> >>> @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >>>    	for (i = 0; i <= TTM_PL_PRIV; i++)
> >>>    		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
> >>>    
> >>> +	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
> >>> +			MAX_DRMCG_LGPU_CAPACITY);
> >>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>> +
> >>>    	mutex_unlock(&dev->drmcg_mutex);
> >>>    	return 0;
> >>>    }
> >>> @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >>>    				MEM_BW_LIMITS_NAME_AVG,
> >>>    				ddr->mem_bw_limits_avg_bytes_per_us);
> >>>    		break;
> >>> +	case DRMCG_TYPE_LGPU:
> >>> +		seq_printf(sf, "%s=%lld %s=%*pbl\n",
> >>> +				LGPU_LIMITS_NAME_COUNT,
> >>> +				ddr->lgpu_used,
> >>> +				LGPU_LIMITS_NAME_LIST,
> >>> +				dev->drmcg_props.lgpu_capacity,
> >>> +				ddr->lgpu_allocated);
> >>> +		break;
> >>>    	default:
> >>>    		seq_puts(sf, "\n");
> >>>    		break;
> >>> @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
> >>>    				MEM_BW_LIMITS_NAME_AVG,
> >>>    				props->mem_bw_avg_bytes_per_us_default);
> >>>    		break;
> >>> +	case DRMCG_TYPE_LGPU:
> >>> +		seq_printf(sf, "%s=%d %s=%*pbl\n",
> >>> +				LGPU_LIMITS_NAME_COUNT,
> >>> +				bitmap_weight(props->lgpu_slots,
> >>> +					props->lgpu_capacity),
> >>> +				LGPU_LIMITS_NAME_LIST,
> >>> +				props->lgpu_capacity,
> >>> +				props->lgpu_slots);
> >>> +		break;
> >>>    	default:
> >>>    		seq_puts(sf, "\n");
> >>>    		break;
> >>> @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
> >>>    	mutex_unlock(&dev->drmcg_mutex);
> >>>    }
> >>>    
> >>> +static void drmcg_lgpu_values_apply(struct drm_device *dev,
> >>> +		struct drmcg_device_resource *ddr, unsigned long *val)
> >>> +{
> >>> +
> >>> +	mutex_lock(&dev->drmcg_mutex);
> >>> +	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
> >>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>> +	mutex_unlock(&dev->drmcg_mutex);
> >>> +}
> >>> +
> >>>    static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >>>    		struct drm_device *dev, char *attrs)
> >>>    {
> >>> +	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >>> +	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >>>    	enum drmcg_res_type type =
> >>>    		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >>>    	struct drmcg *drmcg = css_to_drmcg(of_css(of));
> >>> @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >>>    				continue;
> >>>    			}
> >>>    			break; /* DRMCG_TYPE_MEM */
> >>> +		case DRMCG_TYPE_LGPU:
> >>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> >>> +				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
> >>> +				continue;
> >>> +
> >>> +                        if (!strcmp("max", sval) ||
> >>> +					!strcmp("default", sval)) {
> >>> +				if (parent != NULL)
> >>> +					drmcg_lgpu_values_apply(dev, ddr,
> >>> +						parent->dev_resources[minor]->
> >>> +						lgpu_allocated);
> >>> +				else
> >>> +					drmcg_lgpu_values_apply(dev, ddr,
> >>> +						props->lgpu_slots);
> >>> +
> >>> +				continue;
> >>> +			}
> >>> +
> >>> +			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> >>> +				p_max = parent == NULL ? props->lgpu_capacity:
> >>> +					bitmap_weight(
> >>> +					parent->dev_resources[minor]->
> >>> +					lgpu_allocated, props->lgpu_capacity);
> >>> +
> >>> +				rc = drmcg_process_limit_s64_val(sval,
> >>> +					false, p_max, p_max, &val);
> >>> +
> >>> +				if (rc || val < 0) {
> >>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> >>> +							minor);
> >>> +					continue;
> >>> +				}
> >>> +
> >>> +				bitmap_zero(tmp_bitmap,
> >>> +						MAX_DRMCG_LGPU_CAPACITY);
> >>> +				bitmap_set(tmp_bitmap, 0, val);
> >>> +			}
> >>> +
> >>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> >>> +				rc = bitmap_parselist(sval, tmp_bitmap,
> >>> +						MAX_DRMCG_LGPU_CAPACITY);
> >>> +
> >>> +				if (rc) {
> >>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> >>> +							minor);
> >>> +					continue;
> >>> +				}
> >>> +
> >>> +                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
> >>> +					props->lgpu_slots,
> >>> +					MAX_DRMCG_LGPU_CAPACITY);
> >>> +
> >>> +                        	if (!bitmap_empty(chk_bitmap,
> >>> +						MAX_DRMCG_LGPU_CAPACITY)) {
> >>> +					drmcg_pr_cft_err(drmcg, 0, cft_name,
> >>> +							minor);
> >>> +					continue;
> >>> +				}
> >>> +			}
> >>> +
> >>> +
> >>> +                        if (parent != NULL) {
> >>> +				bitmap_and(chk_bitmap, tmp_bitmap,
> >>> +				parent->dev_resources[minor]->lgpu_allocated,
> >>> +				props->lgpu_capacity);
> >>> +
> >>> +				if (bitmap_empty(chk_bitmap,
> >>> +						props->lgpu_capacity)) {
> >>> +					drmcg_pr_cft_err(drmcg, 0,
> >>> +							cft_name, minor);
> >>> +					continue;
> >>> +				}
> >>> +			}
> >>> +
> >>> +			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
> >>> +
> >>> +			break; /* DRMCG_TYPE_LGPU */
> >>>    		default:
> >>>    			break;
> >>>    		} /* switch (type) */
> >>> @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >>>    			break;
> >>>    		case DRMCG_TYPE_BANDWIDTH:
> >>>    		case DRMCG_TYPE_MEM:
> >>> +		case DRMCG_TYPE_LGPU:
> >>>    			drmcg_nested_limit_parse(of, dm->dev, sattr);
> >>>    			break;
> >>>    		default:
> >>> @@ -731,6 +846,20 @@ struct cftype files[] = {
> >>>    		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
> >>>    						DRMCG_FTYPE_DEFAULT),
> >>>    	},
> >>> +	{
> >>> +		.name = "lgpu",
> >>> +		.seq_show = drmcg_seq_show,
> >>> +		.write = drmcg_limit_write,
> >>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >>> +						DRMCG_FTYPE_LIMIT),
> >>> +	},
> >>> +	{
> >>> +		.name = "lgpu.default",
> >>> +		.seq_show = drmcg_seq_show,
> >>> +		.flags = CFTYPE_ONLY_ON_ROOT,
> >>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >>> +						DRMCG_FTYPE_DEFAULT),
> >>> +	},
> >>>    	{ }	/* terminate */
> >>>    };
> >>>    
> >>> @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
> >>>    
> >>>    static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >>>    {
> >>> +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >>> +        bitmap_fill(dev->drmcg_props.lgpu_slots,
> >>> +			dev->drmcg_props.lgpu_capacity);
> >>> +
> >>>    	/* init cgroups created before registration (i.e. root cgroup) */
> >>>    	if (root_drmcg != NULL) {
> >>>    		struct cgroup_subsys_state *pos;
> >>> @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
> >>>    	for (i = 0; i <= TTM_PL_PRIV; i++)
> >>>    		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
> >>>    
> >>> +	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> >>> +
> >>>    	drmcg_update_cg_tree(dev);
> >>>    }
> >>>    EXPORT_SYMBOL(drmcg_device_early_init);

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
@ 2019-10-09 15:34                 ` Daniel Vetter
  0 siblings, 0 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-10-09 15:34 UTC (permalink / raw)
  To: Kuehling, Felix
  Cc: Greathouse, Joseph, Ho, Kenny, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Deucher, Alexander, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Koenig, Christian

On Wed, Oct 09, 2019 at 03:25:22PM +0000, Kuehling, Felix wrote:
> On 2019-10-09 6:31, Daniel Vetter wrote:
> > On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> >>
> >> The description sounds reasonable to me and maps well to the CU masking
> >> feature in our GPUs.
> >>
> >> It would also allow us to do more coarse-grained masking for example to
> >> guarantee balanced allocation of CUs across shader engines or
> >> partitioning of memory bandwidth or CP pipes (if that is supported by
> >> the hardware/firmware).
> > Hm, so this sounds like the definition for how this cgroup is supposed to
> > work is "amd CU masking" (whatever that exactly is). And the abstract
> > description is just prettification on top, but not actually the real
> > definition you guys want.
> 
> I think you're reading this as the opposite of what I was trying to say. 
> Using CU masking is one possible implementation of LGPUs on AMD 
> hardware. It's the one that Kenny implemented at the end of this patch 
> series, and I pointed out some problems with that approach. Other ways 
> to partition the hardware into LGPUs are conceivable. For example we're 
> considering splitting it along the lines of shader engines, which is 
> more coarse-grain and would also affect memory bandwidth available to 
> each partition.

If this is supposed to be useful for admins then "other ways to partition
the hw are conceivable" is the problem. This should be unique&clear for
admins/end-users. Reading the implementation details and realizing that
the actual meaning is "amd CU masking" isn't good enough by far, since
that's meaningless on any other hw.

And if there's other ways to implement this cgroup for amd, it's also
meaningless (to sysadmins/users) for amd hw.

> We could also consider partitioning pipes in our command processor, 
> although that is not supported by our current CP scheduler firmware.
> 
> The bottom line is, the LGPU model proposed by Kenny is quite abstract 
> and allows drivers implementing it a lot of flexibility depending on the 
> capability of their hardware and firmware. We haven't settled on a final 
> implementation choice even for AMD.

That abstract model of essentially "anything goes" is the problem here
imo. E.g. for cpu cgroups this would be similar to allowing the bitmaks to
mean "cpu core" on one machine "physical die" on the next and maybe
"hyperthread unit" on the 3rd. Useless for admins.

So if we have a gpu bitmaks thing that might mean a command submissio pipe
on one hw (maybe matching what vk exposed, maybe not), some compute unit
mask on the next and something entirely different (e.g. intel has so
called GT slices with compute cores + more stuff around) on the 3rd vendor
then that's not useful for admins.
-Daniel

> 
> Regards,
>    Felix
> 
> 
> >
> > I think adding a cgroup which is that much depending upon the hw
> > implementation of the first driver supporting it is not a good idea.
> > -Daniel
> >
> >> I can't comment on the code as I'm unfamiliar with the details of the
> >> cgroup code.
> >>
> >> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> >>
> >>
> >>> ---
> >>>    Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
> >>>    include/drm/drm_cgroup.h                |   4 +
> >>>    include/linux/cgroup_drm.h              |   6 ++
> >>>    kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
> >>>    4 files changed, 191 insertions(+)
> >>>
> >>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >>> index 87a195133eaa..57f18469bd76 100644
> >>> --- a/Documentation/admin-guide/cgroup-v2.rst
> >>> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >>> @@ -1958,6 +1958,52 @@ DRM Interface Files
> >>>    	Set largest allocation for /dev/dri/card1 to 4MB
> >>>    	echo "226:1 4m" > drm.buffer.peak.max
> >>>    
> >>> +  drm.lgpu
> >>> +	A read-write nested-keyed file which exists on all cgroups.
> >>> +	Each entry is keyed by the DRM device's major:minor.
> >>> +
> >>> +	lgpu stands for logical GPU, it is an abstraction used to
> >>> +	subdivide a physical DRM device for the purpose of resource
> >>> +	management.
> >>> +
> >>> +	The lgpu is a discrete quantity that is device specific (i.e.
> >>> +	some DRM devices may have 64 lgpus while others may have 100
> >>> +	lgpus.)  The lgpu is a single quantity with two representations
> >>> +	denoted by the following nested keys.
> >>> +
> >>> +	  =====     ========================================
> >>> +	  count     Representing lgpu as anonymous resource
> >>> +	  list      Representing lgpu as named resource
> >>> +	  =====     ========================================
> >>> +
> >>> +	For example:
> >>> +	226:0 count=256 list=0-255
> >>> +	226:1 count=4 list=0,2,4,6
> >>> +	226:2 count=32 list=32-63
> >>> +
> >>> +	lgpu is represented by a bitmap and uses the bitmap_parselist
> >>> +	kernel function so the list key input format is a
> >>> +	comma-separated list of decimal numbers and ranges.
> >>> +
> >>> +	Consecutively set bits are shown as two hyphen-separated decimal
> >>> +	numbers, the smallest and largest bit numbers set in the range.
> >>> +	Optionally each range can be postfixed to denote that only parts
> >>> +	of it should be set.  The range will divided to groups of
> >>> +	specific size.
> >>> +	Syntax: range:used_size/group_size
> >>> +	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >>> +
> >>> +	The count key is the hamming weight / hweight of the bitmap.
> >>> +
> >>> +	Both count and list accept the max and default keywords.
> >>> +
> >>> +	Some DRM devices may only support lgpu as anonymous resources.
> >>> +	In such case, the significance of the position of the set bits
> >>> +	in list will be ignored.
> >>> +
> >>> +	This lgpu resource supports the 'allocation' resource
> >>> +	distribution model.
> >>> +
> >>>    GEM Buffer Ownership
> >>>    ~~~~~~~~~~~~~~~~~~~~
> >>>    
> >>> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> >>> index 6d9707e1eb72..a8d6be0b075b 100644
> >>> --- a/include/drm/drm_cgroup.h
> >>> +++ b/include/drm/drm_cgroup.h
> >>> @@ -6,6 +6,7 @@
> >>>    
> >>>    #include <linux/cgroup_drm.h>
> >>>    #include <linux/workqueue.h>
> >>> +#include <linux/types.h>
> >>>    #include <drm/ttm/ttm_bo_api.h>
> >>>    #include <drm/ttm/ttm_bo_driver.h>
> >>>    
> >>> @@ -28,6 +29,9 @@ struct drmcg_props {
> >>>    	s64			mem_highs_default[TTM_PL_PRIV+1];
> >>>    
> >>>    	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
> >>> +
> >>> +	int			lgpu_capacity;
> >>> +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >>>    };
> >>>    
> >>>    #ifdef CONFIG_CGROUP_DRM
> >>> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> >>> index c56cfe74d1a6..7b1cfc4ce4c3 100644
> >>> --- a/include/linux/cgroup_drm.h
> >>> +++ b/include/linux/cgroup_drm.h
> >>> @@ -14,6 +14,8 @@
> >>>    /* limit defined per the way drm_minor_alloc operates */
> >>>    #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >>>    
> >>> +#define MAX_DRMCG_LGPU_CAPACITY 256
> >>> +
> >>>    enum drmcg_mem_bw_attr {
> >>>    	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
> >>>    	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
> >>> @@ -32,6 +34,7 @@ enum drmcg_res_type {
> >>>    	DRMCG_TYPE_MEM_PEAK,
> >>>    	DRMCG_TYPE_BANDWIDTH,
> >>>    	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
> >>> +	DRMCG_TYPE_LGPU,
> >>>    	__DRMCG_TYPE_LAST,
> >>>    };
> >>>    
> >>> @@ -58,6 +61,9 @@ struct drmcg_device_resource {
> >>>    	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
> >>>    	s64			mem_bw_limits_bytes_in_period;
> >>>    	s64			mem_bw_limits_avg_bytes_per_us;
> >>> +
> >>> +	s64			lgpu_used;
> >>> +	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>>    };
> >>>    
> >>>    /**
> >>> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> >>> index 0ea7f0619e25..18c4368e2c29 100644
> >>> --- a/kernel/cgroup/drm.c
> >>> +++ b/kernel/cgroup/drm.c
> >>> @@ -9,6 +9,7 @@
> >>>    #include <linux/cgroup_drm.h>
> >>>    #include <linux/ktime.h>
> >>>    #include <linux/kernel.h>
> >>> +#include <linux/bitmap.h>
> >>>    #include <drm/drm_file.h>
> >>>    #include <drm/drm_drv.h>
> >>>    #include <drm/ttm/ttm_bo_api.h>
> >>> @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
> >>>    #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
> >>>    #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
> >>>    
> >>> +#define LGPU_LIMITS_NAME_LIST "list"
> >>> +#define LGPU_LIMITS_NAME_COUNT "count"
> >>> +
> >>>    static struct drmcg *root_drmcg __read_mostly;
> >>>    
> >>>    static int drmcg_css_free_fn(int id, void *ptr, void *data)
> >>> @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >>>    	for (i = 0; i <= TTM_PL_PRIV; i++)
> >>>    		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
> >>>    
> >>> +	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
> >>> +			MAX_DRMCG_LGPU_CAPACITY);
> >>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>> +
> >>>    	mutex_unlock(&dev->drmcg_mutex);
> >>>    	return 0;
> >>>    }
> >>> @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >>>    				MEM_BW_LIMITS_NAME_AVG,
> >>>    				ddr->mem_bw_limits_avg_bytes_per_us);
> >>>    		break;
> >>> +	case DRMCG_TYPE_LGPU:
> >>> +		seq_printf(sf, "%s=%lld %s=%*pbl\n",
> >>> +				LGPU_LIMITS_NAME_COUNT,
> >>> +				ddr->lgpu_used,
> >>> +				LGPU_LIMITS_NAME_LIST,
> >>> +				dev->drmcg_props.lgpu_capacity,
> >>> +				ddr->lgpu_allocated);
> >>> +		break;
> >>>    	default:
> >>>    		seq_puts(sf, "\n");
> >>>    		break;
> >>> @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
> >>>    				MEM_BW_LIMITS_NAME_AVG,
> >>>    				props->mem_bw_avg_bytes_per_us_default);
> >>>    		break;
> >>> +	case DRMCG_TYPE_LGPU:
> >>> +		seq_printf(sf, "%s=%d %s=%*pbl\n",
> >>> +				LGPU_LIMITS_NAME_COUNT,
> >>> +				bitmap_weight(props->lgpu_slots,
> >>> +					props->lgpu_capacity),
> >>> +				LGPU_LIMITS_NAME_LIST,
> >>> +				props->lgpu_capacity,
> >>> +				props->lgpu_slots);
> >>> +		break;
> >>>    	default:
> >>>    		seq_puts(sf, "\n");
> >>>    		break;
> >>> @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
> >>>    	mutex_unlock(&dev->drmcg_mutex);
> >>>    }
> >>>    
> >>> +static void drmcg_lgpu_values_apply(struct drm_device *dev,
> >>> +		struct drmcg_device_resource *ddr, unsigned long *val)
> >>> +{
> >>> +
> >>> +	mutex_lock(&dev->drmcg_mutex);
> >>> +	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
> >>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>> +	mutex_unlock(&dev->drmcg_mutex);
> >>> +}
> >>> +
> >>>    static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >>>    		struct drm_device *dev, char *attrs)
> >>>    {
> >>> +	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >>> +	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >>>    	enum drmcg_res_type type =
> >>>    		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >>>    	struct drmcg *drmcg = css_to_drmcg(of_css(of));
> >>> @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >>>    				continue;
> >>>    			}
> >>>    			break; /* DRMCG_TYPE_MEM */
> >>> +		case DRMCG_TYPE_LGPU:
> >>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> >>> +				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
> >>> +				continue;
> >>> +
> >>> +                        if (!strcmp("max", sval) ||
> >>> +					!strcmp("default", sval)) {
> >>> +				if (parent != NULL)
> >>> +					drmcg_lgpu_values_apply(dev, ddr,
> >>> +						parent->dev_resources[minor]->
> >>> +						lgpu_allocated);
> >>> +				else
> >>> +					drmcg_lgpu_values_apply(dev, ddr,
> >>> +						props->lgpu_slots);
> >>> +
> >>> +				continue;
> >>> +			}
> >>> +
> >>> +			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> >>> +				p_max = parent == NULL ? props->lgpu_capacity:
> >>> +					bitmap_weight(
> >>> +					parent->dev_resources[minor]->
> >>> +					lgpu_allocated, props->lgpu_capacity);
> >>> +
> >>> +				rc = drmcg_process_limit_s64_val(sval,
> >>> +					false, p_max, p_max, &val);
> >>> +
> >>> +				if (rc || val < 0) {
> >>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> >>> +							minor);
> >>> +					continue;
> >>> +				}
> >>> +
> >>> +				bitmap_zero(tmp_bitmap,
> >>> +						MAX_DRMCG_LGPU_CAPACITY);
> >>> +				bitmap_set(tmp_bitmap, 0, val);
> >>> +			}
> >>> +
> >>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> >>> +				rc = bitmap_parselist(sval, tmp_bitmap,
> >>> +						MAX_DRMCG_LGPU_CAPACITY);
> >>> +
> >>> +				if (rc) {
> >>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> >>> +							minor);
> >>> +					continue;
> >>> +				}
> >>> +
> >>> +                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
> >>> +					props->lgpu_slots,
> >>> +					MAX_DRMCG_LGPU_CAPACITY);
> >>> +
> >>> +                        	if (!bitmap_empty(chk_bitmap,
> >>> +						MAX_DRMCG_LGPU_CAPACITY)) {
> >>> +					drmcg_pr_cft_err(drmcg, 0, cft_name,
> >>> +							minor);
> >>> +					continue;
> >>> +				}
> >>> +			}
> >>> +
> >>> +
> >>> +                        if (parent != NULL) {
> >>> +				bitmap_and(chk_bitmap, tmp_bitmap,
> >>> +				parent->dev_resources[minor]->lgpu_allocated,
> >>> +				props->lgpu_capacity);
> >>> +
> >>> +				if (bitmap_empty(chk_bitmap,
> >>> +						props->lgpu_capacity)) {
> >>> +					drmcg_pr_cft_err(drmcg, 0,
> >>> +							cft_name, minor);
> >>> +					continue;
> >>> +				}
> >>> +			}
> >>> +
> >>> +			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
> >>> +
> >>> +			break; /* DRMCG_TYPE_LGPU */
> >>>    		default:
> >>>    			break;
> >>>    		} /* switch (type) */
> >>> @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >>>    			break;
> >>>    		case DRMCG_TYPE_BANDWIDTH:
> >>>    		case DRMCG_TYPE_MEM:
> >>> +		case DRMCG_TYPE_LGPU:
> >>>    			drmcg_nested_limit_parse(of, dm->dev, sattr);
> >>>    			break;
> >>>    		default:
> >>> @@ -731,6 +846,20 @@ struct cftype files[] = {
> >>>    		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
> >>>    						DRMCG_FTYPE_DEFAULT),
> >>>    	},
> >>> +	{
> >>> +		.name = "lgpu",
> >>> +		.seq_show = drmcg_seq_show,
> >>> +		.write = drmcg_limit_write,
> >>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >>> +						DRMCG_FTYPE_LIMIT),
> >>> +	},
> >>> +	{
> >>> +		.name = "lgpu.default",
> >>> +		.seq_show = drmcg_seq_show,
> >>> +		.flags = CFTYPE_ONLY_ON_ROOT,
> >>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >>> +						DRMCG_FTYPE_DEFAULT),
> >>> +	},
> >>>    	{ }	/* terminate */
> >>>    };
> >>>    
> >>> @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
> >>>    
> >>>    static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >>>    {
> >>> +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >>> +        bitmap_fill(dev->drmcg_props.lgpu_slots,
> >>> +			dev->drmcg_props.lgpu_capacity);
> >>> +
> >>>    	/* init cgroups created before registration (i.e. root cgroup) */
> >>>    	if (root_drmcg != NULL) {
> >>>    		struct cgroup_subsys_state *pos;
> >>> @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
> >>>    	for (i = 0; i <= TTM_PL_PRIV; i++)
> >>>    		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
> >>>    
> >>> +	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> >>> +
> >>>    	drmcg_update_cg_tree(dev);
> >>>    }
> >>>    EXPORT_SYMBOL(drmcg_device_early_init);

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
       [not found]                 ` <20191009153429.GI16989-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-10-09 15:53                   ` Kuehling, Felix
       [not found]                     ` <c7812af4-7ec4-02bb-ff4c-21dd114cf38e-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 89+ messages in thread
From: Kuehling, Felix @ 2019-10-09 15:53 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ho, Kenny, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Deucher, Alexander, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Greathouse, Joseph,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Koenig, Christian

On 2019-10-09 11:34, Daniel Vetter wrote:
> On Wed, Oct 09, 2019 at 03:25:22PM +0000, Kuehling, Felix wrote:
>> On 2019-10-09 6:31, Daniel Vetter wrote:
>>> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
>>>> The description sounds reasonable to me and maps well to the CU masking
>>>> feature in our GPUs.
>>>>
>>>> It would also allow us to do more coarse-grained masking for example to
>>>> guarantee balanced allocation of CUs across shader engines or
>>>> partitioning of memory bandwidth or CP pipes (if that is supported by
>>>> the hardware/firmware).
>>> Hm, so this sounds like the definition for how this cgroup is supposed to
>>> work is "amd CU masking" (whatever that exactly is). And the abstract
>>> description is just prettification on top, but not actually the real
>>> definition you guys want.
>> I think you're reading this as the opposite of what I was trying to say.
>> Using CU masking is one possible implementation of LGPUs on AMD
>> hardware. It's the one that Kenny implemented at the end of this patch
>> series, and I pointed out some problems with that approach. Other ways
>> to partition the hardware into LGPUs are conceivable. For example we're
>> considering splitting it along the lines of shader engines, which is
>> more coarse-grain and would also affect memory bandwidth available to
>> each partition.
> If this is supposed to be useful for admins then "other ways to partition
> the hw are conceivable" is the problem. This should be unique&clear for
> admins/end-users. Reading the implementation details and realizing that
> the actual meaning is "amd CU masking" isn't good enough by far, since
> that's meaningless on any other hw.
>
> And if there's other ways to implement this cgroup for amd, it's also
> meaningless (to sysadmins/users) for amd hw.
>
>> We could also consider partitioning pipes in our command processor,
>> although that is not supported by our current CP scheduler firmware.
>>
>> The bottom line is, the LGPU model proposed by Kenny is quite abstract
>> and allows drivers implementing it a lot of flexibility depending on the
>> capability of their hardware and firmware. We haven't settled on a final
>> implementation choice even for AMD.
> That abstract model of essentially "anything goes" is the problem here
> imo. E.g. for cpu cgroups this would be similar to allowing the bitmaks to
> mean "cpu core" on one machine "physical die" on the next and maybe
> "hyperthread unit" on the 3rd. Useless for admins.
>
> So if we have a gpu bitmaks thing that might mean a command submissio pipe
> on one hw (maybe matching what vk exposed, maybe not), some compute unit
> mask on the next and something entirely different (e.g. intel has so
> called GT slices with compute cores + more stuff around) on the 3rd vendor
> then that's not useful for admins.

The goal is to partition GPU compute resources to eliminate as much 
resource contention as possible between different partitions. Different 
hardware will have different capabilities to implement this. No 
implementation will be perfect. For example, even with CPU cores that 
are supposedly well defined, you can still have different behaviours 
depending on CPU cache architectures, NUMA and thermal management across 
CPU cores. The admin will need some knowledge of their hardware 
architecture to understand those effects that are not described by the 
abstract model of cgroups.

The LGPU model is deliberately flexible, because GPU architectures are 
much less standardized than CPU architectures. Expecting a common model 
that is both very specific and applicable to to all GPUs is unrealistic, 
in my opinion.

Regards,
   Felix


> -Daniel
>
>> Regards,
>>     Felix
>>
>>
>>> I think adding a cgroup which is that much depending upon the hw
>>> implementation of the first driver supporting it is not a good idea.
>>> -Daniel
>>>
>>>> I can't comment on the code as I'm unfamiliar with the details of the
>>>> cgroup code.
>>>>
>>>> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>
>>>>
>>>>> ---
>>>>>     Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
>>>>>     include/drm/drm_cgroup.h                |   4 +
>>>>>     include/linux/cgroup_drm.h              |   6 ++
>>>>>     kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
>>>>>     4 files changed, 191 insertions(+)
>>>>>
>>>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>>>>> index 87a195133eaa..57f18469bd76 100644
>>>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>>>> @@ -1958,6 +1958,52 @@ DRM Interface Files
>>>>>     	Set largest allocation for /dev/dri/card1 to 4MB
>>>>>     	echo "226:1 4m" > drm.buffer.peak.max
>>>>>     
>>>>> +  drm.lgpu
>>>>> +	A read-write nested-keyed file which exists on all cgroups.
>>>>> +	Each entry is keyed by the DRM device's major:minor.
>>>>> +
>>>>> +	lgpu stands for logical GPU, it is an abstraction used to
>>>>> +	subdivide a physical DRM device for the purpose of resource
>>>>> +	management.
>>>>> +
>>>>> +	The lgpu is a discrete quantity that is device specific (i.e.
>>>>> +	some DRM devices may have 64 lgpus while others may have 100
>>>>> +	lgpus.)  The lgpu is a single quantity with two representations
>>>>> +	denoted by the following nested keys.
>>>>> +
>>>>> +	  =====     ========================================
>>>>> +	  count     Representing lgpu as anonymous resource
>>>>> +	  list      Representing lgpu as named resource
>>>>> +	  =====     ========================================
>>>>> +
>>>>> +	For example:
>>>>> +	226:0 count=256 list=0-255
>>>>> +	226:1 count=4 list=0,2,4,6
>>>>> +	226:2 count=32 list=32-63
>>>>> +
>>>>> +	lgpu is represented by a bitmap and uses the bitmap_parselist
>>>>> +	kernel function so the list key input format is a
>>>>> +	comma-separated list of decimal numbers and ranges.
>>>>> +
>>>>> +	Consecutively set bits are shown as two hyphen-separated decimal
>>>>> +	numbers, the smallest and largest bit numbers set in the range.
>>>>> +	Optionally each range can be postfixed to denote that only parts
>>>>> +	of it should be set.  The range will divided to groups of
>>>>> +	specific size.
>>>>> +	Syntax: range:used_size/group_size
>>>>> +	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
>>>>> +
>>>>> +	The count key is the hamming weight / hweight of the bitmap.
>>>>> +
>>>>> +	Both count and list accept the max and default keywords.
>>>>> +
>>>>> +	Some DRM devices may only support lgpu as anonymous resources.
>>>>> +	In such case, the significance of the position of the set bits
>>>>> +	in list will be ignored.
>>>>> +
>>>>> +	This lgpu resource supports the 'allocation' resource
>>>>> +	distribution model.
>>>>> +
>>>>>     GEM Buffer Ownership
>>>>>     ~~~~~~~~~~~~~~~~~~~~
>>>>>     
>>>>> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
>>>>> index 6d9707e1eb72..a8d6be0b075b 100644
>>>>> --- a/include/drm/drm_cgroup.h
>>>>> +++ b/include/drm/drm_cgroup.h
>>>>> @@ -6,6 +6,7 @@
>>>>>     
>>>>>     #include <linux/cgroup_drm.h>
>>>>>     #include <linux/workqueue.h>
>>>>> +#include <linux/types.h>
>>>>>     #include <drm/ttm/ttm_bo_api.h>
>>>>>     #include <drm/ttm/ttm_bo_driver.h>
>>>>>     
>>>>> @@ -28,6 +29,9 @@ struct drmcg_props {
>>>>>     	s64			mem_highs_default[TTM_PL_PRIV+1];
>>>>>     
>>>>>     	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
>>>>> +
>>>>> +	int			lgpu_capacity;
>>>>> +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>>>>>     };
>>>>>     
>>>>>     #ifdef CONFIG_CGROUP_DRM
>>>>> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
>>>>> index c56cfe74d1a6..7b1cfc4ce4c3 100644
>>>>> --- a/include/linux/cgroup_drm.h
>>>>> +++ b/include/linux/cgroup_drm.h
>>>>> @@ -14,6 +14,8 @@
>>>>>     /* limit defined per the way drm_minor_alloc operates */
>>>>>     #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
>>>>>     
>>>>> +#define MAX_DRMCG_LGPU_CAPACITY 256
>>>>> +
>>>>>     enum drmcg_mem_bw_attr {
>>>>>     	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
>>>>>     	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
>>>>> @@ -32,6 +34,7 @@ enum drmcg_res_type {
>>>>>     	DRMCG_TYPE_MEM_PEAK,
>>>>>     	DRMCG_TYPE_BANDWIDTH,
>>>>>     	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
>>>>> +	DRMCG_TYPE_LGPU,
>>>>>     	__DRMCG_TYPE_LAST,
>>>>>     };
>>>>>     
>>>>> @@ -58,6 +61,9 @@ struct drmcg_device_resource {
>>>>>     	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
>>>>>     	s64			mem_bw_limits_bytes_in_period;
>>>>>     	s64			mem_bw_limits_avg_bytes_per_us;
>>>>> +
>>>>> +	s64			lgpu_used;
>>>>> +	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>>>>     };
>>>>>     
>>>>>     /**
>>>>> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
>>>>> index 0ea7f0619e25..18c4368e2c29 100644
>>>>> --- a/kernel/cgroup/drm.c
>>>>> +++ b/kernel/cgroup/drm.c
>>>>> @@ -9,6 +9,7 @@
>>>>>     #include <linux/cgroup_drm.h>
>>>>>     #include <linux/ktime.h>
>>>>>     #include <linux/kernel.h>
>>>>> +#include <linux/bitmap.h>
>>>>>     #include <drm/drm_file.h>
>>>>>     #include <drm/drm_drv.h>
>>>>>     #include <drm/ttm/ttm_bo_api.h>
>>>>> @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
>>>>>     #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
>>>>>     #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
>>>>>     
>>>>> +#define LGPU_LIMITS_NAME_LIST "list"
>>>>> +#define LGPU_LIMITS_NAME_COUNT "count"
>>>>> +
>>>>>     static struct drmcg *root_drmcg __read_mostly;
>>>>>     
>>>>>     static int drmcg_css_free_fn(int id, void *ptr, void *data)
>>>>> @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
>>>>>     	for (i = 0; i <= TTM_PL_PRIV; i++)
>>>>>     		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
>>>>>     
>>>>> +	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
>>>>> +			MAX_DRMCG_LGPU_CAPACITY);
>>>>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>>>> +
>>>>>     	mutex_unlock(&dev->drmcg_mutex);
>>>>>     	return 0;
>>>>>     }
>>>>> @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
>>>>>     				MEM_BW_LIMITS_NAME_AVG,
>>>>>     				ddr->mem_bw_limits_avg_bytes_per_us);
>>>>>     		break;
>>>>> +	case DRMCG_TYPE_LGPU:
>>>>> +		seq_printf(sf, "%s=%lld %s=%*pbl\n",
>>>>> +				LGPU_LIMITS_NAME_COUNT,
>>>>> +				ddr->lgpu_used,
>>>>> +				LGPU_LIMITS_NAME_LIST,
>>>>> +				dev->drmcg_props.lgpu_capacity,
>>>>> +				ddr->lgpu_allocated);
>>>>> +		break;
>>>>>     	default:
>>>>>     		seq_puts(sf, "\n");
>>>>>     		break;
>>>>> @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
>>>>>     				MEM_BW_LIMITS_NAME_AVG,
>>>>>     				props->mem_bw_avg_bytes_per_us_default);
>>>>>     		break;
>>>>> +	case DRMCG_TYPE_LGPU:
>>>>> +		seq_printf(sf, "%s=%d %s=%*pbl\n",
>>>>> +				LGPU_LIMITS_NAME_COUNT,
>>>>> +				bitmap_weight(props->lgpu_slots,
>>>>> +					props->lgpu_capacity),
>>>>> +				LGPU_LIMITS_NAME_LIST,
>>>>> +				props->lgpu_capacity,
>>>>> +				props->lgpu_slots);
>>>>> +		break;
>>>>>     	default:
>>>>>     		seq_puts(sf, "\n");
>>>>>     		break;
>>>>> @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
>>>>>     	mutex_unlock(&dev->drmcg_mutex);
>>>>>     }
>>>>>     
>>>>> +static void drmcg_lgpu_values_apply(struct drm_device *dev,
>>>>> +		struct drmcg_device_resource *ddr, unsigned long *val)
>>>>> +{
>>>>> +
>>>>> +	mutex_lock(&dev->drmcg_mutex);
>>>>> +	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
>>>>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
>>>>> +	mutex_unlock(&dev->drmcg_mutex);
>>>>> +}
>>>>> +
>>>>>     static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>>>>>     		struct drm_device *dev, char *attrs)
>>>>>     {
>>>>> +	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>>>>> +	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
>>>>>     	enum drmcg_res_type type =
>>>>>     		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
>>>>>     	struct drmcg *drmcg = css_to_drmcg(of_css(of));
>>>>> @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
>>>>>     				continue;
>>>>>     			}
>>>>>     			break; /* DRMCG_TYPE_MEM */
>>>>> +		case DRMCG_TYPE_LGPU:
>>>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
>>>>> +				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
>>>>> +				continue;
>>>>> +
>>>>> +                        if (!strcmp("max", sval) ||
>>>>> +					!strcmp("default", sval)) {
>>>>> +				if (parent != NULL)
>>>>> +					drmcg_lgpu_values_apply(dev, ddr,
>>>>> +						parent->dev_resources[minor]->
>>>>> +						lgpu_allocated);
>>>>> +				else
>>>>> +					drmcg_lgpu_values_apply(dev, ddr,
>>>>> +						props->lgpu_slots);
>>>>> +
>>>>> +				continue;
>>>>> +			}
>>>>> +
>>>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
>>>>> +				p_max = parent == NULL ? props->lgpu_capacity:
>>>>> +					bitmap_weight(
>>>>> +					parent->dev_resources[minor]->
>>>>> +					lgpu_allocated, props->lgpu_capacity);
>>>>> +
>>>>> +				rc = drmcg_process_limit_s64_val(sval,
>>>>> +					false, p_max, p_max, &val);
>>>>> +
>>>>> +				if (rc || val < 0) {
>>>>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
>>>>> +							minor);
>>>>> +					continue;
>>>>> +				}
>>>>> +
>>>>> +				bitmap_zero(tmp_bitmap,
>>>>> +						MAX_DRMCG_LGPU_CAPACITY);
>>>>> +				bitmap_set(tmp_bitmap, 0, val);
>>>>> +			}
>>>>> +
>>>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
>>>>> +				rc = bitmap_parselist(sval, tmp_bitmap,
>>>>> +						MAX_DRMCG_LGPU_CAPACITY);
>>>>> +
>>>>> +				if (rc) {
>>>>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
>>>>> +							minor);
>>>>> +					continue;
>>>>> +				}
>>>>> +
>>>>> +                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
>>>>> +					props->lgpu_slots,
>>>>> +					MAX_DRMCG_LGPU_CAPACITY);
>>>>> +
>>>>> +                        	if (!bitmap_empty(chk_bitmap,
>>>>> +						MAX_DRMCG_LGPU_CAPACITY)) {
>>>>> +					drmcg_pr_cft_err(drmcg, 0, cft_name,
>>>>> +							minor);
>>>>> +					continue;
>>>>> +				}
>>>>> +			}
>>>>> +
>>>>> +
>>>>> +                        if (parent != NULL) {
>>>>> +				bitmap_and(chk_bitmap, tmp_bitmap,
>>>>> +				parent->dev_resources[minor]->lgpu_allocated,
>>>>> +				props->lgpu_capacity);
>>>>> +
>>>>> +				if (bitmap_empty(chk_bitmap,
>>>>> +						props->lgpu_capacity)) {
>>>>> +					drmcg_pr_cft_err(drmcg, 0,
>>>>> +							cft_name, minor);
>>>>> +					continue;
>>>>> +				}
>>>>> +			}
>>>>> +
>>>>> +			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
>>>>> +
>>>>> +			break; /* DRMCG_TYPE_LGPU */
>>>>>     		default:
>>>>>     			break;
>>>>>     		} /* switch (type) */
>>>>> @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
>>>>>     			break;
>>>>>     		case DRMCG_TYPE_BANDWIDTH:
>>>>>     		case DRMCG_TYPE_MEM:
>>>>> +		case DRMCG_TYPE_LGPU:
>>>>>     			drmcg_nested_limit_parse(of, dm->dev, sattr);
>>>>>     			break;
>>>>>     		default:
>>>>> @@ -731,6 +846,20 @@ struct cftype files[] = {
>>>>>     		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
>>>>>     						DRMCG_FTYPE_DEFAULT),
>>>>>     	},
>>>>> +	{
>>>>> +		.name = "lgpu",
>>>>> +		.seq_show = drmcg_seq_show,
>>>>> +		.write = drmcg_limit_write,
>>>>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>>>>> +						DRMCG_FTYPE_LIMIT),
>>>>> +	},
>>>>> +	{
>>>>> +		.name = "lgpu.default",
>>>>> +		.seq_show = drmcg_seq_show,
>>>>> +		.flags = CFTYPE_ONLY_ON_ROOT,
>>>>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
>>>>> +						DRMCG_FTYPE_DEFAULT),
>>>>> +	},
>>>>>     	{ }	/* terminate */
>>>>>     };
>>>>>     
>>>>> @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
>>>>>     
>>>>>     static inline void drmcg_update_cg_tree(struct drm_device *dev)
>>>>>     {
>>>>> +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
>>>>> +        bitmap_fill(dev->drmcg_props.lgpu_slots,
>>>>> +			dev->drmcg_props.lgpu_capacity);
>>>>> +
>>>>>     	/* init cgroups created before registration (i.e. root cgroup) */
>>>>>     	if (root_drmcg != NULL) {
>>>>>     		struct cgroup_subsys_state *pos;
>>>>> @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
>>>>>     	for (i = 0; i <= TTM_PL_PRIV; i++)
>>>>>     		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
>>>>>     
>>>>> +	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
>>>>> +
>>>>>     	drmcg_update_cg_tree(dev);
>>>>>     }
>>>>>     EXPORT_SYMBOL(drmcg_device_early_init);
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
       [not found]                     ` <c7812af4-7ec4-02bb-ff4c-21dd114cf38e-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-09 16:06                       ` Daniel Vetter
  2019-10-09 18:52                         ` Greathouse, Joseph
  2019-10-11 17:12                         ` tj
  0 siblings, 2 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-10-09 16:06 UTC (permalink / raw)
  To: Kuehling, Felix
  Cc: Greathouse, Joseph, Ho, Kenny, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Deucher, Alexander, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Koenig, Christian

On Wed, Oct 09, 2019 at 03:53:42PM +0000, Kuehling, Felix wrote:
> On 2019-10-09 11:34, Daniel Vetter wrote:
> > On Wed, Oct 09, 2019 at 03:25:22PM +0000, Kuehling, Felix wrote:
> >> On 2019-10-09 6:31, Daniel Vetter wrote:
> >>> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> >>>> The description sounds reasonable to me and maps well to the CU masking
> >>>> feature in our GPUs.
> >>>>
> >>>> It would also allow us to do more coarse-grained masking for example to
> >>>> guarantee balanced allocation of CUs across shader engines or
> >>>> partitioning of memory bandwidth or CP pipes (if that is supported by
> >>>> the hardware/firmware).
> >>> Hm, so this sounds like the definition for how this cgroup is supposed to
> >>> work is "amd CU masking" (whatever that exactly is). And the abstract
> >>> description is just prettification on top, but not actually the real
> >>> definition you guys want.
> >> I think you're reading this as the opposite of what I was trying to say.
> >> Using CU masking is one possible implementation of LGPUs on AMD
> >> hardware. It's the one that Kenny implemented at the end of this patch
> >> series, and I pointed out some problems with that approach. Other ways
> >> to partition the hardware into LGPUs are conceivable. For example we're
> >> considering splitting it along the lines of shader engines, which is
> >> more coarse-grain and would also affect memory bandwidth available to
> >> each partition.
> > If this is supposed to be useful for admins then "other ways to partition
> > the hw are conceivable" is the problem. This should be unique&clear for
> > admins/end-users. Reading the implementation details and realizing that
> > the actual meaning is "amd CU masking" isn't good enough by far, since
> > that's meaningless on any other hw.
> >
> > And if there's other ways to implement this cgroup for amd, it's also
> > meaningless (to sysadmins/users) for amd hw.
> >
> >> We could also consider partitioning pipes in our command processor,
> >> although that is not supported by our current CP scheduler firmware.
> >>
> >> The bottom line is, the LGPU model proposed by Kenny is quite abstract
> >> and allows drivers implementing it a lot of flexibility depending on the
> >> capability of their hardware and firmware. We haven't settled on a final
> >> implementation choice even for AMD.
> > That abstract model of essentially "anything goes" is the problem here
> > imo. E.g. for cpu cgroups this would be similar to allowing the bitmaks to
> > mean "cpu core" on one machine "physical die" on the next and maybe
> > "hyperthread unit" on the 3rd. Useless for admins.
> >
> > So if we have a gpu bitmaks thing that might mean a command submissio pipe
> > on one hw (maybe matching what vk exposed, maybe not), some compute unit
> > mask on the next and something entirely different (e.g. intel has so
> > called GT slices with compute cores + more stuff around) on the 3rd vendor
> > then that's not useful for admins.
> 
> The goal is to partition GPU compute resources to eliminate as much 
> resource contention as possible between different partitions. Different 
> hardware will have different capabilities to implement this. No 
> implementation will be perfect. For example, even with CPU cores that 
> are supposedly well defined, you can still have different behaviours 
> depending on CPU cache architectures, NUMA and thermal management across 
> CPU cores. The admin will need some knowledge of their hardware 
> architecture to understand those effects that are not described by the 
> abstract model of cgroups.

That's not the point I was making. For cpu cgroups there's a very well
defined connection between the cpu bitmasks/numbers in cgroups and the cpu
bitmasks you use in various system calls (they match). And that stuff
works across vendors.

We need the same for gpus.

> The LGPU model is deliberately flexible, because GPU architectures are 
> much less standardized than CPU architectures. Expecting a common model 
> that is both very specific and applicable to to all GPUs is unrealistic, 
> in my opinion.

So pure abstraction isn't useful, we need to know what these bits mean.
Since if they e.g. mean vk pipes, then maybe I shouldn't be using those vk
pipes in my application anymore. Or we need to define that the userspace
driver needs to filter out any pipes that arent' accessible (if that's
possible, no idea).

cgroups that essentially have pure hw depedent meaning aren't useful.
Note: this is about the fundamental meaning, not about the more unclear
isolation guarantees (which are indeed hw specific on different cpu
platforms). We're not talking about "different gpus might have different
amounts of shared caches bitween different bitmasks". We're talking
"different gpus might assign completely differen meaning to these
bitmasks".
-Daniel

> 
> Regards,
>    Felix
> 
> 
> > -Daniel
> >
> >> Regards,
> >>     Felix
> >>
> >>
> >>> I think adding a cgroup which is that much depending upon the hw
> >>> implementation of the first driver supporting it is not a good idea.
> >>> -Daniel
> >>>
> >>>> I can't comment on the code as I'm unfamiliar with the details of the
> >>>> cgroup code.
> >>>>
> >>>> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> >>>>
> >>>>
> >>>>> ---
> >>>>>     Documentation/admin-guide/cgroup-v2.rst |  46 ++++++++
> >>>>>     include/drm/drm_cgroup.h                |   4 +
> >>>>>     include/linux/cgroup_drm.h              |   6 ++
> >>>>>     kernel/cgroup/drm.c                     | 135 ++++++++++++++++++++++++
> >>>>>     4 files changed, 191 insertions(+)
> >>>>>
> >>>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >>>>> index 87a195133eaa..57f18469bd76 100644
> >>>>> --- a/Documentation/admin-guide/cgroup-v2.rst
> >>>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >>>>> @@ -1958,6 +1958,52 @@ DRM Interface Files
> >>>>>     	Set largest allocation for /dev/dri/card1 to 4MB
> >>>>>     	echo "226:1 4m" > drm.buffer.peak.max
> >>>>>     
> >>>>> +  drm.lgpu
> >>>>> +	A read-write nested-keyed file which exists on all cgroups.
> >>>>> +	Each entry is keyed by the DRM device's major:minor.
> >>>>> +
> >>>>> +	lgpu stands for logical GPU, it is an abstraction used to
> >>>>> +	subdivide a physical DRM device for the purpose of resource
> >>>>> +	management.
> >>>>> +
> >>>>> +	The lgpu is a discrete quantity that is device specific (i.e.
> >>>>> +	some DRM devices may have 64 lgpus while others may have 100
> >>>>> +	lgpus.)  The lgpu is a single quantity with two representations
> >>>>> +	denoted by the following nested keys.
> >>>>> +
> >>>>> +	  =====     ========================================
> >>>>> +	  count     Representing lgpu as anonymous resource
> >>>>> +	  list      Representing lgpu as named resource
> >>>>> +	  =====     ========================================
> >>>>> +
> >>>>> +	For example:
> >>>>> +	226:0 count=256 list=0-255
> >>>>> +	226:1 count=4 list=0,2,4,6
> >>>>> +	226:2 count=32 list=32-63
> >>>>> +
> >>>>> +	lgpu is represented by a bitmap and uses the bitmap_parselist
> >>>>> +	kernel function so the list key input format is a
> >>>>> +	comma-separated list of decimal numbers and ranges.
> >>>>> +
> >>>>> +	Consecutively set bits are shown as two hyphen-separated decimal
> >>>>> +	numbers, the smallest and largest bit numbers set in the range.
> >>>>> +	Optionally each range can be postfixed to denote that only parts
> >>>>> +	of it should be set.  The range will divided to groups of
> >>>>> +	specific size.
> >>>>> +	Syntax: range:used_size/group_size
> >>>>> +	Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >>>>> +
> >>>>> +	The count key is the hamming weight / hweight of the bitmap.
> >>>>> +
> >>>>> +	Both count and list accept the max and default keywords.
> >>>>> +
> >>>>> +	Some DRM devices may only support lgpu as anonymous resources.
> >>>>> +	In such case, the significance of the position of the set bits
> >>>>> +	in list will be ignored.
> >>>>> +
> >>>>> +	This lgpu resource supports the 'allocation' resource
> >>>>> +	distribution model.
> >>>>> +
> >>>>>     GEM Buffer Ownership
> >>>>>     ~~~~~~~~~~~~~~~~~~~~
> >>>>>     
> >>>>> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> >>>>> index 6d9707e1eb72..a8d6be0b075b 100644
> >>>>> --- a/include/drm/drm_cgroup.h
> >>>>> +++ b/include/drm/drm_cgroup.h
> >>>>> @@ -6,6 +6,7 @@
> >>>>>     
> >>>>>     #include <linux/cgroup_drm.h>
> >>>>>     #include <linux/workqueue.h>
> >>>>> +#include <linux/types.h>
> >>>>>     #include <drm/ttm/ttm_bo_api.h>
> >>>>>     #include <drm/ttm/ttm_bo_driver.h>
> >>>>>     
> >>>>> @@ -28,6 +29,9 @@ struct drmcg_props {
> >>>>>     	s64			mem_highs_default[TTM_PL_PRIV+1];
> >>>>>     
> >>>>>     	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
> >>>>> +
> >>>>> +	int			lgpu_capacity;
> >>>>> +        DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >>>>>     };
> >>>>>     
> >>>>>     #ifdef CONFIG_CGROUP_DRM
> >>>>> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> >>>>> index c56cfe74d1a6..7b1cfc4ce4c3 100644
> >>>>> --- a/include/linux/cgroup_drm.h
> >>>>> +++ b/include/linux/cgroup_drm.h
> >>>>> @@ -14,6 +14,8 @@
> >>>>>     /* limit defined per the way drm_minor_alloc operates */
> >>>>>     #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >>>>>     
> >>>>> +#define MAX_DRMCG_LGPU_CAPACITY 256
> >>>>> +
> >>>>>     enum drmcg_mem_bw_attr {
> >>>>>     	DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
> >>>>>     	DRMCG_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
> >>>>> @@ -32,6 +34,7 @@ enum drmcg_res_type {
> >>>>>     	DRMCG_TYPE_MEM_PEAK,
> >>>>>     	DRMCG_TYPE_BANDWIDTH,
> >>>>>     	DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
> >>>>> +	DRMCG_TYPE_LGPU,
> >>>>>     	__DRMCG_TYPE_LAST,
> >>>>>     };
> >>>>>     
> >>>>> @@ -58,6 +61,9 @@ struct drmcg_device_resource {
> >>>>>     	s64			mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
> >>>>>     	s64			mem_bw_limits_bytes_in_period;
> >>>>>     	s64			mem_bw_limits_avg_bytes_per_us;
> >>>>> +
> >>>>> +	s64			lgpu_used;
> >>>>> +	DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>>>>     };
> >>>>>     
> >>>>>     /**
> >>>>> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> >>>>> index 0ea7f0619e25..18c4368e2c29 100644
> >>>>> --- a/kernel/cgroup/drm.c
> >>>>> +++ b/kernel/cgroup/drm.c
> >>>>> @@ -9,6 +9,7 @@
> >>>>>     #include <linux/cgroup_drm.h>
> >>>>>     #include <linux/ktime.h>
> >>>>>     #include <linux/kernel.h>
> >>>>> +#include <linux/bitmap.h>
> >>>>>     #include <drm/drm_file.h>
> >>>>>     #include <drm/drm_drv.h>
> >>>>>     #include <drm/ttm/ttm_bo_api.h>
> >>>>> @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
> >>>>>     #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
> >>>>>     #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
> >>>>>     
> >>>>> +#define LGPU_LIMITS_NAME_LIST "list"
> >>>>> +#define LGPU_LIMITS_NAME_COUNT "count"
> >>>>> +
> >>>>>     static struct drmcg *root_drmcg __read_mostly;
> >>>>>     
> >>>>>     static int drmcg_css_free_fn(int id, void *ptr, void *data)
> >>>>> @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >>>>>     	for (i = 0; i <= TTM_PL_PRIV; i++)
> >>>>>     		ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
> >>>>>     
> >>>>> +	bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
> >>>>> +			MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +
> >>>>>     	mutex_unlock(&dev->drmcg_mutex);
> >>>>>     	return 0;
> >>>>>     }
> >>>>> @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >>>>>     				MEM_BW_LIMITS_NAME_AVG,
> >>>>>     				ddr->mem_bw_limits_avg_bytes_per_us);
> >>>>>     		break;
> >>>>> +	case DRMCG_TYPE_LGPU:
> >>>>> +		seq_printf(sf, "%s=%lld %s=%*pbl\n",
> >>>>> +				LGPU_LIMITS_NAME_COUNT,
> >>>>> +				ddr->lgpu_used,
> >>>>> +				LGPU_LIMITS_NAME_LIST,
> >>>>> +				dev->drmcg_props.lgpu_capacity,
> >>>>> +				ddr->lgpu_allocated);
> >>>>> +		break;
> >>>>>     	default:
> >>>>>     		seq_puts(sf, "\n");
> >>>>>     		break;
> >>>>> @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
> >>>>>     				MEM_BW_LIMITS_NAME_AVG,
> >>>>>     				props->mem_bw_avg_bytes_per_us_default);
> >>>>>     		break;
> >>>>> +	case DRMCG_TYPE_LGPU:
> >>>>> +		seq_printf(sf, "%s=%d %s=%*pbl\n",
> >>>>> +				LGPU_LIMITS_NAME_COUNT,
> >>>>> +				bitmap_weight(props->lgpu_slots,
> >>>>> +					props->lgpu_capacity),
> >>>>> +				LGPU_LIMITS_NAME_LIST,
> >>>>> +				props->lgpu_capacity,
> >>>>> +				props->lgpu_slots);
> >>>>> +		break;
> >>>>>     	default:
> >>>>>     		seq_puts(sf, "\n");
> >>>>>     		break;
> >>>>> @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
> >>>>>     	mutex_unlock(&dev->drmcg_mutex);
> >>>>>     }
> >>>>>     
> >>>>> +static void drmcg_lgpu_values_apply(struct drm_device *dev,
> >>>>> +		struct drmcg_device_resource *ddr, unsigned long *val)
> >>>>> +{
> >>>>> +
> >>>>> +	mutex_lock(&dev->drmcg_mutex);
> >>>>> +	bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +	ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +	mutex_unlock(&dev->drmcg_mutex);
> >>>>> +}
> >>>>> +
> >>>>>     static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >>>>>     		struct drm_device *dev, char *attrs)
> >>>>>     {
> >>>>> +	DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +	DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >>>>>     	enum drmcg_res_type type =
> >>>>>     		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >>>>>     	struct drmcg *drmcg = css_to_drmcg(of_css(of));
> >>>>> @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >>>>>     				continue;
> >>>>>     			}
> >>>>>     			break; /* DRMCG_TYPE_MEM */
> >>>>> +		case DRMCG_TYPE_LGPU:
> >>>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> >>>>> +				strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
> >>>>> +				continue;
> >>>>> +
> >>>>> +                        if (!strcmp("max", sval) ||
> >>>>> +					!strcmp("default", sval)) {
> >>>>> +				if (parent != NULL)
> >>>>> +					drmcg_lgpu_values_apply(dev, ddr,
> >>>>> +						parent->dev_resources[minor]->
> >>>>> +						lgpu_allocated);
> >>>>> +				else
> >>>>> +					drmcg_lgpu_values_apply(dev, ddr,
> >>>>> +						props->lgpu_slots);
> >>>>> +
> >>>>> +				continue;
> >>>>> +			}
> >>>>> +
> >>>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> >>>>> +				p_max = parent == NULL ? props->lgpu_capacity:
> >>>>> +					bitmap_weight(
> >>>>> +					parent->dev_resources[minor]->
> >>>>> +					lgpu_allocated, props->lgpu_capacity);
> >>>>> +
> >>>>> +				rc = drmcg_process_limit_s64_val(sval,
> >>>>> +					false, p_max, p_max, &val);
> >>>>> +
> >>>>> +				if (rc || val < 0) {
> >>>>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> >>>>> +							minor);
> >>>>> +					continue;
> >>>>> +				}
> >>>>> +
> >>>>> +				bitmap_zero(tmp_bitmap,
> >>>>> +						MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +				bitmap_set(tmp_bitmap, 0, val);
> >>>>> +			}
> >>>>> +
> >>>>> +			if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> >>>>> +				rc = bitmap_parselist(sval, tmp_bitmap,
> >>>>> +						MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +
> >>>>> +				if (rc) {
> >>>>> +					drmcg_pr_cft_err(drmcg, rc, cft_name,
> >>>>> +							minor);
> >>>>> +					continue;
> >>>>> +				}
> >>>>> +
> >>>>> +                        	bitmap_andnot(chk_bitmap, tmp_bitmap,
> >>>>> +					props->lgpu_slots,
> >>>>> +					MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +
> >>>>> +                        	if (!bitmap_empty(chk_bitmap,
> >>>>> +						MAX_DRMCG_LGPU_CAPACITY)) {
> >>>>> +					drmcg_pr_cft_err(drmcg, 0, cft_name,
> >>>>> +							minor);
> >>>>> +					continue;
> >>>>> +				}
> >>>>> +			}
> >>>>> +
> >>>>> +
> >>>>> +                        if (parent != NULL) {
> >>>>> +				bitmap_and(chk_bitmap, tmp_bitmap,
> >>>>> +				parent->dev_resources[minor]->lgpu_allocated,
> >>>>> +				props->lgpu_capacity);
> >>>>> +
> >>>>> +				if (bitmap_empty(chk_bitmap,
> >>>>> +						props->lgpu_capacity)) {
> >>>>> +					drmcg_pr_cft_err(drmcg, 0,
> >>>>> +							cft_name, minor);
> >>>>> +					continue;
> >>>>> +				}
> >>>>> +			}
> >>>>> +
> >>>>> +			drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
> >>>>> +
> >>>>> +			break; /* DRMCG_TYPE_LGPU */
> >>>>>     		default:
> >>>>>     			break;
> >>>>>     		} /* switch (type) */
> >>>>> @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >>>>>     			break;
> >>>>>     		case DRMCG_TYPE_BANDWIDTH:
> >>>>>     		case DRMCG_TYPE_MEM:
> >>>>> +		case DRMCG_TYPE_LGPU:
> >>>>>     			drmcg_nested_limit_parse(of, dm->dev, sattr);
> >>>>>     			break;
> >>>>>     		default:
> >>>>> @@ -731,6 +846,20 @@ struct cftype files[] = {
> >>>>>     		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
> >>>>>     						DRMCG_FTYPE_DEFAULT),
> >>>>>     	},
> >>>>> +	{
> >>>>> +		.name = "lgpu",
> >>>>> +		.seq_show = drmcg_seq_show,
> >>>>> +		.write = drmcg_limit_write,
> >>>>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >>>>> +						DRMCG_FTYPE_LIMIT),
> >>>>> +	},
> >>>>> +	{
> >>>>> +		.name = "lgpu.default",
> >>>>> +		.seq_show = drmcg_seq_show,
> >>>>> +		.flags = CFTYPE_ONLY_ON_ROOT,
> >>>>> +		.private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >>>>> +						DRMCG_FTYPE_DEFAULT),
> >>>>> +	},
> >>>>>     	{ }	/* terminate */
> >>>>>     };
> >>>>>     
> >>>>> @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
> >>>>>     
> >>>>>     static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >>>>>     {
> >>>>> +        bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >>>>> +        bitmap_fill(dev->drmcg_props.lgpu_slots,
> >>>>> +			dev->drmcg_props.lgpu_capacity);
> >>>>> +
> >>>>>     	/* init cgroups created before registration (i.e. root cgroup) */
> >>>>>     	if (root_drmcg != NULL) {
> >>>>>     		struct cgroup_subsys_state *pos;
> >>>>> @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
> >>>>>     	for (i = 0; i <= TTM_PL_PRIV; i++)
> >>>>>     		dev->drmcg_props.mem_highs_default[i] = S64_MAX;
> >>>>>     
> >>>>> +	dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> >>>>> +
> >>>>>     	drmcg_update_cg_tree(dev);
> >>>>>     }
> >>>>>     EXPORT_SYMBOL(drmcg_device_early_init);

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* RE: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2019-10-09 16:06                       ` Daniel Vetter
@ 2019-10-09 18:52                         ` Greathouse, Joseph
       [not found]                           ` <CY4PR12MB17670EE9EE4A22663EB584E8F9950-rpdhrqHFk07NeWpHaHeGuQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2019-10-11 17:12                         ` tj
  1 sibling, 1 reply; 89+ messages in thread
From: Greathouse, Joseph @ 2019-10-09 18:52 UTC (permalink / raw)
  To: Daniel Vetter, Kuehling, Felix
  Cc: Ho, Kenny, jsparks, amd-gfx, lkaplan, Deucher, Alexander,
	y2kenny, dri-devel, tj, cgroups, Koenig, Christian

> From: Daniel Vetter <daniel.vetter@ffwll.ch> On Behalf Of Daniel Vetter
> Sent: Wednesday, October 9, 2019 11:07 AM
> On Wed, Oct 09, 2019 at 03:53:42PM +0000, Kuehling, Felix wrote:
> > On 2019-10-09 11:34, Daniel Vetter wrote:
> > > On Wed, Oct 09, 2019 at 03:25:22PM +0000, Kuehling, Felix wrote:
> > >> On 2019-10-09 6:31, Daniel Vetter wrote:
> > >>> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> > >>>> The description sounds reasonable to me and maps well to the CU masking
> > >>>> feature in our GPUs.
> > >>>>
> > >>>> It would also allow us to do more coarse-grained masking for example to
> > >>>> guarantee balanced allocation of CUs across shader engines or
> > >>>> partitioning of memory bandwidth or CP pipes (if that is supported by
> > >>>> the hardware/firmware).
> > >>> Hm, so this sounds like the definition for how this cgroup is supposed to
> > >>> work is "amd CU masking" (whatever that exactly is). And the abstract
> > >>> description is just prettification on top, but not actually the real
> > >>> definition you guys want.
> > >> I think you're reading this as the opposite of what I was trying to say.
> > >> Using CU masking is one possible implementation of LGPUs on AMD
> > >> hardware. It's the one that Kenny implemented at the end of this patch
> > >> series, and I pointed out some problems with that approach. Other ways
> > >> to partition the hardware into LGPUs are conceivable. For example we're
> > >> considering splitting it along the lines of shader engines, which is
> > >> more coarse-grain and would also affect memory bandwidth available to
> > >> each partition.
> > > If this is supposed to be useful for admins then "other ways to partition
> > > the hw are conceivable" is the problem. This should be unique&clear for
> > > admins/end-users. Reading the implementation details and realizing that
> > > the actual meaning is "amd CU masking" isn't good enough by far, since
> > > that's meaningless on any other hw.
> > >
> > > And if there's other ways to implement this cgroup for amd, it's also
> > > meaningless (to sysadmins/users) for amd hw.
> > >
> > >> We could also consider partitioning pipes in our command processor,
> > >> although that is not supported by our current CP scheduler firmware.
> > >>
> > >> The bottom line is, the LGPU model proposed by Kenny is quite abstract
> > >> and allows drivers implementing it a lot of flexibility depending on the
> > >> capability of their hardware and firmware. We haven't settled on a final
> > >> implementation choice even for AMD.
> > > That abstract model of essentially "anything goes" is the problem here
> > > imo. E.g. for cpu cgroups this would be similar to allowing the bitmaks to
> > > mean "cpu core" on one machine "physical die" on the next and maybe
> > > "hyperthread unit" on the 3rd. Useless for admins.
> > >
> > > So if we have a gpu bitmaks thing that might mean a command submissio pipe
> > > on one hw (maybe matching what vk exposed, maybe not), some compute unit
> > > mask on the next and something entirely different (e.g. intel has so
> > > called GT slices with compute cores + more stuff around) on the 3rd vendor
> > > then that's not useful for admins.
> >
> > The goal is to partition GPU compute resources to eliminate as much
> > resource contention as possible between different partitions. Different
> > hardware will have different capabilities to implement this. No
> > implementation will be perfect. For example, even with CPU cores that
> > are supposedly well defined, you can still have different behaviours
> > depending on CPU cache architectures, NUMA and thermal management across
> > CPU cores. The admin will need some knowledge of their hardware
> > architecture to understand those effects that are not described by the
> > abstract model of cgroups.
> 
> That's not the point I was making. For cpu cgroups there's a very well
> defined connection between the cpu bitmasks/numbers in cgroups and the cpu
> bitmasks you use in various system calls (they match). And that stuff
> works across vendors.
> 
> We need the same for gpus.
> 
> > The LGPU model is deliberately flexible, because GPU architectures are
> > much less standardized than CPU architectures. Expecting a common model
> > that is both very specific and applicable to to all GPUs is unrealistic,
> > in my opinion.
> 
> So pure abstraction isn't useful, we need to know what these bits mean.
> Since if they e.g. mean vk pipes, then maybe I shouldn't be using those vk
> pipes in my application anymore. Or we need to define that the userspace
> driver needs to filter out any pipes that arent' accessible (if that's
> possible, no idea).
> 
> cgroups that essentially have pure hw depedent meaning aren't useful.
> Note: this is about the fundamental meaning, not about the more unclear
> isolation guarantees (which are indeed hw specific on different cpu
> platforms). We're not talking about "different gpus might have different
> amounts of shared caches bitween different bitmasks". We're talking
> "different gpus might assign completely differen meaning to these
> bitmasks".
> -Daniel
<snip>

One thing that comes to mind is the OpenCL 1.2+ SubDevices mechanism: https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateSubDevices.html

The concept of LGPU in cgroups seems to match up nicely with an OpenCL SubDevice, at least for compute tasks. We want to divide up the device and give some configurable subset of it to the user as a logical GPU or sub-device.
 
OpenCL defines Compute Units (CUs), and any GPU vendor that runs OpenCL has some mapping of their internal compute resources to this concept of CUs. Off the top of my head (I may be misremembering some of these):
- AMD: Compute Units (CUs)
- ARM: Shader Cores (SCs)
- Intel: Execution Units (EUs)
- Nvidia: Streaming Multiprocessors (SMs)
- Qualcomm: Shader Processors (SPs)

The clCreateSubDevices() API has a variety of ways to slice and dice these compute resources across sub-devices. PARTITION_EQUALLY and PARTITION_BY_COUNTS could possibly be handled by a simple high-level mechanism that just allows you to request some percentage of the available GPU compute resources.

PARTITION_BY_AFFINITY_DOMAIN, however, splits up the CUs based on lower-level information such as what cache levels are shared or what NUMA domain a collection of CUs is in. I would argue that a runtime that wants to do this needs to know a bit about the mapping of CUs to underlying hardware resources.

A cgroup implementation that presented a CU bitmap could sit at the bottom of all three of these partitioning schemes, and more advanced ones if they come up. We might be getting side-tracked by the fact that AMD calls its resources CUs. The OpenCL (or Vulkan, etc.) concept of a Compute Unit is cross-vendor. The concept of targeting work to [Khronos-defined] Compute Units isn't AMD-specific. A bitmap of [Khronos-defined] CUs could map to any of these broad vendor compute resources.

There may be other parts of the GPU that we want to divide up -- command queue resources, pipes, render backends, etc. I'm not sure if any of those have been "standardized" between GPUs to such an extent that they make sense to put into cgroups yet -- I'm ignorant outside of the compute world. But at least the concept of CUs (or SMs, or EUs, etc.) seems to be standard across GPUs and (to me anyway) seems like a reasonable place to allows administrators, developers, users, etc. to divide up their GPUs.

And whatever mechanisms a GPU vendor may put in place to do clCreateSubDevices() could then be additionally used inside the kernel for their cgroups LGPU partitioning.

Thanks
-Joe
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
       [not found]                           ` <CY4PR12MB17670EE9EE4A22663EB584E8F9950-rpdhrqHFk07NeWpHaHeGuQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-10-09 19:07                             ` Daniel Vetter
  0 siblings, 0 replies; 89+ messages in thread
From: Daniel Vetter @ 2019-10-09 19:07 UTC (permalink / raw)
  To: Greathouse, Joseph, Karol Herbst, Ben Skeggs, Nouveau Dev
  Cc: Ho, Kenny, Kuehling, Felix, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Deucher, Alexander, y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Koenig, Christian

On Wed, Oct 9, 2019 at 8:52 PM Greathouse, Joseph
<Joseph.Greathouse@amd.com> wrote:
>
> > From: Daniel Vetter <daniel.vetter@ffwll.ch> On Behalf Of Daniel Vetter
> > Sent: Wednesday, October 9, 2019 11:07 AM
> > On Wed, Oct 09, 2019 at 03:53:42PM +0000, Kuehling, Felix wrote:
> > > On 2019-10-09 11:34, Daniel Vetter wrote:
> > > > On Wed, Oct 09, 2019 at 03:25:22PM +0000, Kuehling, Felix wrote:
> > > >> On 2019-10-09 6:31, Daniel Vetter wrote:
> > > >>> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> > > >>>> The description sounds reasonable to me and maps well to the CU masking
> > > >>>> feature in our GPUs.
> > > >>>>
> > > >>>> It would also allow us to do more coarse-grained masking for example to
> > > >>>> guarantee balanced allocation of CUs across shader engines or
> > > >>>> partitioning of memory bandwidth or CP pipes (if that is supported by
> > > >>>> the hardware/firmware).
> > > >>> Hm, so this sounds like the definition for how this cgroup is supposed to
> > > >>> work is "amd CU masking" (whatever that exactly is). And the abstract
> > > >>> description is just prettification on top, but not actually the real
> > > >>> definition you guys want.
> > > >> I think you're reading this as the opposite of what I was trying to say.
> > > >> Using CU masking is one possible implementation of LGPUs on AMD
> > > >> hardware. It's the one that Kenny implemented at the end of this patch
> > > >> series, and I pointed out some problems with that approach. Other ways
> > > >> to partition the hardware into LGPUs are conceivable. For example we're
> > > >> considering splitting it along the lines of shader engines, which is
> > > >> more coarse-grain and would also affect memory bandwidth available to
> > > >> each partition.
> > > > If this is supposed to be useful for admins then "other ways to partition
> > > > the hw are conceivable" is the problem. This should be unique&clear for
> > > > admins/end-users. Reading the implementation details and realizing that
> > > > the actual meaning is "amd CU masking" isn't good enough by far, since
> > > > that's meaningless on any other hw.
> > > >
> > > > And if there's other ways to implement this cgroup for amd, it's also
> > > > meaningless (to sysadmins/users) for amd hw.
> > > >
> > > >> We could also consider partitioning pipes in our command processor,
> > > >> although that is not supported by our current CP scheduler firmware.
> > > >>
> > > >> The bottom line is, the LGPU model proposed by Kenny is quite abstract
> > > >> and allows drivers implementing it a lot of flexibility depending on the
> > > >> capability of their hardware and firmware. We haven't settled on a final
> > > >> implementation choice even for AMD.
> > > > That abstract model of essentially "anything goes" is the problem here
> > > > imo. E.g. for cpu cgroups this would be similar to allowing the bitmaks to
> > > > mean "cpu core" on one machine "physical die" on the next and maybe
> > > > "hyperthread unit" on the 3rd. Useless for admins.
> > > >
> > > > So if we have a gpu bitmaks thing that might mean a command submissio pipe
> > > > on one hw (maybe matching what vk exposed, maybe not), some compute unit
> > > > mask on the next and something entirely different (e.g. intel has so
> > > > called GT slices with compute cores + more stuff around) on the 3rd vendor
> > > > then that's not useful for admins.
> > >
> > > The goal is to partition GPU compute resources to eliminate as much
> > > resource contention as possible between different partitions. Different
> > > hardware will have different capabilities to implement this. No
> > > implementation will be perfect. For example, even with CPU cores that
> > > are supposedly well defined, you can still have different behaviours
> > > depending on CPU cache architectures, NUMA and thermal management across
> > > CPU cores. The admin will need some knowledge of their hardware
> > > architecture to understand those effects that are not described by the
> > > abstract model of cgroups.
> >
> > That's not the point I was making. For cpu cgroups there's a very well
> > defined connection between the cpu bitmasks/numbers in cgroups and the cpu
> > bitmasks you use in various system calls (they match). And that stuff
> > works across vendors.
> >
> > We need the same for gpus.
> >
> > > The LGPU model is deliberately flexible, because GPU architectures are
> > > much less standardized than CPU architectures. Expecting a common model
> > > that is both very specific and applicable to to all GPUs is unrealistic,
> > > in my opinion.
> >
> > So pure abstraction isn't useful, we need to know what these bits mean.
> > Since if they e.g. mean vk pipes, then maybe I shouldn't be using those vk
> > pipes in my application anymore. Or we need to define that the userspace
> > driver needs to filter out any pipes that arent' accessible (if that's
> > possible, no idea).
> >
> > cgroups that essentially have pure hw depedent meaning aren't useful.
> > Note: this is about the fundamental meaning, not about the more unclear
> > isolation guarantees (which are indeed hw specific on different cpu
> > platforms). We're not talking about "different gpus might have different
> > amounts of shared caches bitween different bitmasks". We're talking
> > "different gpus might assign completely differen meaning to these
> > bitmasks".
> > -Daniel
> <snip>
>
> One thing that comes to mind is the OpenCL 1.2+ SubDevices mechanism: https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateSubDevices.html
>
> The concept of LGPU in cgroups seems to match up nicely with an OpenCL SubDevice, at least for compute tasks. We want to divide up the device and give some configurable subset of it to the user as a logical GPU or sub-device.
>
> OpenCL defines Compute Units (CUs), and any GPU vendor that runs OpenCL has some mapping of their internal compute resources to this concept of CUs. Off the top of my head (I may be misremembering some of these):
> - AMD: Compute Units (CUs)
> - ARM: Shader Cores (SCs)
> - Intel: Execution Units (EUs)
> - Nvidia: Streaming Multiprocessors (SMs)
> - Qualcomm: Shader Processors (SPs)
>
> The clCreateSubDevices() API has a variety of ways to slice and dice these compute resources across sub-devices. PARTITION_EQUALLY and PARTITION_BY_COUNTS could possibly be handled by a simple high-level mechanism that just allows you to request some percentage of the available GPU compute resources.
>
> PARTITION_BY_AFFINITY_DOMAIN, however, splits up the CUs based on lower-level information such as what cache levels are shared or what NUMA domain a collection of CUs is in. I would argue that a runtime that wants to do this needs to know a bit about the mapping of CUs to underlying hardware resources.
>
> A cgroup implementation that presented a CU bitmap could sit at the bottom of all three of these partitioning schemes, and more advanced ones if they come up. We might be getting side-tracked by the fact that AMD calls its resources CUs. The OpenCL (or Vulkan, etc.) concept of a Compute Unit is cross-vendor. The concept of targeting work to [Khronos-defined] Compute Units isn't AMD-specific. A bitmap of [Khronos-defined] CUs could map to any of these broad vendor compute resources.
>
> There may be other parts of the GPU that we want to divide up -- command queue resources, pipes, render backends, etc. I'm not sure if any of those have been "standardized" between GPUs to such an extent that they make sense to put into cgroups yet -- I'm ignorant outside of the compute world. But at least the concept of CUs (or SMs, or EUs, etc.) seems to be standard across GPUs and (to me anyway) seems like a reasonable place to allows administrators, developers, users, etc. to divide up their GPUs.
>
> And whatever mechanisms a GPU vendor may put in place to do clCreateSubDevices() could then be additionally used inside the kernel for their cgroups LGPU partitioning.

Yeah this is the stuff I meant. I quickly checked intel's CL driver,
and from a quick look we don't support that. Adding Karol, who might
know whether this works on nvidia hw and how. If opencl CU don't
really apply to more than amdgpu, then that's not really helping much
with making this stuff more broadly useful.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
  2019-10-09 16:06                       ` Daniel Vetter
  2019-10-09 18:52                         ` Greathouse, Joseph
@ 2019-10-11 17:12                         ` tj
       [not found]                           ` <20191011171247.GC18794-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
  1 sibling, 1 reply; 89+ messages in thread
From: tj @ 2019-10-11 17:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ho, Kenny, Kuehling, Felix, jsparks, amd-gfx, lkaplan, y2kenny,
	dri-devel, Greathouse, Joseph, Deucher, Alexander, cgroups,
	Koenig, Christian

Hello, Daniel.

On Wed, Oct 09, 2019 at 06:06:52PM +0200, Daniel Vetter wrote:
> That's not the point I was making. For cpu cgroups there's a very well
> defined connection between the cpu bitmasks/numbers in cgroups and the cpu
> bitmasks you use in various system calls (they match). And that stuff
> works across vendors.

Please note that there are a lot of limitations even to cpuset.
Affinity is easy to implement and seems attractive in terms of
absolute isolation but it's inherently cumbersome and limited in
granularity and can lead to surprising failure modes where contention
on one cpu can't be resolved by the load balancer and leads to system
wide slowdowns / stalls caused by the dependency chain anchored at the
affinity limited tasks.

Maybe this is a less of a problem for gpu workloads but in general the
more constraints are put on scheduling, the more likely is the system
to develop twisted dependency chains while other parts of the system
are sitting idle.

How does scheduling currently work when there are competing gpu
workloads?  There gotta be some fairness provision whether that's unit
allocation based or time slicing, right?  If that's the case, it might
be best to implement proportional control on top of that.
Work-conserving mechanisms are the most versatile, easiest to use and
least likely to cause regressions.

Thanks.

-- 
tejun
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup
@ 2019-11-29  5:59           ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  5:59 UTC (permalink / raw)
  To: Kuehling, Felix
  Cc: Ho, Kenny, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Greathouse, Joseph, Deucher, Alexander, Koenig, Christian

Reducing audience since this is AMD specific.

On Tue, Oct 8, 2019 at 3:11 PM Kuehling, Felix <Felix.Kuehling@amd.com> wrote:
>
> On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > The number of logical gpu (lgpu) is defined to be the number of compute
> > unit (CU) for a device.  The lgpu allocation limit only applies to
> > compute workload for the moment (enforced via kfd queue creation.)  Any
> > cu_mask update is validated against the availability of the compute unit
> > as defined by the drmcg the kfd process belongs to.
>
> There is something missing here. There is an API for the application to
> specify a CU mask. Right now it looks like the application-specified and
> CGroup-specified CU masks would clobber each other. Instead the two
> should be merged.
>
> The CGroup-specified mask should specify a subset of CUs available for
> application-specified CU masks. When the cgroup CU mask changes, you'd
> need to take any application-specified CU masks into account before
> updating the hardware.
The idea behind the current implementation is to give sysadmin
priority over user application (as that is the definition of control
group.)  Mask specified by applicatoin/user is validated by
pqm_drmcg_lgpu_validate and rejected with EACCES if they are not
compatible.  The alternative is to ignore the difference and have the
kernel guess/redistribute the assignment but I am not sure if this is
a good approach since there is not enough information to allow the
kernel to guess the user's intention correctly consistently.  (This is
base on multiple conversations with you and Joe that, led me to
believe, there are situation where spreading CU assignment across
multiple SE is a good thing but not always.)

If the cgroup-specified mask is changed after the application has set
the mask, the intersection of the two masks will be set instead.  It
is possible to have no intersection and in this case no CU is made
available to the application (just like the possibility for memcgroup
to starve the amount of memory needed by an application.)

> The KFD topology APIs report the number of available CUs to the
> application. CGroups would change that number at runtime and
> applications would not expect that. I think the best way to deal with
> that would be to have multiple bits in the application-specified CU mask
> map to the same CU. How to do that in a fair way is not obvious. I guess
> a more coarse-grain division of the GPU into LGPUs would make this
> somewhat easier.
Another possibility is to add namespace to the topology sysfs such
that the correct number of CUs changes accordingly.  Although that
wouldn't give the user the available mask that is made available by
this implementation via the cgroup sysfs.  Another possibility is to
modify the thunk similar to what was done for device cgroup (device
re-mapping.)

> How is this problem handled for CPU cores and the interaction with CPU
> pthread_setaffinity_np?
Per the documentation of pthread_setaffinity_np, "If the call is
successful, and the thread is not currently running on one of the CPUs
in cpuset, then it is migrated to one of those CPUs."
http://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html

Regards,
Kenny



> Regards,
>    Felix
>
>
> >
> > Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
> > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  21 +++
> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   6 +
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +
> >   .../amd/amdkfd/kfd_process_queue_manager.c    | 140 ++++++++++++++++++
> >   5 files changed, 174 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > index 55cb1b2094fd..369915337213 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
> >               valid;                                                  \
> >       })
> >
> > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > +             struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
> > +             unsigned int nbits);
> > +
> >   /* GPUVM API */
> >   int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
> >                                       void **vm, void **process_info,
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 163a4fbf0611..8abeffdd2e5b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
> >   static void amdgpu_drmcg_custom_init(struct drm_device *dev,
> >       struct drmcg_props *props)
> >   {
> > +     struct amdgpu_device *adev = dev->dev_private;
> > +
> > +     props->lgpu_capacity = adev->gfx.cu_info.number;
> > +
> >       props->limit_enforced = true;
> >   }
> >
> > +static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
> > +             struct task_struct *task, struct drmcg_device_resource *ddr,
> > +             enum drmcg_res_type res_type)
> > +{
> > +     struct amdgpu_device *adev = dev->dev_private;
> > +
> > +     switch (res_type) {
> > +     case DRMCG_TYPE_LGPU:
> > +             amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
> > +                        ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
> > +             break;
> > +     default:
> > +             break;
> > +     }
> > +}
> > +
> >   static struct drm_driver kms_driver = {
> >       .driver_features =
> >           DRIVER_USE_AGP | DRIVER_ATOMIC |
> > @@ -1438,6 +1458,7 @@ static struct drm_driver kms_driver = {
> >       .gem_prime_mmap = amdgpu_gem_prime_mmap,
> >
> >       .drmcg_custom_init = amdgpu_drmcg_custom_init,
> > +     .drmcg_limit_updated = amdgpu_drmcg_limit_updated,
> >
> >       .name = DRIVER_NAME,
> >       .desc = DRIVER_DESC,
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > index 138c70454e2b..fa765b803f97 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > @@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
> >               return -EFAULT;
> >       }
> >
> > +     if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, cu_mask_size)) {
> > +             pr_debug("CU mask not permitted by DRM Cgroup");
> > +             kfree(properties.cu_mask);
> > +             return -EACCES;
> > +     }
> > +
> >       mutex_lock(&p->mutex);
> >
> >       retval = pqm_set_cu_mask(&p->pqm, args->queue_id, &properties);
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > index 8b0eee5b3521..88881bec7550 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > @@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> >                      u32 *ctl_stack_used_size,
> >                      u32 *save_area_used_size);
> >
> > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> > +             unsigned int cu_mask_size);
> > +
> >   int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
> >                               unsigned int fence_value,
> >                               unsigned int timeout_ms);
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > index 7e6c3ee82f5b..a896de290307 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > @@ -23,9 +23,11 @@
> >
> >   #include <linux/slab.h>
> >   #include <linux/list.h>
> > +#include <linux/cgroup_drm.h>
> >   #include "kfd_device_queue_manager.h"
> >   #include "kfd_priv.h"
> >   #include "kfd_kernel_queue.h"
> > +#include "amdgpu.h"
> >   #include "amdgpu_amdkfd.h"
> >
> >   static inline struct process_queue_node *get_queue_by_qid(
> > @@ -167,6 +169,7 @@ static int create_cp_queue(struct process_queue_manager *pqm,
> >                               struct queue_properties *q_properties,
> >                               struct file *f, unsigned int qid)
> >   {
> > +     struct drmcg *drmcg;
> >       int retval;
> >
> >       /* Doorbell initialized in user space*/
> > @@ -180,6 +183,36 @@ static int create_cp_queue(struct process_queue_manager *pqm,
> >       if (retval != 0)
> >               return retval;
> >
> > +
> > +     drmcg = drmcg_get(pqm->process->lead_thread);
> > +     if (drmcg) {
> > +             struct amdgpu_device *adev;
> > +             struct drmcg_device_resource *ddr;
> > +             int mask_size;
> > +             u32 *mask;
> > +
> > +             adev = (struct amdgpu_device *) dev->kgd;
> > +
> > +             mask_size = adev->ddev->drmcg_props.lgpu_capacity;
> > +             mask = kzalloc(sizeof(u32) * round_up(mask_size, 32),
> > +                             GFP_KERNEL);
> > +
> > +             if (!mask) {
> > +                     drmcg_put(drmcg);
> > +                     uninit_queue(*q);
> > +                     return -ENOMEM;
> > +             }
> > +
> > +             ddr = drmcg->dev_resources[adev->ddev->primary->index];
> > +
> > +             bitmap_to_arr32(mask, ddr->lgpu_allocated, mask_size);
> > +
> > +             (*q)->properties.cu_mask_count = mask_size;
> > +             (*q)->properties.cu_mask = mask;
> > +
> > +             drmcg_put(drmcg);
> > +     }
> > +
> >       (*q)->device = dev;
> >       (*q)->process = pqm->process;
> >
> > @@ -495,6 +528,113 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> >                                                      save_area_used_size);
> >   }
> >
> > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> > +             unsigned int cu_mask_size)
> > +{
> > +     DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY);
> > +     struct drmcg_device_resource *ddr;
> > +     struct process_queue_node *pqn;
> > +     struct amdgpu_device *adev;
> > +     struct drmcg *drmcg;
> > +     bool result;
> > +
> > +     if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY)
> > +             return false;
> > +
> > +     bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size);
> > +
> > +     pqn = get_queue_by_qid(&p->pqm, qid);
> > +     if (!pqn)
> > +             return false;
> > +
> > +     adev = (struct amdgpu_device *)pqn->q->device->kgd;
> > +
> > +     drmcg = drmcg_get(p->lead_thread);
> > +     ddr = drmcg->dev_resources[adev->ddev->primary->index];
> > +
> > +     if (bitmap_subset(curr_mask, ddr->lgpu_allocated,
> > +                             MAX_DRMCG_LGPU_CAPACITY))
> > +             result = true;
> > +     else
> > +             result = false;
> > +
> > +     drmcg_put(drmcg);
> > +
> > +     return result;
> > +}
> > +
> > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > +             struct amdgpu_device *adev, unsigned long *lgpu_bm,
> > +             unsigned int lgpu_bm_size)
> > +{
> > +     struct kfd_dev *kdev = adev->kfd.dev;
> > +     struct process_queue_node *pqn;
> > +     struct kfd_process *kfdproc;
> > +     size_t size_in_bytes;
> > +     u32 *cu_mask;
> > +     int rc = 0;
> > +
> > +     if ((lgpu_bm_size % 32) != 0) {
> > +             pr_warn("lgpu_bm_size %d must be a multiple of 32",
> > +                             lgpu_bm_size);
> > +             return -EINVAL;
> > +     }
> > +
> > +     kfdproc = kfd_get_process(task);
> > +
> > +     if (IS_ERR(kfdproc))
> > +             return -ESRCH;
> > +
> > +     size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32);
> > +
> > +     mutex_lock(&kfdproc->mutex);
> > +     list_for_each_entry(pqn, &kfdproc->pqm.queues, process_queue_list) {
> > +             if (pqn->q && pqn->q->device == kdev) {
> > +                     /* update cu_mask accordingly */
> > +                     cu_mask = kzalloc(size_in_bytes, GFP_KERNEL);
> > +                     if (!cu_mask) {
> > +                             rc = -ENOMEM;
> > +                             break;
> > +                     }
> > +
> > +                     if (pqn->q->properties.cu_mask) {
> > +                             DECLARE_BITMAP(curr_mask,
> > +                                             MAX_DRMCG_LGPU_CAPACITY);
> > +
> > +                             if (pqn->q->properties.cu_mask_count >
> > +                                             lgpu_bm_size) {
> > +                                     rc = -EINVAL;
> > +                                     kfree(cu_mask);
> > +                                     break;
> > +                             }
> > +
> > +                             bitmap_from_arr32(curr_mask,
> > +                                             pqn->q->properties.cu_mask,
> > +                                             pqn->q->properties.cu_mask_count);
> > +
> > +                             bitmap_and(curr_mask, curr_mask, lgpu_bm,
> > +                                             lgpu_bm_size);
> > +
> > +                             bitmap_to_arr32(cu_mask, curr_mask,
> > +                                             lgpu_bm_size);
> > +
> > +                             kfree(curr_mask);
> > +                     } else
> > +                             bitmap_to_arr32(cu_mask, lgpu_bm,
> > +                                             lgpu_bm_size);
> > +
> > +                     pqn->q->properties.cu_mask = cu_mask;
> > +                     pqn->q->properties.cu_mask_count = lgpu_bm_size;
> > +
> > +                     rc = pqn->q->device->dqm->ops.update_queue(
> > +                                     pqn->q->device->dqm, pqn->q);
> > +             }
> > +     }
> > +     mutex_unlock(&kfdproc->mutex);
> > +
> > +     return rc;
> > +}
> > +
> >   #if defined(CONFIG_DEBUG_FS)
> >
> >   int pqm_debugfs_mqds(struct seq_file *m, void *data)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup
@ 2019-11-29  5:59           ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  5:59 UTC (permalink / raw)
  To: Kuehling, Felix
  Cc: Ho, Kenny, jsparks, amd-gfx, lkaplan, Greathouse, Joseph,
	Deucher, Alexander, Koenig, Christian

Reducing audience since this is AMD specific.

On Tue, Oct 8, 2019 at 3:11 PM Kuehling, Felix <Felix.Kuehling@amd.com> wrote:
>
> On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > The number of logical gpu (lgpu) is defined to be the number of compute
> > unit (CU) for a device.  The lgpu allocation limit only applies to
> > compute workload for the moment (enforced via kfd queue creation.)  Any
> > cu_mask update is validated against the availability of the compute unit
> > as defined by the drmcg the kfd process belongs to.
>
> There is something missing here. There is an API for the application to
> specify a CU mask. Right now it looks like the application-specified and
> CGroup-specified CU masks would clobber each other. Instead the two
> should be merged.
>
> The CGroup-specified mask should specify a subset of CUs available for
> application-specified CU masks. When the cgroup CU mask changes, you'd
> need to take any application-specified CU masks into account before
> updating the hardware.
The idea behind the current implementation is to give sysadmin
priority over user application (as that is the definition of control
group.)  Mask specified by applicatoin/user is validated by
pqm_drmcg_lgpu_validate and rejected with EACCES if they are not
compatible.  The alternative is to ignore the difference and have the
kernel guess/redistribute the assignment but I am not sure if this is
a good approach since there is not enough information to allow the
kernel to guess the user's intention correctly consistently.  (This is
base on multiple conversations with you and Joe that, led me to
believe, there are situation where spreading CU assignment across
multiple SE is a good thing but not always.)

If the cgroup-specified mask is changed after the application has set
the mask, the intersection of the two masks will be set instead.  It
is possible to have no intersection and in this case no CU is made
available to the application (just like the possibility for memcgroup
to starve the amount of memory needed by an application.)

> The KFD topology APIs report the number of available CUs to the
> application. CGroups would change that number at runtime and
> applications would not expect that. I think the best way to deal with
> that would be to have multiple bits in the application-specified CU mask
> map to the same CU. How to do that in a fair way is not obvious. I guess
> a more coarse-grain division of the GPU into LGPUs would make this
> somewhat easier.
Another possibility is to add namespace to the topology sysfs such
that the correct number of CUs changes accordingly.  Although that
wouldn't give the user the available mask that is made available by
this implementation via the cgroup sysfs.  Another possibility is to
modify the thunk similar to what was done for device cgroup (device
re-mapping.)

> How is this problem handled for CPU cores and the interaction with CPU
> pthread_setaffinity_np?
Per the documentation of pthread_setaffinity_np, "If the call is
successful, and the thread is not currently running on one of the CPUs
in cpuset, then it is migrated to one of those CPUs."
http://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html

Regards,
Kenny



> Regards,
>    Felix
>
>
> >
> > Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
> > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  21 +++
> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   6 +
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +
> >   .../amd/amdkfd/kfd_process_queue_manager.c    | 140 ++++++++++++++++++
> >   5 files changed, 174 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > index 55cb1b2094fd..369915337213 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
> >               valid;                                                  \
> >       })
> >
> > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > +             struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
> > +             unsigned int nbits);
> > +
> >   /* GPUVM API */
> >   int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
> >                                       void **vm, void **process_info,
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 163a4fbf0611..8abeffdd2e5b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
> >   static void amdgpu_drmcg_custom_init(struct drm_device *dev,
> >       struct drmcg_props *props)
> >   {
> > +     struct amdgpu_device *adev = dev->dev_private;
> > +
> > +     props->lgpu_capacity = adev->gfx.cu_info.number;
> > +
> >       props->limit_enforced = true;
> >   }
> >
> > +static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
> > +             struct task_struct *task, struct drmcg_device_resource *ddr,
> > +             enum drmcg_res_type res_type)
> > +{
> > +     struct amdgpu_device *adev = dev->dev_private;
> > +
> > +     switch (res_type) {
> > +     case DRMCG_TYPE_LGPU:
> > +             amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
> > +                        ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
> > +             break;
> > +     default:
> > +             break;
> > +     }
> > +}
> > +
> >   static struct drm_driver kms_driver = {
> >       .driver_features =
> >           DRIVER_USE_AGP | DRIVER_ATOMIC |
> > @@ -1438,6 +1458,7 @@ static struct drm_driver kms_driver = {
> >       .gem_prime_mmap = amdgpu_gem_prime_mmap,
> >
> >       .drmcg_custom_init = amdgpu_drmcg_custom_init,
> > +     .drmcg_limit_updated = amdgpu_drmcg_limit_updated,
> >
> >       .name = DRIVER_NAME,
> >       .desc = DRIVER_DESC,
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > index 138c70454e2b..fa765b803f97 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > @@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
> >               return -EFAULT;
> >       }
> >
> > +     if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, cu_mask_size)) {
> > +             pr_debug("CU mask not permitted by DRM Cgroup");
> > +             kfree(properties.cu_mask);
> > +             return -EACCES;
> > +     }
> > +
> >       mutex_lock(&p->mutex);
> >
> >       retval = pqm_set_cu_mask(&p->pqm, args->queue_id, &properties);
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > index 8b0eee5b3521..88881bec7550 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > @@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> >                      u32 *ctl_stack_used_size,
> >                      u32 *save_area_used_size);
> >
> > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> > +             unsigned int cu_mask_size);
> > +
> >   int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
> >                               unsigned int fence_value,
> >                               unsigned int timeout_ms);
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > index 7e6c3ee82f5b..a896de290307 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > @@ -23,9 +23,11 @@
> >
> >   #include <linux/slab.h>
> >   #include <linux/list.h>
> > +#include <linux/cgroup_drm.h>
> >   #include "kfd_device_queue_manager.h"
> >   #include "kfd_priv.h"
> >   #include "kfd_kernel_queue.h"
> > +#include "amdgpu.h"
> >   #include "amdgpu_amdkfd.h"
> >
> >   static inline struct process_queue_node *get_queue_by_qid(
> > @@ -167,6 +169,7 @@ static int create_cp_queue(struct process_queue_manager *pqm,
> >                               struct queue_properties *q_properties,
> >                               struct file *f, unsigned int qid)
> >   {
> > +     struct drmcg *drmcg;
> >       int retval;
> >
> >       /* Doorbell initialized in user space*/
> > @@ -180,6 +183,36 @@ static int create_cp_queue(struct process_queue_manager *pqm,
> >       if (retval != 0)
> >               return retval;
> >
> > +
> > +     drmcg = drmcg_get(pqm->process->lead_thread);
> > +     if (drmcg) {
> > +             struct amdgpu_device *adev;
> > +             struct drmcg_device_resource *ddr;
> > +             int mask_size;
> > +             u32 *mask;
> > +
> > +             adev = (struct amdgpu_device *) dev->kgd;
> > +
> > +             mask_size = adev->ddev->drmcg_props.lgpu_capacity;
> > +             mask = kzalloc(sizeof(u32) * round_up(mask_size, 32),
> > +                             GFP_KERNEL);
> > +
> > +             if (!mask) {
> > +                     drmcg_put(drmcg);
> > +                     uninit_queue(*q);
> > +                     return -ENOMEM;
> > +             }
> > +
> > +             ddr = drmcg->dev_resources[adev->ddev->primary->index];
> > +
> > +             bitmap_to_arr32(mask, ddr->lgpu_allocated, mask_size);
> > +
> > +             (*q)->properties.cu_mask_count = mask_size;
> > +             (*q)->properties.cu_mask = mask;
> > +
> > +             drmcg_put(drmcg);
> > +     }
> > +
> >       (*q)->device = dev;
> >       (*q)->process = pqm->process;
> >
> > @@ -495,6 +528,113 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> >                                                      save_area_used_size);
> >   }
> >
> > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> > +             unsigned int cu_mask_size)
> > +{
> > +     DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY);
> > +     struct drmcg_device_resource *ddr;
> > +     struct process_queue_node *pqn;
> > +     struct amdgpu_device *adev;
> > +     struct drmcg *drmcg;
> > +     bool result;
> > +
> > +     if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY)
> > +             return false;
> > +
> > +     bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size);
> > +
> > +     pqn = get_queue_by_qid(&p->pqm, qid);
> > +     if (!pqn)
> > +             return false;
> > +
> > +     adev = (struct amdgpu_device *)pqn->q->device->kgd;
> > +
> > +     drmcg = drmcg_get(p->lead_thread);
> > +     ddr = drmcg->dev_resources[adev->ddev->primary->index];
> > +
> > +     if (bitmap_subset(curr_mask, ddr->lgpu_allocated,
> > +                             MAX_DRMCG_LGPU_CAPACITY))
> > +             result = true;
> > +     else
> > +             result = false;
> > +
> > +     drmcg_put(drmcg);
> > +
> > +     return result;
> > +}
> > +
> > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > +             struct amdgpu_device *adev, unsigned long *lgpu_bm,
> > +             unsigned int lgpu_bm_size)
> > +{
> > +     struct kfd_dev *kdev = adev->kfd.dev;
> > +     struct process_queue_node *pqn;
> > +     struct kfd_process *kfdproc;
> > +     size_t size_in_bytes;
> > +     u32 *cu_mask;
> > +     int rc = 0;
> > +
> > +     if ((lgpu_bm_size % 32) != 0) {
> > +             pr_warn("lgpu_bm_size %d must be a multiple of 32",
> > +                             lgpu_bm_size);
> > +             return -EINVAL;
> > +     }
> > +
> > +     kfdproc = kfd_get_process(task);
> > +
> > +     if (IS_ERR(kfdproc))
> > +             return -ESRCH;
> > +
> > +     size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32);
> > +
> > +     mutex_lock(&kfdproc->mutex);
> > +     list_for_each_entry(pqn, &kfdproc->pqm.queues, process_queue_list) {
> > +             if (pqn->q && pqn->q->device == kdev) {
> > +                     /* update cu_mask accordingly */
> > +                     cu_mask = kzalloc(size_in_bytes, GFP_KERNEL);
> > +                     if (!cu_mask) {
> > +                             rc = -ENOMEM;
> > +                             break;
> > +                     }
> > +
> > +                     if (pqn->q->properties.cu_mask) {
> > +                             DECLARE_BITMAP(curr_mask,
> > +                                             MAX_DRMCG_LGPU_CAPACITY);
> > +
> > +                             if (pqn->q->properties.cu_mask_count >
> > +                                             lgpu_bm_size) {
> > +                                     rc = -EINVAL;
> > +                                     kfree(cu_mask);
> > +                                     break;
> > +                             }
> > +
> > +                             bitmap_from_arr32(curr_mask,
> > +                                             pqn->q->properties.cu_mask,
> > +                                             pqn->q->properties.cu_mask_count);
> > +
> > +                             bitmap_and(curr_mask, curr_mask, lgpu_bm,
> > +                                             lgpu_bm_size);
> > +
> > +                             bitmap_to_arr32(cu_mask, curr_mask,
> > +                                             lgpu_bm_size);
> > +
> > +                             kfree(curr_mask);
> > +                     } else
> > +                             bitmap_to_arr32(cu_mask, lgpu_bm,
> > +                                             lgpu_bm_size);
> > +
> > +                     pqn->q->properties.cu_mask = cu_mask;
> > +                     pqn->q->properties.cu_mask_count = lgpu_bm_size;
> > +
> > +                     rc = pqn->q->device->dqm->ops.update_queue(
> > +                                     pqn->q->device->dqm, pqn->q);
> > +             }
> > +     }
> > +     mutex_unlock(&kfdproc->mutex);
> > +
> > +     return rc;
> > +}
> > +
> >   #if defined(CONFIG_DEBUG_FS)
> >
> >   int pqm_debugfs_mqds(struct seq_file *m, void *data)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
@ 2019-11-29  6:00           ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  6:00 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Daniel Vetter, Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Tejun Heo, dri-devel,
	Greathouse, Joseph, Alex Deucher, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný <mkoutny@suse.com> wrote:
> On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > +struct cgroup_subsys drm_cgrp_subsys = {
> > +     .css_alloc      = drmcg_css_alloc,
> > +     .css_free       = drmcg_css_free,
> > +     .early_init     = false,
> > +     .legacy_cftypes = files,
> Do you really want to expose the DRM controller on v1 hierarchies (where
> threads of one process can be in different cgroups, or children cgroups
> compete with their parents)?

(Sorry for the delay, I have been distracted by something else.)
Yes, I am hoping to make the functionality as widely available as
possible since the ecosystem is still transitioning to v2.  Do you see
inherent problem with this approach?

Regards,
Kenny


>
> > +     .dfl_cftypes    = files,
> > +};
>
> Just asking,
> Michal
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
@ 2019-11-29  6:00           ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  6:00 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Tejun Heo, dri-devel, Greathouse, Joseph, Alex Deucher, cgroups,
	Christian König

On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný <mkoutny@suse.com> wrote:
> On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > +struct cgroup_subsys drm_cgrp_subsys = {
> > +     .css_alloc      = drmcg_css_alloc,
> > +     .css_free       = drmcg_css_free,
> > +     .early_init     = false,
> > +     .legacy_cftypes = files,
> Do you really want to expose the DRM controller on v1 hierarchies (where
> threads of one process can be in different cgroups, or children cgroups
> compete with their parents)?

(Sorry for the delay, I have been distracted by something else.)
Yes, I am hoping to make the functionality as widely available as
possible since the ecosystem is still transitioning to v2.  Do you see
inherent problem with this approach?

Regards,
Kenny


>
> > +     .dfl_cftypes    = files,
> > +};
>
> Just asking,
> Michal
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
@ 2019-11-29  6:00           ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  6:00 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Daniel Vetter, Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list,
	lkaplan, Tejun Heo, dri-devel, Greathouse, Joseph, Alex Deucher,
	cgroups, Christian König

On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný <mkoutny@suse.com> wrote:
> On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > +struct cgroup_subsys drm_cgrp_subsys = {
> > +     .css_alloc      = drmcg_css_alloc,
> > +     .css_free       = drmcg_css_free,
> > +     .early_init     = false,
> > +     .legacy_cftypes = files,
> Do you really want to expose the DRM controller on v1 hierarchies (where
> threads of one process can be in different cgroups, or children cgroups
> compete with their parents)?

(Sorry for the delay, I have been distracted by something else.)
Yes, I am hoping to make the functionality as widely available as
possible since the ecosystem is still transitioning to v2.  Do you see
inherent problem with this approach?

Regards,
Kenny


>
> > +     .dfl_cftypes    = files,
> > +};
>
> Just asking,
> Michal
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
@ 2019-11-29  6:00           ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  6:00 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Daniel Vetter, Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Tejun Heo, dri-devel,
	Greathouse, Joseph, Alex Deucher, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Tue, Oct 1, 2019 at 10:31 AM Michal Koutn√Ω <mkoutny@suse.com> wrote:
> On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > +struct cgroup_subsys drm_cgrp_subsys = {
> > +     .css_alloc      = drmcg_css_alloc,
> > +     .css_free       = drmcg_css_free,
> > +     .early_init     = false,
> > +     .legacy_cftypes = files,
> Do you really want to expose the DRM controller on v1 hierarchies (where
> threads of one process can be in different cgroups, or children cgroups
> compete with their parents)?

(Sorry for the delay, I have been distracted by something else.)
Yes, I am hoping to make the functionality as widely available as
possible since the ecosystem is still transitioning to v2.  Do you see
inherent problem with this approach?

Regards,
Kenny


>
> > +     .dfl_cftypes    = files,
> > +};
>
> Just asking,
> Michal
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit
@ 2019-11-29  7:18         ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  7:18 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Tejun Heo, dri-devel, Greathouse, Joseph, Alex Deucher, cgroups,
	Christian König

On Tue, Oct 1, 2019 at 10:30 AM Michal Koutný <mkoutny@suse.com> wrote:
> On Thu, Aug 29, 2019 at 02:05:24AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > drm.buffer.default
> >         A read-only flat-keyed file which exists on the root cgroup.
> >         Each entry is keyed by the drm device's major:minor.
> >
> >         Default limits on the total GEM buffer allocation in bytes.
> What is the purpose of this attribute (and alikes for other resources)?
> I can't see it being set differently but S64_MAX in
> drmcg_device_early_init.

cgroup has a number of conventions and one of which is the idea of a
default.  The idea here is to allow for device specific defaults.  For
this specific resource, I can probably not expose it since it's not
particularly useful, but for other resources (such as the lgpu
resource) the concept of a default is useful (for example, different
devices can have different number of lgpu.)


> > +static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > [...]
> > +             switch (type) {
> > +             case DRMCG_TYPE_BO_TOTAL:
> > +                     p_max = parent == NULL ? S64_MAX :
> > +                             parent->dev_resources[minor]->
> > +                             bo_limits_total_allocated;
> > +
> > +                     rc = drmcg_process_limit_s64_val(sattr, true,
> > +                             props->bo_limits_total_allocated_default,
> > +                             p_max,
> > +                             &val);
> IIUC, this allows initiating the particular limit value based either on
> parent or the default per-device value. This is alas rather an
> antipattern. The most stringent limit on the path from a cgroup to the
> root should be applied at the charging time. However, the child should
> not inherit the verbatim value from the parent (may race with parent and
> it won't be updated upon parent change).
I think this was a mistake during one of my refactor and I shrunk the
critical section protected by a mutex a bit too much.  But you are
right in the sense that I don't propagate the limits downward to the
children when the parent's limit is updated.  But from the user
interface perspective, wouldn't this be confusing?  When a sysadmin
sets a limit using the 'max' keyword, the value would be a global one
even though the actual allowable maximum for the particular cgroup is
less in reality because of the ancestor cgroups?  (If this is the
established norm, I am ok to go along but seems confusing to me.)  I
am probably missing something because as I implemented this, the 'max'
and 'default' semantic has been confusing to me especially for the
children cgroups due to the context of the ancestors.

> You already do the appropriate hierarchical check in
> drmcg_try_chb_bo_alloc, so the parent propagation could be simply
> dropped if I'm not mistaken.
I will need to double check.  But I think interaction between parent
and children (or perhaps between siblings) will be needed eventually
because there seems to be a desire to implement "weight" type of
resource.  Also, from performance perspective, wouldn't it make more
sense to make sure the limits are set correctly during configuration
than to have to check all the cgroups up through the parents?  I don't
have comprehensive knowledge of the implementation of other cgroup
controllers so if more experience folks can comment that would be
great.  (Although, I probably should just do one approach instead of
doing both... or 1.5.)

>
> Also, I can't find how the read of
> parent->dev_resources[minor]->bo_limits_total_allocated and its
> concurrent update are synchronized (i.e. someone writing
> buffer.total.max for parent and child in parallel). (It may just my
> oversight.)
This is probably the refactor mistake I mentioned earlier.

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit
@ 2019-11-29  7:18         ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  7:18 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Daniel Vetter, Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list,
	lkaplan, Tejun Heo, dri-devel, Greathouse, Joseph, Alex Deucher,
	cgroups, Christian König

On Tue, Oct 1, 2019 at 10:30 AM Michal Koutný <mkoutny@suse.com> wrote:
> On Thu, Aug 29, 2019 at 02:05:24AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > drm.buffer.default
> >         A read-only flat-keyed file which exists on the root cgroup.
> >         Each entry is keyed by the drm device's major:minor.
> >
> >         Default limits on the total GEM buffer allocation in bytes.
> What is the purpose of this attribute (and alikes for other resources)?
> I can't see it being set differently but S64_MAX in
> drmcg_device_early_init.

cgroup has a number of conventions and one of which is the idea of a
default.  The idea here is to allow for device specific defaults.  For
this specific resource, I can probably not expose it since it's not
particularly useful, but for other resources (such as the lgpu
resource) the concept of a default is useful (for example, different
devices can have different number of lgpu.)


> > +static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > [...]
> > +             switch (type) {
> > +             case DRMCG_TYPE_BO_TOTAL:
> > +                     p_max = parent == NULL ? S64_MAX :
> > +                             parent->dev_resources[minor]->
> > +                             bo_limits_total_allocated;
> > +
> > +                     rc = drmcg_process_limit_s64_val(sattr, true,
> > +                             props->bo_limits_total_allocated_default,
> > +                             p_max,
> > +                             &val);
> IIUC, this allows initiating the particular limit value based either on
> parent or the default per-device value. This is alas rather an
> antipattern. The most stringent limit on the path from a cgroup to the
> root should be applied at the charging time. However, the child should
> not inherit the verbatim value from the parent (may race with parent and
> it won't be updated upon parent change).
I think this was a mistake during one of my refactor and I shrunk the
critical section protected by a mutex a bit too much.  But you are
right in the sense that I don't propagate the limits downward to the
children when the parent's limit is updated.  But from the user
interface perspective, wouldn't this be confusing?  When a sysadmin
sets a limit using the 'max' keyword, the value would be a global one
even though the actual allowable maximum for the particular cgroup is
less in reality because of the ancestor cgroups?  (If this is the
established norm, I am ok to go along but seems confusing to me.)  I
am probably missing something because as I implemented this, the 'max'
and 'default' semantic has been confusing to me especially for the
children cgroups due to the context of the ancestors.

> You already do the appropriate hierarchical check in
> drmcg_try_chb_bo_alloc, so the parent propagation could be simply
> dropped if I'm not mistaken.
I will need to double check.  But I think interaction between parent
and children (or perhaps between siblings) will be needed eventually
because there seems to be a desire to implement "weight" type of
resource.  Also, from performance perspective, wouldn't it make more
sense to make sure the limits are set correctly during configuration
than to have to check all the cgroups up through the parents?  I don't
have comprehensive knowledge of the implementation of other cgroup
controllers so if more experience folks can comment that would be
great.  (Although, I probably should just do one approach instead of
doing both... or 1.5.)

>
> Also, I can't find how the read of
> parent->dev_resources[minor]->bo_limits_total_allocated and its
> concurrent update are synchronized (i.e. someone writing
> buffer.total.max for parent and child in parallel). (It may just my
> oversight.)
This is probably the refactor mistake I mentioned earlier.

Regards,
Kenny
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit
@ 2019-11-29  7:18         ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-11-29  7:18 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Tejun Heo, dri-devel, Greathouse, Joseph, Alex Deucher, cgroups,
	Christian König

On Tue, Oct 1, 2019 at 10:30 AM Michal Koutn√Ω <mkoutny@suse.com> wrote:
> On Thu, Aug 29, 2019 at 02:05:24AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > drm.buffer.default
> >         A read-only flat-keyed file which exists on the root cgroup.
> >         Each entry is keyed by the drm device's major:minor.
> >
> >         Default limits on the total GEM buffer allocation in bytes.
> What is the purpose of this attribute (and alikes for other resources)?
> I can't see it being set differently but S64_MAX in
> drmcg_device_early_init.

cgroup has a number of conventions and one of which is the idea of a
default.  The idea here is to allow for device specific defaults.  For
this specific resource, I can probably not expose it since it's not
particularly useful, but for other resources (such as the lgpu
resource) the concept of a default is useful (for example, different
devices can have different number of lgpu.)


> > +static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > [...]
> > +             switch (type) {
> > +             case DRMCG_TYPE_BO_TOTAL:
> > +                     p_max = parent == NULL ? S64_MAX :
> > +                             parent->dev_resources[minor]->
> > +                             bo_limits_total_allocated;
> > +
> > +                     rc = drmcg_process_limit_s64_val(sattr, true,
> > +                             props->bo_limits_total_allocated_default,
> > +                             p_max,
> > +                             &val);
> IIUC, this allows initiating the particular limit value based either on
> parent or the default per-device value. This is alas rather an
> antipattern. The most stringent limit on the path from a cgroup to the
> root should be applied at the charging time. However, the child should
> not inherit the verbatim value from the parent (may race with parent and
> it won't be updated upon parent change).
I think this was a mistake during one of my refactor and I shrunk the
critical section protected by a mutex a bit too much.  But you are
right in the sense that I don't propagate the limits downward to the
children when the parent's limit is updated.  But from the user
interface perspective, wouldn't this be confusing?  When a sysadmin
sets a limit using the 'max' keyword, the value would be a global one
even though the actual allowable maximum for the particular cgroup is
less in reality because of the ancestor cgroups?  (If this is the
established norm, I am ok to go along but seems confusing to me.)  I
am probably missing something because as I implemented this, the 'max'
and 'default' semantic has been confusing to me especially for the
children cgroups due to the context of the ancestors.

> You already do the appropriate hierarchical check in
> drmcg_try_chb_bo_alloc, so the parent propagation could be simply
> dropped if I'm not mistaken.
I will need to double check.  But I think interaction between parent
and children (or perhaps between siblings) will be needed eventually
because there seems to be a desire to implement "weight" type of
resource.  Also, from performance perspective, wouldn't it make more
sense to make sure the limits are set correctly during configuration
than to have to check all the cgroups up through the parents?  I don't
have comprehensive knowledge of the implementation of other cgroup
controllers so if more experience folks can comment that would be
great.  (Although, I probably should just do one approach instead of
doing both... or 1.5.)

>
> Also, I can't find how the read of
> parent->dev_resources[minor]->bo_limits_total_allocated and its
> concurrent update are synchronized (i.e. someone writing
> buffer.total.max for parent and child in parallel). (It may just my
> oversight.)
This is probably the refactor mistake I mentioned earlier.

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
@ 2019-11-29 20:10                               ` Felix Kuehling
  0 siblings, 0 replies; 89+ messages in thread
From: Felix Kuehling @ 2019-11-29 20:10 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, Daniel Vetter
  Cc: Ho, Kenny, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Greathouse, Joseph,
	Deucher, Alexander, cgroups-u79uwXL29TY76Z2rM5mHXA, Koenig,
	Christian

On 2019-10-11 1:12 p.m., tj@kernel.org wrote:
> Hello, Daniel.
>
> On Wed, Oct 09, 2019 at 06:06:52PM +0200, Daniel Vetter wrote:
>> That's not the point I was making. For cpu cgroups there's a very well
>> defined connection between the cpu bitmasks/numbers in cgroups and the cpu
>> bitmasks you use in various system calls (they match). And that stuff
>> works across vendors.
> Please note that there are a lot of limitations even to cpuset.
> Affinity is easy to implement and seems attractive in terms of
> absolute isolation but it's inherently cumbersome and limited in
> granularity and can lead to surprising failure modes where contention
> on one cpu can't be resolved by the load balancer and leads to system
> wide slowdowns / stalls caused by the dependency chain anchored at the
> affinity limited tasks.
>
> Maybe this is a less of a problem for gpu workloads but in general the
> more constraints are put on scheduling, the more likely is the system
> to develop twisted dependency chains while other parts of the system
> are sitting idle.
>
> How does scheduling currently work when there are competing gpu
> workloads?  There gotta be some fairness provision whether that's unit
> allocation based or time slicing, right?

The scheduling of competing workloads on GPUs is handled in hardware and 
firmware. The Linux kernel and driver are not really involved. We have 
some knobs we can tweak in the driver (queue and pipe priorities, 
resource reservations for certain types of workloads), but they are 
pretty HW-specific and I wouldn't make any claims about fairness.

Regards,
   Felix

>    If that's the case, it might
> be best to implement proportional control on top of that.
> Work-conserving mechanisms are the most versatile, easiest to use and
> least likely to cause regressions.
>
> Thanks.
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
@ 2019-11-29 20:10                               ` Felix Kuehling
  0 siblings, 0 replies; 89+ messages in thread
From: Felix Kuehling @ 2019-11-29 20:10 UTC (permalink / raw)
  To: tj, Daniel Vetter
  Cc: Ho, Kenny, jsparks, amd-gfx, lkaplan, y2kenny, dri-devel,
	Greathouse, Joseph, Deucher, Alexander, cgroups, Koenig,
	Christian

On 2019-10-11 1:12 p.m., tj@kernel.org wrote:
> Hello, Daniel.
>
> On Wed, Oct 09, 2019 at 06:06:52PM +0200, Daniel Vetter wrote:
>> That's not the point I was making. For cpu cgroups there's a very well
>> defined connection between the cpu bitmasks/numbers in cgroups and the cpu
>> bitmasks you use in various system calls (they match). And that stuff
>> works across vendors.
> Please note that there are a lot of limitations even to cpuset.
> Affinity is easy to implement and seems attractive in terms of
> absolute isolation but it's inherently cumbersome and limited in
> granularity and can lead to surprising failure modes where contention
> on one cpu can't be resolved by the load balancer and leads to system
> wide slowdowns / stalls caused by the dependency chain anchored at the
> affinity limited tasks.
>
> Maybe this is a less of a problem for gpu workloads but in general the
> more constraints are put on scheduling, the more likely is the system
> to develop twisted dependency chains while other parts of the system
> are sitting idle.
>
> How does scheduling currently work when there are competing gpu
> workloads?  There gotta be some fairness provision whether that's unit
> allocation based or time slicing, right?

The scheduling of competing workloads on GPUs is handled in hardware and 
firmware. The Linux kernel and driver are not really involved. We have 
some knobs we can tweak in the driver (queue and pipe priorities, 
resource reservations for certain types of workloads), but they are 
pretty HW-specific and I wouldn't make any claims about fairness.

Regards,
   Felix

>    If that's the case, it might
> be best to implement proportional control on top of that.
> Work-conserving mechanisms are the most versatile, easiest to use and
> least likely to cause regressions.
>
> Thanks.
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
@ 2019-11-29 20:10                               ` Felix Kuehling
  0 siblings, 0 replies; 89+ messages in thread
From: Felix Kuehling @ 2019-11-29 20:10 UTC (permalink / raw)
  To: tj, Daniel Vetter
  Cc: Ho, Kenny, jsparks, amd-gfx, lkaplan, y2kenny, dri-devel,
	Greathouse, Joseph, Deucher, Alexander, cgroups, Koenig,
	Christian

On 2019-10-11 1:12 p.m., tj@kernel.org wrote:
> Hello, Daniel.
>
> On Wed, Oct 09, 2019 at 06:06:52PM +0200, Daniel Vetter wrote:
>> That's not the point I was making. For cpu cgroups there's a very well
>> defined connection between the cpu bitmasks/numbers in cgroups and the cpu
>> bitmasks you use in various system calls (they match). And that stuff
>> works across vendors.
> Please note that there are a lot of limitations even to cpuset.
> Affinity is easy to implement and seems attractive in terms of
> absolute isolation but it's inherently cumbersome and limited in
> granularity and can lead to surprising failure modes where contention
> on one cpu can't be resolved by the load balancer and leads to system
> wide slowdowns / stalls caused by the dependency chain anchored at the
> affinity limited tasks.
>
> Maybe this is a less of a problem for gpu workloads but in general the
> more constraints are put on scheduling, the more likely is the system
> to develop twisted dependency chains while other parts of the system
> are sitting idle.
>
> How does scheduling currently work when there are competing gpu
> workloads?  There gotta be some fairness provision whether that's unit
> allocation based or time slicing, right?

The scheduling of competing workloads on GPUs is handled in hardware and 
firmware. The Linux kernel and driver are not really involved. We have 
some knobs we can tweak in the driver (queue and pipe priorities, 
resource reservations for certain types of workloads), but they are 
pretty HW-specific and I wouldn't make any claims about fairness.

Regards,
   Felix

>    If that's the case, it might
> be best to implement proportional control on top of that.
> Work-conserving mechanisms are the most versatile, easiest to use and
> least likely to cause regressions.
>
> Thanks.
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
@ 2019-11-29 20:10                               ` Felix Kuehling
  0 siblings, 0 replies; 89+ messages in thread
From: Felix Kuehling @ 2019-11-29 20:10 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, Daniel Vetter
  Cc: Ho, Kenny, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Greathouse, Joseph,
	Deucher, Alexander, cgroups-u79uwXL29TY76Z2rM5mHXA, Koenig,
	Christian

On 2019-10-11 1:12 p.m., tj@kernel.org wrote:
> Hello, Daniel.
>
> On Wed, Oct 09, 2019 at 06:06:52PM +0200, Daniel Vetter wrote:
>> That's not the point I was making. For cpu cgroups there's a very well
>> defined connection between the cpu bitmasks/numbers in cgroups and the cpu
>> bitmasks you use in various system calls (they match). And that stuff
>> works across vendors.
> Please note that there are a lot of limitations even to cpuset.
> Affinity is easy to implement and seems attractive in terms of
> absolute isolation but it's inherently cumbersome and limited in
> granularity and can lead to surprising failure modes where contention
> on one cpu can't be resolved by the load balancer and leads to system
> wide slowdowns / stalls caused by the dependency chain anchored at the
> affinity limited tasks.
>
> Maybe this is a less of a problem for gpu workloads but in general the
> more constraints are put on scheduling, the more likely is the system
> to develop twisted dependency chains while other parts of the system
> are sitting idle.
>
> How does scheduling currently work when there are competing gpu
> workloads?  There gotta be some fairness provision whether that's unit
> allocation based or time slicing, right?

The scheduling of competing workloads on GPUs is handled in hardware and 
firmware. The Linux kernel and driver are not really involved. We have 
some knobs we can tweak in the driver (queue and pipe priorities, 
resource reservations for certain types of workloads), but they are 
pretty HW-specific and I wouldn't make any claims about fairness.

Regards,
   Felix

>    If that's the case, it might
> be best to implement proportional control on top of that.
> Work-conserving mechanisms are the most versatile, easiest to use and
> least likely to cause regressions.
>
> Thanks.
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
@ 2019-12-02 19:14               ` Tejun Heo
  0 siblings, 0 replies; 89+ messages in thread
From: Tejun Heo @ 2019-12-02 19:14 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Daniel Vetter, Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Michal Koutný,
	dri-devel, Greathouse, Joseph, Alex Deucher,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Fri, Nov 29, 2019 at 01:00:36AM -0500, Kenny Ho wrote:
> On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný <mkoutny@suse.com> wrote:
> > On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > > +struct cgroup_subsys drm_cgrp_subsys = {
> > > +     .css_alloc      = drmcg_css_alloc,
> > > +     .css_free       = drmcg_css_free,
> > > +     .early_init     = false,
> > > +     .legacy_cftypes = files,
> > Do you really want to expose the DRM controller on v1 hierarchies (where
> > threads of one process can be in different cgroups, or children cgroups
> > compete with their parents)?
> 
> (Sorry for the delay, I have been distracted by something else.)
> Yes, I am hoping to make the functionality as widely available as
> possible since the ecosystem is still transitioning to v2.  Do you see
> inherent problem with this approach?

Integrating with memcg could be more challenging on cgroup1.  That's
one of the reasons why e.g. cgroup-aware pagecache writeback is only
on cgroup2.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
@ 2019-12-02 19:14               ` Tejun Heo
  0 siblings, 0 replies; 89+ messages in thread
From: Tejun Heo @ 2019-12-02 19:14 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list, lkaplan,
	Michal Koutný,
	dri-devel, Greathouse, Joseph, Alex Deucher, cgroups,
	Christian König

On Fri, Nov 29, 2019 at 01:00:36AM -0500, Kenny Ho wrote:
> On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný <mkoutny@suse.com> wrote:
> > On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > > +struct cgroup_subsys drm_cgrp_subsys = {
> > > +     .css_alloc      = drmcg_css_alloc,
> > > +     .css_free       = drmcg_css_free,
> > > +     .early_init     = false,
> > > +     .legacy_cftypes = files,
> > Do you really want to expose the DRM controller on v1 hierarchies (where
> > threads of one process can be in different cgroups, or children cgroups
> > compete with their parents)?
> 
> (Sorry for the delay, I have been distracted by something else.)
> Yes, I am hoping to make the functionality as widely available as
> possible since the ecosystem is still transitioning to v2.  Do you see
> inherent problem with this approach?

Integrating with memcg could be more challenging on cgroup1.  That's
one of the reasons why e.g. cgroup-aware pagecache writeback is only
on cgroup2.

-- 
tejun
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
@ 2019-12-02 19:14               ` Tejun Heo
  0 siblings, 0 replies; 89+ messages in thread
From: Tejun Heo @ 2019-12-02 19:14 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Daniel Vetter, Kenny Ho, Kuehling, Felix, jsparks, amd-gfx list,
	lkaplan, Michal Koutný,
	dri-devel, Greathouse, Joseph, Alex Deucher, cgroups,
	Christian König

On Fri, Nov 29, 2019 at 01:00:36AM -0500, Kenny Ho wrote:
> On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný <mkoutny@suse.com> wrote:
> > On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > > +struct cgroup_subsys drm_cgrp_subsys = {
> > > +     .css_alloc      = drmcg_css_alloc,
> > > +     .css_free       = drmcg_css_free,
> > > +     .early_init     = false,
> > > +     .legacy_cftypes = files,
> > Do you really want to expose the DRM controller on v1 hierarchies (where
> > threads of one process can be in different cgroups, or children cgroups
> > compete with their parents)?
> 
> (Sorry for the delay, I have been distracted by something else.)
> Yes, I am hoping to make the functionality as widely available as
> possible since the ecosystem is still transitioning to v2.  Do you see
> inherent problem with this approach?

Integrating with memcg could be more challenging on cgroup1.  That's
one of the reasons why e.g. cgroup-aware pagecache writeback is only
on cgroup2.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem
@ 2019-12-02 19:14               ` Tejun Heo
  0 siblings, 0 replies; 89+ messages in thread
From: Tejun Heo @ 2019-12-02 19:14 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Daniel Vetter, Kenny Ho, Kuehling, Felix, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Michal Koutný,
	dri-devel, Greathouse, Joseph, Alex Deucher,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Fri, Nov 29, 2019 at 01:00:36AM -0500, Kenny Ho wrote:
> On Tue, Oct 1, 2019 at 10:31 AM Michal Koutn√Ω <mkoutny@suse.com> wrote:
> > On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho <Kenny.Ho@amd.com> wrote:
> > > +struct cgroup_subsys drm_cgrp_subsys = {
> > > +     .css_alloc      = drmcg_css_alloc,
> > > +     .css_free       = drmcg_css_free,
> > > +     .early_init     = false,
> > > +     .legacy_cftypes = files,
> > Do you really want to expose the DRM controller on v1 hierarchies (where
> > threads of one process can be in different cgroups, or children cgroups
> > compete with their parents)?
> 
> (Sorry for the delay, I have been distracted by something else.)
> Yes, I am hoping to make the functionality as widely available as
> possible since the ecosystem is still transitioning to v2.  Do you see
> inherent problem with this approach?

Integrating with memcg could be more challenging on cgroup1.  That's
one of the reasons why e.g. cgroup-aware pagecache writeback is only
on cgroup2.

-- 
tejun
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* RE: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup
  2019-11-29  5:59           ` Kenny Ho
  (?)
@ 2019-12-02 22:05           ` Greathouse, Joseph
  2019-12-03 16:02             ` Kenny Ho
  -1 siblings, 1 reply; 89+ messages in thread
From: Greathouse, Joseph @ 2019-12-02 22:05 UTC (permalink / raw)
  To: Kenny Ho, Kuehling, Felix
  Cc: Ho, Kenny, jsparks, amd-gfx, lkaplan, Deucher, Alexander, Koenig,
	Christian

> -----Original Message-----
> From: Kenny Ho <y2kenny@gmail.com>
> Sent: Friday, November 29, 2019 12:00 AM
> 
> Reducing audience since this is AMD specific.
> 
> On Tue, Oct 8, 2019 at 3:11 PM Kuehling, Felix <Felix.Kuehling@amd.com> wrote:
> >
> > On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > > The number of logical gpu (lgpu) is defined to be the number of
> > > compute unit (CU) for a device.  The lgpu allocation limit only
> > > applies to compute workload for the moment (enforced via kfd queue
> > > creation.)  Any cu_mask update is validated against the availability
> > > of the compute unit as defined by the drmcg the kfd process belongs to.
> >
> > There is something missing here. There is an API for the application
> > to specify a CU mask. Right now it looks like the
> > application-specified and CGroup-specified CU masks would clobber each
> > other. Instead the two should be merged.
> >
> > The CGroup-specified mask should specify a subset of CUs available for
> > application-specified CU masks. When the cgroup CU mask changes, you'd
> > need to take any application-specified CU masks into account before
> > updating the hardware.
> The idea behind the current implementation is to give sysadmin priority over user application (as that is the definition of control
> group.)  Mask specified by applicatoin/user is validated by pqm_drmcg_lgpu_validate and rejected with EACCES if they are not
> compatible.  The alternative is to ignore the difference and have the kernel guess/redistribute the assignment but I am not sure if this
> is a good approach since there is not enough information to allow the kernel to guess the user's intention correctly consistently.  (This
> is base on multiple conversations with you and Joe that, led me to believe, there are situation where spreading CU assignment across
> multiple SE is a good thing but not always.)
> 
> If the cgroup-specified mask is changed after the application has set the mask, the intersection of the two masks will be set instead.  It
> is possible to have no intersection and in this case no CU is made available to the application (just like the possibility for memcgroup to
> starve the amount of memory needed by an application.)

I don't disagree with forcing a user to work within an lgpu's allocation. But there's two minor problems here:

1) we will need a way for the process to query what the lgpu's bitmap looks like. You and Felix are somewhat discussing this below, but I don't think the KFD's "number of CUs" topology information is sufficient. I can know I have 32 CUs, but I don't know which 32 bits in the bitmask are turned on. But your code in pqm_drmcg_lgpu_validate() requires a subset when setting  CU mask on an lgpu. A user needs to know what bits are on in the LGPU for this to work.
2) Even if we have a query API, do we have an easy way to prevent a data race? Do we care? For instance, if I query the existing lgpu bitmap, then try to set a CU mask on a subset of that, it's possible that the lgpu will change between the query and set. That would make the setting fail, maybe that's good enough (you can just try in a loop until it succeeds?) 

Do empty CU masks actually work? This seems like something we would want to avoid. This could happen not infrequently if someone does something like:
* lgpu with half the CUs enabled
* User sets a mask to use half of those CUs
* lgpu is changed to enable the other half of the CUS --> now the user's mask is fully destroyed and everything dies. :\

> > The KFD topology APIs report the number of available CUs to the
> > application. CGroups would change that number at runtime and
> > applications would not expect that. I think the best way to deal with
> > that would be to have multiple bits in the application-specified CU
> > mask map to the same CU. How to do that in a fair way is not obvious.
> > I guess a more coarse-grain division of the GPU into LGPUs would make
> > this somewhat easier.
> Another possibility is to add namespace to the topology sysfs such that the correct number of CUs changes accordingly.  Although that
> wouldn't give the user the available mask that is made available by this implementation via the cgroup sysfs.  Another possibility is to
> modify the thunk similar to what was done for device cgroup (device
> re-mapping.)

I'd vote for a set of mask query APIs in the Thunk. One for the process's current CU mask, and one for a queue's current CU mask. We have a setter API already. Since the KFD topology information is also mirrored in sysfs, I would worry that a process would see different KFD topology information if it's querying the Thunk (which would show the lgpu's number of CUS0 vs. if it's reading sysfs (which would show the GPU's number of CUs).

As mentioned above, the KFD "num CUs" is insufficient for knowing how to set the CU bitmask, so I don't think we should rely on it in this case. IMO, KFD topology should describe the real hardware regardless of how cgroups is limiting things. I'm willing to be told this is a bad idea, though.

> > How is this problem handled for CPU cores and the interaction with CPU
> > pthread_setaffinity_np?
> Per the documentation of pthread_setaffinity_np, "If the call is successful, and the thread is not currently running on one of the CPUs
> in cpuset, then it is migrated to one of those CPUs."
> http://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
>
> Regards,
> Kenny
> 
> 
> 
> > Regards,
> >    Felix
> >
> >
> > >
> > > Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
> > > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  21 +++
> > >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   6 +
> > >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +
> > >   .../amd/amdkfd/kfd_process_queue_manager.c    | 140 ++++++++++++++++++
> > >   5 files changed, 174 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > index 55cb1b2094fd..369915337213 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
> > >               valid;                                                  \
> > >       })
> > >
> > > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > > +             struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
> > > +             unsigned int nbits);
> > > +
> > >   /* GPUVM API */
> > >   int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
> > >                                       void **vm, void
> > > **process_info, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index 163a4fbf0611..8abeffdd2e5b 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
> > >   static void amdgpu_drmcg_custom_init(struct drm_device *dev,
> > >       struct drmcg_props *props)
> > >   {
> > > +     struct amdgpu_device *adev = dev->dev_private;
> > > +
> > > +     props->lgpu_capacity = adev->gfx.cu_info.number;
> > > +
> > >       props->limit_enforced = true;
> > >   }
> > >
> > > +static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
> > > +             struct task_struct *task, struct drmcg_device_resource *ddr,
> > > +             enum drmcg_res_type res_type) {
> > > +     struct amdgpu_device *adev = dev->dev_private;
> > > +
> > > +     switch (res_type) {
> > > +     case DRMCG_TYPE_LGPU:
> > > +             amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
> > > +                        ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
> > > +             break;
> > > +     default:
> > > +             break;
> > > +     }
> > > +}
> > > +
> > >   static struct drm_driver kms_driver = {
> > >       .driver_features =
> > >           DRIVER_USE_AGP | DRIVER_ATOMIC | @@ -1438,6 +1458,7 @@
> > > static struct drm_driver kms_driver = {
> > >       .gem_prime_mmap = amdgpu_gem_prime_mmap,
> > >
> > >       .drmcg_custom_init = amdgpu_drmcg_custom_init,
> > > +     .drmcg_limit_updated = amdgpu_drmcg_limit_updated,
> > >
> > >       .name = DRIVER_NAME,
> > >       .desc = DRIVER_DESC,
> > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > index 138c70454e2b..fa765b803f97 100644
> > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > @@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
> > >               return -EFAULT;
> > >       }
> > >
> > > +     if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, cu_mask_size)) {
> > > +             pr_debug("CU mask not permitted by DRM Cgroup");
> > > +             kfree(properties.cu_mask);
> > > +             return -EACCES;
> > > +     }
> > > +
> > >       mutex_lock(&p->mutex);
> > >
> > >       retval = pqm_set_cu_mask(&p->pqm, args->queue_id,
> > > &properties); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > > index 8b0eee5b3521..88881bec7550 100644
> > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > > @@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> > >                      u32 *ctl_stack_used_size,
> > >                      u32 *save_area_used_size);
> > >
> > > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> > > +             unsigned int cu_mask_size);
> > > +
> > >   int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
> > >                               unsigned int fence_value,
> > >                               unsigned int timeout_ms); diff --git
> > > a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > > b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > > index 7e6c3ee82f5b..a896de290307 100644
> > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > > @@ -23,9 +23,11 @@
> > >
> > >   #include <linux/slab.h>
> > >   #include <linux/list.h>
> > > +#include <linux/cgroup_drm.h>
> > >   #include "kfd_device_queue_manager.h"
> > >   #include "kfd_priv.h"
> > >   #include "kfd_kernel_queue.h"
> > > +#include "amdgpu.h"
> > >   #include "amdgpu_amdkfd.h"
> > >
> > >   static inline struct process_queue_node *get_queue_by_qid( @@
> > > -167,6 +169,7 @@ static int create_cp_queue(struct process_queue_manager *pqm,
> > >                               struct queue_properties *q_properties,
> > >                               struct file *f, unsigned int qid)
> > >   {
> > > +     struct drmcg *drmcg;
> > >       int retval;
> > >
> > >       /* Doorbell initialized in user space*/ @@ -180,6 +183,36 @@
> > > static int create_cp_queue(struct process_queue_manager *pqm,
> > >       if (retval != 0)
> > >               return retval;
> > >
> > > +
> > > +     drmcg = drmcg_get(pqm->process->lead_thread);
> > > +     if (drmcg) {
> > > +             struct amdgpu_device *adev;
> > > +             struct drmcg_device_resource *ddr;
> > > +             int mask_size;
> > > +             u32 *mask;
> > > +
> > > +             adev = (struct amdgpu_device *) dev->kgd;
> > > +
> > > +             mask_size = adev->ddev->drmcg_props.lgpu_capacity;
> > > +             mask = kzalloc(sizeof(u32) * round_up(mask_size, 32),
> > > +                             GFP_KERNEL);
> > > +
> > > +             if (!mask) {
> > > +                     drmcg_put(drmcg);
> > > +                     uninit_queue(*q);
> > > +                     return -ENOMEM;
> > > +             }
> > > +
> > > +             ddr =
> > > + drmcg->dev_resources[adev->ddev->primary->index];
> > > +
> > > +             bitmap_to_arr32(mask, ddr->lgpu_allocated, mask_size);
> > > +
> > > +             (*q)->properties.cu_mask_count = mask_size;
> > > +             (*q)->properties.cu_mask = mask;
> > > +
> > > +             drmcg_put(drmcg);
> > > +     }
> > > +
> > >       (*q)->device = dev;
> > >       (*q)->process = pqm->process;
> > >
> > > @@ -495,6 +528,113 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> > >                                                      save_area_used_size);
> > >   }
> > >
> > > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> > > +             unsigned int cu_mask_size) {
> > > +     DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY);
> > > +     struct drmcg_device_resource *ddr;
> > > +     struct process_queue_node *pqn;
> > > +     struct amdgpu_device *adev;
> > > +     struct drmcg *drmcg;
> > > +     bool result;
> > > +
> > > +     if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY)
> > > +             return false;
> > > +
> > > +     bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size);
> > > +
> > > +     pqn = get_queue_by_qid(&p->pqm, qid);
> > > +     if (!pqn)
> > > +             return false;
> > > +
> > > +     adev = (struct amdgpu_device *)pqn->q->device->kgd;
> > > +
> > > +     drmcg = drmcg_get(p->lead_thread);
> > > +     ddr = drmcg->dev_resources[adev->ddev->primary->index];
> > > +
> > > +     if (bitmap_subset(curr_mask, ddr->lgpu_allocated,
> > > +                             MAX_DRMCG_LGPU_CAPACITY))
> > > +             result = true;
> > > +     else
> > > +             result = false;
> > > +
> > > +     drmcg_put(drmcg);
> > > +
> > > +     return result;
> > > +}
> > > +
> > > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > > +             struct amdgpu_device *adev, unsigned long *lgpu_bm,
> > > +             unsigned int lgpu_bm_size) {
> > > +     struct kfd_dev *kdev = adev->kfd.dev;
> > > +     struct process_queue_node *pqn;
> > > +     struct kfd_process *kfdproc;
> > > +     size_t size_in_bytes;
> > > +     u32 *cu_mask;
> > > +     int rc = 0;
> > > +
> > > +     if ((lgpu_bm_size % 32) != 0) {
> > > +             pr_warn("lgpu_bm_size %d must be a multiple of 32",
> > > +                             lgpu_bm_size);
> > > +             return -EINVAL;
> > > +     }
> > > +
> > > +     kfdproc = kfd_get_process(task);
> > > +
> > > +     if (IS_ERR(kfdproc))
> > > +             return -ESRCH;
> > > +
> > > +     size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32);
> > > +
> > > +     mutex_lock(&kfdproc->mutex);
> > > +     list_for_each_entry(pqn, &kfdproc->pqm.queues, process_queue_list) {
> > > +             if (pqn->q && pqn->q->device == kdev) {
> > > +                     /* update cu_mask accordingly */
> > > +                     cu_mask = kzalloc(size_in_bytes, GFP_KERNEL);
> > > +                     if (!cu_mask) {
> > > +                             rc = -ENOMEM;
> > > +                             break;
> > > +                     }
> > > +
> > > +                     if (pqn->q->properties.cu_mask) {
> > > +                             DECLARE_BITMAP(curr_mask,
> > > +
> > > + MAX_DRMCG_LGPU_CAPACITY);
> > > +
> > > +                             if (pqn->q->properties.cu_mask_count >
> > > +                                             lgpu_bm_size) {
> > > +                                     rc = -EINVAL;
> > > +                                     kfree(cu_mask);
> > > +                                     break;
> > > +                             }
> > > +
> > > +                             bitmap_from_arr32(curr_mask,
> > > +                                             pqn->q->properties.cu_mask,
> > > +
> > > + pqn->q->properties.cu_mask_count);
> > > +
> > > +                             bitmap_and(curr_mask, curr_mask, lgpu_bm,
> > > +                                             lgpu_bm_size);
> > > +
> > > +                             bitmap_to_arr32(cu_mask, curr_mask,
> > > +                                             lgpu_bm_size);
> > > +
> > > +                             kfree(curr_mask);
> > > +                     } else
> > > +                             bitmap_to_arr32(cu_mask, lgpu_bm,
> > > +                                             lgpu_bm_size);
> > > +
> > > +                     pqn->q->properties.cu_mask = cu_mask;
> > > +                     pqn->q->properties.cu_mask_count =
> > > + lgpu_bm_size;
> > > +
> > > +                     rc = pqn->q->device->dqm->ops.update_queue(
> > > +                                     pqn->q->device->dqm, pqn->q);
> > > +             }
> > > +     }
> > > +     mutex_unlock(&kfdproc->mutex);
> > > +
> > > +     return rc;
> > > +}
> > > +
> > >   #if defined(CONFIG_DEBUG_FS)
> > >
> > >   int pqm_debugfs_mqds(struct seq_file *m, void *data)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup
  2019-12-02 22:05           ` Greathouse, Joseph
@ 2019-12-03 16:02             ` Kenny Ho
  0 siblings, 0 replies; 89+ messages in thread
From: Kenny Ho @ 2019-12-03 16:02 UTC (permalink / raw)
  To: Greathouse, Joseph
  Cc: Ho, Kenny, Kuehling, Felix, jsparks, amd-gfx, lkaplan, Deucher,
	Alexander, Koenig, Christian

Hey Joe,

I don't have all the answers right now but one thing I want to mention
is that, with cgroup, there's always a possibility for a user
configuration that lead to under resource for the application.  Your
comments certainly highlight the needs to make under-resource
situation obvious to debug.  (I want to write this down so I don't
forget also... :) I should probably have some dmesg for situation like
this.)  Thanks!

Regards,
Kenny

On Mon, Dec 2, 2019 at 5:05 PM Greathouse, Joseph
<Joseph.Greathouse@amd.com> wrote:
>
> > -----Original Message-----
> > From: Kenny Ho <y2kenny@gmail.com>
> > Sent: Friday, November 29, 2019 12:00 AM
> >
> > Reducing audience since this is AMD specific.
> >
> > On Tue, Oct 8, 2019 at 3:11 PM Kuehling, Felix <Felix.Kuehling@amd.com> wrote:
> > >
> > > On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > > > The number of logical gpu (lgpu) is defined to be the number of
> > > > compute unit (CU) for a device.  The lgpu allocation limit only
> > > > applies to compute workload for the moment (enforced via kfd queue
> > > > creation.)  Any cu_mask update is validated against the availability
> > > > of the compute unit as defined by the drmcg the kfd process belongs to.
> > >
> > > There is something missing here. There is an API for the application
> > > to specify a CU mask. Right now it looks like the
> > > application-specified and CGroup-specified CU masks would clobber each
> > > other. Instead the two should be merged.
> > >
> > > The CGroup-specified mask should specify a subset of CUs available for
> > > application-specified CU masks. When the cgroup CU mask changes, you'd
> > > need to take any application-specified CU masks into account before
> > > updating the hardware.
> > The idea behind the current implementation is to give sysadmin priority over user application (as that is the definition of control
> > group.)  Mask specified by applicatoin/user is validated by pqm_drmcg_lgpu_validate and rejected with EACCES if they are not
> > compatible.  The alternative is to ignore the difference and have the kernel guess/redistribute the assignment but I am not sure if this
> > is a good approach since there is not enough information to allow the kernel to guess the user's intention correctly consistently.  (This
> > is base on multiple conversations with you and Joe that, led me to believe, there are situation where spreading CU assignment across
> > multiple SE is a good thing but not always.)
> >
> > If the cgroup-specified mask is changed after the application has set the mask, the intersection of the two masks will be set instead.  It
> > is possible to have no intersection and in this case no CU is made available to the application (just like the possibility for memcgroup to
> > starve the amount of memory needed by an application.)
>
> I don't disagree with forcing a user to work within an lgpu's allocation. But there's two minor problems here:
>
> 1) we will need a way for the process to query what the lgpu's bitmap looks like. You and Felix are somewhat discussing this below, but I don't think the KFD's "number of CUs" topology information is sufficient. I can know I have 32 CUs, but I don't know which 32 bits in the bitmask are turned on. But your code in pqm_drmcg_lgpu_validate() requires a subset when setting  CU mask on an lgpu. A user needs to know what bits are on in the LGPU for this to work.
> 2) Even if we have a query API, do we have an easy way to prevent a data race? Do we care? For instance, if I query the existing lgpu bitmap, then try to set a CU mask on a subset of that, it's possible that the lgpu will change between the query and set. That would make the setting fail, maybe that's good enough (you can just try in a loop until it succeeds?)
>
> Do empty CU masks actually work? This seems like something we would want to avoid. This could happen not infrequently if someone does something like:
> * lgpu with half the CUs enabled
> * User sets a mask to use half of those CUs
> * lgpu is changed to enable the other half of the CUS --> now the user's mask is fully destroyed and everything dies. :\
>
> > > The KFD topology APIs report the number of available CUs to the
> > > application. CGroups would change that number at runtime and
> > > applications would not expect that. I think the best way to deal with
> > > that would be to have multiple bits in the application-specified CU
> > > mask map to the same CU. How to do that in a fair way is not obvious.
> > > I guess a more coarse-grain division of the GPU into LGPUs would make
> > > this somewhat easier.
> > Another possibility is to add namespace to the topology sysfs such that the correct number of CUs changes accordingly.  Although that
> > wouldn't give the user the available mask that is made available by this implementation via the cgroup sysfs.  Another possibility is to
> > modify the thunk similar to what was done for device cgroup (device
> > re-mapping.)
>
> I'd vote for a set of mask query APIs in the Thunk. One for the process's current CU mask, and one for a queue's current CU mask. We have a setter API already. Since the KFD topology information is also mirrored in sysfs, I would worry that a process would see different KFD topology information if it's querying the Thunk (which would show the lgpu's number of CUS0 vs. if it's reading sysfs (which would show the GPU's number of CUs).
>
> As mentioned above, the KFD "num CUs" is insufficient for knowing how to set the CU bitmask, so I don't think we should rely on it in this case. IMO, KFD topology should describe the real hardware regardless of how cgroups is limiting things. I'm willing to be told this is a bad idea, though.
>
> > > How is this problem handled for CPU cores and the interaction with CPU
> > > pthread_setaffinity_np?
> > Per the documentation of pthread_setaffinity_np, "If the call is successful, and the thread is not currently running on one of the CPUs
> > in cpuset, then it is migrated to one of those CPUs."
> > http://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
> >
> > Regards,
> > Kenny
> >
> >
> >
> > > Regards,
> > >    Felix
> > >
> > >
> > > >
> > > > Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e
> > > > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > > > ---
> > > >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +
> > > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  21 +++
> > > >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   6 +
> > > >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +
> > > >   .../amd/amdkfd/kfd_process_queue_manager.c    | 140 ++++++++++++++++++
> > > >   5 files changed, 174 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > > index 55cb1b2094fd..369915337213 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> > > > @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
> > > >               valid;                                                  \
> > > >       })
> > > >
> > > > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > > > +             struct amdgpu_device *adev, unsigned long *lgpu_bitmap,
> > > > +             unsigned int nbits);
> > > > +
> > > >   /* GPUVM API */
> > > >   int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
> > > >                                       void **vm, void
> > > > **process_info, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > index 163a4fbf0611..8abeffdd2e5b 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, unsigned int pipe,
> > > >   static void amdgpu_drmcg_custom_init(struct drm_device *dev,
> > > >       struct drmcg_props *props)
> > > >   {
> > > > +     struct amdgpu_device *adev = dev->dev_private;
> > > > +
> > > > +     props->lgpu_capacity = adev->gfx.cu_info.number;
> > > > +
> > > >       props->limit_enforced = true;
> > > >   }
> > > >
> > > > +static void amdgpu_drmcg_limit_updated(struct drm_device *dev,
> > > > +             struct task_struct *task, struct drmcg_device_resource *ddr,
> > > > +             enum drmcg_res_type res_type) {
> > > > +     struct amdgpu_device *adev = dev->dev_private;
> > > > +
> > > > +     switch (res_type) {
> > > > +     case DRMCG_TYPE_LGPU:
> > > > +             amdgpu_amdkfd_update_cu_mask_for_process(task, adev,
> > > > +                        ddr->lgpu_allocated, dev->drmcg_props.lgpu_capacity);
> > > > +             break;
> > > > +     default:
> > > > +             break;
> > > > +     }
> > > > +}
> > > > +
> > > >   static struct drm_driver kms_driver = {
> > > >       .driver_features =
> > > >           DRIVER_USE_AGP | DRIVER_ATOMIC | @@ -1438,6 +1458,7 @@
> > > > static struct drm_driver kms_driver = {
> > > >       .gem_prime_mmap = amdgpu_gem_prime_mmap,
> > > >
> > > >       .drmcg_custom_init = amdgpu_drmcg_custom_init,
> > > > +     .drmcg_limit_updated = amdgpu_drmcg_limit_updated,
> > > >
> > > >       .name = DRIVER_NAME,
> > > >       .desc = DRIVER_DESC,
> > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > > index 138c70454e2b..fa765b803f97 100644
> > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > > > @@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
> > > >               return -EFAULT;
> > > >       }
> > > >
> > > > +     if (!pqm_drmcg_lgpu_validate(p, args->queue_id, properties.cu_mask, cu_mask_size)) {
> > > > +             pr_debug("CU mask not permitted by DRM Cgroup");
> > > > +             kfree(properties.cu_mask);
> > > > +             return -EACCES;
> > > > +     }
> > > > +
> > > >       mutex_lock(&p->mutex);
> > > >
> > > >       retval = pqm_set_cu_mask(&p->pqm, args->queue_id,
> > > > &properties); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > > > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > > > index 8b0eee5b3521..88881bec7550 100644
> > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > > > @@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> > > >                      u32 *ctl_stack_used_size,
> > > >                      u32 *save_area_used_size);
> > > >
> > > > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> > > > +             unsigned int cu_mask_size);
> > > > +
> > > >   int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
> > > >                               unsigned int fence_value,
> > > >                               unsigned int timeout_ms); diff --git
> > > > a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > > > b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > > > index 7e6c3ee82f5b..a896de290307 100644
> > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > > > @@ -23,9 +23,11 @@
> > > >
> > > >   #include <linux/slab.h>
> > > >   #include <linux/list.h>
> > > > +#include <linux/cgroup_drm.h>
> > > >   #include "kfd_device_queue_manager.h"
> > > >   #include "kfd_priv.h"
> > > >   #include "kfd_kernel_queue.h"
> > > > +#include "amdgpu.h"
> > > >   #include "amdgpu_amdkfd.h"
> > > >
> > > >   static inline struct process_queue_node *get_queue_by_qid( @@
> > > > -167,6 +169,7 @@ static int create_cp_queue(struct process_queue_manager *pqm,
> > > >                               struct queue_properties *q_properties,
> > > >                               struct file *f, unsigned int qid)
> > > >   {
> > > > +     struct drmcg *drmcg;
> > > >       int retval;
> > > >
> > > >       /* Doorbell initialized in user space*/ @@ -180,6 +183,36 @@
> > > > static int create_cp_queue(struct process_queue_manager *pqm,
> > > >       if (retval != 0)
> > > >               return retval;
> > > >
> > > > +
> > > > +     drmcg = drmcg_get(pqm->process->lead_thread);
> > > > +     if (drmcg) {
> > > > +             struct amdgpu_device *adev;
> > > > +             struct drmcg_device_resource *ddr;
> > > > +             int mask_size;
> > > > +             u32 *mask;
> > > > +
> > > > +             adev = (struct amdgpu_device *) dev->kgd;
> > > > +
> > > > +             mask_size = adev->ddev->drmcg_props.lgpu_capacity;
> > > > +             mask = kzalloc(sizeof(u32) * round_up(mask_size, 32),
> > > > +                             GFP_KERNEL);
> > > > +
> > > > +             if (!mask) {
> > > > +                     drmcg_put(drmcg);
> > > > +                     uninit_queue(*q);
> > > > +                     return -ENOMEM;
> > > > +             }
> > > > +
> > > > +             ddr =
> > > > + drmcg->dev_resources[adev->ddev->primary->index];
> > > > +
> > > > +             bitmap_to_arr32(mask, ddr->lgpu_allocated, mask_size);
> > > > +
> > > > +             (*q)->properties.cu_mask_count = mask_size;
> > > > +             (*q)->properties.cu_mask = mask;
> > > > +
> > > > +             drmcg_put(drmcg);
> > > > +     }
> > > > +
> > > >       (*q)->device = dev;
> > > >       (*q)->process = pqm->process;
> > > >
> > > > @@ -495,6 +528,113 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
> > > >                                                      save_area_used_size);
> > > >   }
> > > >
> > > > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 *cu_mask,
> > > > +             unsigned int cu_mask_size) {
> > > > +     DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY);
> > > > +     struct drmcg_device_resource *ddr;
> > > > +     struct process_queue_node *pqn;
> > > > +     struct amdgpu_device *adev;
> > > > +     struct drmcg *drmcg;
> > > > +     bool result;
> > > > +
> > > > +     if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY)
> > > > +             return false;
> > > > +
> > > > +     bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size);
> > > > +
> > > > +     pqn = get_queue_by_qid(&p->pqm, qid);
> > > > +     if (!pqn)
> > > > +             return false;
> > > > +
> > > > +     adev = (struct amdgpu_device *)pqn->q->device->kgd;
> > > > +
> > > > +     drmcg = drmcg_get(p->lead_thread);
> > > > +     ddr = drmcg->dev_resources[adev->ddev->primary->index];
> > > > +
> > > > +     if (bitmap_subset(curr_mask, ddr->lgpu_allocated,
> > > > +                             MAX_DRMCG_LGPU_CAPACITY))
> > > > +             result = true;
> > > > +     else
> > > > +             result = false;
> > > > +
> > > > +     drmcg_put(drmcg);
> > > > +
> > > > +     return result;
> > > > +}
> > > > +
> > > > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task,
> > > > +             struct amdgpu_device *adev, unsigned long *lgpu_bm,
> > > > +             unsigned int lgpu_bm_size) {
> > > > +     struct kfd_dev *kdev = adev->kfd.dev;
> > > > +     struct process_queue_node *pqn;
> > > > +     struct kfd_process *kfdproc;
> > > > +     size_t size_in_bytes;
> > > > +     u32 *cu_mask;
> > > > +     int rc = 0;
> > > > +
> > > > +     if ((lgpu_bm_size % 32) != 0) {
> > > > +             pr_warn("lgpu_bm_size %d must be a multiple of 32",
> > > > +                             lgpu_bm_size);
> > > > +             return -EINVAL;
> > > > +     }
> > > > +
> > > > +     kfdproc = kfd_get_process(task);
> > > > +
> > > > +     if (IS_ERR(kfdproc))
> > > > +             return -ESRCH;
> > > > +
> > > > +     size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32);
> > > > +
> > > > +     mutex_lock(&kfdproc->mutex);
> > > > +     list_for_each_entry(pqn, &kfdproc->pqm.queues, process_queue_list) {
> > > > +             if (pqn->q && pqn->q->device == kdev) {
> > > > +                     /* update cu_mask accordingly */
> > > > +                     cu_mask = kzalloc(size_in_bytes, GFP_KERNEL);
> > > > +                     if (!cu_mask) {
> > > > +                             rc = -ENOMEM;
> > > > +                             break;
> > > > +                     }
> > > > +
> > > > +                     if (pqn->q->properties.cu_mask) {
> > > > +                             DECLARE_BITMAP(curr_mask,
> > > > +
> > > > + MAX_DRMCG_LGPU_CAPACITY);
> > > > +
> > > > +                             if (pqn->q->properties.cu_mask_count >
> > > > +                                             lgpu_bm_size) {
> > > > +                                     rc = -EINVAL;
> > > > +                                     kfree(cu_mask);
> > > > +                                     break;
> > > > +                             }
> > > > +
> > > > +                             bitmap_from_arr32(curr_mask,
> > > > +                                             pqn->q->properties.cu_mask,
> > > > +
> > > > + pqn->q->properties.cu_mask_count);
> > > > +
> > > > +                             bitmap_and(curr_mask, curr_mask, lgpu_bm,
> > > > +                                             lgpu_bm_size);
> > > > +
> > > > +                             bitmap_to_arr32(cu_mask, curr_mask,
> > > > +                                             lgpu_bm_size);
> > > > +
> > > > +                             kfree(curr_mask);
> > > > +                     } else
> > > > +                             bitmap_to_arr32(cu_mask, lgpu_bm,
> > > > +                                             lgpu_bm_size);
> > > > +
> > > > +                     pqn->q->properties.cu_mask = cu_mask;
> > > > +                     pqn->q->properties.cu_mask_count =
> > > > + lgpu_bm_size;
> > > > +
> > > > +                     rc = pqn->q->device->dqm->ops.update_queue(
> > > > +                                     pqn->q->device->dqm, pqn->q);
> > > > +             }
> > > > +     }
> > > > +     mutex_unlock(&kfdproc->mutex);
> > > > +
> > > > +     return rc;
> > > > +}
> > > > +
> > > >   #if defined(CONFIG_DEBUG_FS)
> > > >
> > > >   int pqm_debugfs_mqds(struct seq_file *m, void *data)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2019-12-03 16:02 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-29  6:05 [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Kenny Ho
2019-08-29  6:05 ` [PATCH RFC v4 01/16] drm: Add drm_minor_for_each Kenny Ho
     [not found]   ` <20190829060533.32315-2-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-09-03  7:57     ` Daniel Vetter
2019-09-03 19:45       ` Kenny Ho
     [not found]         ` <CAOWid-dxxDhyxP2+0R0oKAk29rR-1TbMyhshR1+gbcpGJCAW6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-03 20:12           ` Daniel Vetter
     [not found]             ` <CAKMK7uEofjdVURu+meonh_YdV5eX8vfNALkW3A_+kLapCV8j+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-03 20:43               ` Kenny Ho
     [not found]                 ` <CAOWid-eUVztW4hNVpznnJRcwHcjCirGL2aS75p4OY8XoGuJqUg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-04  8:54                   ` Daniel Vetter
     [not found]                     ` <20190904085434.GF2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-09-05 18:27                       ` Kenny Ho
2019-09-05 18:28                       ` Kenny Ho
2019-09-05 20:06                         ` Daniel Vetter
     [not found]                           ` <CAKMK7uGSrscs-WAv0pYfcxaUGXvx7M6JYbiPHTY=1hxRbFK1sg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-05 20:20                             ` Kenny Ho
2019-09-05 20:32                               ` Daniel Vetter
     [not found]                                 ` <CAKMK7uHy+GRAcpLDuz6STCBW+GNfNWr-i=ZERF3uqkO7jfynnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-05 21:26                                   ` Kenny Ho
     [not found]                                     ` <CAOWid-cRP1T2gr2U_ZN+QhS7jFM0kFTWiYy8JPPXXmGW7xBPzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-06  9:12                                       ` Daniel Vetter
2019-09-06 15:29                     ` Tejun Heo
2019-09-06 15:36                       ` Daniel Vetter
     [not found]                         ` <CAKMK7uFQqAMB1DbiEy-o2bzr_F25My93imNcg1Qh9DHe=uWQug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-06 15:38                           ` Tejun Heo
2019-08-29  6:05 ` [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem Kenny Ho
     [not found]   ` <20190829060533.32315-3-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-10-01 14:31     ` Michal Koutný
     [not found]       ` <20191001143106.GA4749-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2019-11-29  6:00         ` Kenny Ho
2019-11-29  6:00           ` Kenny Ho
2019-11-29  6:00           ` Kenny Ho
2019-11-29  6:00           ` Kenny Ho
     [not found]           ` <CAOWid-ewvs-c-z_WW+Cx=Jaf0p8ZAwkWCkq2E8Xkj+2HvfNjaA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-12-02 19:14             ` Tejun Heo
2019-12-02 19:14               ` Tejun Heo
2019-12-02 19:14               ` Tejun Heo
2019-12-02 19:14               ` Tejun Heo
2019-08-29  6:05 ` [PATCH RFC v4 03/16] drm, cgroup: Initialize drmcg properties Kenny Ho
2019-08-29  6:05 ` [PATCH RFC v4 07/16] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
     [not found]   ` <20190829060533.32315-8-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-10-01 14:30     ` Michal Koutný
2019-11-29  7:18       ` Kenny Ho
2019-11-29  7:18         ` Kenny Ho
2019-11-29  7:18         ` Kenny Ho
     [not found] ` <20190829060533.32315-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-08-29  6:05   ` [PATCH RFC v4 04/16] drm, cgroup: Add total GEM buffer allocation stats Kenny Ho
2019-08-29  6:05   ` [PATCH RFC v4 05/16] drm, cgroup: Add peak " Kenny Ho
2019-08-29  6:05   ` [PATCH RFC v4 06/16] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
2019-08-29  6:05   ` [PATCH RFC v4 08/16] drm, cgroup: Add peak GEM buffer allocation limit Kenny Ho
2019-08-29  6:05   ` [PATCH RFC v4 09/16] drm, cgroup: Add TTM buffer allocation stats Kenny Ho
2019-08-29  6:05   ` [PATCH RFC v4 10/16] drm, cgroup: Add TTM buffer peak usage stats Kenny Ho
2019-08-29  6:05   ` [PATCH RFC v4 11/16] drm, cgroup: Add per cgroup bw measure and control Kenny Ho
     [not found]     ` <20190829060533.32315-12-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-10-01 14:30       ` Michal Koutný
2019-08-29  6:05   ` [PATCH RFC v4 13/16] drm, cgroup: Allow more aggressive memory reclaim Kenny Ho
2019-08-29  7:08     ` Koenig, Christian
2019-08-29 14:07       ` Kenny Ho
     [not found]         ` <CAOWid-dzJiqjH9+=36fFYh87OKOzToMDcJZpepOWdjoXpBSF8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-08-29 14:12           ` Koenig, Christian
     [not found]             ` <f6963293-bebe-0dca-b509-799f9096ca91-5C7GfCeVMHo@public.gmane.org>
2019-08-29 14:39               ` Kenny Ho
2019-08-29  6:05   ` [PATCH RFC v4 15/16] drm, cgroup: add update trigger after limit change Kenny Ho
2019-08-31  4:28   ` [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Tejun Heo
     [not found]     ` <20190831042857.GD2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2019-09-03  7:55       ` Daniel Vetter
     [not found]         ` <20190903075550.GJ2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-09-03 18:50           ` Tejun Heo
2019-09-03 19:23             ` Kenny Ho
2019-09-03 19:48             ` Daniel Vetter
     [not found]               ` <CAKMK7uE5Bj-3cJH895iqnLpwUV+GBDM1Y=n4Z4A3xervMdJKXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-06 15:23                 ` Tejun Heo
     [not found]                   ` <20190906152320.GM2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2019-09-06 15:34                     ` Daniel Vetter
     [not found]                       ` <CAKMK7uEXP7XLFB2aFU6+E0TH_DepFRkfCoKoHwkXtjZRDyhHig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-09-06 15:45                         ` Tejun Heo
     [not found]                           ` <20190906154539.GP2263813-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2019-09-10 11:54                             ` Michal Hocko
     [not found]                               ` <20190910115448.GT2063-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2019-09-10 16:03                                 ` Tejun Heo
2019-09-10 17:25                                   ` Michal Hocko
2019-09-17 12:21                               ` Daniel Vetter
2019-08-29  6:05 ` [PATCH RFC v4 12/16] drm, cgroup: Add soft VRAM limit Kenny Ho
2019-08-29  6:05 ` [PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource Kenny Ho
     [not found]   ` <20190829060533.32315-15-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-10-08 18:53     ` Kuehling, Felix
     [not found]       ` <b3d2b3c1-8854-10ca-3e39-b3bef35bdfa9-5C7GfCeVMHo@public.gmane.org>
2019-10-09 10:31         ` Daniel Vetter
     [not found]           ` <20191009103153.GU16989-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-10-09 15:08             ` Kenny Ho
     [not found]               ` <CAOWid-fLurBT6-h5WjQsEPA+dq1fgfWztbyZuLV4ypmWH8SC9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-09 15:23                 ` Daniel Vetter
2019-10-09 15:25           ` Kuehling, Felix
2019-10-09 15:25             ` Kuehling, Felix
     [not found]             ` <ee873e89-48fd-c4c9-1ce0-73965f4ad2ba-5C7GfCeVMHo@public.gmane.org>
2019-10-09 15:34               ` Daniel Vetter
2019-10-09 15:34                 ` Daniel Vetter
     [not found]                 ` <20191009153429.GI16989-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-10-09 15:53                   ` Kuehling, Felix
     [not found]                     ` <c7812af4-7ec4-02bb-ff4c-21dd114cf38e-5C7GfCeVMHo@public.gmane.org>
2019-10-09 16:06                       ` Daniel Vetter
2019-10-09 18:52                         ` Greathouse, Joseph
     [not found]                           ` <CY4PR12MB17670EE9EE4A22663EB584E8F9950-rpdhrqHFk07NeWpHaHeGuQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-09 19:07                             ` Daniel Vetter
2019-10-11 17:12                         ` tj
     [not found]                           ` <20191011171247.GC18794-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2019-11-29 20:10                             ` Felix Kuehling
2019-11-29 20:10                               ` Felix Kuehling
2019-11-29 20:10                               ` Felix Kuehling
2019-11-29 20:10                               ` Felix Kuehling
2019-08-29  6:05 ` [PATCH RFC v4 16/16] drm/amdgpu: Integrate with DRM cgroup Kenny Ho
     [not found]   ` <20190829060533.32315-17-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-10-08 19:11     ` Kuehling, Felix
2019-10-08 19:11       ` Kuehling, Felix
     [not found]       ` <04abdc58-ae30-a13d-e7dc-f1020a1400b9-5C7GfCeVMHo@public.gmane.org>
2019-11-29  5:59         ` Kenny Ho
2019-11-29  5:59           ` Kenny Ho
2019-12-02 22:05           ` Greathouse, Joseph
2019-12-03 16:02             ` Kenny Ho
2019-09-03  8:02 ` [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem Daniel Vetter
     [not found]   ` <20190903080217.GL2112-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-09-03  8:24     ` Koenig, Christian
2019-09-03  9:19       ` Daniel Vetter
2019-09-03 19:30         ` Kenny Ho

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.