dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem
@ 2019-06-26 15:05 Kenny Ho
  2019-06-26 15:05 ` [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices Kenny Ho
                   ` (7 more replies)
  0 siblings, 8 replies; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, joseph.greathouse-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc
  Cc: Kenny Ho

This is a follow up to the RFC I made previously to introduce a cgroup
controller for the GPU/DRM subsystem [v1,v2].  The goal is to be able to
provide resource management to GPU resources using things like container.
The cover letter from v1 is copied below for reference.

[v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
[v2]: https://www.spinics.net/lists/cgroups/msg22074.html

v3:
Base on feedbacks on v2:
* removed .help type file from v2
* conform to cgroup convention for default and max handling
* conform to cgroup convention for addressing device specific limits (with major:minor)
New function:
* adopted memparse for memory size related attributes
* added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
* added ttm buffer usage stats (per cgroup, for system, tt, vram.)
* added ttm buffer usage limit (per cgroup, for vram.)
* added per cgroup bandwidth stats and limiting (burst and average bandwidth)

v2:
* Removed the vendoring concepts
* Add limit to total buffer allocation
* Add limit to the maximum size of a buffer allocation

v1: cover letter

The purpose of this patch series is to start a discussion for a generic cgroup
controller for the drm subsystem.  The design proposed here is a very early one.
We are hoping to engage the community as we develop the idea.


Backgrounds
==========
Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
tasks, and all their future children, into hierarchical groups with specialized
behaviour, such as accounting/limiting the resources which processes in a cgroup
can access[1].  Weights, limits, protections, allocations are the main resource
distribution models.  Existing cgroup controllers includes cpu, memory, io,
rdma, and more.  cgroup is one of the foundational technologies that enables the
popular container application deployment and management method.

Direct Rendering Manager/drm contains code intended to support the needs of
complex graphics devices. Graphics drivers in the kernel may make use of DRM
functions to make tasks like memory management, interrupt handling and DMA
easier, and provide a uniform interface to applications.  The DRM has also
developed beyond traditional graphics applications to support compute/GPGPU
applications.


Motivations
=========
As GPU grow beyond the realm of desktop/workstation graphics into areas like
data center clusters and IoT, there are increasing needs to monitor and regulate
GPU as a resource like cpu, memory and io.

Matt Roper from Intel began working on similar idea in early 2018 [2] for the
purpose of managing GPU priority using the cgroup hierarchy.  While that
particular use case may not warrant a standalone drm cgroup controller, there
are other use cases where having one can be useful [3].  Monitoring GPU
resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
(execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
sysadmins get a better understanding of the applications usage profile.  Further
usage regulations of the aforementioned resources can also help sysadmins
optimize workload deployment on limited GPU resources.

With the increased importance of machine learning, data science and other
cloud-based applications, GPUs are already in production use in data centers
today [5,6,7].  Existing GPU resource management is very course grain, however,
as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
alternative is to use GPU virtualization (with or without SRIOV) but it
generally acts on the entire GPU instead of the specific resources in a GPU.
With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
resource management (in addition to what may be available via GPU
virtualization.)

In addition to production use, the DRM cgroup can also help with testing
graphics application robustness by providing a mean to artificially limit DRM
resources availble to the applications.


Challenges
========
While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed some
of the ideas from RDMA cgroup controller.

Approach
=======
To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757

Kenny Ho (11):
  cgroup: Introduce cgroup for drm subsystem
  cgroup: Add mechanism to register DRM devices
  drm/amdgpu: Register AMD devices for DRM cgroup
  drm, cgroup: Add total GEM buffer allocation limit
  drm, cgroup: Add peak GEM buffer allocation limit
  drm, cgroup: Add GEM buffer allocation count stats
  drm, cgroup: Add TTM buffer allocation stats
  drm, cgroup: Add TTM buffer peak usage stats
  drm, cgroup: Add per cgroup bw measure and control
  drm, cgroup: Add soft VRAM limit
  drm, cgroup: Allow more aggressive memory reclaim

 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |    3 +-
 drivers/gpu/drm/drm_gem.c                  |    8 +
 drivers/gpu/drm/drm_prime.c                |    9 +
 drivers/gpu/drm/ttm/ttm_bo.c               |   91 ++
 drivers/gpu/drm/ttm/ttm_bo_util.c          |    4 +
 include/drm/drm_cgroup.h                   |  115 ++
 include/drm/drm_gem.h                      |   11 +
 include/drm/ttm/ttm_bo_api.h               |    2 +
 include/drm/ttm/ttm_bo_driver.h            |   10 +
 include/linux/cgroup_drm.h                 |  114 ++
 include/linux/cgroup_subsys.h              |    4 +
 init/Kconfig                               |    5 +
 kernel/cgroup/Makefile                     |    1 +
 kernel/cgroup/drm.c                        | 1171 ++++++++++++++++++++
 16 files changed, 1555 insertions(+), 1 deletion(-)
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.21.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 01/11] cgroup: Introduce cgroup for drm subsystem
       [not found] ` <20190626150522.11618-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26 15:05   ` Kenny Ho
       [not found]     ` <20190626150522.11618-2-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-06-26 15:05   ` [RFC PATCH v3 05/11] drm, cgroup: Add peak GEM buffer allocation limit Kenny Ho
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, joseph.greathouse-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc
  Cc: Kenny Ho

Change-Id: I6830d3990f63f0c13abeba29b1d330cf28882831
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 include/linux/cgroup_drm.h    | 76 +++++++++++++++++++++++++++++++++++
 include/linux/cgroup_subsys.h |  4 ++
 init/Kconfig                  |  5 +++
 kernel/cgroup/Makefile        |  1 +
 kernel/cgroup/drm.c           | 42 +++++++++++++++++++
 5 files changed, 128 insertions(+)
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
new file mode 100644
index 000000000000..9928e60037a5
--- /dev/null
+++ b/include/linux/cgroup_drm.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef _CGROUP_DRM_H
+#define _CGROUP_DRM_H
+
+#ifdef CONFIG_CGROUP_DRM
+
+#include <linux/cgroup.h>
+
+struct drmcgrp {
+	struct cgroup_subsys_state	css;
+};
+
+static inline struct drmcgrp *css_drmcgrp(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct drmcgrp, css) : NULL;
+}
+
+static inline struct drmcgrp *drmcgrp_from(struct task_struct *task)
+{
+	return css_drmcgrp(task_get_css(task, drm_cgrp_id));
+}
+
+static inline struct drmcgrp *get_drmcgrp(struct task_struct *task)
+{
+	struct cgroup_subsys_state *css = task_get_css(task, drm_cgrp_id);
+
+	if (css)
+		css_get(css);
+
+	return css_drmcgrp(css);
+}
+
+static inline void put_drmcgrp(struct drmcgrp *drmcgrp)
+{
+	if (drmcgrp)
+		css_put(&drmcgrp->css);
+}
+
+static inline struct drmcgrp *parent_drmcgrp(struct drmcgrp *cg)
+{
+	return css_drmcgrp(cg->css.parent);
+}
+
+#else /* CONFIG_CGROUP_DRM */
+
+struct drmcgrp {
+};
+
+static inline struct drmcgrp *css_drmcgrp(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct drmcgrp *drmcgrp_from(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline struct drmcgrp *get_drmcgrp(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void put_drmcgrp(struct drmcgrp *drmcgrp)
+{
+}
+
+static inline struct drmcgrp *parent_drmcgrp(struct drmcgrp *cg)
+{
+	return NULL;
+}
+
+#endif	/* CONFIG_CGROUP_DRM */
+#endif	/* _CGROUP_DRM_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..ddedad809e8b 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,10 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_DRM)
+SUBSYS(drm)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index d47cb77a220e..0b0f112eb23b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -839,6 +839,11 @@ config CGROUP_RDMA
 	  Attaching processes with active RDMA resources to the cgroup
 	  hierarchy is allowed even if can cross the hierarchy's limit.
 
+config CGROUP_DRM
+	bool "DRM controller (EXPERIMENTAL)"
+	help
+	  Provides accounting and enforcement of resources in the DRM subsystem.
+
 config CGROUP_FREEZER
 	bool "Freezer controller"
 	help
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index bfcdae896122..6af14bd93050 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -4,5 +4,6 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o
 obj-$(CONFIG_CGROUP_FREEZER) += freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += pids.o
 obj-$(CONFIG_CGROUP_RDMA) += rdma.o
+obj-$(CONFIG_CGROUP_DRM) += drm.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
new file mode 100644
index 000000000000..66cb1dda023d
--- /dev/null
+++ b/kernel/cgroup/drm.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+#include <linux/slab.h>
+#include <linux/cgroup.h>
+#include <linux/cgroup_drm.h>
+
+static struct drmcgrp *root_drmcgrp __read_mostly;
+
+static void drmcgrp_css_free(struct cgroup_subsys_state *css)
+{
+	struct drmcgrp *drmcgrp = css_drmcgrp(css);
+
+	kfree(drmcgrp);
+}
+
+static struct cgroup_subsys_state *
+drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct drmcgrp *parent = css_drmcgrp(parent_css);
+	struct drmcgrp *drmcgrp;
+
+	drmcgrp = kzalloc(sizeof(struct drmcgrp), GFP_KERNEL);
+	if (!drmcgrp)
+		return ERR_PTR(-ENOMEM);
+
+	if (!parent)
+		root_drmcgrp = drmcgrp;
+
+	return &drmcgrp->css;
+}
+
+struct cftype files[] = {
+	{ }	/* terminate */
+};
+
+struct cgroup_subsys drm_cgrp_subsys = {
+	.css_alloc	= drmcgrp_css_alloc,
+	.css_free	= drmcgrp_css_free,
+	.early_init	= false,
+	.legacy_cftypes	= files,
+	.dfl_cftypes	= files,
+};
-- 
2.21.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices
  2019-06-26 15:05 [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
@ 2019-06-26 15:05 ` Kenny Ho
       [not found]   ` <20190626150522.11618-3-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-06-26 15:05 ` [RFC PATCH v3 03/11] drm/amdgpu: Register AMD devices for DRM cgroup Kenny Ho
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, joseph.greathouse, jsparks, lkaplan
  Cc: Kenny Ho

Change-Id: I908ee6975ea0585e4c30eafde4599f87094d8c65
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 include/drm/drm_cgroup.h   |  24 ++++++++
 include/linux/cgroup_drm.h |  10 ++++
 kernel/cgroup/drm.c        | 116 +++++++++++++++++++++++++++++++++++++
 3 files changed, 150 insertions(+)
 create mode 100644 include/drm/drm_cgroup.h

diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
new file mode 100644
index 000000000000..ddb9eab64360
--- /dev/null
+++ b/include/drm/drm_cgroup.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ */
+#ifndef __DRM_CGROUP_H__
+#define __DRM_CGROUP_H__
+
+#ifdef CONFIG_CGROUP_DRM
+
+int drmcgrp_register_device(struct drm_device *device);
+
+int drmcgrp_unregister_device(struct drm_device *device);
+
+#else
+static inline int drmcgrp_register_device(struct drm_device *device)
+{
+	return 0;
+}
+
+static inline int drmcgrp_unregister_device(struct drm_device *device)
+{
+	return 0;
+}
+#endif /* CONFIG_CGROUP_DRM */
+#endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 9928e60037a5..27497f786c93 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -6,10 +6,20 @@
 
 #ifdef CONFIG_CGROUP_DRM
 
+#include <linux/mutex.h>
 #include <linux/cgroup.h>
+#include <drm/drm_file.h>
+
+/* limit defined per the way drm_minor_alloc operates */
+#define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
+
+struct drmcgrp_device_resource {
+	/* for per device stats */
+};
 
 struct drmcgrp {
 	struct cgroup_subsys_state	css;
+	struct drmcgrp_device_resource	*dev_resources[MAX_DRM_DEV];
 };
 
 static inline struct drmcgrp *css_drmcgrp(struct cgroup_subsys_state *css)
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 66cb1dda023d..7da6e0d93991 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -1,28 +1,99 @@
 // SPDX-License-Identifier: MIT
 // Copyright 2019 Advanced Micro Devices, Inc.
+#include <linux/export.h>
 #include <linux/slab.h>
 #include <linux/cgroup.h>
+#include <linux/fs.h>
+#include <linux/seq_file.h>
+#include <linux/mutex.h>
 #include <linux/cgroup_drm.h>
+#include <linux/kernel.h>
+#include <drm/drm_device.h>
+#include <drm/drm_cgroup.h>
+
+static DEFINE_MUTEX(drmcgrp_mutex);
+
+struct drmcgrp_device {
+	struct drm_device	*dev;
+	struct mutex		mutex;
+};
+
+/* indexed by drm_minor for access speed */
+static struct drmcgrp_device	*known_drmcgrp_devs[MAX_DRM_DEV];
+
+static int max_minor;
+
 
 static struct drmcgrp *root_drmcgrp __read_mostly;
 
 static void drmcgrp_css_free(struct cgroup_subsys_state *css)
 {
 	struct drmcgrp *drmcgrp = css_drmcgrp(css);
+	int i;
+
+	for (i = 0; i <= max_minor; i++) {
+		if (drmcgrp->dev_resources[i] != NULL)
+			kfree(drmcgrp->dev_resources[i]);
+	}
 
 	kfree(drmcgrp);
 }
 
+static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
+{
+	struct drmcgrp_device_resource *ddr = drmcgrp->dev_resources[minor];
+
+	if (ddr == NULL) {
+		ddr = kzalloc(sizeof(struct drmcgrp_device_resource),
+			GFP_KERNEL);
+
+		if (!ddr)
+			return -ENOMEM;
+
+		drmcgrp->dev_resources[minor] = ddr;
+	}
+
+	/* set defaults here */
+
+	return 0;
+}
+
+static inline int init_drmcgrp(struct drmcgrp *drmcgrp, struct drm_device *dev)
+{
+	int rc = 0;
+	int i;
+
+	if (dev != NULL) {
+		rc = init_drmcgrp_single(drmcgrp, dev->primary->index);
+		return rc;
+	}
+
+	for (i = 0; i <= max_minor; i++) {
+		rc = init_drmcgrp_single(drmcgrp, i);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 static struct cgroup_subsys_state *
 drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
 {
 	struct drmcgrp *parent = css_drmcgrp(parent_css);
 	struct drmcgrp *drmcgrp;
+	int rc;
 
 	drmcgrp = kzalloc(sizeof(struct drmcgrp), GFP_KERNEL);
 	if (!drmcgrp)
 		return ERR_PTR(-ENOMEM);
 
+	rc = init_drmcgrp(drmcgrp, NULL);
+	if (rc) {
+		drmcgrp_css_free(&drmcgrp->css);
+		return ERR_PTR(rc);
+	}
+
 	if (!parent)
 		root_drmcgrp = drmcgrp;
 
@@ -40,3 +111,48 @@ struct cgroup_subsys drm_cgrp_subsys = {
 	.legacy_cftypes	= files,
 	.dfl_cftypes	= files,
 };
+
+int drmcgrp_register_device(struct drm_device *dev)
+{
+	struct drmcgrp_device *ddev;
+
+	ddev = kzalloc(sizeof(struct drmcgrp_device), GFP_KERNEL);
+	if (!ddev)
+		return -ENOMEM;
+
+	ddev->dev = dev;
+	mutex_init(&ddev->mutex);
+
+	mutex_lock(&drmcgrp_mutex);
+	known_drmcgrp_devs[dev->primary->index] = ddev;
+	max_minor = max(max_minor, dev->primary->index);
+	mutex_unlock(&drmcgrp_mutex);
+
+	/* init cgroups created before registration (i.e. root cgroup) */
+	if (root_drmcgrp != NULL) {
+		struct cgroup_subsys_state *pos;
+		struct drmcgrp *child;
+
+		rcu_read_lock();
+		css_for_each_descendant_pre(pos, &root_drmcgrp->css) {
+			child = css_drmcgrp(pos);
+			init_drmcgrp(child, dev);
+		}
+		rcu_read_unlock();
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drmcgrp_register_device);
+
+int drmcgrp_unregister_device(struct drm_device *dev)
+{
+	mutex_lock(&drmcgrp_mutex);
+
+	kfree(known_drmcgrp_devs[dev->primary->index]);
+	known_drmcgrp_devs[dev->primary->index] = NULL;
+
+	mutex_unlock(&drmcgrp_mutex);
+	return 0;
+}
+EXPORT_SYMBOL(drmcgrp_unregister_device);
-- 
2.21.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 03/11] drm/amdgpu: Register AMD devices for DRM cgroup
  2019-06-26 15:05 [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
  2019-06-26 15:05 ` [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices Kenny Ho
@ 2019-06-26 15:05 ` Kenny Ho
  2019-06-26 15:05 ` [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, joseph.greathouse, jsparks, lkaplan
  Cc: Kenny Ho

Change-Id: I3750fc657b956b52750a36cb303c54fa6a265b44
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index da7b4fe8ade3..2568fd730161 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -28,6 +28,7 @@
 #include <drm/drmP.h>
 #include "amdgpu.h"
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_cgroup.h>
 #include "amdgpu_sched.h"
 #include "amdgpu_uvd.h"
 #include "amdgpu_vce.h"
@@ -97,6 +98,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 
 	amdgpu_device_fini(adev);
 
+	drmcgrp_unregister_device(dev);
 done_free:
 	kfree(adev);
 	dev->dev_private = NULL;
@@ -141,6 +143,8 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
 	struct amdgpu_device *adev;
 	int r, acpi_status;
 
+	drmcgrp_register_device(dev);
+
 #ifdef CONFIG_DRM_AMDGPU_SI
 	if (!amdgpu_si_support) {
 		switch (flags & AMD_ASIC_MASK) {
-- 
2.21.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
  2019-06-26 15:05 [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
  2019-06-26 15:05 ` [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices Kenny Ho
  2019-06-26 15:05 ` [RFC PATCH v3 03/11] drm/amdgpu: Register AMD devices for DRM cgroup Kenny Ho
@ 2019-06-26 15:05 ` Kenny Ho
       [not found]   ` <20190626150522.11618-5-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
       [not found] ` <20190626150522.11618-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, joseph.greathouse, jsparks, lkaplan
  Cc: Kenny Ho

The drm resource being measured and limited here is the GEM buffer
objects.  User applications allocate and free these buffers.  In
addition, a process can allocate a buffer and share it with another
process.  The consumer of a shared buffer can also outlive the
allocator of the buffer.

For the purpose of cgroup accounting and limiting, ownership of the
buffer is deemed to be the cgroup for which the allocating process
belongs to.  There is one cgroup limit per drm device.

In order to prevent the buffer outliving the cgroup that owns it, a
process is prevented from importing buffers that are not own by the
process' cgroup or the ancestors of the process' cgroup.  In other
words, in order for a buffer to be shared between two cgroups, the
buffer must be created by the common ancestors of the cgroups.

drm.buffer.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Total GEM buffer allocation in bytes.

drm.buffer.default
        A read-only flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Default limits on the total GEM buffer allocation in bytes.

drm.buffer.max
        A read-write flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Per device limits on the total GEM buffer allocation in byte.
        This is a hard limit.  Attempts in allocating beyond the cgroup
        limit will result in ENOMEM.  Shorthand understood by memparse
        (such as k, m, g) can be used.

        Set allocation limit for /dev/dri/card1 to 1GB
        echo "226:1 1g" > drm.buffer.total.max

        Set allocation limit for /dev/dri/card0 to 512MB
        echo "226:0 512m" > drm.buffer.total.max

Change-Id: I4c249d06d45ec709d6481d4cbe87c5168545c5d0
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
 drivers/gpu/drm/drm_gem.c                  |   8 +
 drivers/gpu/drm/drm_prime.c                |   9 +
 include/drm/drm_cgroup.h                   |  34 ++-
 include/drm/drm_gem.h                      |  11 +
 include/linux/cgroup_drm.h                 |   2 +
 kernel/cgroup/drm.c                        | 321 +++++++++++++++++++++
 7 files changed, 387 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 93b2c5a48a71..b4c078b7ad63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -34,6 +34,7 @@
 #include <drm/drmP.h>
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_cache.h>
+#include <drm/drm_cgroup.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -446,6 +447,9 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 	if (!amdgpu_bo_validate_size(adev, size, bp->domain))
 		return -ENOMEM;
 
+	if (!drmcgrp_bo_can_allocate(current, adev->ddev, size))
+		return -ENOMEM;
+
 	*bo_ptr = NULL;
 
 	acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 6a80db077dc6..e20c1034bf2b 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -37,10 +37,12 @@
 #include <linux/shmem_fs.h>
 #include <linux/dma-buf.h>
 #include <linux/mem_encrypt.h>
+#include <linux/cgroup_drm.h>
 #include <drm/drmP.h>
 #include <drm/drm_vma_manager.h>
 #include <drm/drm_gem.h>
 #include <drm/drm_print.h>
+#include <drm/drm_cgroup.h>
 #include "drm_internal.h"
 
 /** @file drm_gem.c
@@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
 	obj->handle_count = 0;
 	obj->size = size;
 	drm_vma_node_reset(&obj->vma_node);
+
+	obj->drmcgrp = get_drmcgrp(current);
+	drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
@@ -804,6 +809,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
 	if (obj->filp)
 		fput(obj->filp);
 
+	drmcgrp_unchg_bo_alloc(obj->drmcgrp, obj->dev, obj->size);
+	put_drmcgrp(obj->drmcgrp);
+
 	drm_gem_free_mmap_offset(obj);
 }
 EXPORT_SYMBOL(drm_gem_object_release);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 231e3f6d5f41..eeb612116810 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -32,6 +32,7 @@
 #include <drm/drm_prime.h>
 #include <drm/drm_gem.h>
 #include <drm/drmP.h>
+#include <drm/drm_cgroup.h>
 
 #include "drm_internal.h"
 
@@ -794,6 +795,7 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
 {
 	struct dma_buf *dma_buf;
 	struct drm_gem_object *obj;
+	struct drmcgrp *drmcgrp = drmcgrp_from(current);
 	int ret;
 
 	dma_buf = dma_buf_get(prime_fd);
@@ -818,6 +820,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
 		goto out_unlock;
 	}
 
+	/* only allow bo from the same cgroup or its ancestor to be imported */
+	if (drmcgrp != NULL &&
+			!drmcgrp_is_self_or_ancestor(drmcgrp, obj->drmcgrp)) {
+		ret = -EACCES;
+		goto out_unlock;
+	}
+
 	if (obj->dma_buf) {
 		WARN_ON(obj->dma_buf != dma_buf);
 	} else {
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index ddb9eab64360..8711b7c5f7bf 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -4,12 +4,20 @@
 #ifndef __DRM_CGROUP_H__
 #define __DRM_CGROUP_H__
 
+#include <linux/cgroup_drm.h>
+
 #ifdef CONFIG_CGROUP_DRM
 
 int drmcgrp_register_device(struct drm_device *device);
-
 int drmcgrp_unregister_device(struct drm_device *device);
-
+bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
+		struct drmcgrp *relative);
+void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
+		size_t size);
+void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
+		size_t size);
+bool drmcgrp_bo_can_allocate(struct task_struct *task, struct drm_device *dev,
+		size_t size);
 #else
 static inline int drmcgrp_register_device(struct drm_device *device)
 {
@@ -20,5 +28,27 @@ static inline int drmcgrp_unregister_device(struct drm_device *device)
 {
 	return 0;
 }
+
+static inline bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
+		struct drmcgrp *relative)
+{
+	return false;
+}
+
+static inline void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp,
+		struct drm_device *dev,	size_t size)
+{
+}
+
+static inline void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp,
+		struct drm_device *dev,	size_t size)
+{
+}
+
+static inline bool drmcgrp_bo_can_allocate(struct task_struct *task,
+		struct drm_device *dev,	size_t size)
+{
+	return true;
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index c95727425284..09d1c69a3f0c 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -272,6 +272,17 @@ struct drm_gem_object {
 	 *
 	 */
 	const struct drm_gem_object_funcs *funcs;
+
+	/**
+	 * @drmcgrp:
+	 *
+	 * DRM cgroup this GEM object belongs to.
+	 *
+	 * This is used to track and limit the amount of GEM objects a user
+	 * can allocate.  Since GEM objects can be shared, this is also used
+	 * to ensure GEM objects are only shared within the same cgroup.
+	 */
+	struct drmcgrp *drmcgrp;
 };
 
 /**
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 27497f786c93..efa019666f1c 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -15,6 +15,8 @@
 
 struct drmcgrp_device_resource {
 	/* for per device stats */
+	s64			bo_stats_total_allocated;
+	s64			bo_limits_total_allocated;
 };
 
 struct drmcgrp {
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 7da6e0d93991..cfc1fe74dca3 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -9,6 +9,7 @@
 #include <linux/cgroup_drm.h>
 #include <linux/kernel.h>
 #include <drm/drm_device.h>
+#include <drm/drm_ioctl.h>
 #include <drm/drm_cgroup.h>
 
 static DEFINE_MUTEX(drmcgrp_mutex);
@@ -16,6 +17,26 @@ static DEFINE_MUTEX(drmcgrp_mutex);
 struct drmcgrp_device {
 	struct drm_device	*dev;
 	struct mutex		mutex;
+
+	s64			bo_limits_total_allocated_default;
+};
+
+#define DRMCG_CTF_PRIV_SIZE 3
+#define DRMCG_CTF_PRIV_MASK GENMASK((DRMCG_CTF_PRIV_SIZE - 1), 0)
+#define DRMCG_CTF_PRIV(res_type, f_type)  ((res_type) <<\
+		DRMCG_CTF_PRIV_SIZE | (f_type))
+#define DRMCG_CTF_PRIV2RESTYPE(priv) ((priv) >> DRMCG_CTF_PRIV_SIZE)
+#define DRMCG_CTF_PRIV2FTYPE(priv) ((priv) & DRMCG_CTF_PRIV_MASK)
+
+
+enum drmcgrp_res_type {
+	DRMCGRP_TYPE_BO_TOTAL,
+};
+
+enum drmcgrp_file_type {
+	DRMCGRP_FTYPE_STATS,
+	DRMCGRP_FTYPE_LIMIT,
+	DRMCGRP_FTYPE_DEFAULT,
 };
 
 /* indexed by drm_minor for access speed */
@@ -54,6 +75,10 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
 	}
 
 	/* set defaults here */
+	if (known_drmcgrp_devs[minor] != NULL) {
+		ddr->bo_limits_total_allocated =
+		  known_drmcgrp_devs[minor]->bo_limits_total_allocated_default;
+	}
 
 	return 0;
 }
@@ -100,7 +125,225 @@ drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
 	return &drmcgrp->css;
 }
 
+static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
+		struct seq_file *sf, enum drmcgrp_res_type type)
+{
+	if (ddr == NULL) {
+		seq_puts(sf, "\n");
+		return;
+	}
+
+	switch (type) {
+	case DRMCGRP_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
+static inline void drmcgrp_print_limits(struct drmcgrp_device_resource *ddr,
+		struct seq_file *sf, enum drmcgrp_res_type type)
+{
+	if (ddr == NULL) {
+		seq_puts(sf, "\n");
+		return;
+	}
+
+	switch (type) {
+	case DRMCGRP_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
+static inline void drmcgrp_print_default(struct drmcgrp_device *ddev,
+		struct seq_file *sf, enum drmcgrp_res_type type)
+{
+	if (ddev == NULL) {
+		seq_puts(sf, "\n");
+		return;
+	}
+
+	switch (type) {
+	case DRMCGRP_TYPE_BO_TOTAL:
+		seq_printf(sf, "%lld\n",
+				ddev->bo_limits_total_allocated_default);
+		break;
+	default:
+		seq_puts(sf, "\n");
+		break;
+	}
+}
+
+int drmcgrp_bo_show(struct seq_file *sf, void *v)
+{
+	struct drmcgrp *drmcgrp = css_drmcgrp(seq_css(sf));
+	struct drmcgrp_device_resource *ddr = NULL;
+	enum drmcgrp_file_type f_type =
+		DRMCG_CTF_PRIV2FTYPE(seq_cft(sf)->private);
+	enum drmcgrp_res_type type =
+		DRMCG_CTF_PRIV2RESTYPE(seq_cft(sf)->private);
+	struct drmcgrp_device *ddev;
+	int i;
+
+	for (i = 0; i <= max_minor; i++) {
+		ddr = drmcgrp->dev_resources[i];
+		ddev = known_drmcgrp_devs[i];
+
+		seq_printf(sf, "%d:%d ", DRM_MAJOR, i);
+
+		switch (f_type) {
+		case DRMCGRP_FTYPE_STATS:
+			drmcgrp_print_stats(ddr, sf, type);
+			break;
+		case DRMCGRP_FTYPE_LIMIT:
+			drmcgrp_print_limits(ddr, sf, type);
+			break;
+		case DRMCGRP_FTYPE_DEFAULT:
+			drmcgrp_print_default(ddev, sf, type);
+			break;
+		default:
+			seq_puts(sf, "\n");
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static inline void drmcgrp_pr_cft_err(const struct drmcgrp *drmcgrp,
+		const char *cft_name, int minor)
+{
+	pr_err("drmcgrp: error parsing %s, minor %d ",
+			cft_name, minor);
+	pr_cont_cgroup_name(drmcgrp->css.cgroup);
+	pr_cont("\n");
+}
+
+static inline int drmcgrp_process_limit_val(char *sval, bool is_mem,
+			s64 def_val, s64 max_val, s64 *ret_val)
+{
+	int rc = strcmp("max", sval);
+
+
+	if (!rc)
+		*ret_val = max_val;
+	else {
+		rc = strcmp("default", sval);
+
+		if (!rc)
+			*ret_val = def_val;
+	}
+
+	if (rc) {
+		if (is_mem) {
+			*ret_val = memparse(sval, NULL);
+			rc = 0;
+		} else {
+			rc = kstrtoll(sval, 0, ret_val);
+		}
+	}
+
+	if (*ret_val > max_val)
+		*ret_val = max_val;
+
+	return rc;
+}
+
+ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file *of, char *buf,
+		size_t nbytes, loff_t off)
+{
+	struct drmcgrp *drmcgrp = css_drmcgrp(of_css(of));
+	struct drmcgrp *parent = parent_drmcgrp(drmcgrp);
+	enum drmcgrp_res_type type =
+		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
+	char *cft_name = of_cft(of)->name;
+	char *limits = strstrip(buf);
+	struct drmcgrp_device *ddev;
+	struct drmcgrp_device_resource *ddr;
+	char *line;
+	char sattr[256];
+	s64 val;
+	s64 p_max;
+	int rc;
+	int minor;
+
+	while (limits != NULL) {
+		line =  strsep(&limits, "\n");
+
+		if (sscanf(line,
+			__stringify(DRM_MAJOR)":%u %255[^\t\n]",
+							&minor, sattr) != 2) {
+			pr_err("drmcgrp: error parsing %s ", cft_name);
+			pr_cont_cgroup_name(drmcgrp->css.cgroup);
+			pr_cont("\n");
+
+			continue;
+		}
+
+		if (minor < 0 || minor > max_minor) {
+			pr_err("drmcgrp: invalid minor %d for %s ",
+					minor, cft_name);
+			pr_cont_cgroup_name(drmcgrp->css.cgroup);
+			pr_cont("\n");
+
+			continue;
+		}
+
+		ddr = drmcgrp->dev_resources[minor];
+		ddev = known_drmcgrp_devs[minor];
+		switch (type) {
+		case DRMCGRP_TYPE_BO_TOTAL:
+			p_max = parent == NULL ? S64_MAX :
+				parent->dev_resources[minor]->
+				bo_limits_total_allocated;
+
+			rc = drmcgrp_process_limit_val(sattr, true,
+				ddev->bo_limits_total_allocated_default,
+				p_max,
+				&val);
+
+			if (rc || val < 0) {
+				drmcgrp_pr_cft_err(drmcgrp, cft_name, minor);
+				continue;
+			}
+
+			ddr->bo_limits_total_allocated = val;
+			break;
+		default:
+			break;
+		}
+	}
+
+	return nbytes;
+}
+
 struct cftype files[] = {
+	{
+		.name = "buffer.total.stats",
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_TOTAL,
+						DRMCGRP_FTYPE_STATS),
+	},
+	{
+		.name = "buffer.total.default",
+		.seq_show = drmcgrp_bo_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_TOTAL,
+						DRMCGRP_FTYPE_DEFAULT),
+	},
+	{
+		.name = "buffer.total.max",
+		.write = drmcgrp_bo_limit_write,
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_TOTAL,
+						DRMCGRP_FTYPE_LIMIT),
+	},
 	{ }	/* terminate */
 };
 
@@ -121,6 +364,8 @@ int drmcgrp_register_device(struct drm_device *dev)
 		return -ENOMEM;
 
 	ddev->dev = dev;
+	ddev->bo_limits_total_allocated_default = S64_MAX;
+
 	mutex_init(&ddev->mutex);
 
 	mutex_lock(&drmcgrp_mutex);
@@ -156,3 +401,79 @@ int drmcgrp_unregister_device(struct drm_device *dev)
 	return 0;
 }
 EXPORT_SYMBOL(drmcgrp_unregister_device);
+
+bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self, struct drmcgrp *relative)
+{
+	for (; self != NULL; self = parent_drmcgrp(self))
+		if (self == relative)
+			return true;
+
+	return false;
+}
+EXPORT_SYMBOL(drmcgrp_is_self_or_ancestor);
+
+bool drmcgrp_bo_can_allocate(struct task_struct *task, struct drm_device *dev,
+		size_t size)
+{
+	struct drmcgrp *drmcgrp = drmcgrp_from(task);
+	struct drmcgrp_device_resource *ddr;
+	struct drmcgrp_device_resource *d;
+	int devIdx = dev->primary->index;
+	bool result = true;
+	s64 delta = 0;
+
+	if (drmcgrp == NULL || drmcgrp == root_drmcgrp)
+		return true;
+
+	ddr = drmcgrp->dev_resources[devIdx];
+	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
+	for ( ; drmcgrp != root_drmcgrp; drmcgrp = parent_drmcgrp(drmcgrp)) {
+		d = drmcgrp->dev_resources[devIdx];
+		delta = d->bo_limits_total_allocated -
+				d->bo_stats_total_allocated;
+
+		if (delta <= 0 || size > delta) {
+			result = false;
+			break;
+		}
+	}
+	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
+
+	return result;
+}
+EXPORT_SYMBOL(drmcgrp_bo_can_allocate);
+
+void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
+		size_t size)
+{
+	struct drmcgrp_device_resource *ddr;
+	int devIdx = dev->primary->index;
+
+	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
+		return;
+
+	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
+	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
+		ddr = drmcgrp->dev_resources[devIdx];
+
+		ddr->bo_stats_total_allocated += (s64)size;
+	}
+	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
+}
+EXPORT_SYMBOL(drmcgrp_chg_bo_alloc);
+
+void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
+		size_t size)
+{
+	int devIdx = dev->primary->index;
+
+	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
+		return;
+
+	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
+	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp))
+		drmcgrp->dev_resources[devIdx]->bo_stats_total_allocated
+			-= (s64)size;
+	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
+}
+EXPORT_SYMBOL(drmcgrp_unchg_bo_alloc);
-- 
2.21.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 05/11] drm, cgroup: Add peak GEM buffer allocation limit
       [not found] ` <20190626150522.11618-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-06-26 15:05   ` [RFC PATCH v3 01/11] cgroup: Introduce cgroup for drm subsystem Kenny Ho
@ 2019-06-26 15:05   ` Kenny Ho
  2019-06-26 15:05   ` [RFC PATCH v3 06/11] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, joseph.greathouse-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc
  Cc: Kenny Ho

drm.buffer.peak.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Largest GEM buffer allocated in bytes.

drm.buffer.peak.default
        A read-only flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Default limits on the largest GEM buffer allocation in bytes.

drm.buffer.peak.max
        A read-write flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Per device limits on the largest GEM buffer allocation in bytes.
        This is a hard limit.  Attempts in allocating beyond the cgroup
        limit will result in ENOMEM.  Shorthand understood by memparse
        (such as k, m, g) can be used.

        Set largest allocation for /dev/dri/card1 to 4MB
        echo "226:1 4m" > drm.buffer.peak.max

Change-Id: I0830d56775568e1cf215b56cc892d5e7945e9f25
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 include/linux/cgroup_drm.h |  3 ++
 kernel/cgroup/drm.c        | 61 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index efa019666f1c..126c156ffd70 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -17,6 +17,9 @@ struct drmcgrp_device_resource {
 	/* for per device stats */
 	s64			bo_stats_total_allocated;
 	s64			bo_limits_total_allocated;
+
+	size_t			bo_stats_peak_allocated;
+	size_t			bo_limits_peak_allocated;
 };
 
 struct drmcgrp {
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index cfc1fe74dca3..265008197654 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -19,6 +19,7 @@ struct drmcgrp_device {
 	struct mutex		mutex;
 
 	s64			bo_limits_total_allocated_default;
+	size_t			bo_limits_peak_allocated_default;
 };
 
 #define DRMCG_CTF_PRIV_SIZE 3
@@ -31,6 +32,7 @@ struct drmcgrp_device {
 
 enum drmcgrp_res_type {
 	DRMCGRP_TYPE_BO_TOTAL,
+	DRMCGRP_TYPE_BO_PEAK,
 };
 
 enum drmcgrp_file_type {
@@ -78,6 +80,9 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
 	if (known_drmcgrp_devs[minor] != NULL) {
 		ddr->bo_limits_total_allocated =
 		  known_drmcgrp_devs[minor]->bo_limits_total_allocated_default;
+
+		ddr->bo_limits_peak_allocated =
+		  known_drmcgrp_devs[minor]->bo_limits_peak_allocated_default;
 	}
 
 	return 0;
@@ -137,6 +142,9 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
 	case DRMCGRP_TYPE_BO_TOTAL:
 		seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
 		break;
+	case DRMCGRP_TYPE_BO_PEAK:
+		seq_printf(sf, "%zu\n", ddr->bo_stats_peak_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -155,6 +163,9 @@ static inline void drmcgrp_print_limits(struct drmcgrp_device_resource *ddr,
 	case DRMCGRP_TYPE_BO_TOTAL:
 		seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
 		break;
+	case DRMCGRP_TYPE_BO_PEAK:
+		seq_printf(sf, "%zu\n", ddr->bo_limits_peak_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -174,6 +185,10 @@ static inline void drmcgrp_print_default(struct drmcgrp_device *ddev,
 		seq_printf(sf, "%lld\n",
 				ddev->bo_limits_total_allocated_default);
 		break;
+	case DRMCGRP_TYPE_BO_PEAK:
+		seq_printf(sf, "%zu\n",
+				ddev->bo_limits_peak_allocated_default);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -315,6 +330,23 @@ ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file *of, char *buf,
 
 			ddr->bo_limits_total_allocated = val;
 			break;
+		case DRMCGRP_TYPE_BO_PEAK:
+			p_max = parent == NULL ? SIZE_MAX :
+				parent->dev_resources[minor]->
+				bo_limits_peak_allocated;
+
+			rc = drmcgrp_process_limit_val(sattr, true,
+				ddev->bo_limits_peak_allocated_default,
+				p_max,
+				&val);
+
+			if (rc || val < 0) {
+				drmcgrp_pr_cft_err(drmcgrp, cft_name, minor);
+				continue;
+			}
+
+			ddr->bo_limits_peak_allocated = val;
+			break;
 		default:
 			break;
 		}
@@ -344,6 +376,26 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_TOTAL,
 						DRMCGRP_FTYPE_LIMIT),
 	},
+	{
+		.name = "buffer.peak.stats",
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_PEAK,
+						DRMCGRP_FTYPE_STATS),
+	},
+	{
+		.name = "buffer.peak.default",
+		.seq_show = drmcgrp_bo_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_PEAK,
+						DRMCGRP_FTYPE_DEFAULT),
+	},
+	{
+		.name = "buffer.peak.max",
+		.write = drmcgrp_bo_limit_write,
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_PEAK,
+						DRMCGRP_FTYPE_LIMIT),
+	},
 	{ }	/* terminate */
 };
 
@@ -365,6 +417,7 @@ int drmcgrp_register_device(struct drm_device *dev)
 
 	ddev->dev = dev;
 	ddev->bo_limits_total_allocated_default = S64_MAX;
+	ddev->bo_limits_peak_allocated_default = SIZE_MAX;
 
 	mutex_init(&ddev->mutex);
 
@@ -436,6 +489,11 @@ bool drmcgrp_bo_can_allocate(struct task_struct *task, struct drm_device *dev,
 			result = false;
 			break;
 		}
+
+		if (d->bo_limits_peak_allocated < size) {
+			result = false;
+			break;
+		}
 	}
 	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
 
@@ -457,6 +515,9 @@ void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
 		ddr = drmcgrp->dev_resources[devIdx];
 
 		ddr->bo_stats_total_allocated += (s64)size;
+
+		if (ddr->bo_stats_peak_allocated < (size_t)size)
+			ddr->bo_stats_peak_allocated = (size_t)size;
 	}
 	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
 }
-- 
2.21.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 06/11] drm, cgroup: Add GEM buffer allocation count stats
       [not found] ` <20190626150522.11618-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-06-26 15:05   ` [RFC PATCH v3 01/11] cgroup: Introduce cgroup for drm subsystem Kenny Ho
  2019-06-26 15:05   ` [RFC PATCH v3 05/11] drm, cgroup: Add peak GEM buffer allocation limit Kenny Ho
@ 2019-06-26 15:05   ` Kenny Ho
  2019-06-26 15:05   ` [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control Kenny Ho
  2019-06-26 15:05   ` [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim Kenny Ho
  4 siblings, 0 replies; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, joseph.greathouse-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc
  Cc: Kenny Ho

drm.buffer.count.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Total number of GEM buffer allocated.

Change-Id: Id3e1809d5fee8562e47a7d2b961688956d844ec6
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 include/linux/cgroup_drm.h |  2 ++
 kernel/cgroup/drm.c        | 23 ++++++++++++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 126c156ffd70..e4400b21ab8e 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -20,6 +20,8 @@ struct drmcgrp_device_resource {
 
 	size_t			bo_stats_peak_allocated;
 	size_t			bo_limits_peak_allocated;
+
+	s64			bo_stats_count_allocated;
 };
 
 struct drmcgrp {
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 265008197654..9144f93b851f 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -33,6 +33,7 @@ struct drmcgrp_device {
 enum drmcgrp_res_type {
 	DRMCGRP_TYPE_BO_TOTAL,
 	DRMCGRP_TYPE_BO_PEAK,
+	DRMCGRP_TYPE_BO_COUNT,
 };
 
 enum drmcgrp_file_type {
@@ -145,6 +146,9 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
 	case DRMCGRP_TYPE_BO_PEAK:
 		seq_printf(sf, "%zu\n", ddr->bo_stats_peak_allocated);
 		break;
+	case DRMCGRP_TYPE_BO_COUNT:
+		seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -396,6 +400,12 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_PEAK,
 						DRMCGRP_FTYPE_LIMIT),
 	},
+	{
+		.name = "buffer.count.stats",
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_COUNT,
+						DRMCGRP_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -518,6 +528,8 @@ void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
 
 		if (ddr->bo_stats_peak_allocated < (size_t)size)
 			ddr->bo_stats_peak_allocated = (size_t)size;
+
+		ddr->bo_stats_count_allocated++;
 	}
 	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
 }
@@ -526,15 +538,20 @@ EXPORT_SYMBOL(drmcgrp_chg_bo_alloc);
 void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
 		size_t size)
 {
+	struct drmcgrp_device_resource *ddr;
 	int devIdx = dev->primary->index;
 
 	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
 		return;
 
 	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
-	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp))
-		drmcgrp->dev_resources[devIdx]->bo_stats_total_allocated
-			-= (s64)size;
+	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
+		ddr = drmcgrp->dev_resources[devIdx];
+
+		ddr->bo_stats_total_allocated -= (s64)size;
+
+		ddr->bo_stats_count_allocated--;
+	}
 	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
 }
 EXPORT_SYMBOL(drmcgrp_unchg_bo_alloc);
-- 
2.21.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
  2019-06-26 15:05 [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (3 preceding siblings ...)
       [not found] ` <20190626150522.11618-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26 15:05 ` Kenny Ho
  2019-06-26 16:12   ` Daniel Vetter
  2019-06-26 15:05 ` [RFC PATCH v3 08/11] drm, cgroup: Add TTM buffer peak usage stats Kenny Ho
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, joseph.greathouse, jsparks, lkaplan
  Cc: Kenny Ho

The drm resource being measured is the TTM (Translation Table Manager)
buffers.  TTM manages different types of memory that a GPU might access.
These memory types include dedicated Video RAM (VRAM) and host/system
memory accessible through IOMMU (GART/GTT).  TTM is currently used by
multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
nouveau, qxl, virtio, vmwgfx.)

drm.memory.stats
        A read-only nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ======         =============================================
          system         Host/system memory
          tt             Host memory used by the drm device (GTT/GART)
          vram           Video RAM used by the drm device
          priv           Other drm device, vendor specific memory
          ======         =============================================

        Reading returns the following::

        226:0 system=0 tt=0 vram=0 priv=0
        226:1 system=0 tt=9035776 vram=17768448 priv=16809984
        226:2 system=0 tt=9035776 vram=17768448 priv=16809984

drm.memory.evict.stats
        A read-only flat-keyed file which exists on all cgroups.  Each
        entry is keyed by the drm device's major:minor.

        Total number of evictions.

Change-Id: Ice2c4cc845051229549bebeb6aa2d7d6153bdf6a
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c            |  30 +++++++
 drivers/gpu/drm/ttm/ttm_bo_util.c       |   4 +
 include/drm/drm_cgroup.h                |  19 ++++
 include/drm/ttm/ttm_bo_api.h            |   2 +
 include/drm/ttm/ttm_bo_driver.h         |   8 ++
 include/linux/cgroup_drm.h              |   4 +
 kernel/cgroup/drm.c                     | 113 ++++++++++++++++++++++++
 8 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e9ecc3953673..a8dfc78ed45f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1678,8 +1678,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 	mutex_init(&adev->mman.gtt_window_lock);
 
 	/* No others user of address space so set it to 0 */
-	r = ttm_bo_device_init(&adev->mman.bdev,
+	r = ttm_bo_device_init_tmp(&adev->mman.bdev,
 			       &amdgpu_bo_driver,
+			       adev->ddev,
 			       adev->ddev->anon_inode->i_mapping,
 			       adev->need_dma32);
 	if (r) {
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 2845fceb2fbd..e9f70547f0ad 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -34,6 +34,7 @@
 #include <drm/ttm/ttm_module.h>
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
+#include <drm/drm_cgroup.h>
 #include <linux/jiffies.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
@@ -42,6 +43,7 @@
 #include <linux/module.h>
 #include <linux/atomic.h>
 #include <linux/reservation.h>
+#include <linux/cgroup_drm.h>
 
 static void ttm_bo_global_kobj_release(struct kobject *kobj);
 
@@ -151,6 +153,10 @@ static void ttm_bo_release_list(struct kref *list_kref)
 	struct ttm_bo_device *bdev = bo->bdev;
 	size_t acc_size = bo->acc_size;
 
+	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+		drmcgrp_unchg_mem(bo);
+	put_drmcgrp(bo->drmcgrp);
+
 	BUG_ON(kref_read(&bo->list_kref));
 	BUG_ON(kref_read(&bo->kref));
 	BUG_ON(atomic_read(&bo->cpu_writers));
@@ -353,6 +359,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 		if (bo->mem.mem_type == TTM_PL_SYSTEM) {
 			if (bdev->driver->move_notify)
 				bdev->driver->move_notify(bo, evict, mem);
+			if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+				drmcgrp_mem_track_move(bo, evict, mem);
 			bo->mem = *mem;
 			mem->mm_node = NULL;
 			goto moved;
@@ -361,6 +369,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 
 	if (bdev->driver->move_notify)
 		bdev->driver->move_notify(bo, evict, mem);
+	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+		drmcgrp_mem_track_move(bo, evict, mem);
 
 	if (!(old_man->flags & TTM_MEMTYPE_FLAG_FIXED) &&
 	    !(new_man->flags & TTM_MEMTYPE_FLAG_FIXED))
@@ -374,6 +384,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 		if (bdev->driver->move_notify) {
 			swap(*mem, bo->mem);
 			bdev->driver->move_notify(bo, false, mem);
+			if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+				drmcgrp_mem_track_move(bo, evict, mem);
 			swap(*mem, bo->mem);
 		}
 
@@ -1275,6 +1287,10 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
 		WARN_ON(!locked);
 	}
 
+	bo->drmcgrp = get_drmcgrp(current);
+	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+		drmcgrp_chg_mem(bo);
+
 	if (likely(!ret))
 		ret = ttm_bo_validate(bo, placement, ctx);
 
@@ -1666,6 +1682,20 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
 }
 EXPORT_SYMBOL(ttm_bo_device_init);
 
+/* TODO merge with official function when implementation finalized*/
+int ttm_bo_device_init_tmp(struct ttm_bo_device *bdev,
+		struct ttm_bo_driver *driver,
+		struct drm_device *ddev,
+		struct address_space *mapping,
+		bool need_dma32)
+{
+	int ret = ttm_bo_device_init(bdev, driver, mapping, need_dma32);
+
+	bdev->ddev = ddev;
+	return ret;
+}
+EXPORT_SYMBOL(ttm_bo_device_init_tmp);
+
 /*
  * buffer object vm functions.
  */
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 895d77d799e4..4ed7847c21f4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -32,6 +32,7 @@
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
 #include <drm/drm_vma_manager.h>
+#include <drm/drm_cgroup.h>
 #include <linux/io.h>
 #include <linux/highmem.h>
 #include <linux/wait.h>
@@ -522,6 +523,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 	ret = reservation_object_trylock(fbo->base.resv);
 	WARN_ON(!ret);
 
+	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
+		drmcgrp_chg_mem(bo);
+
 	*new_obj = &fbo->base;
 	return 0;
 }
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 8711b7c5f7bf..48ab5450cf17 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -5,6 +5,7 @@
 #define __DRM_CGROUP_H__
 
 #include <linux/cgroup_drm.h>
+#include <drm/ttm/ttm_bo_api.h>
 
 #ifdef CONFIG_CGROUP_DRM
 
@@ -18,6 +19,11 @@ void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
 		size_t size);
 bool drmcgrp_bo_can_allocate(struct task_struct *task, struct drm_device *dev,
 		size_t size);
+void drmcgrp_chg_mem(struct ttm_buffer_object *tbo);
+void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo);
+void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
+		struct ttm_mem_reg *new_mem);
+
 #else
 static inline int drmcgrp_register_device(struct drm_device *device)
 {
@@ -50,5 +56,18 @@ static inline bool drmcgrp_bo_can_allocate(struct task_struct *task,
 {
 	return true;
 }
+
+static inline void drmcgrp_chg_mem(struct ttm_buffer_object *tbo)
+{
+}
+
+static inline void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo)
+{
+}
+
+static inline void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo,
+		bool evict, struct ttm_mem_reg *new_mem)
+{
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 49d9cdfc58f2..ae1bb6daec81 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -128,6 +128,7 @@ struct ttm_tt;
  * struct ttm_buffer_object
  *
  * @bdev: Pointer to the buffer object device structure.
+ * @drmcgrp: DRM cgroup this object belongs to.
  * @type: The bo type.
  * @destroy: Destruction function. If NULL, kfree is used.
  * @num_pages: Actual number of pages.
@@ -174,6 +175,7 @@ struct ttm_buffer_object {
 	 */
 
 	struct ttm_bo_device *bdev;
+	struct drmcgrp *drmcgrp;
 	enum ttm_bo_type type;
 	void (*destroy) (struct ttm_buffer_object *);
 	unsigned long num_pages;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index c008346c2401..4cbcb41e5aa9 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -30,6 +30,7 @@
 #ifndef _TTM_BO_DRIVER_H_
 #define _TTM_BO_DRIVER_H_
 
+#include <drm/drm_device.h>
 #include <drm/drm_mm.h>
 #include <drm/drm_vma_manager.h>
 #include <linux/workqueue.h>
@@ -442,6 +443,7 @@ extern struct ttm_bo_global {
  * @driver: Pointer to a struct ttm_bo_driver struct setup by the driver.
  * @man: An array of mem_type_managers.
  * @vma_manager: Address space manager
+ * @ddev: Pointer to struct drm_device that this ttm_bo_device belongs to
  * lru_lock: Spinlock that protects the buffer+device lru lists and
  * ddestroy lists.
  * @dev_mapping: A pointer to the struct address_space representing the
@@ -460,6 +462,7 @@ struct ttm_bo_device {
 	struct ttm_bo_global *glob;
 	struct ttm_bo_driver *driver;
 	struct ttm_mem_type_manager man[TTM_NUM_MEM_TYPES];
+	struct drm_device *ddev;
 
 	/*
 	 * Protected by internal locks.
@@ -598,6 +601,11 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
 		       struct address_space *mapping,
 		       bool need_dma32);
 
+int ttm_bo_device_init_tmp(struct ttm_bo_device *bdev,
+		       struct ttm_bo_driver *driver,
+		       struct drm_device *ddev,
+		       struct address_space *mapping,
+		       bool need_dma32);
 /**
  * ttm_bo_unmap_virtual
  *
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index e4400b21ab8e..141bea06f74c 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -9,6 +9,7 @@
 #include <linux/mutex.h>
 #include <linux/cgroup.h>
 #include <drm/drm_file.h>
+#include <drm/ttm/ttm_placement.h>
 
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
@@ -22,6 +23,9 @@ struct drmcgrp_device_resource {
 	size_t			bo_limits_peak_allocated;
 
 	s64			bo_stats_count_allocated;
+
+	s64			mem_stats[TTM_PL_PRIV+1];
+	s64			mem_stats_evict;
 };
 
 struct drmcgrp {
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 9144f93b851f..5aee42a628c1 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -8,6 +8,8 @@
 #include <linux/mutex.h>
 #include <linux/cgroup_drm.h>
 #include <linux/kernel.h>
+#include <drm/ttm/ttm_bo_api.h>
+#include <drm/ttm/ttm_bo_driver.h>
 #include <drm/drm_device.h>
 #include <drm/drm_ioctl.h>
 #include <drm/drm_cgroup.h>
@@ -34,6 +36,8 @@ enum drmcgrp_res_type {
 	DRMCGRP_TYPE_BO_TOTAL,
 	DRMCGRP_TYPE_BO_PEAK,
 	DRMCGRP_TYPE_BO_COUNT,
+	DRMCGRP_TYPE_MEM,
+	DRMCGRP_TYPE_MEM_EVICT,
 };
 
 enum drmcgrp_file_type {
@@ -42,6 +46,13 @@ enum drmcgrp_file_type {
 	DRMCGRP_FTYPE_DEFAULT,
 };
 
+static char const *ttm_placement_names[] = {
+	[TTM_PL_SYSTEM] = "system",
+	[TTM_PL_TT]     = "tt",
+	[TTM_PL_VRAM]   = "vram",
+	[TTM_PL_PRIV]   = "priv",
+};
+
 /* indexed by drm_minor for access speed */
 static struct drmcgrp_device	*known_drmcgrp_devs[MAX_DRM_DEV];
 
@@ -134,6 +145,7 @@ drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
 static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
 		struct seq_file *sf, enum drmcgrp_res_type type)
 {
+	int i;
 	if (ddr == NULL) {
 		seq_puts(sf, "\n");
 		return;
@@ -149,6 +161,16 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
 	case DRMCGRP_TYPE_BO_COUNT:
 		seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
 		break;
+	case DRMCGRP_TYPE_MEM:
+		for (i = 0; i <= TTM_PL_PRIV; i++) {
+			seq_printf(sf, "%s=%lld ", ttm_placement_names[i],
+					ddr->mem_stats[i]);
+		}
+		seq_puts(sf, "\n");
+		break;
+	case DRMCGRP_TYPE_MEM_EVICT:
+		seq_printf(sf, "%lld\n", ddr->mem_stats_evict);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -406,6 +428,18 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_COUNT,
 						DRMCGRP_FTYPE_STATS),
 	},
+	{
+		.name = "memory.stats",
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM,
+						DRMCGRP_FTYPE_STATS),
+	},
+	{
+		.name = "memory.evict.stats",
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_EVICT,
+						DRMCGRP_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -555,3 +589,82 @@ void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
 	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
 }
 EXPORT_SYMBOL(drmcgrp_unchg_bo_alloc);
+
+void drmcgrp_chg_mem(struct ttm_buffer_object *tbo)
+{
+	struct drm_device *dev = tbo->bdev->ddev;
+	struct drmcgrp *drmcgrp = tbo->drmcgrp;
+	int devIdx = dev->primary->index;
+	s64 size = (s64)(tbo->mem.size);
+	int mem_type = tbo->mem.mem_type;
+	struct drmcgrp_device_resource *ddr;
+
+	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
+		return;
+
+	mem_type = mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : mem_type;
+
+	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
+	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
+		ddr = drmcgrp->dev_resources[devIdx];
+		ddr->mem_stats[mem_type] += size;
+	}
+	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
+}
+EXPORT_SYMBOL(drmcgrp_chg_mem);
+
+void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo)
+{
+	struct drm_device *dev = tbo->bdev->ddev;
+	struct drmcgrp *drmcgrp = tbo->drmcgrp;
+	int devIdx = dev->primary->index;
+	s64 size = (s64)(tbo->mem.size);
+	int mem_type = tbo->mem.mem_type;
+	struct drmcgrp_device_resource *ddr;
+
+	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
+		return;
+
+	mem_type = mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : mem_type;
+
+	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
+	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
+		ddr = drmcgrp->dev_resources[devIdx];
+		ddr->mem_stats[mem_type] -= size;
+	}
+	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
+}
+EXPORT_SYMBOL(drmcgrp_unchg_mem);
+
+void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
+		struct ttm_mem_reg *new_mem)
+{
+	struct drm_device *dev = old_bo->bdev->ddev;
+	struct drmcgrp *drmcgrp = old_bo->drmcgrp;
+	s64 move_in_bytes = (s64)(old_bo->mem.size);
+	int devIdx = dev->primary->index;
+	int old_mem_type = old_bo->mem.mem_type;
+	int new_mem_type = new_mem->mem_type;
+	struct drmcgrp_device_resource *ddr;
+	struct drmcgrp_device *known_dev;
+
+	known_dev = known_drmcgrp_devs[devIdx];
+
+	if (drmcgrp == NULL || known_dev == NULL)
+		return;
+
+	old_mem_type = old_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : old_mem_type;
+	new_mem_type = new_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : new_mem_type;
+
+	mutex_lock(&known_dev->mutex);
+	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
+		ddr = drmcgrp->dev_resources[devIdx];
+		ddr->mem_stats[old_mem_type] -= move_in_bytes;
+		ddr->mem_stats[new_mem_type] += move_in_bytes;
+
+		if (evict)
+			ddr->mem_stats_evict++;
+	}
+	mutex_unlock(&known_dev->mutex);
+}
+EXPORT_SYMBOL(drmcgrp_mem_track_move);
-- 
2.21.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 08/11] drm, cgroup: Add TTM buffer peak usage stats
  2019-06-26 15:05 [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (4 preceding siblings ...)
  2019-06-26 15:05 ` [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats Kenny Ho
@ 2019-06-26 15:05 ` Kenny Ho
  2019-06-26 16:16   ` Daniel Vetter
  2019-06-26 15:05 ` [RFC PATCH v3 10/11] drm, cgroup: Add soft VRAM limit Kenny Ho
  2019-06-27  7:24 ` [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Daniel Vetter
  7 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, joseph.greathouse, jsparks, lkaplan
  Cc: Kenny Ho

drm.memory.peak.stats
        A read-only nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ======         ==============================================
          system         Peak host memory used
          tt             Peak host memory used by the device (GTT/GART)
          vram           Peak Video RAM used by the drm device
          priv           Other drm device specific memory peak usage
          ======         ==============================================

        Reading returns the following::

        226:0 system=0 tt=0 vram=0 priv=0
        226:1 system=0 tt=9035776 vram=17768448 priv=16809984
        226:2 system=0 tt=9035776 vram=17768448 priv=16809984

Change-Id: I986e44533848f66411465bdd52105e78105a709a
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 include/linux/cgroup_drm.h |  1 +
 kernel/cgroup/drm.c        | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 141bea06f74c..922529641df5 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -25,6 +25,7 @@ struct drmcgrp_device_resource {
 	s64			bo_stats_count_allocated;
 
 	s64			mem_stats[TTM_PL_PRIV+1];
+	s64			mem_peaks[TTM_PL_PRIV+1];
 	s64			mem_stats_evict;
 };
 
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 5aee42a628c1..5f5fa6a2b068 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -38,6 +38,7 @@ enum drmcgrp_res_type {
 	DRMCGRP_TYPE_BO_COUNT,
 	DRMCGRP_TYPE_MEM,
 	DRMCGRP_TYPE_MEM_EVICT,
+	DRMCGRP_TYPE_MEM_PEAK,
 };
 
 enum drmcgrp_file_type {
@@ -171,6 +172,13 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
 	case DRMCGRP_TYPE_MEM_EVICT:
 		seq_printf(sf, "%lld\n", ddr->mem_stats_evict);
 		break;
+	case DRMCGRP_TYPE_MEM_PEAK:
+		for (i = 0; i <= TTM_PL_PRIV; i++) {
+			seq_printf(sf, "%s=%lld ", ttm_placement_names[i],
+					ddr->mem_peaks[i]);
+		}
+		seq_puts(sf, "\n");
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -440,6 +448,12 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_EVICT,
 						DRMCGRP_FTYPE_STATS),
 	},
+	{
+		.name = "memory.peaks.stats",
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_PEAK,
+						DRMCGRP_FTYPE_STATS),
+	},
 	{ }	/* terminate */
 };
 
@@ -608,6 +622,8 @@ void drmcgrp_chg_mem(struct ttm_buffer_object *tbo)
 	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
 		ddr = drmcgrp->dev_resources[devIdx];
 		ddr->mem_stats[mem_type] += size;
+		ddr->mem_peaks[mem_type] = max(ddr->mem_peaks[mem_type],
+				ddr->mem_stats[mem_type]);
 	}
 	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
 }
@@ -662,6 +678,10 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 		ddr->mem_stats[old_mem_type] -= move_in_bytes;
 		ddr->mem_stats[new_mem_type] += move_in_bytes;
 
+		ddr->mem_peaks[new_mem_type] = max(
+				ddr->mem_peaks[new_mem_type],
+				ddr->mem_stats[new_mem_type]);
+
 		if (evict)
 			ddr->mem_stats_evict++;
 	}
-- 
2.21.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control
       [not found] ` <20190626150522.11618-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2019-06-26 15:05   ` [RFC PATCH v3 06/11] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
@ 2019-06-26 15:05   ` Kenny Ho
       [not found]     ` <20190626150522.11618-10-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  2019-06-26 15:05   ` [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim Kenny Ho
  4 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, joseph.greathouse-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc
  Cc: Kenny Ho

The bandwidth is measured by keeping track of the amount of bytes moved
by ttm within a time period.  We defined two type of bandwidth: burst
and average.  Average bandwidth is calculated by dividing the total
amount of bytes moved within a cgroup by the lifetime of the cgroup.
Burst bandwidth is similar except that the byte and time measurement is
reset after a user configurable period.

The bandwidth control is best effort since it is done on a per move
basis instead of per byte.  The bandwidth is limited by delaying the
move of a buffer.  The bandwidth limit can be exceeded when the next
move is larger than the remaining allowance.

drm.burst_bw_period_in_us
        A read-write flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Length of a period use to measure burst bandwidth in us.
        One period per device.

drm.burst_bw_period_in_us.default
        A read-only flat-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.

        Default length of a period in us (one per device.)

drm.bandwidth.stats
        A read-only nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          =================     ======================================
          burst_byte_per_us     Burst bandwidth
          avg_bytes_per_us      Average bandwidth
          moved_byte            Amount of byte moved within a period
          accum_us              Amount of time accumulated in a period
          total_moved_byte      Byte moved within the cgroup lifetime
          total_accum_us        Cgroup lifetime in us
          byte_credit           Available byte credit to limit avg bw
          =================     ======================================

        Reading returns the following::
        226:1 burst_byte_per_us=23 avg_bytes_per_us=0 moved_byte=2244608
        accum_us=95575 total_moved_byte=45899776 total_accum_us=201634590
        byte_credit=13214278590464
        226:2 burst_byte_per_us=10 avg_bytes_per_us=219 moved_byte=430080
        accum_us=39350 total_moved_byte=65518026752 total_accum_us=298337721
        byte_credit=9223372036854644735

drm.bandwidth.high
        A read-write nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ================  =======================================
          bytes_in_period   Burst limit per period in byte
          avg_bytes_per_us  Average bandwidth limit in bytes per us
          ================  =======================================

        Reading returns the following::

        226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
        226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536

drm.bandwidth.default
        A read-only nested-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ================  ========================================
          bytes_in_period   Default burst limit per period in byte
          avg_bytes_per_us  Default average bw limit in bytes per us
          ================  ========================================

        Reading returns the following::

        226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
        226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536

Change-Id: Ie573491325ccc16535bb943e7857f43bd0962add
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |   7 +
 include/drm/drm_cgroup.h     |  13 ++
 include/linux/cgroup_drm.h   |  14 ++
 kernel/cgroup/drm.c          | 309 ++++++++++++++++++++++++++++++++++-
 4 files changed, 340 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index e9f70547f0ad..f06c2b9d8a4a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -36,6 +36,7 @@
 #include <drm/ttm/ttm_placement.h>
 #include <drm/drm_cgroup.h>
 #include <linux/jiffies.h>
+#include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/mm.h>
@@ -1176,6 +1177,12 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
 	 * Check whether we need to move buffer.
 	 */
 	if (!ttm_bo_mem_compat(placement, &bo->mem, &new_flags)) {
+		unsigned int move_delay = drmcgrp_get_mem_bw_period_in_us(bo);
+		move_delay /= 2000; /* check every half period in ms*/
+		while (bo->bdev->ddev != NULL && !drmcgrp_mem_can_move(bo)) {
+			msleep(move_delay);
+		}
+
 		ret = ttm_bo_move_buffer(bo, placement, ctx);
 		if (ret)
 			return ret;
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 48ab5450cf17..9b1dbd6a4eca 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -23,6 +23,8 @@ void drmcgrp_chg_mem(struct ttm_buffer_object *tbo);
 void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo);
 void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 		struct ttm_mem_reg *new_mem);
+unsigned int drmcgrp_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo);
+bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo);
 
 #else
 static inline int drmcgrp_register_device(struct drm_device *device)
@@ -69,5 +71,16 @@ static inline void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo,
 		bool evict, struct ttm_mem_reg *new_mem)
 {
 }
+
+static inline unsigned int drmcgrp_get_mem_bw_period_in_us(
+		struct ttm_buffer_object *tbo)
+{
+	return 0;
+}
+
+static inline bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo)
+{
+	return true;
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 922529641df5..94828da2104a 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -14,6 +14,15 @@
 /* limit defined per the way drm_minor_alloc operates */
 #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
 
+enum drmcgrp_mem_bw_attr {
+    DRMCGRP_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
+    DRMCGRP_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
+    DRMCGRP_MEM_BW_ATTR_TOTAL_BYTE_MOVED,
+    DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US,
+    DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT,
+    __DRMCGRP_MEM_BW_ATTR_LAST,
+};
+
 struct drmcgrp_device_resource {
 	/* for per device stats */
 	s64			bo_stats_total_allocated;
@@ -27,6 +36,11 @@ struct drmcgrp_device_resource {
 	s64			mem_stats[TTM_PL_PRIV+1];
 	s64			mem_peaks[TTM_PL_PRIV+1];
 	s64			mem_stats_evict;
+
+	s64			mem_bw_stats_last_update_us;
+	s64			mem_bw_stats[__DRMCGRP_MEM_BW_ATTR_LAST];
+	s64			mem_bw_limits_bytes_in_period;
+	s64			mem_bw_limits_avg_bytes_per_us;
 };
 
 struct drmcgrp {
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 5f5fa6a2b068..bbc6612200a4 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -7,6 +7,7 @@
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
 #include <linux/cgroup_drm.h>
+#include <linux/ktime.h>
 #include <linux/kernel.h>
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
@@ -22,6 +23,12 @@ struct drmcgrp_device {
 
 	s64			bo_limits_total_allocated_default;
 	size_t			bo_limits_peak_allocated_default;
+
+	s64			mem_bw_limits_period_in_us;
+	s64			mem_bw_limits_period_in_us_default;
+
+	s64			mem_bw_bytes_in_period_default;
+	s64			mem_bw_avg_bytes_per_us_default;
 };
 
 #define DRMCG_CTF_PRIV_SIZE 3
@@ -39,6 +46,8 @@ enum drmcgrp_res_type {
 	DRMCGRP_TYPE_MEM,
 	DRMCGRP_TYPE_MEM_EVICT,
 	DRMCGRP_TYPE_MEM_PEAK,
+	DRMCGRP_TYPE_BANDWIDTH,
+	DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST,
 };
 
 enum drmcgrp_file_type {
@@ -54,6 +63,17 @@ static char const *ttm_placement_names[] = {
 	[TTM_PL_PRIV]   = "priv",
 };
 
+static char const *mem_bw_attr_names[] = {
+	[DRMCGRP_MEM_BW_ATTR_BYTE_MOVED] = "moved_byte",
+	[DRMCGRP_MEM_BW_ATTR_ACCUM_US] = "accum_us",
+	[DRMCGRP_MEM_BW_ATTR_TOTAL_BYTE_MOVED] = "total_moved_byte",
+	[DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US] = "total_accum_us",
+	[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT] = "byte_credit",
+};
+
+#define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
+#define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
+
 /* indexed by drm_minor for access speed */
 static struct drmcgrp_device	*known_drmcgrp_devs[MAX_DRM_DEV];
 
@@ -86,6 +106,9 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
 		if (!ddr)
 			return -ENOMEM;
 
+		ddr->mem_bw_stats_last_update_us = ktime_to_us(ktime_get());
+		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] = 1;
+
 		drmcgrp->dev_resources[minor] = ddr;
 	}
 
@@ -96,6 +119,12 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
 
 		ddr->bo_limits_peak_allocated =
 		  known_drmcgrp_devs[minor]->bo_limits_peak_allocated_default;
+
+		ddr->mem_bw_limits_bytes_in_period =
+		  known_drmcgrp_devs[minor]->mem_bw_bytes_in_period_default;
+
+		ddr->mem_bw_limits_avg_bytes_per_us =
+		  known_drmcgrp_devs[minor]->mem_bw_avg_bytes_per_us_default;
 	}
 
 	return 0;
@@ -143,6 +172,26 @@ drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
 	return &drmcgrp->css;
 }
 
+static inline void drmcgrp_mem_burst_bw_stats_reset(struct drm_device *dev)
+{
+	struct cgroup_subsys_state *pos;
+	struct drmcgrp *node;
+	struct drmcgrp_device_resource *ddr;
+	int devIdx;
+
+	devIdx =  dev->primary->index;
+
+	rcu_read_lock();
+	css_for_each_descendant_pre(pos, &root_drmcgrp->css) {
+		node = css_drmcgrp(pos);
+		ddr = node->dev_resources[devIdx];
+
+		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] = 1;
+		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_MOVED] = 0;
+	}
+	rcu_read_unlock();
+}
+
 static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
 		struct seq_file *sf, enum drmcgrp_res_type type)
 {
@@ -179,6 +228,31 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
 		}
 		seq_puts(sf, "\n");
 		break;
+	case DRMCGRP_TYPE_BANDWIDTH:
+		if (ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] == 0)
+			seq_puts(sf, "burst_byte_per_us=NaN ");
+		else
+			seq_printf(sf, "burst_byte_per_us=%lld ",
+				ddr->mem_bw_stats[
+				DRMCGRP_MEM_BW_ATTR_BYTE_MOVED]/
+				ddr->mem_bw_stats[
+				DRMCGRP_MEM_BW_ATTR_ACCUM_US]);
+
+		if (ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US] == 0)
+			seq_puts(sf, "avg_bytes_per_us=NaN ");
+		else
+			seq_printf(sf, "avg_bytes_per_us=%lld ",
+				ddr->mem_bw_stats[
+				DRMCGRP_MEM_BW_ATTR_TOTAL_BYTE_MOVED]/
+				ddr->mem_bw_stats[
+				DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US]);
+
+		for (i = 0; i < __DRMCGRP_MEM_BW_ATTR_LAST; i++) {
+			seq_printf(sf, "%s=%lld ", mem_bw_attr_names[i],
+					ddr->mem_bw_stats[i]);
+		}
+		seq_puts(sf, "\n");
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -186,9 +260,9 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
 }
 
 static inline void drmcgrp_print_limits(struct drmcgrp_device_resource *ddr,
-		struct seq_file *sf, enum drmcgrp_res_type type)
+		struct seq_file *sf, enum drmcgrp_res_type type, int minor)
 {
-	if (ddr == NULL) {
+	if (ddr == NULL || known_drmcgrp_devs[minor] == NULL) {
 		seq_puts(sf, "\n");
 		return;
 	}
@@ -200,6 +274,17 @@ static inline void drmcgrp_print_limits(struct drmcgrp_device_resource *ddr,
 	case DRMCGRP_TYPE_BO_PEAK:
 		seq_printf(sf, "%zu\n", ddr->bo_limits_peak_allocated);
 		break;
+	case DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST:
+		seq_printf(sf, "%lld\n",
+			known_drmcgrp_devs[minor]->mem_bw_limits_period_in_us);
+		break;
+	case DRMCGRP_TYPE_BANDWIDTH:
+		seq_printf(sf, "%s=%lld %s=%lld\n",
+				MEM_BW_LIMITS_NAME_BURST,
+				ddr->mem_bw_limits_bytes_in_period,
+				MEM_BW_LIMITS_NAME_AVG,
+				ddr->mem_bw_limits_avg_bytes_per_us);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -223,6 +308,17 @@ static inline void drmcgrp_print_default(struct drmcgrp_device *ddev,
 		seq_printf(sf, "%zu\n",
 				ddev->bo_limits_peak_allocated_default);
 		break;
+	case DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST:
+		seq_printf(sf, "%lld\n",
+				ddev->mem_bw_limits_period_in_us_default);
+		break;
+	case DRMCGRP_TYPE_BANDWIDTH:
+		seq_printf(sf, "%s=%lld %s=%lld\n",
+				MEM_BW_LIMITS_NAME_BURST,
+				ddev->mem_bw_bytes_in_period_default,
+				MEM_BW_LIMITS_NAME_AVG,
+				ddev->mem_bw_avg_bytes_per_us_default);
+		break;
 	default:
 		seq_puts(sf, "\n");
 		break;
@@ -251,7 +347,7 @@ int drmcgrp_bo_show(struct seq_file *sf, void *v)
 			drmcgrp_print_stats(ddr, sf, type);
 			break;
 		case DRMCGRP_FTYPE_LIMIT:
-			drmcgrp_print_limits(ddr, sf, type);
+			drmcgrp_print_limits(ddr, sf, type, i);
 			break;
 		case DRMCGRP_FTYPE_DEFAULT:
 			drmcgrp_print_default(ddev, sf, type);
@@ -317,6 +413,9 @@ ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file *of, char *buf,
 	struct drmcgrp_device_resource *ddr;
 	char *line;
 	char sattr[256];
+	char sval[256];
+	char *nested;
+	char *attr;
 	s64 val;
 	s64 p_max;
 	int rc;
@@ -381,6 +480,78 @@ ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file *of, char *buf,
 
 			ddr->bo_limits_peak_allocated = val;
 			break;
+		case DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST:
+			rc = drmcgrp_process_limit_val(sattr, false,
+				ddev->mem_bw_limits_period_in_us_default,
+				S64_MAX,
+				&val);
+
+			if (rc || val < 2000) {
+				drmcgrp_pr_cft_err(drmcgrp, cft_name, minor);
+				continue;
+			}
+
+			ddev->mem_bw_limits_period_in_us= val;
+			drmcgrp_mem_burst_bw_stats_reset(ddev->dev);
+			break;
+		case DRMCGRP_TYPE_BANDWIDTH:
+			nested = strstrip(sattr);
+
+			while (nested != NULL) {
+				attr = strsep(&nested, " ");
+
+				if (sscanf(attr, MEM_BW_LIMITS_NAME_BURST"=%s",
+							sval) == 1) {
+					p_max = parent == NULL ? S64_MAX :
+						parent->
+						dev_resources[minor]->
+						mem_bw_limits_bytes_in_period;
+
+					rc = drmcgrp_process_limit_val(sval,
+						true,
+						ddev->
+						mem_bw_bytes_in_period_default,
+						p_max,
+						&val);
+
+					if (rc || val < 0) {
+						drmcgrp_pr_cft_err(drmcgrp,
+								cft_name,
+								minor);
+						continue;
+					}
+
+					ddr->mem_bw_limits_bytes_in_period=val;
+					continue;
+				}
+
+				if (sscanf(attr, MEM_BW_LIMITS_NAME_AVG"=%s",
+							sval) == 1) {
+					p_max = parent == NULL ? S64_MAX :
+						parent->
+						dev_resources[minor]->
+						mem_bw_limits_avg_bytes_per_us;
+
+					rc = drmcgrp_process_limit_val(sval,
+						true,
+						ddev->
+					      mem_bw_avg_bytes_per_us_default,
+						p_max,
+						&val);
+
+					if (rc || val < 0) {
+						drmcgrp_pr_cft_err(drmcgrp,
+								cft_name,
+								minor);
+						continue;
+					}
+
+					ddr->
+					mem_bw_limits_avg_bytes_per_us=val;
+					continue;
+				}
+			}
+			break;
 		default:
 			break;
 		}
@@ -454,6 +625,41 @@ struct cftype files[] = {
 		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_PEAK,
 						DRMCGRP_FTYPE_STATS),
 	},
+	{
+		.name = "burst_bw_period_in_us",
+		.write = drmcgrp_bo_limit_write,
+		.seq_show = drmcgrp_bo_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST,
+						DRMCGRP_FTYPE_LIMIT),
+	},
+	{
+		.name = "burst_bw_period_in_us.default",
+		.seq_show = drmcgrp_bo_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST,
+						DRMCGRP_FTYPE_DEFAULT),
+	},
+	{
+		.name = "bandwidth.stats",
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH,
+						DRMCGRP_FTYPE_STATS),
+	},
+	{
+		.name = "bandwidth.high",
+		.write = drmcgrp_bo_limit_write,
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH,
+						DRMCGRP_FTYPE_LIMIT),
+	},
+	{
+		.name = "bandwidth.default",
+		.seq_show = drmcgrp_bo_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH,
+						DRMCGRP_FTYPE_DEFAULT),
+	},
 	{ }	/* terminate */
 };
 
@@ -476,6 +682,10 @@ int drmcgrp_register_device(struct drm_device *dev)
 	ddev->dev = dev;
 	ddev->bo_limits_total_allocated_default = S64_MAX;
 	ddev->bo_limits_peak_allocated_default = SIZE_MAX;
+	ddev->mem_bw_limits_period_in_us_default = 200000;
+	ddev->mem_bw_limits_period_in_us = 200000;
+	ddev->mem_bw_bytes_in_period_default = S64_MAX;
+	ddev->mem_bw_avg_bytes_per_us_default = 65536;
 
 	mutex_init(&ddev->mutex);
 
@@ -652,6 +862,27 @@ void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo)
 }
 EXPORT_SYMBOL(drmcgrp_unchg_mem);
 
+static inline void drmcgrp_mem_bw_accum(s64 time_us,
+		struct drmcgrp_device_resource *ddr)
+{
+	s64 increment_us = time_us - ddr->mem_bw_stats_last_update_us;
+	s64 new_credit = ddr->mem_bw_limits_avg_bytes_per_us * increment_us;
+
+	ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US]
+		+= increment_us;
+	ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US]
+		+= increment_us;
+
+	if ((S64_MAX - new_credit) >
+			ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT])
+		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
+			+= new_credit;
+	else
+		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT] = S64_MAX;
+
+	ddr->mem_bw_stats_last_update_us = time_us;
+}
+
 void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 		struct ttm_mem_reg *new_mem)
 {
@@ -661,6 +892,7 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 	int devIdx = dev->primary->index;
 	int old_mem_type = old_bo->mem.mem_type;
 	int new_mem_type = new_mem->mem_type;
+	s64 time_us;
 	struct drmcgrp_device_resource *ddr;
 	struct drmcgrp_device *known_dev;
 
@@ -672,6 +904,14 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 	old_mem_type = old_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : old_mem_type;
 	new_mem_type = new_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : new_mem_type;
 
+	if (root_drmcgrp->dev_resources[devIdx] != NULL &&
+			root_drmcgrp->dev_resources[devIdx]->
+			mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] >=
+			known_dev->mem_bw_limits_period_in_us)
+		drmcgrp_mem_burst_bw_stats_reset(dev);
+
+	time_us = ktime_to_us(ktime_get());
+
 	mutex_lock(&known_dev->mutex);
 	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
 		ddr = drmcgrp->dev_resources[devIdx];
@@ -684,7 +924,70 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 
 		if (evict)
 			ddr->mem_stats_evict++;
+
+		drmcgrp_mem_bw_accum(time_us, ddr);
+
+		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_MOVED]
+			+= move_in_bytes;
+		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_TOTAL_BYTE_MOVED]
+			+= move_in_bytes;
+
+		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
+			-= move_in_bytes;
 	}
 	mutex_unlock(&known_dev->mutex);
 }
 EXPORT_SYMBOL(drmcgrp_mem_track_move);
+
+unsigned int drmcgrp_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo)
+{
+	int devIdx;
+
+	//TODO replace with BUG_ON
+	if (tbo->bdev->ddev == NULL)
+		return 0;
+
+	devIdx = tbo->bdev->ddev->primary->index;
+
+	return (unsigned int) known_drmcgrp_devs[devIdx]->
+		mem_bw_limits_period_in_us;
+}
+EXPORT_SYMBOL(drmcgrp_get_mem_bw_period_in_us);
+
+bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo)
+{
+	struct drm_device *dev = tbo->bdev->ddev;
+	struct drmcgrp *drmcgrp = tbo->drmcgrp;
+	int devIdx = dev->primary->index;
+	s64 time_us;
+	struct drmcgrp_device_resource *ddr;
+	bool result = true;
+
+	if (root_drmcgrp->dev_resources[devIdx] != NULL &&
+			root_drmcgrp->dev_resources[devIdx]->
+			mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] >=
+			known_drmcgrp_devs[devIdx]->
+			mem_bw_limits_period_in_us)
+		drmcgrp_mem_burst_bw_stats_reset(dev);
+
+	time_us = ktime_to_us(ktime_get());
+
+	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
+	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
+		ddr = drmcgrp->dev_resources[devIdx];
+
+		drmcgrp_mem_bw_accum(time_us, ddr);
+
+		if (result &&
+			(ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_MOVED]
+			 >= ddr->mem_bw_limits_bytes_in_period ||
+			ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
+			 <= 0)) {
+			result = false;
+		}
+	}
+	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
+
+	return result;
+}
+EXPORT_SYMBOL(drmcgrp_mem_can_move);
-- 
2.21.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 10/11] drm, cgroup: Add soft VRAM limit
  2019-06-26 15:05 [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (5 preceding siblings ...)
  2019-06-26 15:05 ` [RFC PATCH v3 08/11] drm, cgroup: Add TTM buffer peak usage stats Kenny Ho
@ 2019-06-26 15:05 ` Kenny Ho
  2019-06-27  7:24 ` [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Daniel Vetter
  7 siblings, 0 replies; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny, cgroups, dri-devel, amd-gfx, tj, alexander.deucher,
	christian.koenig, joseph.greathouse, jsparks, lkaplan
  Cc: Kenny Ho

The drm resource being limited is the TTM (Translation Table Manager)
buffers.  TTM manages different types of memory that a GPU might access.
These memory types include dedicated Video RAM (VRAM) and host/system
memory accessible through IOMMU (GART/GTT).  TTM is currently used by
multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
nouveau, qxl, virtio, vmwgfx.)

TTM buffers belonging to drm cgroups under memory pressure will be
selected to be evicted first.

drm.memory.high
        A read-write nested-keyed file which exists on all cgroups.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ====         =============================================
          vram         Video RAM soft limit for a drm device in byte
          ====         =============================================

        Reading returns the following::

        226:0 vram=0
        226:1 vram=17768448
        226:2 vram=17768448

drm.memory.default
        A read-only nested-keyed file which exists on the root cgroup.
        Each entry is keyed by the drm device's major:minor.  The
        following nested keys are defined.

          ====         ===============================
          vram         Video RAM default limit in byte
          ====         ===============================

        Reading returns the following::

        226:0 vram=0
        226:1 vram=17768448
        226:2 vram=17768448

Change-Id: I7988e28a453b53140b40a28c176239acbc81d491
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |   7 ++
 include/drm/drm_cgroup.h     |  15 ++++
 include/linux/cgroup_drm.h   |   2 +
 kernel/cgroup/drm.c          | 145 +++++++++++++++++++++++++++++++++++
 4 files changed, 169 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index f06c2b9d8a4a..79c530f4a198 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -806,12 +806,19 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
 	struct ttm_mem_type_manager *man = &bdev->man[mem_type];
 	struct ttm_buffer_object *bo = NULL;
 	bool locked = false;
+        bool check_drmcgrp;
 	unsigned i;
 	int ret;
 
+	check_drmcgrp = drmcgrp_mem_pressure_scan(bdev, mem_type);
+
 	spin_lock(&glob->lru_lock);
 	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
 		list_for_each_entry(bo, &man->lru[i], lru) {
+			if (check_drmcgrp &&
+				!drmcgrp_mem_should_evict(bo, mem_type))
+				continue;
+
 			if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked))
 				continue;
 
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 9b1dbd6a4eca..360c1e6c809f 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -6,6 +6,7 @@
 
 #include <linux/cgroup_drm.h>
 #include <drm/ttm/ttm_bo_api.h>
+#include <drm/ttm/ttm_bo_driver.h>
 
 #ifdef CONFIG_CGROUP_DRM
 
@@ -25,6 +26,8 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 		struct ttm_mem_reg *new_mem);
 unsigned int drmcgrp_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo);
 bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo);
+bool drmcgrp_mem_pressure_scan(struct ttm_bo_device *bdev, unsigned type);
+bool drmcgrp_mem_should_evict(struct ttm_buffer_object *tbo, unsigned type);
 
 #else
 static inline int drmcgrp_register_device(struct drm_device *device)
@@ -82,5 +85,17 @@ static inline bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo)
 {
 	return true;
 }
+
+static inline bool drmcgrp_mem_pressure_scan(struct ttm_bo_device *bdev,
+		unsigned type)
+{
+	return false;
+}
+
+static inline bool drmcgrp_mem_should_evict(struct ttm_buffer_object *tbo,
+		unsigned type)
+{
+	return true;
+}
 #endif /* CONFIG_CGROUP_DRM */
 #endif /* __DRM_CGROUP_H__ */
diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
index 94828da2104a..52ef02eaac70 100644
--- a/include/linux/cgroup_drm.h
+++ b/include/linux/cgroup_drm.h
@@ -35,6 +35,8 @@ struct drmcgrp_device_resource {
 
 	s64			mem_stats[TTM_PL_PRIV+1];
 	s64			mem_peaks[TTM_PL_PRIV+1];
+	s64			mem_highs[TTM_PL_PRIV+1];
+	bool			mem_pressure[TTM_PL_PRIV+1];
 	s64			mem_stats_evict;
 
 	s64			mem_bw_stats_last_update_us;
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index bbc6612200a4..1ce13db36ce9 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -29,6 +29,8 @@ struct drmcgrp_device {
 
 	s64			mem_bw_bytes_in_period_default;
 	s64			mem_bw_avg_bytes_per_us_default;
+
+	s64			mem_highs_default[TTM_PL_PRIV+1];
 };
 
 #define DRMCG_CTF_PRIV_SIZE 3
@@ -114,6 +116,8 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
 
 	/* set defaults here */
 	if (known_drmcgrp_devs[minor] != NULL) {
+		int i;
+
 		ddr->bo_limits_total_allocated =
 		  known_drmcgrp_devs[minor]->bo_limits_total_allocated_default;
 
@@ -125,6 +129,11 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
 
 		ddr->mem_bw_limits_avg_bytes_per_us =
 		  known_drmcgrp_devs[minor]->mem_bw_avg_bytes_per_us_default;
+
+		for (i = 0; i <= TTM_PL_PRIV; i++) {
+			ddr->mem_highs[i] =
+			known_drmcgrp_devs[minor]->mem_highs_default[i];
+		}
 	}
 
 	return 0;
@@ -274,6 +283,11 @@ static inline void drmcgrp_print_limits(struct drmcgrp_device_resource *ddr,
 	case DRMCGRP_TYPE_BO_PEAK:
 		seq_printf(sf, "%zu\n", ddr->bo_limits_peak_allocated);
 		break;
+	case DRMCGRP_TYPE_MEM:
+		seq_printf(sf, "%s=%lld\n",
+				ttm_placement_names[TTM_PL_VRAM],
+				ddr->mem_highs[TTM_PL_VRAM]);
+		break;
 	case DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST:
 		seq_printf(sf, "%lld\n",
 			known_drmcgrp_devs[minor]->mem_bw_limits_period_in_us);
@@ -308,6 +322,11 @@ static inline void drmcgrp_print_default(struct drmcgrp_device *ddev,
 		seq_printf(sf, "%zu\n",
 				ddev->bo_limits_peak_allocated_default);
 		break;
+	case DRMCGRP_TYPE_MEM:
+		seq_printf(sf, "%s=%lld\n",
+				ttm_placement_names[TTM_PL_VRAM],
+				ddev->mem_highs_default[TTM_PL_VRAM]);
+		break;
 	case DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST:
 		seq_printf(sf, "%lld\n",
 				ddev->mem_bw_limits_period_in_us_default);
@@ -552,6 +571,38 @@ ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file *of, char *buf,
 				}
 			}
 			break;
+		case DRMCGRP_TYPE_MEM:
+			nested = strstrip(sattr);
+
+			while (nested != NULL) {
+				attr = strsep(&nested, " ");
+
+				if (sscanf(attr, "vram=%s",
+					 sval) == 1) {
+					p_max = parent == NULL ? S64_MAX :
+						parent->
+						dev_resources[minor]->
+						mem_highs[TTM_PL_VRAM];
+
+					rc = drmcgrp_process_limit_val(sval,
+						true,
+						ddev->
+						mem_highs_default[TTM_PL_VRAM],
+						p_max,
+						&val);
+
+					if (rc || val < 0) {
+						drmcgrp_pr_cft_err(drmcgrp,
+								cft_name,
+								minor);
+						continue;
+					}
+
+					ddr->mem_highs[TTM_PL_VRAM]=val;
+					continue;
+				}
+			}
+			break;
 		default:
 			break;
 		}
@@ -624,6 +675,20 @@ struct cftype files[] = {
 		.seq_show = drmcgrp_bo_show,
 		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_PEAK,
 						DRMCGRP_FTYPE_STATS),
+        },
+	{
+		.name = "memory.default",
+		.seq_show = drmcgrp_bo_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM,
+						DRMCGRP_FTYPE_DEFAULT),
+	},
+	{
+		.name = "memory.high",
+		.write = drmcgrp_bo_limit_write,
+		.seq_show = drmcgrp_bo_show,
+		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM,
+						DRMCGRP_FTYPE_LIMIT),
 	},
 	{
 		.name = "burst_bw_period_in_us",
@@ -674,6 +739,7 @@ struct cgroup_subsys drm_cgrp_subsys = {
 int drmcgrp_register_device(struct drm_device *dev)
 {
 	struct drmcgrp_device *ddev;
+	int i;
 
 	ddev = kzalloc(sizeof(struct drmcgrp_device), GFP_KERNEL);
 	if (!ddev)
@@ -687,6 +753,10 @@ int drmcgrp_register_device(struct drm_device *dev)
 	ddev->mem_bw_bytes_in_period_default = S64_MAX;
 	ddev->mem_bw_avg_bytes_per_us_default = 65536;
 
+	for (i = 0; i <= TTM_PL_PRIV; i++) {
+		ddev->mem_highs_default[i] = S64_MAX;
+	}
+
 	mutex_init(&ddev->mutex);
 
 	mutex_lock(&drmcgrp_mutex);
@@ -991,3 +1061,78 @@ bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo)
 	return result;
 }
 EXPORT_SYMBOL(drmcgrp_mem_can_move);
+
+static inline void drmcgrp_mem_set_pressure(struct drmcgrp *drmcgrp,
+		int devIdx, unsigned mem_type, bool pressure_val)
+{
+	struct drmcgrp_device_resource *ddr;
+	struct cgroup_subsys_state *pos;
+	struct drmcgrp *node;
+
+	css_for_each_descendant_pre(pos, &drmcgrp->css) {
+		node = css_drmcgrp(pos);
+		ddr = node->dev_resources[devIdx];
+		ddr->mem_pressure[mem_type] = pressure_val;
+	}
+}
+
+static inline bool drmcgrp_mem_check(struct drmcgrp *drmcgrp, int devIdx,
+		unsigned mem_type)
+{
+	struct drmcgrp_device_resource *ddr = drmcgrp->dev_resources[devIdx];
+
+	/* already under pressure, no need to check and set */
+	if (ddr->mem_pressure[mem_type])
+		return true;
+
+	if (ddr->mem_stats[mem_type] >= ddr->mem_highs[mem_type]) {
+		drmcgrp_mem_set_pressure(drmcgrp, devIdx, mem_type, true);
+		return true;
+	}
+
+	return false;
+}
+
+bool drmcgrp_mem_pressure_scan(struct ttm_bo_device *bdev, unsigned type)
+{
+	struct drm_device *dev = bdev->ddev;
+	struct cgroup_subsys_state *pos;
+	struct drmcgrp *node;
+	int devIdx;
+	bool result = false;
+
+	//TODO replace with BUG_ON
+	if (dev == NULL || type != TTM_PL_VRAM) /* only vram limit for now */
+		return false;
+
+	devIdx = dev->primary->index;
+
+	type = type > TTM_PL_PRIV ? TTM_PL_PRIV : type;
+
+	rcu_read_lock();
+	drmcgrp_mem_set_pressure(root_drmcgrp, devIdx, type, false);
+
+	css_for_each_descendant_pre(pos, &root_drmcgrp->css) {
+		node = css_drmcgrp(pos);
+		result |= drmcgrp_mem_check(node, devIdx, type);
+	}
+	rcu_read_unlock();
+
+	return result;
+}
+EXPORT_SYMBOL(drmcgrp_mem_pressure_scan);
+
+bool drmcgrp_mem_should_evict(struct ttm_buffer_object *tbo, unsigned type)
+{
+	struct drm_device *dev = tbo->bdev->ddev;
+	int devIdx;
+
+	//TODO replace with BUG_ON
+	if (dev == NULL)
+		return true;
+
+	devIdx = dev->primary->index;
+
+	return tbo->drmcgrp->dev_resources[devIdx]->mem_pressure[type];
+}
+EXPORT_SYMBOL(drmcgrp_mem_should_evict);
-- 
2.21.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim
       [not found] ` <20190626150522.11618-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2019-06-26 15:05   ` [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control Kenny Ho
@ 2019-06-26 15:05   ` Kenny Ho
       [not found]     ` <20190626150522.11618-12-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
  4 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 15:05 UTC (permalink / raw)
  To: y2kenny-Re5JQEeQqe8AvxtiuMwx3w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tj-DgEjT+Ai2ygdnm+yROfE0A, alexander.deucher-5C7GfCeVMHo,
	christian.koenig-5C7GfCeVMHo, joseph.greathouse-5C7GfCeVMHo,
	jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc
  Cc: Kenny Ho

Allow DRM TTM memory manager to register a work_struct, such that, when
a drmcgrp is under memory pressure, memory reclaiming can be triggered
immediately.

Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c    | 47 +++++++++++++++++++++++++++++++++
 include/drm/drm_cgroup.h        | 14 ++++++++++
 include/drm/ttm/ttm_bo_driver.h |  2 ++
 kernel/cgroup/drm.c             | 33 +++++++++++++++++++++++
 4 files changed, 96 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 79c530f4a198..5fc3bc5bd4c5 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1509,6 +1509,44 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned mem_type)
 }
 EXPORT_SYMBOL(ttm_bo_evict_mm);
 
+static void ttm_bo_reclaim_wq(struct work_struct *work)
+{
+	struct ttm_operation_ctx ctx = {
+		.interruptible = false,
+		.no_wait_gpu = false,
+		.flags = TTM_OPT_FLAG_FORCE_ALLOC
+	};
+	struct ttm_mem_type_manager *man =
+	    container_of(work, struct ttm_mem_type_manager, reclaim_wq);
+	struct ttm_bo_device *bdev = man->bdev;
+	struct dma_fence *fence;
+	int mem_type;
+	int ret;
+
+	for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
+		if (&bdev->man[mem_type] == man)
+			break;
+
+	BUG_ON(mem_type >= TTM_NUM_MEM_TYPES);
+
+	if (!drmcgrp_mem_pressure_scan(bdev, mem_type))
+		return;
+
+	ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx);
+	if (ret)
+		return;
+
+	spin_lock(&man->move_lock);
+	fence = dma_fence_get(man->move);
+	spin_unlock(&man->move_lock);
+
+	if (fence) {
+		ret = dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+	}
+
+}
+
 int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
 			unsigned long p_size)
 {
@@ -1543,6 +1581,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
 		INIT_LIST_HEAD(&man->lru[i]);
 	man->move = NULL;
 
+	pr_err("drmcgrp %p type %d\n", bdev->ddev, type);
+
+	if (type <= TTM_PL_VRAM) {
+		INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
+		drmcgrp_register_device_mm(bdev->ddev, type, &man->reclaim_wq);
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(ttm_bo_init_mm);
@@ -1620,6 +1665,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
 		man = &bdev->man[i];
 		if (man->has_type) {
 			man->use_type = false;
+			drmcgrp_unregister_device_mm(bdev->ddev, i);
+			cancel_work_sync(&man->reclaim_wq);
 			if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
 				ret = -EBUSY;
 				pr_err("DRM memory manager type %d is not clean\n",
diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
index 360c1e6c809f..134d6e5475f3 100644
--- a/include/drm/drm_cgroup.h
+++ b/include/drm/drm_cgroup.h
@@ -5,6 +5,7 @@
 #define __DRM_CGROUP_H__
 
 #include <linux/cgroup_drm.h>
+#include <linux/workqueue.h>
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
 
@@ -12,6 +13,9 @@
 
 int drmcgrp_register_device(struct drm_device *device);
 int drmcgrp_unregister_device(struct drm_device *device);
+void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
+		struct work_struct *wq);
+void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type);
 bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
 		struct drmcgrp *relative);
 void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
@@ -40,6 +44,16 @@ static inline int drmcgrp_unregister_device(struct drm_device *device)
 	return 0;
 }
 
+static inline void drmcgrp_register_device_mm(struct drm_device *dev,
+		unsigned type, struct work_struct *wq)
+{
+}
+
+static inline void drmcgrp_unregister_device_mm(struct drm_device *dev,
+		unsigned type)
+{
+}
+
 static inline bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
 		struct drmcgrp *relative)
 {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 4cbcb41e5aa9..0956ca7888fc 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
 	 * Protected by @move_lock.
 	 */
 	struct dma_fence *move;
+
+	struct work_struct reclaim_wq;
 };
 
 /**
diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
index 1ce13db36ce9..985a89e849d3 100644
--- a/kernel/cgroup/drm.c
+++ b/kernel/cgroup/drm.c
@@ -31,6 +31,8 @@ struct drmcgrp_device {
 	s64			mem_bw_avg_bytes_per_us_default;
 
 	s64			mem_highs_default[TTM_PL_PRIV+1];
+
+	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
 };
 
 #define DRMCG_CTF_PRIV_SIZE 3
@@ -793,6 +795,31 @@ int drmcgrp_unregister_device(struct drm_device *dev)
 }
 EXPORT_SYMBOL(drmcgrp_unregister_device);
 
+void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
+		struct work_struct *wq)
+{
+	if (dev == NULL || dev->primary->index > max_minor
+			|| type >= TTM_PL_PRIV)
+		return;
+
+	mutex_lock(&drmcgrp_mutex);
+	known_drmcgrp_devs[dev->primary->index]->mem_reclaim_wq[type] = wq;
+	mutex_unlock(&drmcgrp_mutex);
+}
+EXPORT_SYMBOL(drmcgrp_register_device_mm);
+
+void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type)
+{
+	if (dev == NULL || dev->primary->index > max_minor
+			|| type >= TTM_PL_PRIV)
+		return;
+
+	mutex_lock(&drmcgrp_mutex);
+	known_drmcgrp_devs[dev->primary->index]->mem_reclaim_wq[type] = NULL;
+	mutex_unlock(&drmcgrp_mutex);
+}
+EXPORT_SYMBOL(drmcgrp_unregister_device_mm);
+
 bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self, struct drmcgrp *relative)
 {
 	for (; self != NULL; self = parent_drmcgrp(self))
@@ -1004,6 +1031,12 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
 
 		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
 			-= move_in_bytes;
+
+		if (known_dev->mem_reclaim_wq[new_mem_type] != NULL &&
+                        ddr->mem_stats[new_mem_type] >
+				ddr->mem_highs[new_mem_type])
+			schedule_work(
+				known_dev->mem_reclaim_wq[new_mem_type]);
 	}
 	mutex_unlock(&known_dev->mutex);
 }
-- 
2.21.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 01/11] cgroup: Introduce cgroup for drm subsystem
       [not found]     ` <20190626150522.11618-2-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26 15:49       ` Daniel Vetter
  2019-06-26 19:35         ` Kenny Ho
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 15:49 UTC (permalink / raw)
  To: Kenny Ho
  Cc: jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, alexander.deucher-5C7GfCeVMHo,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

On Wed, Jun 26, 2019 at 11:05:12AM -0400, Kenny Ho wrote:
Needs a bit more commit message here I htink.

> Change-Id: I6830d3990f63f0c13abeba29b1d330cf28882831
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>

Bunch of naming bikesheds
> ---
>  include/linux/cgroup_drm.h    | 76 +++++++++++++++++++++++++++++++++++
>  include/linux/cgroup_subsys.h |  4 ++
>  init/Kconfig                  |  5 +++
>  kernel/cgroup/Makefile        |  1 +
>  kernel/cgroup/drm.c           | 42 +++++++++++++++++++
>  5 files changed, 128 insertions(+)
>  create mode 100644 include/linux/cgroup_drm.h
>  create mode 100644 kernel/cgroup/drm.c

> 
> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> new file mode 100644
> index 000000000000..9928e60037a5
> --- /dev/null
> +++ b/include/linux/cgroup_drm.h
> @@ -0,0 +1,76 @@
> +/* SPDX-License-Identifier: MIT
> + * Copyright 2019 Advanced Micro Devices, Inc.
> + */
> +#ifndef _CGROUP_DRM_H
> +#define _CGROUP_DRM_H
> +
> +#ifdef CONFIG_CGROUP_DRM
> +
> +#include <linux/cgroup.h>
> +
> +struct drmcgrp {

drm_cgroup for more consistency how we usually call these things.

> +	struct cgroup_subsys_state	css;
> +};
> +
> +static inline struct drmcgrp *css_drmcgrp(struct cgroup_subsys_state *css)

ccs_to_drm_cgroup

> +{
> +	return css ? container_of(css, struct drmcgrp, css) : NULL;
> +}
> +
> +static inline struct drmcgrp *drmcgrp_from(struct task_struct *task)

task_get_drm_cgroup for consistency with task_get_css?

> +{
> +	return css_drmcgrp(task_get_css(task, drm_cgrp_id));
> +}
> +
> +static inline struct drmcgrp *get_drmcgrp(struct task_struct *task)
> +{
> +	struct cgroup_subsys_state *css = task_get_css(task, drm_cgrp_id);
> +
> +	if (css)
> +		css_get(css);
> +
> +	return css_drmcgrp(css);
> +}
> +
> +static inline void put_drmcgrp(struct drmcgrp *drmcgrp)

In drm we generally put _get/_put at the end, cgroup seems to do the same.

> +{
> +	if (drmcgrp)
> +		css_put(&drmcgrp->css);
> +}
> +
> +static inline struct drmcgrp *parent_drmcgrp(struct drmcgrp *cg)

I'd also call this drm_cgroup_parent or so.

Also all the above needs a bit of nice kerneldoc for the final version.
-Daniel

> +{
> +	return css_drmcgrp(cg->css.parent);
> +}
> +
> +#else /* CONFIG_CGROUP_DRM */
> +
> +struct drmcgrp {
> +};
> +
> +static inline struct drmcgrp *css_drmcgrp(struct cgroup_subsys_state *css)
> +{
> +	return NULL;
> +}
> +
> +static inline struct drmcgrp *drmcgrp_from(struct task_struct *task)
> +{
> +	return NULL;
> +}
> +
> +static inline struct drmcgrp *get_drmcgrp(struct task_struct *task)
> +{
> +	return NULL;
> +}
> +
> +static inline void put_drmcgrp(struct drmcgrp *drmcgrp)
> +{
> +}
> +
> +static inline struct drmcgrp *parent_drmcgrp(struct drmcgrp *cg)
> +{
> +	return NULL;
> +}
> +
> +#endif	/* CONFIG_CGROUP_DRM */
> +#endif	/* _CGROUP_DRM_H */
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index acb77dcff3b4..ddedad809e8b 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -61,6 +61,10 @@ SUBSYS(pids)
>  SUBSYS(rdma)
>  #endif
>  
> +#if IS_ENABLED(CONFIG_CGROUP_DRM)
> +SUBSYS(drm)
> +#endif
> +
>  /*
>   * The following subsystems are not supported on the default hierarchy.
>   */
> diff --git a/init/Kconfig b/init/Kconfig
> index d47cb77a220e..0b0f112eb23b 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -839,6 +839,11 @@ config CGROUP_RDMA
>  	  Attaching processes with active RDMA resources to the cgroup
>  	  hierarchy is allowed even if can cross the hierarchy's limit.
>  
> +config CGROUP_DRM
> +	bool "DRM controller (EXPERIMENTAL)"
> +	help
> +	  Provides accounting and enforcement of resources in the DRM subsystem.
> +
>  config CGROUP_FREEZER
>  	bool "Freezer controller"
>  	help
> diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
> index bfcdae896122..6af14bd93050 100644
> --- a/kernel/cgroup/Makefile
> +++ b/kernel/cgroup/Makefile
> @@ -4,5 +4,6 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o
>  obj-$(CONFIG_CGROUP_FREEZER) += freezer.o
>  obj-$(CONFIG_CGROUP_PIDS) += pids.o
>  obj-$(CONFIG_CGROUP_RDMA) += rdma.o
> +obj-$(CONFIG_CGROUP_DRM) += drm.o
>  obj-$(CONFIG_CPUSETS) += cpuset.o
>  obj-$(CONFIG_CGROUP_DEBUG) += debug.o
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> new file mode 100644
> index 000000000000..66cb1dda023d
> --- /dev/null
> +++ b/kernel/cgroup/drm.c
> @@ -0,0 +1,42 @@
> +// SPDX-License-Identifier: MIT
> +// Copyright 2019 Advanced Micro Devices, Inc.
> +#include <linux/slab.h>
> +#include <linux/cgroup.h>
> +#include <linux/cgroup_drm.h>
> +
> +static struct drmcgrp *root_drmcgrp __read_mostly;
> +
> +static void drmcgrp_css_free(struct cgroup_subsys_state *css)
> +{
> +	struct drmcgrp *drmcgrp = css_drmcgrp(css);
> +
> +	kfree(drmcgrp);
> +}
> +
> +static struct cgroup_subsys_state *
> +drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
> +{
> +	struct drmcgrp *parent = css_drmcgrp(parent_css);
> +	struct drmcgrp *drmcgrp;
> +
> +	drmcgrp = kzalloc(sizeof(struct drmcgrp), GFP_KERNEL);
> +	if (!drmcgrp)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (!parent)
> +		root_drmcgrp = drmcgrp;
> +
> +	return &drmcgrp->css;
> +}
> +
> +struct cftype files[] = {
> +	{ }	/* terminate */
> +};
> +
> +struct cgroup_subsys drm_cgrp_subsys = {
> +	.css_alloc	= drmcgrp_css_alloc,
> +	.css_free	= drmcgrp_css_free,
> +	.early_init	= false,
> +	.legacy_cftypes	= files,
> +	.dfl_cftypes	= files,
> +};
> -- 
> 2.21.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices
       [not found]   ` <20190626150522.11618-3-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26 15:56     ` Daniel Vetter
  2019-06-26 20:37       ` Kenny Ho
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 15:56 UTC (permalink / raw)
  To: Kenny Ho
  Cc: jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, alexander.deucher-5C7GfCeVMHo,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

On Wed, Jun 26, 2019 at 11:05:13AM -0400, Kenny Ho wrote:
> Change-Id: I908ee6975ea0585e4c30eafde4599f87094d8c65
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>

Why the separate, explicit registration step? I think a simpler design for
drivers would be that we set up cgroups if there's anything to be
controlled, and then for GEM drivers the basic GEM stuff would be set up
automically (there's really no reason not to I think).

Also tying to the minor is a bit funky, since we have multiple of these.
Need to make sure were at least consistent with whether we use the primary
or render minor - I'd always go with the primary one like you do here.

> ---
>  include/drm/drm_cgroup.h   |  24 ++++++++
>  include/linux/cgroup_drm.h |  10 ++++
>  kernel/cgroup/drm.c        | 116 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 150 insertions(+)
>  create mode 100644 include/drm/drm_cgroup.h
> 
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> new file mode 100644
> index 000000000000..ddb9eab64360
> --- /dev/null
> +++ b/include/drm/drm_cgroup.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: MIT
> + * Copyright 2019 Advanced Micro Devices, Inc.
> + */
> +#ifndef __DRM_CGROUP_H__
> +#define __DRM_CGROUP_H__
> +
> +#ifdef CONFIG_CGROUP_DRM
> +
> +int drmcgrp_register_device(struct drm_device *device);
> +
> +int drmcgrp_unregister_device(struct drm_device *device);
> +
> +#else
> +static inline int drmcgrp_register_device(struct drm_device *device)
> +{
> +	return 0;
> +}
> +
> +static inline int drmcgrp_unregister_device(struct drm_device *device)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_CGROUP_DRM */
> +#endif /* __DRM_CGROUP_H__ */
> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> index 9928e60037a5..27497f786c93 100644
> --- a/include/linux/cgroup_drm.h
> +++ b/include/linux/cgroup_drm.h
> @@ -6,10 +6,20 @@
>  
>  #ifdef CONFIG_CGROUP_DRM
>  
> +#include <linux/mutex.h>
>  #include <linux/cgroup.h>
> +#include <drm/drm_file.h>
> +
> +/* limit defined per the way drm_minor_alloc operates */
> +#define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> +
> +struct drmcgrp_device_resource {
> +	/* for per device stats */
> +};
>  
>  struct drmcgrp {
>  	struct cgroup_subsys_state	css;
> +	struct drmcgrp_device_resource	*dev_resources[MAX_DRM_DEV];
>  };
>  
>  static inline struct drmcgrp *css_drmcgrp(struct cgroup_subsys_state *css)
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 66cb1dda023d..7da6e0d93991 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -1,28 +1,99 @@
>  // SPDX-License-Identifier: MIT
>  // Copyright 2019 Advanced Micro Devices, Inc.
> +#include <linux/export.h>
>  #include <linux/slab.h>
>  #include <linux/cgroup.h>
> +#include <linux/fs.h>
> +#include <linux/seq_file.h>
> +#include <linux/mutex.h>
>  #include <linux/cgroup_drm.h>
> +#include <linux/kernel.h>
> +#include <drm/drm_device.h>
> +#include <drm/drm_cgroup.h>
> +
> +static DEFINE_MUTEX(drmcgrp_mutex);
> +
> +struct drmcgrp_device {
> +	struct drm_device	*dev;
> +	struct mutex		mutex;
> +};
> +
> +/* indexed by drm_minor for access speed */
> +static struct drmcgrp_device	*known_drmcgrp_devs[MAX_DRM_DEV];
> +
> +static int max_minor;

Uh no global stuff like this please. Or some explanation in the commit
message why we really cant avoid this.

> +
>  
>  static struct drmcgrp *root_drmcgrp __read_mostly;
>  
>  static void drmcgrp_css_free(struct cgroup_subsys_state *css)
>  {
>  	struct drmcgrp *drmcgrp = css_drmcgrp(css);
> +	int i;
> +
> +	for (i = 0; i <= max_minor; i++) {
> +		if (drmcgrp->dev_resources[i] != NULL)
> +			kfree(drmcgrp->dev_resources[i]);
> +	}
>  
>  	kfree(drmcgrp);
>  }
>  
> +static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
> +{
> +	struct drmcgrp_device_resource *ddr = drmcgrp->dev_resources[minor];
> +
> +	if (ddr == NULL) {
> +		ddr = kzalloc(sizeof(struct drmcgrp_device_resource),
> +			GFP_KERNEL);
> +
> +		if (!ddr)
> +			return -ENOMEM;
> +
> +		drmcgrp->dev_resources[minor] = ddr;
> +	}
> +
> +	/* set defaults here */
> +
> +	return 0;
> +}
> +
> +static inline int init_drmcgrp(struct drmcgrp *drmcgrp, struct drm_device *dev)
> +{
> +	int rc = 0;
> +	int i;
> +
> +	if (dev != NULL) {
> +		rc = init_drmcgrp_single(drmcgrp, dev->primary->index);
> +		return rc;
> +	}
> +
> +	for (i = 0; i <= max_minor; i++) {
> +		rc = init_drmcgrp_single(drmcgrp, i);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
>  static struct cgroup_subsys_state *
>  drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
>  {
>  	struct drmcgrp *parent = css_drmcgrp(parent_css);
>  	struct drmcgrp *drmcgrp;
> +	int rc;
>  
>  	drmcgrp = kzalloc(sizeof(struct drmcgrp), GFP_KERNEL);
>  	if (!drmcgrp)
>  		return ERR_PTR(-ENOMEM);
>  
> +	rc = init_drmcgrp(drmcgrp, NULL);
> +	if (rc) {
> +		drmcgrp_css_free(&drmcgrp->css);
> +		return ERR_PTR(rc);
> +	}
> +
>  	if (!parent)
>  		root_drmcgrp = drmcgrp;
>  
> @@ -40,3 +111,48 @@ struct cgroup_subsys drm_cgrp_subsys = {
>  	.legacy_cftypes	= files,
>  	.dfl_cftypes	= files,
>  };
> +
> +int drmcgrp_register_device(struct drm_device *dev)

Imo this should be done as part of drm_dev_register (maybe only if the
driver has set up a controller or something). Definitely with the
unregister logic below. Also anything used by drivers needs kerneldoc.


> +{
> +	struct drmcgrp_device *ddev;
> +
> +	ddev = kzalloc(sizeof(struct drmcgrp_device), GFP_KERNEL);
> +	if (!ddev)
> +		return -ENOMEM;
> +
> +	ddev->dev = dev;
> +	mutex_init(&ddev->mutex);
> +
> +	mutex_lock(&drmcgrp_mutex);
> +	known_drmcgrp_devs[dev->primary->index] = ddev;
> +	max_minor = max(max_minor, dev->primary->index);
> +	mutex_unlock(&drmcgrp_mutex);
> +
> +	/* init cgroups created before registration (i.e. root cgroup) */
> +	if (root_drmcgrp != NULL) {
> +		struct cgroup_subsys_state *pos;
> +		struct drmcgrp *child;
> +
> +		rcu_read_lock();
> +		css_for_each_descendant_pre(pos, &root_drmcgrp->css) {
> +			child = css_drmcgrp(pos);
> +			init_drmcgrp(child, dev);
> +		}
> +		rcu_read_unlock();

I have no idea, but is this guaranteed to get them all?
-Daniel

> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(drmcgrp_register_device);
> +
> +int drmcgrp_unregister_device(struct drm_device *dev)
> +{
> +	mutex_lock(&drmcgrp_mutex);
> +
> +	kfree(known_drmcgrp_devs[dev->primary->index]);
> +	known_drmcgrp_devs[dev->primary->index] = NULL;
> +
> +	mutex_unlock(&drmcgrp_mutex);
> +	return 0;
> +}
> +EXPORT_SYMBOL(drmcgrp_unregister_device);
> -- 
> 2.21.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
       [not found]   ` <20190626150522.11618-5-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26 16:05     ` Daniel Vetter
       [not found]       ` <20190626160553.GR12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 16:05 UTC (permalink / raw)
  To: Kenny Ho
  Cc: jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, alexander.deucher-5C7GfCeVMHo,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

On Wed, Jun 26, 2019 at 11:05:15AM -0400, Kenny Ho wrote:
> The drm resource being measured and limited here is the GEM buffer
> objects.  User applications allocate and free these buffers.  In
> addition, a process can allocate a buffer and share it with another
> process.  The consumer of a shared buffer can also outlive the
> allocator of the buffer.
> 
> For the purpose of cgroup accounting and limiting, ownership of the
> buffer is deemed to be the cgroup for which the allocating process
> belongs to.  There is one cgroup limit per drm device.
> 
> In order to prevent the buffer outliving the cgroup that owns it, a
> process is prevented from importing buffers that are not own by the
> process' cgroup or the ancestors of the process' cgroup.  In other
> words, in order for a buffer to be shared between two cgroups, the
> buffer must be created by the common ancestors of the cgroups.
> 
> drm.buffer.stats
>         A read-only flat-keyed file which exists on all cgroups.  Each
>         entry is keyed by the drm device's major:minor.
> 
>         Total GEM buffer allocation in bytes.
> 
> drm.buffer.default
>         A read-only flat-keyed file which exists on the root cgroup.
>         Each entry is keyed by the drm device's major:minor.
> 
>         Default limits on the total GEM buffer allocation in bytes.

Don't we need a "0 means no limit" semantics here?

> drm.buffer.max
>         A read-write flat-keyed file which exists on all cgroups.  Each
>         entry is keyed by the drm device's major:minor.
> 
>         Per device limits on the total GEM buffer allocation in byte.
>         This is a hard limit.  Attempts in allocating beyond the cgroup
>         limit will result in ENOMEM.  Shorthand understood by memparse
>         (such as k, m, g) can be used.
> 
>         Set allocation limit for /dev/dri/card1 to 1GB
>         echo "226:1 1g" > drm.buffer.total.max
> 
>         Set allocation limit for /dev/dri/card0 to 512MB
>         echo "226:0 512m" > drm.buffer.total.max

I think we need a new drm-cgroup.rst which contains all this
documentation.

With multiple GPUs, do we need an overall GEM bo limit, across all gpus?
For other stuff later on like vram/tt/... and all that it needs to be
per-device, but I think one overall limit could be useful.

> 
> Change-Id: I4c249d06d45ec709d6481d4cbe87c5168545c5d0
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
>  drivers/gpu/drm/drm_gem.c                  |   8 +
>  drivers/gpu/drm/drm_prime.c                |   9 +
>  include/drm/drm_cgroup.h                   |  34 ++-
>  include/drm/drm_gem.h                      |  11 +
>  include/linux/cgroup_drm.h                 |   2 +
>  kernel/cgroup/drm.c                        | 321 +++++++++++++++++++++
>  7 files changed, 387 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 93b2c5a48a71..b4c078b7ad63 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -34,6 +34,7 @@
>  #include <drm/drmP.h>
>  #include <drm/amdgpu_drm.h>
>  #include <drm/drm_cache.h>
> +#include <drm/drm_cgroup.h>
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
>  #include "amdgpu_amdkfd.h"
> @@ -446,6 +447,9 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>  	if (!amdgpu_bo_validate_size(adev, size, bp->domain))
>  		return -ENOMEM;
>  
> +	if (!drmcgrp_bo_can_allocate(current, adev->ddev, size))
> +		return -ENOMEM;

So what happens when you start a lot of threads all at the same time,
allocating gem bo? Also would be nice if we could roll out at least the
accounting part of this cgroup to all GEM drivers.

> +
>  	*bo_ptr = NULL;
>  
>  	acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 6a80db077dc6..e20c1034bf2b 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -37,10 +37,12 @@
>  #include <linux/shmem_fs.h>
>  #include <linux/dma-buf.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/cgroup_drm.h>
>  #include <drm/drmP.h>
>  #include <drm/drm_vma_manager.h>
>  #include <drm/drm_gem.h>
>  #include <drm/drm_print.h>
> +#include <drm/drm_cgroup.h>
>  #include "drm_internal.h"
>  
>  /** @file drm_gem.c
> @@ -154,6 +156,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
>  	obj->handle_count = 0;
>  	obj->size = size;
>  	drm_vma_node_reset(&obj->vma_node);
> +
> +	obj->drmcgrp = get_drmcgrp(current);
> +	drmcgrp_chg_bo_alloc(obj->drmcgrp, dev, size);
>  }
>  EXPORT_SYMBOL(drm_gem_private_object_init);
>  
> @@ -804,6 +809,9 @@ drm_gem_object_release(struct drm_gem_object *obj)
>  	if (obj->filp)
>  		fput(obj->filp);
>  
> +	drmcgrp_unchg_bo_alloc(obj->drmcgrp, obj->dev, obj->size);
> +	put_drmcgrp(obj->drmcgrp);
> +
>  	drm_gem_free_mmap_offset(obj);
>  }
>  EXPORT_SYMBOL(drm_gem_object_release);
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 231e3f6d5f41..eeb612116810 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -32,6 +32,7 @@
>  #include <drm/drm_prime.h>
>  #include <drm/drm_gem.h>
>  #include <drm/drmP.h>
> +#include <drm/drm_cgroup.h>
>  
>  #include "drm_internal.h"
>  
> @@ -794,6 +795,7 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>  {
>  	struct dma_buf *dma_buf;
>  	struct drm_gem_object *obj;
> +	struct drmcgrp *drmcgrp = drmcgrp_from(current);
>  	int ret;
>  
>  	dma_buf = dma_buf_get(prime_fd);
> @@ -818,6 +820,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>  		goto out_unlock;
>  	}
>  
> +	/* only allow bo from the same cgroup or its ancestor to be imported */
> +	if (drmcgrp != NULL &&

Quite a serious limitation here ...

> +			!drmcgrp_is_self_or_ancestor(drmcgrp, obj->drmcgrp)) {

Also what happens if you actually share across devices? Then importing in
the 2nd group is suddenly possible, and I think will be double-counted.

What's the underlying technical reason for not allowing sharing across
cgroups?

> +		ret = -EACCES;
> +		goto out_unlock;
> +	}
> +
>  	if (obj->dma_buf) {
>  		WARN_ON(obj->dma_buf != dma_buf);
>  	} else {
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> index ddb9eab64360..8711b7c5f7bf 100644
> --- a/include/drm/drm_cgroup.h
> +++ b/include/drm/drm_cgroup.h
> @@ -4,12 +4,20 @@
>  #ifndef __DRM_CGROUP_H__
>  #define __DRM_CGROUP_H__
>  
> +#include <linux/cgroup_drm.h>
> +
>  #ifdef CONFIG_CGROUP_DRM
>  
>  int drmcgrp_register_device(struct drm_device *device);
> -
>  int drmcgrp_unregister_device(struct drm_device *device);
> -
> +bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
> +		struct drmcgrp *relative);
> +void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
> +		size_t size);
> +void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
> +		size_t size);
> +bool drmcgrp_bo_can_allocate(struct task_struct *task, struct drm_device *dev,
> +		size_t size);
>  #else
>  static inline int drmcgrp_register_device(struct drm_device *device)
>  {
> @@ -20,5 +28,27 @@ static inline int drmcgrp_unregister_device(struct drm_device *device)
>  {
>  	return 0;
>  }
> +
> +static inline bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
> +		struct drmcgrp *relative)
> +{
> +	return false;
> +}
> +
> +static inline void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp,
> +		struct drm_device *dev,	size_t size)
> +{
> +}
> +
> +static inline void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp,
> +		struct drm_device *dev,	size_t size)
> +{
> +}
> +
> +static inline bool drmcgrp_bo_can_allocate(struct task_struct *task,
> +		struct drm_device *dev,	size_t size)
> +{
> +	return true;
> +}
>  #endif /* CONFIG_CGROUP_DRM */
>  #endif /* __DRM_CGROUP_H__ */
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index c95727425284..09d1c69a3f0c 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -272,6 +272,17 @@ struct drm_gem_object {
>  	 *
>  	 */
>  	const struct drm_gem_object_funcs *funcs;
> +
> +	/**
> +	 * @drmcgrp:
> +	 *
> +	 * DRM cgroup this GEM object belongs to.
> +	 *
> +	 * This is used to track and limit the amount of GEM objects a user
> +	 * can allocate.  Since GEM objects can be shared, this is also used
> +	 * to ensure GEM objects are only shared within the same cgroup.
> +	 */
> +	struct drmcgrp *drmcgrp;
>  };
>  
>  /**
> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> index 27497f786c93..efa019666f1c 100644
> --- a/include/linux/cgroup_drm.h
> +++ b/include/linux/cgroup_drm.h
> @@ -15,6 +15,8 @@
>  
>  struct drmcgrp_device_resource {
>  	/* for per device stats */
> +	s64			bo_stats_total_allocated;
> +	s64			bo_limits_total_allocated;
>  };
>  
>  struct drmcgrp {
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 7da6e0d93991..cfc1fe74dca3 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -9,6 +9,7 @@
>  #include <linux/cgroup_drm.h>
>  #include <linux/kernel.h>
>  #include <drm/drm_device.h>
> +#include <drm/drm_ioctl.h>
>  #include <drm/drm_cgroup.h>
>  
>  static DEFINE_MUTEX(drmcgrp_mutex);
> @@ -16,6 +17,26 @@ static DEFINE_MUTEX(drmcgrp_mutex);
>  struct drmcgrp_device {
>  	struct drm_device	*dev;
>  	struct mutex		mutex;
> +
> +	s64			bo_limits_total_allocated_default;
> +};
> +
> +#define DRMCG_CTF_PRIV_SIZE 3
> +#define DRMCG_CTF_PRIV_MASK GENMASK((DRMCG_CTF_PRIV_SIZE - 1), 0)
> +#define DRMCG_CTF_PRIV(res_type, f_type)  ((res_type) <<\
> +		DRMCG_CTF_PRIV_SIZE | (f_type))
> +#define DRMCG_CTF_PRIV2RESTYPE(priv) ((priv) >> DRMCG_CTF_PRIV_SIZE)
> +#define DRMCG_CTF_PRIV2FTYPE(priv) ((priv) & DRMCG_CTF_PRIV_MASK)
> +
> +
> +enum drmcgrp_res_type {
> +	DRMCGRP_TYPE_BO_TOTAL,
> +};
> +
> +enum drmcgrp_file_type {
> +	DRMCGRP_FTYPE_STATS,
> +	DRMCGRP_FTYPE_LIMIT,
> +	DRMCGRP_FTYPE_DEFAULT,
>  };
>  
>  /* indexed by drm_minor for access speed */
> @@ -54,6 +75,10 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
>  	}
>  
>  	/* set defaults here */
> +	if (known_drmcgrp_devs[minor] != NULL) {
> +		ddr->bo_limits_total_allocated =
> +		  known_drmcgrp_devs[minor]->bo_limits_total_allocated_default;
> +	}
>  
>  	return 0;
>  }
> @@ -100,7 +125,225 @@ drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
>  	return &drmcgrp->css;
>  }
>  
> +static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
> +		struct seq_file *sf, enum drmcgrp_res_type type)
> +{
> +	if (ddr == NULL) {
> +		seq_puts(sf, "\n");
> +		return;
> +	}
> +
> +	switch (type) {
> +	case DRMCGRP_TYPE_BO_TOTAL:
> +		seq_printf(sf, "%lld\n", ddr->bo_stats_total_allocated);
> +		break;
> +	default:
> +		seq_puts(sf, "\n");
> +		break;
> +	}
> +}
> +
> +static inline void drmcgrp_print_limits(struct drmcgrp_device_resource *ddr,
> +		struct seq_file *sf, enum drmcgrp_res_type type)
> +{
> +	if (ddr == NULL) {
> +		seq_puts(sf, "\n");
> +		return;
> +	}
> +
> +	switch (type) {
> +	case DRMCGRP_TYPE_BO_TOTAL:
> +		seq_printf(sf, "%lld\n", ddr->bo_limits_total_allocated);
> +		break;
> +	default:
> +		seq_puts(sf, "\n");
> +		break;
> +	}
> +}
> +
> +static inline void drmcgrp_print_default(struct drmcgrp_device *ddev,
> +		struct seq_file *sf, enum drmcgrp_res_type type)
> +{
> +	if (ddev == NULL) {
> +		seq_puts(sf, "\n");
> +		return;
> +	}
> +
> +	switch (type) {
> +	case DRMCGRP_TYPE_BO_TOTAL:
> +		seq_printf(sf, "%lld\n",
> +				ddev->bo_limits_total_allocated_default);
> +		break;
> +	default:
> +		seq_puts(sf, "\n");
> +		break;
> +	}
> +}
> +
> +int drmcgrp_bo_show(struct seq_file *sf, void *v)
> +{
> +	struct drmcgrp *drmcgrp = css_drmcgrp(seq_css(sf));
> +	struct drmcgrp_device_resource *ddr = NULL;
> +	enum drmcgrp_file_type f_type =
> +		DRMCG_CTF_PRIV2FTYPE(seq_cft(sf)->private);
> +	enum drmcgrp_res_type type =
> +		DRMCG_CTF_PRIV2RESTYPE(seq_cft(sf)->private);
> +	struct drmcgrp_device *ddev;
> +	int i;
> +
> +	for (i = 0; i <= max_minor; i++) {
> +		ddr = drmcgrp->dev_resources[i];
> +		ddev = known_drmcgrp_devs[i];
> +
> +		seq_printf(sf, "%d:%d ", DRM_MAJOR, i);
> +
> +		switch (f_type) {
> +		case DRMCGRP_FTYPE_STATS:
> +			drmcgrp_print_stats(ddr, sf, type);
> +			break;
> +		case DRMCGRP_FTYPE_LIMIT:
> +			drmcgrp_print_limits(ddr, sf, type);
> +			break;
> +		case DRMCGRP_FTYPE_DEFAULT:
> +			drmcgrp_print_default(ddev, sf, type);
> +			break;
> +		default:
> +			seq_puts(sf, "\n");
> +			break;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static inline void drmcgrp_pr_cft_err(const struct drmcgrp *drmcgrp,
> +		const char *cft_name, int minor)
> +{
> +	pr_err("drmcgrp: error parsing %s, minor %d ",
> +			cft_name, minor);
> +	pr_cont_cgroup_name(drmcgrp->css.cgroup);
> +	pr_cont("\n");
> +}
> +
> +static inline int drmcgrp_process_limit_val(char *sval, bool is_mem,
> +			s64 def_val, s64 max_val, s64 *ret_val)
> +{
> +	int rc = strcmp("max", sval);
> +
> +
> +	if (!rc)
> +		*ret_val = max_val;
> +	else {
> +		rc = strcmp("default", sval);
> +
> +		if (!rc)
> +			*ret_val = def_val;
> +	}
> +
> +	if (rc) {
> +		if (is_mem) {
> +			*ret_val = memparse(sval, NULL);
> +			rc = 0;
> +		} else {
> +			rc = kstrtoll(sval, 0, ret_val);
> +		}
> +	}
> +
> +	if (*ret_val > max_val)
> +		*ret_val = max_val;
> +
> +	return rc;
> +}
> +
> +ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file *of, char *buf,
> +		size_t nbytes, loff_t off)
> +{
> +	struct drmcgrp *drmcgrp = css_drmcgrp(of_css(of));
> +	struct drmcgrp *parent = parent_drmcgrp(drmcgrp);
> +	enum drmcgrp_res_type type =
> +		DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> +	char *cft_name = of_cft(of)->name;
> +	char *limits = strstrip(buf);
> +	struct drmcgrp_device *ddev;
> +	struct drmcgrp_device_resource *ddr;
> +	char *line;
> +	char sattr[256];
> +	s64 val;
> +	s64 p_max;
> +	int rc;
> +	int minor;
> +
> +	while (limits != NULL) {
> +		line =  strsep(&limits, "\n");
> +
> +		if (sscanf(line,
> +			__stringify(DRM_MAJOR)":%u %255[^\t\n]",
> +							&minor, sattr) != 2) {
> +			pr_err("drmcgrp: error parsing %s ", cft_name);
> +			pr_cont_cgroup_name(drmcgrp->css.cgroup);
> +			pr_cont("\n");
> +
> +			continue;
> +		}
> +
> +		if (minor < 0 || minor > max_minor) {
> +			pr_err("drmcgrp: invalid minor %d for %s ",
> +					minor, cft_name);
> +			pr_cont_cgroup_name(drmcgrp->css.cgroup);
> +			pr_cont("\n");
> +
> +			continue;
> +		}
> +
> +		ddr = drmcgrp->dev_resources[minor];
> +		ddev = known_drmcgrp_devs[minor];
> +		switch (type) {
> +		case DRMCGRP_TYPE_BO_TOTAL:
> +			p_max = parent == NULL ? S64_MAX :
> +				parent->dev_resources[minor]->
> +				bo_limits_total_allocated;
> +
> +			rc = drmcgrp_process_limit_val(sattr, true,
> +				ddev->bo_limits_total_allocated_default,
> +				p_max,
> +				&val);
> +
> +			if (rc || val < 0) {
> +				drmcgrp_pr_cft_err(drmcgrp, cft_name, minor);
> +				continue;
> +			}
> +
> +			ddr->bo_limits_total_allocated = val;
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +
> +	return nbytes;
> +}
> +
>  struct cftype files[] = {
> +	{
> +		.name = "buffer.total.stats",
> +		.seq_show = drmcgrp_bo_show,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_TOTAL,
> +						DRMCGRP_FTYPE_STATS),
> +	},
> +	{
> +		.name = "buffer.total.default",
> +		.seq_show = drmcgrp_bo_show,
> +		.flags = CFTYPE_ONLY_ON_ROOT,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_TOTAL,
> +						DRMCGRP_FTYPE_DEFAULT),
> +	},
> +	{
> +		.name = "buffer.total.max",
> +		.write = drmcgrp_bo_limit_write,
> +		.seq_show = drmcgrp_bo_show,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_TOTAL,
> +						DRMCGRP_FTYPE_LIMIT),
> +	},
>  	{ }	/* terminate */
>  };
>  
> @@ -121,6 +364,8 @@ int drmcgrp_register_device(struct drm_device *dev)
>  		return -ENOMEM;
>  
>  	ddev->dev = dev;
> +	ddev->bo_limits_total_allocated_default = S64_MAX;
> +
>  	mutex_init(&ddev->mutex);
>  
>  	mutex_lock(&drmcgrp_mutex);
> @@ -156,3 +401,79 @@ int drmcgrp_unregister_device(struct drm_device *dev)
>  	return 0;
>  }
>  EXPORT_SYMBOL(drmcgrp_unregister_device);
> +
> +bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self, struct drmcgrp *relative)
> +{
> +	for (; self != NULL; self = parent_drmcgrp(self))
> +		if (self == relative)
> +			return true;
> +
> +	return false;
> +}
> +EXPORT_SYMBOL(drmcgrp_is_self_or_ancestor);
> +
> +bool drmcgrp_bo_can_allocate(struct task_struct *task, struct drm_device *dev,
> +		size_t size)
> +{
> +	struct drmcgrp *drmcgrp = drmcgrp_from(task);
> +	struct drmcgrp_device_resource *ddr;
> +	struct drmcgrp_device_resource *d;
> +	int devIdx = dev->primary->index;
> +	bool result = true;
> +	s64 delta = 0;
> +
> +	if (drmcgrp == NULL || drmcgrp == root_drmcgrp)
> +		return true;
> +
> +	ddr = drmcgrp->dev_resources[devIdx];
> +	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
> +	for ( ; drmcgrp != root_drmcgrp; drmcgrp = parent_drmcgrp(drmcgrp)) {
> +		d = drmcgrp->dev_resources[devIdx];
> +		delta = d->bo_limits_total_allocated -
> +				d->bo_stats_total_allocated;
> +
> +		if (delta <= 0 || size > delta) {
> +			result = false;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
> +
> +	return result;
> +}
> +EXPORT_SYMBOL(drmcgrp_bo_can_allocate);
> +
> +void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
> +		size_t size)
> +{
> +	struct drmcgrp_device_resource *ddr;
> +	int devIdx = dev->primary->index;
> +
> +	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
> +		return;
> +
> +	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
> +	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
> +		ddr = drmcgrp->dev_resources[devIdx];
> +
> +		ddr->bo_stats_total_allocated += (s64)size;
> +	}
> +	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
> +}
> +EXPORT_SYMBOL(drmcgrp_chg_bo_alloc);
> +
> +void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
> +		size_t size)
> +{
> +	int devIdx = dev->primary->index;
> +
> +	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
> +		return;
> +
> +	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
> +	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp))
> +		drmcgrp->dev_resources[devIdx]->bo_stats_total_allocated
> +			-= (s64)size;
> +	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
> +}
> +EXPORT_SYMBOL(drmcgrp_unchg_bo_alloc);
> -- 
> 2.21.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
  2019-06-26 15:05 ` [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats Kenny Ho
@ 2019-06-26 16:12   ` Daniel Vetter
       [not found]     ` <20190626161254.GS12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 16:12 UTC (permalink / raw)
  To: Kenny Ho
  Cc: jsparks, amd-gfx, lkaplan, alexander.deucher, y2kenny, dri-devel,
	joseph.greathouse, tj, cgroups, christian.koenig

On Wed, Jun 26, 2019 at 11:05:18AM -0400, Kenny Ho wrote:
> The drm resource being measured is the TTM (Translation Table Manager)
> buffers.  TTM manages different types of memory that a GPU might access.
> These memory types include dedicated Video RAM (VRAM) and host/system
> memory accessible through IOMMU (GART/GTT).  TTM is currently used by
> multiple drm drivers (amd, ast, bochs, cirrus, hisilicon, maga200,
> nouveau, qxl, virtio, vmwgfx.)
> 
> drm.memory.stats
>         A read-only nested-keyed file which exists on all cgroups.
>         Each entry is keyed by the drm device's major:minor.  The
>         following nested keys are defined.
> 
>           ======         =============================================
>           system         Host/system memory

Shouldn't that be covered by gem bo stats already? Also, system memory is
definitely something a lot of non-ttm drivers want to be able to track, so
that needs to be separate from ttm.

>           tt             Host memory used by the drm device (GTT/GART)
>           vram           Video RAM used by the drm device
>           priv           Other drm device, vendor specific memory

So what's "priv". In general I think we need some way to register the
different kinds of memory, e.g. stuff not in your list:

- multiple kinds of vram (like numa-style gpus)
- cma (for all those non-ttm drivers that's a big one, it's like system
  memory but also totally different)
- any carveouts and stuff
>           ======         =============================================
> 
>         Reading returns the following::
> 
>         226:0 system=0 tt=0 vram=0 priv=0
>         226:1 system=0 tt=9035776 vram=17768448 priv=16809984
>         226:2 system=0 tt=9035776 vram=17768448 priv=16809984
> 
> drm.memory.evict.stats
>         A read-only flat-keyed file which exists on all cgroups.  Each
>         entry is keyed by the drm device's major:minor.
> 
>         Total number of evictions.
> 
> Change-Id: Ice2c4cc845051229549bebeb6aa2d7d6153bdf6a
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>

I think with all the ttm refactoring going on I think we need to de-ttm
the interface functions here a bit. With Gerd Hoffmans series you can just
use a gem_bo pointer here, so what's left to do is have some extracted
structure for tracking memory types. I think Brian Welty has some ideas
for this, even in patch form. Would be good to keep him on cc at least for
the next version. We'd need to explicitly hand in the ttm_mem_reg (or
whatever the specific thing is going to be).

-Daniel
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |   3 +-
>  drivers/gpu/drm/ttm/ttm_bo.c            |  30 +++++++
>  drivers/gpu/drm/ttm/ttm_bo_util.c       |   4 +
>  include/drm/drm_cgroup.h                |  19 ++++
>  include/drm/ttm/ttm_bo_api.h            |   2 +
>  include/drm/ttm/ttm_bo_driver.h         |   8 ++
>  include/linux/cgroup_drm.h              |   4 +
>  kernel/cgroup/drm.c                     | 113 ++++++++++++++++++++++++
>  8 files changed, 182 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index e9ecc3953673..a8dfc78ed45f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1678,8 +1678,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>  	mutex_init(&adev->mman.gtt_window_lock);
>  
>  	/* No others user of address space so set it to 0 */
> -	r = ttm_bo_device_init(&adev->mman.bdev,
> +	r = ttm_bo_device_init_tmp(&adev->mman.bdev,
>  			       &amdgpu_bo_driver,
> +			       adev->ddev,
>  			       adev->ddev->anon_inode->i_mapping,
>  			       adev->need_dma32);
>  	if (r) {
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 2845fceb2fbd..e9f70547f0ad 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -34,6 +34,7 @@
>  #include <drm/ttm/ttm_module.h>
>  #include <drm/ttm/ttm_bo_driver.h>
>  #include <drm/ttm/ttm_placement.h>
> +#include <drm/drm_cgroup.h>
>  #include <linux/jiffies.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
> @@ -42,6 +43,7 @@
>  #include <linux/module.h>
>  #include <linux/atomic.h>
>  #include <linux/reservation.h>
> +#include <linux/cgroup_drm.h>
>  
>  static void ttm_bo_global_kobj_release(struct kobject *kobj);
>  
> @@ -151,6 +153,10 @@ static void ttm_bo_release_list(struct kref *list_kref)
>  	struct ttm_bo_device *bdev = bo->bdev;
>  	size_t acc_size = bo->acc_size;
>  
> +	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
> +		drmcgrp_unchg_mem(bo);
> +	put_drmcgrp(bo->drmcgrp);
> +
>  	BUG_ON(kref_read(&bo->list_kref));
>  	BUG_ON(kref_read(&bo->kref));
>  	BUG_ON(atomic_read(&bo->cpu_writers));
> @@ -353,6 +359,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  		if (bo->mem.mem_type == TTM_PL_SYSTEM) {
>  			if (bdev->driver->move_notify)
>  				bdev->driver->move_notify(bo, evict, mem);
> +			if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
> +				drmcgrp_mem_track_move(bo, evict, mem);
>  			bo->mem = *mem;
>  			mem->mm_node = NULL;
>  			goto moved;
> @@ -361,6 +369,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  
>  	if (bdev->driver->move_notify)
>  		bdev->driver->move_notify(bo, evict, mem);
> +	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
> +		drmcgrp_mem_track_move(bo, evict, mem);
>  
>  	if (!(old_man->flags & TTM_MEMTYPE_FLAG_FIXED) &&
>  	    !(new_man->flags & TTM_MEMTYPE_FLAG_FIXED))
> @@ -374,6 +384,8 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  		if (bdev->driver->move_notify) {
>  			swap(*mem, bo->mem);
>  			bdev->driver->move_notify(bo, false, mem);
> +			if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
> +				drmcgrp_mem_track_move(bo, evict, mem);
>  			swap(*mem, bo->mem);
>  		}
>  
> @@ -1275,6 +1287,10 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
>  		WARN_ON(!locked);
>  	}
>  
> +	bo->drmcgrp = get_drmcgrp(current);
> +	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
> +		drmcgrp_chg_mem(bo);
> +
>  	if (likely(!ret))
>  		ret = ttm_bo_validate(bo, placement, ctx);
>  
> @@ -1666,6 +1682,20 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
>  }
>  EXPORT_SYMBOL(ttm_bo_device_init);
>  
> +/* TODO merge with official function when implementation finalized*/
> +int ttm_bo_device_init_tmp(struct ttm_bo_device *bdev,
> +		struct ttm_bo_driver *driver,
> +		struct drm_device *ddev,
> +		struct address_space *mapping,
> +		bool need_dma32)
> +{
> +	int ret = ttm_bo_device_init(bdev, driver, mapping, need_dma32);
> +
> +	bdev->ddev = ddev;
> +	return ret;
> +}
> +EXPORT_SYMBOL(ttm_bo_device_init_tmp);
> +
>  /*
>   * buffer object vm functions.
>   */
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index 895d77d799e4..4ed7847c21f4 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -32,6 +32,7 @@
>  #include <drm/ttm/ttm_bo_driver.h>
>  #include <drm/ttm/ttm_placement.h>
>  #include <drm/drm_vma_manager.h>
> +#include <drm/drm_cgroup.h>
>  #include <linux/io.h>
>  #include <linux/highmem.h>
>  #include <linux/wait.h>
> @@ -522,6 +523,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
>  	ret = reservation_object_trylock(fbo->base.resv);
>  	WARN_ON(!ret);
>  
> +	if (bo->bdev->ddev != NULL) // TODO: remove after ddev initiazlied for all
> +		drmcgrp_chg_mem(bo);
> +
>  	*new_obj = &fbo->base;
>  	return 0;
>  }
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> index 8711b7c5f7bf..48ab5450cf17 100644
> --- a/include/drm/drm_cgroup.h
> +++ b/include/drm/drm_cgroup.h
> @@ -5,6 +5,7 @@
>  #define __DRM_CGROUP_H__
>  
>  #include <linux/cgroup_drm.h>
> +#include <drm/ttm/ttm_bo_api.h>
>  
>  #ifdef CONFIG_CGROUP_DRM
>  
> @@ -18,6 +19,11 @@ void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
>  		size_t size);
>  bool drmcgrp_bo_can_allocate(struct task_struct *task, struct drm_device *dev,
>  		size_t size);
> +void drmcgrp_chg_mem(struct ttm_buffer_object *tbo);
> +void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo);
> +void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
> +		struct ttm_mem_reg *new_mem);
> +
>  #else
>  static inline int drmcgrp_register_device(struct drm_device *device)
>  {
> @@ -50,5 +56,18 @@ static inline bool drmcgrp_bo_can_allocate(struct task_struct *task,
>  {
>  	return true;
>  }
> +
> +static inline void drmcgrp_chg_mem(struct ttm_buffer_object *tbo)
> +{
> +}
> +
> +static inline void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo)
> +{
> +}
> +
> +static inline void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo,
> +		bool evict, struct ttm_mem_reg *new_mem)
> +{
> +}
>  #endif /* CONFIG_CGROUP_DRM */
>  #endif /* __DRM_CGROUP_H__ */
> diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
> index 49d9cdfc58f2..ae1bb6daec81 100644
> --- a/include/drm/ttm/ttm_bo_api.h
> +++ b/include/drm/ttm/ttm_bo_api.h
> @@ -128,6 +128,7 @@ struct ttm_tt;
>   * struct ttm_buffer_object
>   *
>   * @bdev: Pointer to the buffer object device structure.
> + * @drmcgrp: DRM cgroup this object belongs to.
>   * @type: The bo type.
>   * @destroy: Destruction function. If NULL, kfree is used.
>   * @num_pages: Actual number of pages.
> @@ -174,6 +175,7 @@ struct ttm_buffer_object {
>  	 */
>  
>  	struct ttm_bo_device *bdev;
> +	struct drmcgrp *drmcgrp;
>  	enum ttm_bo_type type;
>  	void (*destroy) (struct ttm_buffer_object *);
>  	unsigned long num_pages;
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index c008346c2401..4cbcb41e5aa9 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -30,6 +30,7 @@
>  #ifndef _TTM_BO_DRIVER_H_
>  #define _TTM_BO_DRIVER_H_
>  
> +#include <drm/drm_device.h>
>  #include <drm/drm_mm.h>
>  #include <drm/drm_vma_manager.h>
>  #include <linux/workqueue.h>
> @@ -442,6 +443,7 @@ extern struct ttm_bo_global {
>   * @driver: Pointer to a struct ttm_bo_driver struct setup by the driver.
>   * @man: An array of mem_type_managers.
>   * @vma_manager: Address space manager
> + * @ddev: Pointer to struct drm_device that this ttm_bo_device belongs to
>   * lru_lock: Spinlock that protects the buffer+device lru lists and
>   * ddestroy lists.
>   * @dev_mapping: A pointer to the struct address_space representing the
> @@ -460,6 +462,7 @@ struct ttm_bo_device {
>  	struct ttm_bo_global *glob;
>  	struct ttm_bo_driver *driver;
>  	struct ttm_mem_type_manager man[TTM_NUM_MEM_TYPES];
> +	struct drm_device *ddev;
>  
>  	/*
>  	 * Protected by internal locks.
> @@ -598,6 +601,11 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
>  		       struct address_space *mapping,
>  		       bool need_dma32);
>  
> +int ttm_bo_device_init_tmp(struct ttm_bo_device *bdev,
> +		       struct ttm_bo_driver *driver,
> +		       struct drm_device *ddev,
> +		       struct address_space *mapping,
> +		       bool need_dma32);
>  /**
>   * ttm_bo_unmap_virtual
>   *
> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> index e4400b21ab8e..141bea06f74c 100644
> --- a/include/linux/cgroup_drm.h
> +++ b/include/linux/cgroup_drm.h
> @@ -9,6 +9,7 @@
>  #include <linux/mutex.h>
>  #include <linux/cgroup.h>
>  #include <drm/drm_file.h>
> +#include <drm/ttm/ttm_placement.h>
>  
>  /* limit defined per the way drm_minor_alloc operates */
>  #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> @@ -22,6 +23,9 @@ struct drmcgrp_device_resource {
>  	size_t			bo_limits_peak_allocated;
>  
>  	s64			bo_stats_count_allocated;
> +
> +	s64			mem_stats[TTM_PL_PRIV+1];
> +	s64			mem_stats_evict;
>  };
>  
>  struct drmcgrp {
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 9144f93b851f..5aee42a628c1 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -8,6 +8,8 @@
>  #include <linux/mutex.h>
>  #include <linux/cgroup_drm.h>
>  #include <linux/kernel.h>
> +#include <drm/ttm/ttm_bo_api.h>
> +#include <drm/ttm/ttm_bo_driver.h>
>  #include <drm/drm_device.h>
>  #include <drm/drm_ioctl.h>
>  #include <drm/drm_cgroup.h>
> @@ -34,6 +36,8 @@ enum drmcgrp_res_type {
>  	DRMCGRP_TYPE_BO_TOTAL,
>  	DRMCGRP_TYPE_BO_PEAK,
>  	DRMCGRP_TYPE_BO_COUNT,
> +	DRMCGRP_TYPE_MEM,
> +	DRMCGRP_TYPE_MEM_EVICT,
>  };
>  
>  enum drmcgrp_file_type {
> @@ -42,6 +46,13 @@ enum drmcgrp_file_type {
>  	DRMCGRP_FTYPE_DEFAULT,
>  };
>  
> +static char const *ttm_placement_names[] = {
> +	[TTM_PL_SYSTEM] = "system",
> +	[TTM_PL_TT]     = "tt",
> +	[TTM_PL_VRAM]   = "vram",
> +	[TTM_PL_PRIV]   = "priv",
> +};
> +
>  /* indexed by drm_minor for access speed */
>  static struct drmcgrp_device	*known_drmcgrp_devs[MAX_DRM_DEV];
>  
> @@ -134,6 +145,7 @@ drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
>  static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
>  		struct seq_file *sf, enum drmcgrp_res_type type)
>  {
> +	int i;
>  	if (ddr == NULL) {
>  		seq_puts(sf, "\n");
>  		return;
> @@ -149,6 +161,16 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
>  	case DRMCGRP_TYPE_BO_COUNT:
>  		seq_printf(sf, "%lld\n", ddr->bo_stats_count_allocated);
>  		break;
> +	case DRMCGRP_TYPE_MEM:
> +		for (i = 0; i <= TTM_PL_PRIV; i++) {
> +			seq_printf(sf, "%s=%lld ", ttm_placement_names[i],
> +					ddr->mem_stats[i]);
> +		}
> +		seq_puts(sf, "\n");
> +		break;
> +	case DRMCGRP_TYPE_MEM_EVICT:
> +		seq_printf(sf, "%lld\n", ddr->mem_stats_evict);
> +		break;
>  	default:
>  		seq_puts(sf, "\n");
>  		break;
> @@ -406,6 +428,18 @@ struct cftype files[] = {
>  		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BO_COUNT,
>  						DRMCGRP_FTYPE_STATS),
>  	},
> +	{
> +		.name = "memory.stats",
> +		.seq_show = drmcgrp_bo_show,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM,
> +						DRMCGRP_FTYPE_STATS),
> +	},
> +	{
> +		.name = "memory.evict.stats",
> +		.seq_show = drmcgrp_bo_show,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_EVICT,
> +						DRMCGRP_FTYPE_STATS),
> +	},
>  	{ }	/* terminate */
>  };
>  
> @@ -555,3 +589,82 @@ void drmcgrp_unchg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
>  	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
>  }
>  EXPORT_SYMBOL(drmcgrp_unchg_bo_alloc);
> +
> +void drmcgrp_chg_mem(struct ttm_buffer_object *tbo)
> +{
> +	struct drm_device *dev = tbo->bdev->ddev;
> +	struct drmcgrp *drmcgrp = tbo->drmcgrp;
> +	int devIdx = dev->primary->index;
> +	s64 size = (s64)(tbo->mem.size);
> +	int mem_type = tbo->mem.mem_type;
> +	struct drmcgrp_device_resource *ddr;
> +
> +	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
> +		return;
> +
> +	mem_type = mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : mem_type;
> +
> +	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
> +	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
> +		ddr = drmcgrp->dev_resources[devIdx];
> +		ddr->mem_stats[mem_type] += size;
> +	}
> +	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
> +}
> +EXPORT_SYMBOL(drmcgrp_chg_mem);
> +
> +void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo)
> +{
> +	struct drm_device *dev = tbo->bdev->ddev;
> +	struct drmcgrp *drmcgrp = tbo->drmcgrp;
> +	int devIdx = dev->primary->index;
> +	s64 size = (s64)(tbo->mem.size);
> +	int mem_type = tbo->mem.mem_type;
> +	struct drmcgrp_device_resource *ddr;
> +
> +	if (drmcgrp == NULL || known_drmcgrp_devs[devIdx] == NULL)
> +		return;
> +
> +	mem_type = mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : mem_type;
> +
> +	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
> +	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
> +		ddr = drmcgrp->dev_resources[devIdx];
> +		ddr->mem_stats[mem_type] -= size;
> +	}
> +	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
> +}
> +EXPORT_SYMBOL(drmcgrp_unchg_mem);
> +
> +void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
> +		struct ttm_mem_reg *new_mem)
> +{
> +	struct drm_device *dev = old_bo->bdev->ddev;
> +	struct drmcgrp *drmcgrp = old_bo->drmcgrp;
> +	s64 move_in_bytes = (s64)(old_bo->mem.size);
> +	int devIdx = dev->primary->index;
> +	int old_mem_type = old_bo->mem.mem_type;
> +	int new_mem_type = new_mem->mem_type;
> +	struct drmcgrp_device_resource *ddr;
> +	struct drmcgrp_device *known_dev;
> +
> +	known_dev = known_drmcgrp_devs[devIdx];
> +
> +	if (drmcgrp == NULL || known_dev == NULL)
> +		return;
> +
> +	old_mem_type = old_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : old_mem_type;
> +	new_mem_type = new_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : new_mem_type;
> +
> +	mutex_lock(&known_dev->mutex);
> +	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
> +		ddr = drmcgrp->dev_resources[devIdx];
> +		ddr->mem_stats[old_mem_type] -= move_in_bytes;
> +		ddr->mem_stats[new_mem_type] += move_in_bytes;
> +
> +		if (evict)
> +			ddr->mem_stats_evict++;
> +	}
> +	mutex_unlock(&known_dev->mutex);
> +}
> +EXPORT_SYMBOL(drmcgrp_mem_track_move);
> -- 
> 2.21.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 08/11] drm, cgroup: Add TTM buffer peak usage stats
  2019-06-26 15:05 ` [RFC PATCH v3 08/11] drm, cgroup: Add TTM buffer peak usage stats Kenny Ho
@ 2019-06-26 16:16   ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 16:16 UTC (permalink / raw)
  To: Kenny Ho
  Cc: jsparks, amd-gfx, lkaplan, alexander.deucher, y2kenny, dri-devel,
	joseph.greathouse, tj, cgroups, christian.koenig

On Wed, Jun 26, 2019 at 11:05:19AM -0400, Kenny Ho wrote:
> drm.memory.peak.stats
>         A read-only nested-keyed file which exists on all cgroups.
>         Each entry is keyed by the drm device's major:minor.  The
>         following nested keys are defined.
> 
>           ======         ==============================================
>           system         Peak host memory used
>           tt             Peak host memory used by the device (GTT/GART)
>           vram           Peak Video RAM used by the drm device
>           priv           Other drm device specific memory peak usage
>           ======         ==============================================
> 
>         Reading returns the following::
> 
>         226:0 system=0 tt=0 vram=0 priv=0
>         226:1 system=0 tt=9035776 vram=17768448 priv=16809984
>         226:2 system=0 tt=9035776 vram=17768448 priv=16809984
> 
> Change-Id: I986e44533848f66411465bdd52105e78105a709a
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>

Same concerns as with the previous patch, a bit too much ttm in here.
Otherwise looks like useful information, and wont need driver changes
anywhere.
-Daniel

> ---
>  include/linux/cgroup_drm.h |  1 +
>  kernel/cgroup/drm.c        | 20 ++++++++++++++++++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> index 141bea06f74c..922529641df5 100644
> --- a/include/linux/cgroup_drm.h
> +++ b/include/linux/cgroup_drm.h
> @@ -25,6 +25,7 @@ struct drmcgrp_device_resource {
>  	s64			bo_stats_count_allocated;
>  
>  	s64			mem_stats[TTM_PL_PRIV+1];
> +	s64			mem_peaks[TTM_PL_PRIV+1];
>  	s64			mem_stats_evict;
>  };
>  
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 5aee42a628c1..5f5fa6a2b068 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -38,6 +38,7 @@ enum drmcgrp_res_type {
>  	DRMCGRP_TYPE_BO_COUNT,
>  	DRMCGRP_TYPE_MEM,
>  	DRMCGRP_TYPE_MEM_EVICT,
> +	DRMCGRP_TYPE_MEM_PEAK,
>  };
>  
>  enum drmcgrp_file_type {
> @@ -171,6 +172,13 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
>  	case DRMCGRP_TYPE_MEM_EVICT:
>  		seq_printf(sf, "%lld\n", ddr->mem_stats_evict);
>  		break;
> +	case DRMCGRP_TYPE_MEM_PEAK:
> +		for (i = 0; i <= TTM_PL_PRIV; i++) {
> +			seq_printf(sf, "%s=%lld ", ttm_placement_names[i],
> +					ddr->mem_peaks[i]);
> +		}
> +		seq_puts(sf, "\n");
> +		break;
>  	default:
>  		seq_puts(sf, "\n");
>  		break;
> @@ -440,6 +448,12 @@ struct cftype files[] = {
>  		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_EVICT,
>  						DRMCGRP_FTYPE_STATS),
>  	},
> +	{
> +		.name = "memory.peaks.stats",
> +		.seq_show = drmcgrp_bo_show,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_PEAK,
> +						DRMCGRP_FTYPE_STATS),
> +	},
>  	{ }	/* terminate */
>  };
>  
> @@ -608,6 +622,8 @@ void drmcgrp_chg_mem(struct ttm_buffer_object *tbo)
>  	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
>  		ddr = drmcgrp->dev_resources[devIdx];
>  		ddr->mem_stats[mem_type] += size;
> +		ddr->mem_peaks[mem_type] = max(ddr->mem_peaks[mem_type],
> +				ddr->mem_stats[mem_type]);
>  	}
>  	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
>  }
> @@ -662,6 +678,10 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>  		ddr->mem_stats[old_mem_type] -= move_in_bytes;
>  		ddr->mem_stats[new_mem_type] += move_in_bytes;
>  
> +		ddr->mem_peaks[new_mem_type] = max(
> +				ddr->mem_peaks[new_mem_type],
> +				ddr->mem_stats[new_mem_type]);
> +
>  		if (evict)
>  			ddr->mem_stats_evict++;
>  	}
> -- 
> 2.21.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control
       [not found]     ` <20190626150522.11618-10-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26 16:25       ` Daniel Vetter
       [not found]         ` <20190626162554.GU12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 16:25 UTC (permalink / raw)
  To: Kenny Ho
  Cc: jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, alexander.deucher-5C7GfCeVMHo,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

On Wed, Jun 26, 2019 at 11:05:20AM -0400, Kenny Ho wrote:
> The bandwidth is measured by keeping track of the amount of bytes moved
> by ttm within a time period.  We defined two type of bandwidth: burst
> and average.  Average bandwidth is calculated by dividing the total
> amount of bytes moved within a cgroup by the lifetime of the cgroup.
> Burst bandwidth is similar except that the byte and time measurement is
> reset after a user configurable period.
> 
> The bandwidth control is best effort since it is done on a per move
> basis instead of per byte.  The bandwidth is limited by delaying the
> move of a buffer.  The bandwidth limit can be exceeded when the next
> move is larger than the remaining allowance.
> 
> drm.burst_bw_period_in_us
>         A read-write flat-keyed file which exists on the root cgroup.
>         Each entry is keyed by the drm device's major:minor.
> 
>         Length of a period use to measure burst bandwidth in us.
>         One period per device.
> 
> drm.burst_bw_period_in_us.default
>         A read-only flat-keyed file which exists on the root cgroup.
>         Each entry is keyed by the drm device's major:minor.
> 
>         Default length of a period in us (one per device.)
> 
> drm.bandwidth.stats
>         A read-only nested-keyed file which exists on all cgroups.
>         Each entry is keyed by the drm device's major:minor.  The
>         following nested keys are defined.
> 
>           =================     ======================================
>           burst_byte_per_us     Burst bandwidth
>           avg_bytes_per_us      Average bandwidth
>           moved_byte            Amount of byte moved within a period
>           accum_us              Amount of time accumulated in a period
>           total_moved_byte      Byte moved within the cgroup lifetime
>           total_accum_us        Cgroup lifetime in us
>           byte_credit           Available byte credit to limit avg bw
>           =================     ======================================
> 
>         Reading returns the following::
>         226:1 burst_byte_per_us=23 avg_bytes_per_us=0 moved_byte=2244608
>         accum_us=95575 total_moved_byte=45899776 total_accum_us=201634590
>         byte_credit=13214278590464
>         226:2 burst_byte_per_us=10 avg_bytes_per_us=219 moved_byte=430080
>         accum_us=39350 total_moved_byte=65518026752 total_accum_us=298337721
>         byte_credit=9223372036854644735
> 
> drm.bandwidth.high
>         A read-write nested-keyed file which exists on all cgroups.
>         Each entry is keyed by the drm device's major:minor.  The
>         following nested keys are defined.
> 
>           ================  =======================================
>           bytes_in_period   Burst limit per period in byte
>           avg_bytes_per_us  Average bandwidth limit in bytes per us
>           ================  =======================================
> 
>         Reading returns the following::
> 
>         226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
>         226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
> 
> drm.bandwidth.default
>         A read-only nested-keyed file which exists on the root cgroup.
>         Each entry is keyed by the drm device's major:minor.  The
>         following nested keys are defined.
> 
>           ================  ========================================
>           bytes_in_period   Default burst limit per period in byte
>           avg_bytes_per_us  Default average bw limit in bytes per us
>           ================  ========================================
> 
>         Reading returns the following::
> 
>         226:1 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
>         226:2 bytes_in_period=9223372036854775807 avg_bytes_per_us=65536
> 
> Change-Id: Ie573491325ccc16535bb943e7857f43bd0962add
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>

So I'm not too sure exposing this is a great idea, at least depending upon
what you're trying to do with it. There's a few concerns here:

- I think bo movement stats might be useful, but they're not telling you
  everything. Applications can also copy data themselves and put buffers
  where they want them, especially with more explicit apis like vk.

- which kind of moves are we talking about here? Eviction related bo moves
  seem not counted here, and if you have lots of gpus with funny
  interconnects you might also get other kinds of moves, not just system
  ram <-> vram.

- What happens if we slow down, but someone else needs to evict our
  buffers/move them (ttm is atm not great at this, but Christian König is
  working on patches). I think there's lots of priority inversion
  potential here.

- If the goal is to avoid thrashing the interconnects, then this isn't the
  full picture by far - apps can use copy engines and explicit placement,
  again that's how vulkan at least is supposed to work.

I guess these all boil down to: What do you want to achieve here? The
commit message doesn't explain the intended use-case of this.
-Daniel

> ---
>  drivers/gpu/drm/ttm/ttm_bo.c |   7 +
>  include/drm/drm_cgroup.h     |  13 ++
>  include/linux/cgroup_drm.h   |  14 ++
>  kernel/cgroup/drm.c          | 309 ++++++++++++++++++++++++++++++++++-
>  4 files changed, 340 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index e9f70547f0ad..f06c2b9d8a4a 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -36,6 +36,7 @@
>  #include <drm/ttm/ttm_placement.h>
>  #include <drm/drm_cgroup.h>
>  #include <linux/jiffies.h>
> +#include <linux/delay.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
>  #include <linux/mm.h>
> @@ -1176,6 +1177,12 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
>  	 * Check whether we need to move buffer.
>  	 */
>  	if (!ttm_bo_mem_compat(placement, &bo->mem, &new_flags)) {
> +		unsigned int move_delay = drmcgrp_get_mem_bw_period_in_us(bo);
> +		move_delay /= 2000; /* check every half period in ms*/
> +		while (bo->bdev->ddev != NULL && !drmcgrp_mem_can_move(bo)) {
> +			msleep(move_delay);
> +		}
> +
>  		ret = ttm_bo_move_buffer(bo, placement, ctx);
>  		if (ret)
>  			return ret;
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> index 48ab5450cf17..9b1dbd6a4eca 100644
> --- a/include/drm/drm_cgroup.h
> +++ b/include/drm/drm_cgroup.h
> @@ -23,6 +23,8 @@ void drmcgrp_chg_mem(struct ttm_buffer_object *tbo);
>  void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo);
>  void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>  		struct ttm_mem_reg *new_mem);
> +unsigned int drmcgrp_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo);
> +bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo);
>  
>  #else
>  static inline int drmcgrp_register_device(struct drm_device *device)
> @@ -69,5 +71,16 @@ static inline void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo,
>  		bool evict, struct ttm_mem_reg *new_mem)
>  {
>  }
> +
> +static inline unsigned int drmcgrp_get_mem_bw_period_in_us(
> +		struct ttm_buffer_object *tbo)
> +{
> +	return 0;
> +}
> +
> +static inline bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo)
> +{
> +	return true;
> +}
>  #endif /* CONFIG_CGROUP_DRM */
>  #endif /* __DRM_CGROUP_H__ */
> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> index 922529641df5..94828da2104a 100644
> --- a/include/linux/cgroup_drm.h
> +++ b/include/linux/cgroup_drm.h
> @@ -14,6 +14,15 @@
>  /* limit defined per the way drm_minor_alloc operates */
>  #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
>  
> +enum drmcgrp_mem_bw_attr {
> +    DRMCGRP_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
> +    DRMCGRP_MEM_BW_ATTR_ACCUM_US,  /* for calulating 'instantaneous' bw */
> +    DRMCGRP_MEM_BW_ATTR_TOTAL_BYTE_MOVED,
> +    DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US,
> +    DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT,
> +    __DRMCGRP_MEM_BW_ATTR_LAST,
> +};
> +
>  struct drmcgrp_device_resource {
>  	/* for per device stats */
>  	s64			bo_stats_total_allocated;
> @@ -27,6 +36,11 @@ struct drmcgrp_device_resource {
>  	s64			mem_stats[TTM_PL_PRIV+1];
>  	s64			mem_peaks[TTM_PL_PRIV+1];
>  	s64			mem_stats_evict;
> +
> +	s64			mem_bw_stats_last_update_us;
> +	s64			mem_bw_stats[__DRMCGRP_MEM_BW_ATTR_LAST];
> +	s64			mem_bw_limits_bytes_in_period;
> +	s64			mem_bw_limits_avg_bytes_per_us;
>  };
>  
>  struct drmcgrp {
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 5f5fa6a2b068..bbc6612200a4 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -7,6 +7,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/mutex.h>
>  #include <linux/cgroup_drm.h>
> +#include <linux/ktime.h>
>  #include <linux/kernel.h>
>  #include <drm/ttm/ttm_bo_api.h>
>  #include <drm/ttm/ttm_bo_driver.h>
> @@ -22,6 +23,12 @@ struct drmcgrp_device {
>  
>  	s64			bo_limits_total_allocated_default;
>  	size_t			bo_limits_peak_allocated_default;
> +
> +	s64			mem_bw_limits_period_in_us;
> +	s64			mem_bw_limits_period_in_us_default;
> +
> +	s64			mem_bw_bytes_in_period_default;
> +	s64			mem_bw_avg_bytes_per_us_default;
>  };
>  
>  #define DRMCG_CTF_PRIV_SIZE 3
> @@ -39,6 +46,8 @@ enum drmcgrp_res_type {
>  	DRMCGRP_TYPE_MEM,
>  	DRMCGRP_TYPE_MEM_EVICT,
>  	DRMCGRP_TYPE_MEM_PEAK,
> +	DRMCGRP_TYPE_BANDWIDTH,
> +	DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST,
>  };
>  
>  enum drmcgrp_file_type {
> @@ -54,6 +63,17 @@ static char const *ttm_placement_names[] = {
>  	[TTM_PL_PRIV]   = "priv",
>  };
>  
> +static char const *mem_bw_attr_names[] = {
> +	[DRMCGRP_MEM_BW_ATTR_BYTE_MOVED] = "moved_byte",
> +	[DRMCGRP_MEM_BW_ATTR_ACCUM_US] = "accum_us",
> +	[DRMCGRP_MEM_BW_ATTR_TOTAL_BYTE_MOVED] = "total_moved_byte",
> +	[DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US] = "total_accum_us",
> +	[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT] = "byte_credit",
> +};
> +
> +#define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
> +#define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
> +
>  /* indexed by drm_minor for access speed */
>  static struct drmcgrp_device	*known_drmcgrp_devs[MAX_DRM_DEV];
>  
> @@ -86,6 +106,9 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
>  		if (!ddr)
>  			return -ENOMEM;
>  
> +		ddr->mem_bw_stats_last_update_us = ktime_to_us(ktime_get());
> +		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] = 1;
> +
>  		drmcgrp->dev_resources[minor] = ddr;
>  	}
>  
> @@ -96,6 +119,12 @@ static inline int init_drmcgrp_single(struct drmcgrp *drmcgrp, int minor)
>  
>  		ddr->bo_limits_peak_allocated =
>  		  known_drmcgrp_devs[minor]->bo_limits_peak_allocated_default;
> +
> +		ddr->mem_bw_limits_bytes_in_period =
> +		  known_drmcgrp_devs[minor]->mem_bw_bytes_in_period_default;
> +
> +		ddr->mem_bw_limits_avg_bytes_per_us =
> +		  known_drmcgrp_devs[minor]->mem_bw_avg_bytes_per_us_default;
>  	}
>  
>  	return 0;
> @@ -143,6 +172,26 @@ drmcgrp_css_alloc(struct cgroup_subsys_state *parent_css)
>  	return &drmcgrp->css;
>  }
>  
> +static inline void drmcgrp_mem_burst_bw_stats_reset(struct drm_device *dev)
> +{
> +	struct cgroup_subsys_state *pos;
> +	struct drmcgrp *node;
> +	struct drmcgrp_device_resource *ddr;
> +	int devIdx;
> +
> +	devIdx =  dev->primary->index;
> +
> +	rcu_read_lock();
> +	css_for_each_descendant_pre(pos, &root_drmcgrp->css) {
> +		node = css_drmcgrp(pos);
> +		ddr = node->dev_resources[devIdx];
> +
> +		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] = 1;
> +		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_MOVED] = 0;
> +	}
> +	rcu_read_unlock();
> +}
> +
>  static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
>  		struct seq_file *sf, enum drmcgrp_res_type type)
>  {
> @@ -179,6 +228,31 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
>  		}
>  		seq_puts(sf, "\n");
>  		break;
> +	case DRMCGRP_TYPE_BANDWIDTH:
> +		if (ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] == 0)
> +			seq_puts(sf, "burst_byte_per_us=NaN ");
> +		else
> +			seq_printf(sf, "burst_byte_per_us=%lld ",
> +				ddr->mem_bw_stats[
> +				DRMCGRP_MEM_BW_ATTR_BYTE_MOVED]/
> +				ddr->mem_bw_stats[
> +				DRMCGRP_MEM_BW_ATTR_ACCUM_US]);
> +
> +		if (ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US] == 0)
> +			seq_puts(sf, "avg_bytes_per_us=NaN ");
> +		else
> +			seq_printf(sf, "avg_bytes_per_us=%lld ",
> +				ddr->mem_bw_stats[
> +				DRMCGRP_MEM_BW_ATTR_TOTAL_BYTE_MOVED]/
> +				ddr->mem_bw_stats[
> +				DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US]);
> +
> +		for (i = 0; i < __DRMCGRP_MEM_BW_ATTR_LAST; i++) {
> +			seq_printf(sf, "%s=%lld ", mem_bw_attr_names[i],
> +					ddr->mem_bw_stats[i]);
> +		}
> +		seq_puts(sf, "\n");
> +		break;
>  	default:
>  		seq_puts(sf, "\n");
>  		break;
> @@ -186,9 +260,9 @@ static inline void drmcgrp_print_stats(struct drmcgrp_device_resource *ddr,
>  }
>  
>  static inline void drmcgrp_print_limits(struct drmcgrp_device_resource *ddr,
> -		struct seq_file *sf, enum drmcgrp_res_type type)
> +		struct seq_file *sf, enum drmcgrp_res_type type, int minor)
>  {
> -	if (ddr == NULL) {
> +	if (ddr == NULL || known_drmcgrp_devs[minor] == NULL) {
>  		seq_puts(sf, "\n");
>  		return;
>  	}
> @@ -200,6 +274,17 @@ static inline void drmcgrp_print_limits(struct drmcgrp_device_resource *ddr,
>  	case DRMCGRP_TYPE_BO_PEAK:
>  		seq_printf(sf, "%zu\n", ddr->bo_limits_peak_allocated);
>  		break;
> +	case DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST:
> +		seq_printf(sf, "%lld\n",
> +			known_drmcgrp_devs[minor]->mem_bw_limits_period_in_us);
> +		break;
> +	case DRMCGRP_TYPE_BANDWIDTH:
> +		seq_printf(sf, "%s=%lld %s=%lld\n",
> +				MEM_BW_LIMITS_NAME_BURST,
> +				ddr->mem_bw_limits_bytes_in_period,
> +				MEM_BW_LIMITS_NAME_AVG,
> +				ddr->mem_bw_limits_avg_bytes_per_us);
> +		break;
>  	default:
>  		seq_puts(sf, "\n");
>  		break;
> @@ -223,6 +308,17 @@ static inline void drmcgrp_print_default(struct drmcgrp_device *ddev,
>  		seq_printf(sf, "%zu\n",
>  				ddev->bo_limits_peak_allocated_default);
>  		break;
> +	case DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST:
> +		seq_printf(sf, "%lld\n",
> +				ddev->mem_bw_limits_period_in_us_default);
> +		break;
> +	case DRMCGRP_TYPE_BANDWIDTH:
> +		seq_printf(sf, "%s=%lld %s=%lld\n",
> +				MEM_BW_LIMITS_NAME_BURST,
> +				ddev->mem_bw_bytes_in_period_default,
> +				MEM_BW_LIMITS_NAME_AVG,
> +				ddev->mem_bw_avg_bytes_per_us_default);
> +		break;
>  	default:
>  		seq_puts(sf, "\n");
>  		break;
> @@ -251,7 +347,7 @@ int drmcgrp_bo_show(struct seq_file *sf, void *v)
>  			drmcgrp_print_stats(ddr, sf, type);
>  			break;
>  		case DRMCGRP_FTYPE_LIMIT:
> -			drmcgrp_print_limits(ddr, sf, type);
> +			drmcgrp_print_limits(ddr, sf, type, i);
>  			break;
>  		case DRMCGRP_FTYPE_DEFAULT:
>  			drmcgrp_print_default(ddev, sf, type);
> @@ -317,6 +413,9 @@ ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file *of, char *buf,
>  	struct drmcgrp_device_resource *ddr;
>  	char *line;
>  	char sattr[256];
> +	char sval[256];
> +	char *nested;
> +	char *attr;
>  	s64 val;
>  	s64 p_max;
>  	int rc;
> @@ -381,6 +480,78 @@ ssize_t drmcgrp_bo_limit_write(struct kernfs_open_file *of, char *buf,
>  
>  			ddr->bo_limits_peak_allocated = val;
>  			break;
> +		case DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST:
> +			rc = drmcgrp_process_limit_val(sattr, false,
> +				ddev->mem_bw_limits_period_in_us_default,
> +				S64_MAX,
> +				&val);
> +
> +			if (rc || val < 2000) {
> +				drmcgrp_pr_cft_err(drmcgrp, cft_name, minor);
> +				continue;
> +			}
> +
> +			ddev->mem_bw_limits_period_in_us= val;
> +			drmcgrp_mem_burst_bw_stats_reset(ddev->dev);
> +			break;
> +		case DRMCGRP_TYPE_BANDWIDTH:
> +			nested = strstrip(sattr);
> +
> +			while (nested != NULL) {
> +				attr = strsep(&nested, " ");
> +
> +				if (sscanf(attr, MEM_BW_LIMITS_NAME_BURST"=%s",
> +							sval) == 1) {
> +					p_max = parent == NULL ? S64_MAX :
> +						parent->
> +						dev_resources[minor]->
> +						mem_bw_limits_bytes_in_period;
> +
> +					rc = drmcgrp_process_limit_val(sval,
> +						true,
> +						ddev->
> +						mem_bw_bytes_in_period_default,
> +						p_max,
> +						&val);
> +
> +					if (rc || val < 0) {
> +						drmcgrp_pr_cft_err(drmcgrp,
> +								cft_name,
> +								minor);
> +						continue;
> +					}
> +
> +					ddr->mem_bw_limits_bytes_in_period=val;
> +					continue;
> +				}
> +
> +				if (sscanf(attr, MEM_BW_LIMITS_NAME_AVG"=%s",
> +							sval) == 1) {
> +					p_max = parent == NULL ? S64_MAX :
> +						parent->
> +						dev_resources[minor]->
> +						mem_bw_limits_avg_bytes_per_us;
> +
> +					rc = drmcgrp_process_limit_val(sval,
> +						true,
> +						ddev->
> +					      mem_bw_avg_bytes_per_us_default,
> +						p_max,
> +						&val);
> +
> +					if (rc || val < 0) {
> +						drmcgrp_pr_cft_err(drmcgrp,
> +								cft_name,
> +								minor);
> +						continue;
> +					}
> +
> +					ddr->
> +					mem_bw_limits_avg_bytes_per_us=val;
> +					continue;
> +				}
> +			}
> +			break;
>  		default:
>  			break;
>  		}
> @@ -454,6 +625,41 @@ struct cftype files[] = {
>  		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_MEM_PEAK,
>  						DRMCGRP_FTYPE_STATS),
>  	},
> +	{
> +		.name = "burst_bw_period_in_us",
> +		.write = drmcgrp_bo_limit_write,
> +		.seq_show = drmcgrp_bo_show,
> +		.flags = CFTYPE_ONLY_ON_ROOT,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST,
> +						DRMCGRP_FTYPE_LIMIT),
> +	},
> +	{
> +		.name = "burst_bw_period_in_us.default",
> +		.seq_show = drmcgrp_bo_show,
> +		.flags = CFTYPE_ONLY_ON_ROOT,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH_PERIOD_BURST,
> +						DRMCGRP_FTYPE_DEFAULT),
> +	},
> +	{
> +		.name = "bandwidth.stats",
> +		.seq_show = drmcgrp_bo_show,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH,
> +						DRMCGRP_FTYPE_STATS),
> +	},
> +	{
> +		.name = "bandwidth.high",
> +		.write = drmcgrp_bo_limit_write,
> +		.seq_show = drmcgrp_bo_show,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH,
> +						DRMCGRP_FTYPE_LIMIT),
> +	},
> +	{
> +		.name = "bandwidth.default",
> +		.seq_show = drmcgrp_bo_show,
> +		.flags = CFTYPE_ONLY_ON_ROOT,
> +		.private = DRMCG_CTF_PRIV(DRMCGRP_TYPE_BANDWIDTH,
> +						DRMCGRP_FTYPE_DEFAULT),
> +	},
>  	{ }	/* terminate */
>  };
>  
> @@ -476,6 +682,10 @@ int drmcgrp_register_device(struct drm_device *dev)
>  	ddev->dev = dev;
>  	ddev->bo_limits_total_allocated_default = S64_MAX;
>  	ddev->bo_limits_peak_allocated_default = SIZE_MAX;
> +	ddev->mem_bw_limits_period_in_us_default = 200000;
> +	ddev->mem_bw_limits_period_in_us = 200000;
> +	ddev->mem_bw_bytes_in_period_default = S64_MAX;
> +	ddev->mem_bw_avg_bytes_per_us_default = 65536;
>  
>  	mutex_init(&ddev->mutex);
>  
> @@ -652,6 +862,27 @@ void drmcgrp_unchg_mem(struct ttm_buffer_object *tbo)
>  }
>  EXPORT_SYMBOL(drmcgrp_unchg_mem);
>  
> +static inline void drmcgrp_mem_bw_accum(s64 time_us,
> +		struct drmcgrp_device_resource *ddr)
> +{
> +	s64 increment_us = time_us - ddr->mem_bw_stats_last_update_us;
> +	s64 new_credit = ddr->mem_bw_limits_avg_bytes_per_us * increment_us;
> +
> +	ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US]
> +		+= increment_us;
> +	ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_TOTAL_ACCUM_US]
> +		+= increment_us;
> +
> +	if ((S64_MAX - new_credit) >
> +			ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT])
> +		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
> +			+= new_credit;
> +	else
> +		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT] = S64_MAX;
> +
> +	ddr->mem_bw_stats_last_update_us = time_us;
> +}
> +
>  void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>  		struct ttm_mem_reg *new_mem)
>  {
> @@ -661,6 +892,7 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>  	int devIdx = dev->primary->index;
>  	int old_mem_type = old_bo->mem.mem_type;
>  	int new_mem_type = new_mem->mem_type;
> +	s64 time_us;
>  	struct drmcgrp_device_resource *ddr;
>  	struct drmcgrp_device *known_dev;
>  
> @@ -672,6 +904,14 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>  	old_mem_type = old_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : old_mem_type;
>  	new_mem_type = new_mem_type > TTM_PL_PRIV ? TTM_PL_PRIV : new_mem_type;
>  
> +	if (root_drmcgrp->dev_resources[devIdx] != NULL &&
> +			root_drmcgrp->dev_resources[devIdx]->
> +			mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] >=
> +			known_dev->mem_bw_limits_period_in_us)
> +		drmcgrp_mem_burst_bw_stats_reset(dev);
> +
> +	time_us = ktime_to_us(ktime_get());
> +
>  	mutex_lock(&known_dev->mutex);
>  	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
>  		ddr = drmcgrp->dev_resources[devIdx];
> @@ -684,7 +924,70 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>  
>  		if (evict)
>  			ddr->mem_stats_evict++;
> +
> +		drmcgrp_mem_bw_accum(time_us, ddr);
> +
> +		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_MOVED]
> +			+= move_in_bytes;
> +		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_TOTAL_BYTE_MOVED]
> +			+= move_in_bytes;
> +
> +		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
> +			-= move_in_bytes;
>  	}
>  	mutex_unlock(&known_dev->mutex);
>  }
>  EXPORT_SYMBOL(drmcgrp_mem_track_move);
> +
> +unsigned int drmcgrp_get_mem_bw_period_in_us(struct ttm_buffer_object *tbo)
> +{
> +	int devIdx;
> +
> +	//TODO replace with BUG_ON

Nah, WARN_ON, BUG_ON considered evil in code where it's avoidable.

> +	if (tbo->bdev->ddev == NULL)
> +		return 0;
> +
> +	devIdx = tbo->bdev->ddev->primary->index;
> +
> +	return (unsigned int) known_drmcgrp_devs[devIdx]->
> +		mem_bw_limits_period_in_us;
> +}
> +EXPORT_SYMBOL(drmcgrp_get_mem_bw_period_in_us);
> +
> +bool drmcgrp_mem_can_move(struct ttm_buffer_object *tbo)
> +{
> +	struct drm_device *dev = tbo->bdev->ddev;
> +	struct drmcgrp *drmcgrp = tbo->drmcgrp;
> +	int devIdx = dev->primary->index;
> +	s64 time_us;
> +	struct drmcgrp_device_resource *ddr;
> +	bool result = true;
> +
> +	if (root_drmcgrp->dev_resources[devIdx] != NULL &&
> +			root_drmcgrp->dev_resources[devIdx]->
> +			mem_bw_stats[DRMCGRP_MEM_BW_ATTR_ACCUM_US] >=
> +			known_drmcgrp_devs[devIdx]->
> +			mem_bw_limits_period_in_us)
> +		drmcgrp_mem_burst_bw_stats_reset(dev);
> +
> +	time_us = ktime_to_us(ktime_get());
> +
> +	mutex_lock(&known_drmcgrp_devs[devIdx]->mutex);
> +	for ( ; drmcgrp != NULL; drmcgrp = parent_drmcgrp(drmcgrp)) {
> +		ddr = drmcgrp->dev_resources[devIdx];
> +
> +		drmcgrp_mem_bw_accum(time_us, ddr);
> +
> +		if (result &&
> +			(ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_MOVED]
> +			 >= ddr->mem_bw_limits_bytes_in_period ||
> +			ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
> +			 <= 0)) {
> +			result = false;
> +		}
> +	}
> +	mutex_unlock(&known_drmcgrp_devs[devIdx]->mutex);
> +
> +	return result;
> +}
> +EXPORT_SYMBOL(drmcgrp_mem_can_move);
> -- 
> 2.21.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim
       [not found]     ` <20190626150522.11618-12-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26 16:44       ` Daniel Vetter
  2019-06-26 22:52         ` Kenny Ho
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 16:44 UTC (permalink / raw)
  To: Kenny Ho
  Cc: jsparks-WVYJKLFxKCc, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	lkaplan-WVYJKLFxKCc, alexander.deucher-5C7GfCeVMHo,
	y2kenny-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, christian.koenig-5C7GfCeVMHo

On Wed, Jun 26, 2019 at 11:05:22AM -0400, Kenny Ho wrote:
> Allow DRM TTM memory manager to register a work_struct, such that, when
> a drmcgrp is under memory pressure, memory reclaiming can be triggered
> immediately.
> 
> Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c    | 47 +++++++++++++++++++++++++++++++++
>  include/drm/drm_cgroup.h        | 14 ++++++++++
>  include/drm/ttm/ttm_bo_driver.h |  2 ++
>  kernel/cgroup/drm.c             | 33 +++++++++++++++++++++++
>  4 files changed, 96 insertions(+)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 79c530f4a198..5fc3bc5bd4c5 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1509,6 +1509,44 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned mem_type)
>  }
>  EXPORT_SYMBOL(ttm_bo_evict_mm);
>  
> +static void ttm_bo_reclaim_wq(struct work_struct *work)
> +{

I think a design a bit more inspired by memcg aware core shrinkers would
be nice, i.e. explicitly passing:
- which drm_cgroup needs to be shrunk
- which ttm_mem_reg (well the fancy new abstracted out stuff for tracking
  special gpu memory resources like tt or vram or whatever)
- how much it needs to be shrunk

I think with that a lot more the book-keeping could be pushed into the
drm_cgroup code, and the callback just needs to actually shrink enough as
requested.
-Daniel

> +	struct ttm_operation_ctx ctx = {
> +		.interruptible = false,
> +		.no_wait_gpu = false,
> +		.flags = TTM_OPT_FLAG_FORCE_ALLOC
> +	};
> +	struct ttm_mem_type_manager *man =
> +	    container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> +	struct ttm_bo_device *bdev = man->bdev;
> +	struct dma_fence *fence;
> +	int mem_type;
> +	int ret;
> +
> +	for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> +		if (&bdev->man[mem_type] == man)
> +			break;
> +
> +	BUG_ON(mem_type >= TTM_NUM_MEM_TYPES);
> +
> +	if (!drmcgrp_mem_pressure_scan(bdev, mem_type))
> +		return;
> +
> +	ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx);
> +	if (ret)
> +		return;
> +
> +	spin_lock(&man->move_lock);
> +	fence = dma_fence_get(man->move);
> +	spin_unlock(&man->move_lock);
> +
> +	if (fence) {
> +		ret = dma_fence_wait(fence, false);
> +		dma_fence_put(fence);
> +	}
> +
> +}
> +
>  int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>  			unsigned long p_size)
>  {
> @@ -1543,6 +1581,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
>  		INIT_LIST_HEAD(&man->lru[i]);
>  	man->move = NULL;
>  
> +	pr_err("drmcgrp %p type %d\n", bdev->ddev, type);
> +
> +	if (type <= TTM_PL_VRAM) {
> +		INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
> +		drmcgrp_register_device_mm(bdev->ddev, type, &man->reclaim_wq);
> +	}
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL(ttm_bo_init_mm);
> @@ -1620,6 +1665,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
>  		man = &bdev->man[i];
>  		if (man->has_type) {
>  			man->use_type = false;
> +			drmcgrp_unregister_device_mm(bdev->ddev, i);
> +			cancel_work_sync(&man->reclaim_wq);
>  			if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
>  				ret = -EBUSY;
>  				pr_err("DRM memory manager type %d is not clean\n",
> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> index 360c1e6c809f..134d6e5475f3 100644
> --- a/include/drm/drm_cgroup.h
> +++ b/include/drm/drm_cgroup.h
> @@ -5,6 +5,7 @@
>  #define __DRM_CGROUP_H__
>  
>  #include <linux/cgroup_drm.h>
> +#include <linux/workqueue.h>
>  #include <drm/ttm/ttm_bo_api.h>
>  #include <drm/ttm/ttm_bo_driver.h>
>  
> @@ -12,6 +13,9 @@
>  
>  int drmcgrp_register_device(struct drm_device *device);
>  int drmcgrp_unregister_device(struct drm_device *device);
> +void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
> +		struct work_struct *wq);
> +void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type);
>  bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
>  		struct drmcgrp *relative);
>  void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
> @@ -40,6 +44,16 @@ static inline int drmcgrp_unregister_device(struct drm_device *device)
>  	return 0;
>  }
>  
> +static inline void drmcgrp_register_device_mm(struct drm_device *dev,
> +		unsigned type, struct work_struct *wq)
> +{
> +}
> +
> +static inline void drmcgrp_unregister_device_mm(struct drm_device *dev,
> +		unsigned type)
> +{
> +}
> +
>  static inline bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
>  		struct drmcgrp *relative)
>  {
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 4cbcb41e5aa9..0956ca7888fc 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
>  	 * Protected by @move_lock.
>  	 */
>  	struct dma_fence *move;
> +
> +	struct work_struct reclaim_wq;
>  };
>  
>  /**
> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> index 1ce13db36ce9..985a89e849d3 100644
> --- a/kernel/cgroup/drm.c
> +++ b/kernel/cgroup/drm.c
> @@ -31,6 +31,8 @@ struct drmcgrp_device {
>  	s64			mem_bw_avg_bytes_per_us_default;
>  
>  	s64			mem_highs_default[TTM_PL_PRIV+1];
> +
> +	struct work_struct	*mem_reclaim_wq[TTM_PL_PRIV];
>  };
>  
>  #define DRMCG_CTF_PRIV_SIZE 3
> @@ -793,6 +795,31 @@ int drmcgrp_unregister_device(struct drm_device *dev)
>  }
>  EXPORT_SYMBOL(drmcgrp_unregister_device);
>  
> +void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
> +		struct work_struct *wq)
> +{
> +	if (dev == NULL || dev->primary->index > max_minor
> +			|| type >= TTM_PL_PRIV)
> +		return;
> +
> +	mutex_lock(&drmcgrp_mutex);
> +	known_drmcgrp_devs[dev->primary->index]->mem_reclaim_wq[type] = wq;
> +	mutex_unlock(&drmcgrp_mutex);
> +}
> +EXPORT_SYMBOL(drmcgrp_register_device_mm);
> +
> +void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type)
> +{
> +	if (dev == NULL || dev->primary->index > max_minor
> +			|| type >= TTM_PL_PRIV)
> +		return;
> +
> +	mutex_lock(&drmcgrp_mutex);
> +	known_drmcgrp_devs[dev->primary->index]->mem_reclaim_wq[type] = NULL;
> +	mutex_unlock(&drmcgrp_mutex);
> +}
> +EXPORT_SYMBOL(drmcgrp_unregister_device_mm);
> +
>  bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self, struct drmcgrp *relative)
>  {
>  	for (; self != NULL; self = parent_drmcgrp(self))
> @@ -1004,6 +1031,12 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>  
>  		ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
>  			-= move_in_bytes;
> +
> +		if (known_dev->mem_reclaim_wq[new_mem_type] != NULL &&
> +                        ddr->mem_stats[new_mem_type] >
> +				ddr->mem_highs[new_mem_type])
> +			schedule_work(
> +				known_dev->mem_reclaim_wq[new_mem_type]);
>  	}
>  	mutex_unlock(&known_dev->mutex);
>  }
> -- 
> 2.21.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 01/11] cgroup: Introduce cgroup for drm subsystem
  2019-06-26 15:49       ` Daniel Vetter
@ 2019-06-26 19:35         ` Kenny Ho
       [not found]           ` <CAOWid-dyGwf=e0ikBEQ=bnVM_bC8-FeTOD8fJVMJKUgPv6vtyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 19:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks, amd-gfx, lkaplan, Alex Deucher, dri-devel,
	joseph.greathouse, Tejun Heo, cgroups, Christian König

On Wed, Jun 26, 2019 at 11:49 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> Bunch of naming bikesheds

I appreciate the suggestions, naming is hard :).

> > +#include <linux/cgroup.h>
> > +
> > +struct drmcgrp {
>
> drm_cgroup for more consistency how we usually call these things.

I was hoping to keep the symbol short if possible.  I started with
drmcg (following blkcg),  but I believe that causes confusion with
other aspect of the drm subsystem.  I don't have too strong of an
opinion on this but I'd prefer not needing to keep refactoring.  So if
there are other opinions on this, please speak up.

> > +
> > +static inline void put_drmcgrp(struct drmcgrp *drmcgrp)
>
> In drm we generally put _get/_put at the end, cgroup seems to do the same.

ok, I will refactor.

> > +{
> > +     if (drmcgrp)
> > +             css_put(&drmcgrp->css);
> > +}
> > +
> > +static inline struct drmcgrp *parent_drmcgrp(struct drmcgrp *cg)
>
> I'd also call this drm_cgroup_parent or so.
>
> Also all the above needs a bit of nice kerneldoc for the final version.
> -Daniel

Noted, will do, thanks.

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 01/11] cgroup: Introduce cgroup for drm subsystem
       [not found]           ` <CAOWid-dyGwf=e0ikBEQ=bnVM_bC8-FeTOD8fJVMJKUgPv6vtyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-26 20:12             ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 20:12 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Kenny Ho, jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel, joseph.greathouse-5C7GfCeVMHo,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Wed, Jun 26, 2019 at 9:35 PM Kenny Ho <y2kenny@gmail.com> wrote:
>
> On Wed, Jun 26, 2019 at 11:49 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > Bunch of naming bikesheds
>
> I appreciate the suggestions, naming is hard :).
>
> > > +#include <linux/cgroup.h>
> > > +
> > > +struct drmcgrp {
> >
> > drm_cgroup for more consistency how we usually call these things.
>
> I was hoping to keep the symbol short if possible.  I started with
> drmcg (following blkcg),  but I believe that causes confusion with
> other aspect of the drm subsystem.  I don't have too strong of an
> opinion on this but I'd prefer not needing to keep refactoring.  So if
> there are other opinions on this, please speak up.

I think drmcg sounds good to me. That aligns at least with memcg,
blkcg in cgroups, so as good reason as any. drmcgrp just felt kinda
awkward in-between solution, not as easy to read as drm_cgroup, but
also not as short as drmcg and cgrp is just letter jumbo I can never
remember anyway what it means :-)
-Daniel

> > > +
> > > +static inline void put_drmcgrp(struct drmcgrp *drmcgrp)
> >
> > In drm we generally put _get/_put at the end, cgroup seems to do the same.
>
> ok, I will refactor.
>
> > > +{
> > > +     if (drmcgrp)
> > > +             css_put(&drmcgrp->css);
> > > +}
> > > +
> > > +static inline struct drmcgrp *parent_drmcgrp(struct drmcgrp *cg)
> >
> > I'd also call this drm_cgroup_parent or so.
> >
> > Also all the above needs a bit of nice kerneldoc for the final version.
> > -Daniel
>
> Noted, will do, thanks.
>
> Regards,
> Kenny



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices
  2019-06-26 15:56     ` Daniel Vetter
@ 2019-06-26 20:37       ` Kenny Ho
  2019-06-26 21:03         ` Daniel Vetter
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 20:37 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks, amd-gfx, lkaplan, Alex Deucher, dri-devel,
	joseph.greathouse, Tejun Heo, cgroups, Christian König

(sending again, I keep missing the reply-all in gmail.)

On Wed, Jun 26, 2019 at 11:56 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> Why the separate, explicit registration step? I think a simpler design for
> drivers would be that we set up cgroups if there's anything to be
> controlled, and then for GEM drivers the basic GEM stuff would be set up
> automically (there's really no reason not to I think).

Is this what you mean with the comment about drm_dev_register below?
I think I understand what you are saying but not super clear.  Are you
suggesting the use of driver feature bits (drm_core_check_feature,
etc.) similar to the way Brian Welty did in his proposal in May?

> Also tying to the minor is a bit funky, since we have multiple of these.
> Need to make sure were at least consistent with whether we use the primary
> or render minor - I'd always go with the primary one like you do here.

Um... come to think of it, I can probably embed struct drmcgrp_device
into drm_device and that way I don't really need to keep a separate
array of
known_drmcgrp_devs and get rid of that max_minor thing.  Not sure why
I didn't think of this before.

> > +
> > +int drmcgrp_register_device(struct drm_device *dev)
>
> Imo this should be done as part of drm_dev_register (maybe only if the
> driver has set up a controller or something). Definitely with the
> unregister logic below. Also anything used by drivers needs kerneldoc.
>
>
> > +     /* init cgroups created before registration (i.e. root cgroup) */
> > +     if (root_drmcgrp != NULL) {
> > +             struct cgroup_subsys_state *pos;
> > +             struct drmcgrp *child;
> > +
> > +             rcu_read_lock();
> > +             css_for_each_descendant_pre(pos, &root_drmcgrp->css) {
> > +                     child = css_drmcgrp(pos);
> > +                     init_drmcgrp(child, dev);
> > +             }
> > +             rcu_read_unlock();
>
> I have no idea, but is this guaranteed to get them all?

I believe so, base on my understanding about
css_for_each_descendant_pre and how I am starting from the root
cgroup.  Hopefully I didn't miss anything.

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices
  2019-06-26 20:37       ` Kenny Ho
@ 2019-06-26 21:03         ` Daniel Vetter
       [not found]           ` <CAKMK7uERvn7Ed2trGQShM94Ozp6+x8bsULFyGj9CYWstuzb56A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 21:03 UTC (permalink / raw)
  To: Kenny Ho
  Cc: Kenny Ho, jsparks, amd-gfx list, lkaplan, Alex Deucher,
	dri-devel, joseph.greathouse, Tejun Heo, cgroups,
	Christian König

On Wed, Jun 26, 2019 at 10:37 PM Kenny Ho <y2kenny@gmail.com> wrote:
> (sending again, I keep missing the reply-all in gmail.)

You can make it the default somewhere in the gmail options.

(also resending, I missed that you didn't group-replied).

On Wed, Jun 26, 2019 at 10:25 PM Kenny Ho <y2kenny@gmail.com> wrote:
>
> On Wed, Jun 26, 2019 at 11:56 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > Why the separate, explicit registration step? I think a simpler design for
> > drivers would be that we set up cgroups if there's anything to be
> > controlled, and then for GEM drivers the basic GEM stuff would be set up
> > automically (there's really no reason not to I think).
>
> Is this what you mean with the comment about drm_dev_register below?
> I think I understand what you are saying but not super clear.  Are you
> suggesting the use of driver feature bits (drm_core_check_feature,
> etc.) similar to the way Brian Welty did in his proposal in May?

Also not exactly a fan of driver feature bits tbh. What I had in mind was:

- For stuff like the GEM accounting which we can do for all drivers
easily (we can't do the enforcment, that needs a few changes), just
roll it out for everyone. I.e. if you enable the DRMCG Kconfig, all
DRIVER_GEM would get that basic gem cgroup accounting.

- for other bits the driver just registers certain things, like "I can
enforce gem limits" or "I have gpu memory regions vram, tt, and system
and can enforce them" in their normal driver setup. Then at
drm_dev_register time we register all these additional cgroups, like
we today register all the other interafaces and pieces of a drm_device
(drm_minor, drm_connectors, debugfs files, sysfs stuff, all these
things).

Since the concepts are still a bit in flux, let's take an example from
the modeset side:
- driver call drm_connector_init() to create connector object
- drm_dev_register() also sets up all the public interfaces for that
connector (debugfs, sysfs, ...)

I think a similar setup would be good for cgroups here, you just
register your special ttm_mem_reg or whatever, and the magic happens
automatically.

> > Also tying to the minor is a bit funky, since we have multiple of these.
> > Need to make sure were at least consistent with whether we use the primary
> > or render minor - I'd always go with the primary one like you do here.
>
> Um... come to think of it, I can probably embed struct drmcgrp_device
> into drm_device and that way I don't really need to keep a separate
> array of
> known_drmcgrp_devs and get rid of that max_minor thing.  Not sure why
> I didn't think of this before.

Yeah if that's possible, embedding is definitely the preferred way.
drm_device is huge already, and the per-device overhead really doesn't
matter.

> > > +
> > > +int drmcgrp_register_device(struct drm_device *dev)
> >
> > Imo this should be done as part of drm_dev_register (maybe only if the
> > driver has set up a controller or something). Definitely with the
> > unregister logic below. Also anything used by drivers needs kerneldoc.
> >
> >
> > > +     /* init cgroups created before registration (i.e. root cgroup) */
> > > +     if (root_drmcgrp != NULL) {
> > > +             struct cgroup_subsys_state *pos;
> > > +             struct drmcgrp *child;
> > > +
> > > +             rcu_read_lock();
> > > +             css_for_each_descendant_pre(pos, &root_drmcgrp->css) {
> > > +                     child = css_drmcgrp(pos);
> > > +                     init_drmcgrp(child, dev);
> > > +             }
> > > +             rcu_read_unlock();
> >
> > I have no idea, but is this guaranteed to get them all?
>
> I believe so, base on my understanding about
> css_for_each_descendant_pre and how I am starting from the root
> cgroup.  Hopefully I didn't miss anything.

Well it's rcu, so I expect it'll race with concurrent
addition/removal. And the kerneldoc has some complicated sounding
comments about how to synchronize that with some locks that I don't
fully understand, but I think you're also not having any additional
locking so not sure this all works correctly ...

Do we still need the init_dmcgrp stuff if we'd just embedd? That would
probably be the simplest way to solve this all :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
       [not found]       ` <20190626160553.GR12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-06-26 21:27         ` Kenny Ho
       [not found]           ` <CAOWid-eurCMx1F7ciUwx0e+p=s=NP8=UxQUhhF-hdK-iAna+fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 21:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	joseph.greathouse-5C7GfCeVMHo, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Wed, Jun 26, 2019 at 12:05 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > drm.buffer.default
> >         A read-only flat-keyed file which exists on the root cgroup.
> >         Each entry is keyed by the drm device's major:minor.
> >
> >         Default limits on the total GEM buffer allocation in bytes.
>
> Don't we need a "0 means no limit" semantics here?

I believe the convention is to use the 'max' keyword.

>
> I think we need a new drm-cgroup.rst which contains all this
> documentation.

Yes I planned to do that when things are more finalized.  I am
actually writing the commit message following the current doc format
so I can reuse it in the rst.

>
> With multiple GPUs, do we need an overall GEM bo limit, across all gpus?
> For other stuff later on like vram/tt/... and all that it needs to be
> per-device, but I think one overall limit could be useful.

This one I am not sure but should be fairly straightforward to add.
I'd love to hear more feedbacks on this as well.

> >       if (!amdgpu_bo_validate_size(adev, size, bp->domain))
> >               return -ENOMEM;
> >
> > +     if (!drmcgrp_bo_can_allocate(current, adev->ddev, size))
> > +             return -ENOMEM;
>
> So what happens when you start a lot of threads all at the same time,
> allocating gem bo? Also would be nice if we could roll out at least the
> accounting part of this cgroup to all GEM drivers.

When there is a large number of allocation, the allocation will be
checked in sequence within a device (since I used a per device mutex
in the check.)  Are you suggesting the overhead here is significant
enough to be a bottleneck?  The accounting part should be available to
all GEM drivers (unless I missed something) since the chg and unchg
function is called via the generic drm_gem_private_object_init and
drm_gem_object_release.

> > +     /* only allow bo from the same cgroup or its ancestor to be imported */
> > +     if (drmcgrp != NULL &&
>
> Quite a serious limitation here ...
>
> > +                     !drmcgrp_is_self_or_ancestor(drmcgrp, obj->drmcgrp)) {
>
> Also what happens if you actually share across devices? Then importing in
> the 2nd group is suddenly possible, and I think will be double-counted.
>
> What's the underlying technical reason for not allowing sharing across
> cgroups?

With the current implementation, there shouldn't be double counting as
the counting is done during the buffer init.

To be clear, sharing across cgroup is allowed, the buffer just needs
to be allocated by a process that is parent to the cgroup.  So in the
case of xorg allocating buffer for client, the xorg would be in the
root cgroup and the buffer can be passed around by different clients
(in root or other cgroup.)  The idea here is to establish some form of
ownership, otherwise there wouldn't be a way to account for or limit
the usage.

Regards,
Kenny
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
       [not found]           ` <CAOWid-eurCMx1F7ciUwx0e+p=s=NP8=UxQUhhF-hdK-iAna+fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-26 21:41             ` Daniel Vetter
       [not found]               ` <20190626214113.GA12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-26 21:41 UTC (permalink / raw)
  To: Kenny Ho
  Cc: joseph.greathouse-5C7GfCeVMHo, Kenny Ho, jsparks-WVYJKLFxKCc,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Daniel Vetter, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Wed, Jun 26, 2019 at 05:27:48PM -0400, Kenny Ho wrote:
> On Wed, Jun 26, 2019 at 12:05 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > drm.buffer.default
> > >         A read-only flat-keyed file which exists on the root cgroup.
> > >         Each entry is keyed by the drm device's major:minor.
> > >
> > >         Default limits on the total GEM buffer allocation in bytes.
> >
> > Don't we need a "0 means no limit" semantics here?
> 
> I believe the convention is to use the 'max' keyword.
> 
> >
> > I think we need a new drm-cgroup.rst which contains all this
> > documentation.
> 
> Yes I planned to do that when things are more finalized.  I am
> actually writing the commit message following the current doc format
> so I can reuse it in the rst.

Awesome.

> > With multiple GPUs, do we need an overall GEM bo limit, across all gpus?
> > For other stuff later on like vram/tt/... and all that it needs to be
> > per-device, but I think one overall limit could be useful.
> 
> This one I am not sure but should be fairly straightforward to add.
> I'd love to hear more feedbacks on this as well.
> 
> > >       if (!amdgpu_bo_validate_size(adev, size, bp->domain))
> > >               return -ENOMEM;
> > >
> > > +     if (!drmcgrp_bo_can_allocate(current, adev->ddev, size))
> > > +             return -ENOMEM;
> >
> > So what happens when you start a lot of threads all at the same time,
> > allocating gem bo? Also would be nice if we could roll out at least the
> > accounting part of this cgroup to all GEM drivers.
> 
> When there is a large number of allocation, the allocation will be
> checked in sequence within a device (since I used a per device mutex
> in the check.)  Are you suggesting the overhead here is significant
> enough to be a bottleneck?  The accounting part should be available to
> all GEM drivers (unless I missed something) since the chg and unchg
> function is called via the generic drm_gem_private_object_init and
> drm_gem_object_release.

thread 1: checks limits, still under the total

thread 2: checks limits, still under the total

thread 1: allocates, still under

thread 2: allocates, now over the limit

I think the check and chg need to be one step, or this wont work. Or I'm
missing something somewhere.

Wrt rolling out the accounting for all drivers: Since you also roll out
enforcement in this patch I'm not sure whether the accounting part is
fully stand-alone. And as discussed a bit on an earlier patch, I think for
DRIVER_GEM we should set up the accounting cgroup automatically.

> > > +     /* only allow bo from the same cgroup or its ancestor to be imported */
> > > +     if (drmcgrp != NULL &&
> >
> > Quite a serious limitation here ...
> >
> > > +                     !drmcgrp_is_self_or_ancestor(drmcgrp, obj->drmcgrp)) {
> >
> > Also what happens if you actually share across devices? Then importing in
> > the 2nd group is suddenly possible, and I think will be double-counted.
> >
> > What's the underlying technical reason for not allowing sharing across
> > cgroups?
> 
> With the current implementation, there shouldn't be double counting as
> the counting is done during the buffer init.

If you share across devices there will be two drm_gem_obect structures, on
on each device. But only one underlying bo.

Now the bo limit is per-device too, so that's all fine, but for a global
bo limit we'd need to make sure we count these only once.

> To be clear, sharing across cgroup is allowed, the buffer just needs
> to be allocated by a process that is parent to the cgroup.  So in the
> case of xorg allocating buffer for client, the xorg would be in the
> root cgroup and the buffer can be passed around by different clients
> (in root or other cgroup.)  The idea here is to establish some form of
> ownership, otherwise there wouldn't be a way to account for or limit
> the usage.

But why? What's the problem if I allocate something and then hand it to
someone else. E.g. one popular use of cgroups is to isolate clients, so
maybe you'd do a cgroup + namespace for each X11 client (ok wayland, with
X11 this is probably pointless).

But with your current limitation those clients can't pass buffers to the
compositor anymore, making cgroups useless. Your example here only works
if Xorg is in the root and allocates all the buffers. That's not even true
for DRI3 anymore.

So pretty serious limitation on cgroups, and I'm not really understanding
why we need this. I think if we want to prevent buffer sharing, what we
need are some selinux hooks and stuff so you can prevent an import/access
by someone who's not allowed to touch a buffer. But that kind of access
right management should be separate from resource control imo.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices
       [not found]           ` <CAKMK7uERvn7Ed2trGQShM94Ozp6+x8bsULFyGj9CYWstuzb56A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-26 21:58             ` Kenny Ho
  0 siblings, 0 replies; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 21:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel, joseph.greathouse-5C7GfCeVMHo,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Wed, Jun 26, 2019 at 5:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Jun 26, 2019 at 10:37 PM Kenny Ho <y2kenny@gmail.com> wrote:
> > (sending again, I keep missing the reply-all in gmail.)
> You can make it the default somewhere in the gmail options.
Um... interesting, my option was actually not set (neither reply or reply-all.)

> > On Wed, Jun 26, 2019 at 11:56 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > Why the separate, explicit registration step? I think a simpler design for
> > > drivers would be that we set up cgroups if there's anything to be
> > > controlled, and then for GEM drivers the basic GEM stuff would be set up
> > > automically (there's really no reason not to I think).
> >
> > Is this what you mean with the comment about drm_dev_register below?
> > I think I understand what you are saying but not super clear.  Are you
> > suggesting the use of driver feature bits (drm_core_check_feature,
> > etc.) similar to the way Brian Welty did in his proposal in May?
>
> Also not exactly a fan of driver feature bits tbh. What I had in mind was:
>
> - For stuff like the GEM accounting which we can do for all drivers
> easily (we can't do the enforcment, that needs a few changes), just
> roll it out for everyone. I.e. if you enable the DRMCG Kconfig, all
> DRIVER_GEM would get that basic gem cgroup accounting.
>
> - for other bits the driver just registers certain things, like "I can
> enforce gem limits" or "I have gpu memory regions vram, tt, and system
> and can enforce them" in their normal driver setup. Then at
> drm_dev_register time we register all these additional cgroups, like
> we today register all the other interafaces and pieces of a drm_device
> (drm_minor, drm_connectors, debugfs files, sysfs stuff, all these
> things).
>
> Since the concepts are still a bit in flux, let's take an example from
> the modeset side:
> - driver call drm_connector_init() to create connector object
> - drm_dev_register() also sets up all the public interfaces for that
> connector (debugfs, sysfs, ...)
>
> I think a similar setup would be good for cgroups here, you just
> register your special ttm_mem_reg or whatever, and the magic happens
> automatically.

Ok, I will look into those (I am not too familiar about those at this point.)

> > > I have no idea, but is this guaranteed to get them all?
> >
> > I believe so, base on my understanding about
> > css_for_each_descendant_pre and how I am starting from the root
> > cgroup.  Hopefully I didn't miss anything.
>
> Well it's rcu, so I expect it'll race with concurrent
> addition/removal. And the kerneldoc has some complicated sounding
> comments about how to synchronize that with some locks that I don't
> fully understand, but I think you're also not having any additional
> locking so not sure this all works correctly ...
>
> Do we still need the init_dmcgrp stuff if we'd just embedd? That would
> probably be the simplest way to solve this all :-)

I will need to dig into it a bit more to know for sure.  I think I
still need the init_drmcgrp stuff. I implemented it like this because
the cgroup subsystem appear to be initialized before the drm subsystem
so the root cgroup does not know any drm devices and the per device
default limits are not set.  In theory, I should only need to set the
root cgroup (so I don't need to use css_for_each_descendant_pre, which
requires the rcu_lock.)  But I am not 100% confident there won't be
any additional cgroup being added to the hierarchy between cgroup
subsystem init and drm subsystem init.

Alternatively I can protect it with an additional mutex but I am not
sure if that's needed.

Regards,
Kenny
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
       [not found]               ` <20190626214113.GA12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-06-26 22:41                 ` Kenny Ho
       [not found]                   ` <CAOWid-egYGijS0a6uuG4mPUmOWaPwF-EKokR=LFNJ=5M+akVZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 22:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel, joseph.greathouse-5C7GfCeVMHo,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Wed, Jun 26, 2019 at 5:41 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Jun 26, 2019 at 05:27:48PM -0400, Kenny Ho wrote:
> > On Wed, Jun 26, 2019 at 12:05 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > So what happens when you start a lot of threads all at the same time,
> > > allocating gem bo? Also would be nice if we could roll out at least the
> > > accounting part of this cgroup to all GEM drivers.
> >
> > When there is a large number of allocation, the allocation will be
> > checked in sequence within a device (since I used a per device mutex
> > in the check.)  Are you suggesting the overhead here is significant
> > enough to be a bottleneck?  The accounting part should be available to
> > all GEM drivers (unless I missed something) since the chg and unchg
> > function is called via the generic drm_gem_private_object_init and
> > drm_gem_object_release.
>
> thread 1: checks limits, still under the total
>
> thread 2: checks limits, still under the total
>
> thread 1: allocates, still under
>
> thread 2: allocates, now over the limit
>
> I think the check and chg need to be one step, or this wont work. Or I'm
> missing something somewhere.

Ok, I see what you are saying.

> Wrt rolling out the accounting for all drivers: Since you also roll out
> enforcement in this patch I'm not sure whether the accounting part is
> fully stand-alone. And as discussed a bit on an earlier patch, I think for
> DRIVER_GEM we should set up the accounting cgroup automatically.
I think I should be able to split the commit and restructure things a bit.

> > > What's the underlying technical reason for not allowing sharing across
> > > cgroups?
> > To be clear, sharing across cgroup is allowed, the buffer just needs
> > to be allocated by a process that is parent to the cgroup.  So in the
> > case of xorg allocating buffer for client, the xorg would be in the
> > root cgroup and the buffer can be passed around by different clients
> > (in root or other cgroup.)  The idea here is to establish some form of
> > ownership, otherwise there wouldn't be a way to account for or limit
> > the usage.
>
> But why? What's the problem if I allocate something and then hand it to
> someone else. E.g. one popular use of cgroups is to isolate clients, so
> maybe you'd do a cgroup + namespace for each X11 client (ok wayland, with
> X11 this is probably pointless).
>
> But with your current limitation those clients can't pass buffers to the
> compositor anymore, making cgroups useless. Your example here only works
> if Xorg is in the root and allocates all the buffers. That's not even true
> for DRI3 anymore.
>
> So pretty serious limitation on cgroups, and I'm not really understanding
> why we need this. I think if we want to prevent buffer sharing, what we
> need are some selinux hooks and stuff so you can prevent an import/access
> by someone who's not allowed to touch a buffer. But that kind of access
> right management should be separate from resource control imo.
So without the sharing restriction and some kind of ownership
structure, we will have to migrate/change the owner of the buffer when
the cgroup that created the buffer die before the receiving cgroup(s)
and I am not sure how to do that properly at the moment.  1) Should
each cgroup keep track of all the buffers that belongs to it and
migrate?  (Is that efficient?)  2) which cgroup should be the new
owner (and therefore have the limit applied?)  Having the creator
being the owner is kind of natural, but let say the buffer is shared
with 5 other cgroups, which of these 5 cgroups should be the new owner
of the buffer?

Regards,
Kenny
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim
  2019-06-26 16:44       ` Daniel Vetter
@ 2019-06-26 22:52         ` Kenny Ho
  2019-06-27  6:15           ` Daniel Vetter
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-26 22:52 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks, amd-gfx list, lkaplan, Alex Deucher,
	dri-devel, joseph.greathouse, Tejun Heo, cgroups,
	Christian König

Ok.  I am not too familiar with shrinker but I will dig into it.  Just
so that I am looking into the right things, you are referring to
things like struct shrinker and struct shrink_control?

Regards,
Kenny

On Wed, Jun 26, 2019 at 12:44 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Jun 26, 2019 at 11:05:22AM -0400, Kenny Ho wrote:
> > Allow DRM TTM memory manager to register a work_struct, such that, when
> > a drmcgrp is under memory pressure, memory reclaiming can be triggered
> > immediately.
> >
> > Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > ---
> >  drivers/gpu/drm/ttm/ttm_bo.c    | 47 +++++++++++++++++++++++++++++++++
> >  include/drm/drm_cgroup.h        | 14 ++++++++++
> >  include/drm/ttm/ttm_bo_driver.h |  2 ++
> >  kernel/cgroup/drm.c             | 33 +++++++++++++++++++++++
> >  4 files changed, 96 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index 79c530f4a198..5fc3bc5bd4c5 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -1509,6 +1509,44 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned mem_type)
> >  }
> >  EXPORT_SYMBOL(ttm_bo_evict_mm);
> >
> > +static void ttm_bo_reclaim_wq(struct work_struct *work)
> > +{
>
> I think a design a bit more inspired by memcg aware core shrinkers would
> be nice, i.e. explicitly passing:
> - which drm_cgroup needs to be shrunk
> - which ttm_mem_reg (well the fancy new abstracted out stuff for tracking
>   special gpu memory resources like tt or vram or whatever)
> - how much it needs to be shrunk
>
> I think with that a lot more the book-keeping could be pushed into the
> drm_cgroup code, and the callback just needs to actually shrink enough as
> requested.
> -Daniel
>
> > +     struct ttm_operation_ctx ctx = {
> > +             .interruptible = false,
> > +             .no_wait_gpu = false,
> > +             .flags = TTM_OPT_FLAG_FORCE_ALLOC
> > +     };
> > +     struct ttm_mem_type_manager *man =
> > +         container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> > +     struct ttm_bo_device *bdev = man->bdev;
> > +     struct dma_fence *fence;
> > +     int mem_type;
> > +     int ret;
> > +
> > +     for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> > +             if (&bdev->man[mem_type] == man)
> > +                     break;
> > +
> > +     BUG_ON(mem_type >= TTM_NUM_MEM_TYPES);
> > +
> > +     if (!drmcgrp_mem_pressure_scan(bdev, mem_type))
> > +             return;
> > +
> > +     ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx);
> > +     if (ret)
> > +             return;
> > +
> > +     spin_lock(&man->move_lock);
> > +     fence = dma_fence_get(man->move);
> > +     spin_unlock(&man->move_lock);
> > +
> > +     if (fence) {
> > +             ret = dma_fence_wait(fence, false);
> > +             dma_fence_put(fence);
> > +     }
> > +
> > +}
> > +
> >  int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> >                       unsigned long p_size)
> >  {
> > @@ -1543,6 +1581,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> >               INIT_LIST_HEAD(&man->lru[i]);
> >       man->move = NULL;
> >
> > +     pr_err("drmcgrp %p type %d\n", bdev->ddev, type);
> > +
> > +     if (type <= TTM_PL_VRAM) {
> > +             INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
> > +             drmcgrp_register_device_mm(bdev->ddev, type, &man->reclaim_wq);
> > +     }
> > +
> >       return 0;
> >  }
> >  EXPORT_SYMBOL(ttm_bo_init_mm);
> > @@ -1620,6 +1665,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
> >               man = &bdev->man[i];
> >               if (man->has_type) {
> >                       man->use_type = false;
> > +                     drmcgrp_unregister_device_mm(bdev->ddev, i);
> > +                     cancel_work_sync(&man->reclaim_wq);
> >                       if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
> >                               ret = -EBUSY;
> >                               pr_err("DRM memory manager type %d is not clean\n",
> > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > index 360c1e6c809f..134d6e5475f3 100644
> > --- a/include/drm/drm_cgroup.h
> > +++ b/include/drm/drm_cgroup.h
> > @@ -5,6 +5,7 @@
> >  #define __DRM_CGROUP_H__
> >
> >  #include <linux/cgroup_drm.h>
> > +#include <linux/workqueue.h>
> >  #include <drm/ttm/ttm_bo_api.h>
> >  #include <drm/ttm/ttm_bo_driver.h>
> >
> > @@ -12,6 +13,9 @@
> >
> >  int drmcgrp_register_device(struct drm_device *device);
> >  int drmcgrp_unregister_device(struct drm_device *device);
> > +void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
> > +             struct work_struct *wq);
> > +void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type);
> >  bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
> >               struct drmcgrp *relative);
> >  void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
> > @@ -40,6 +44,16 @@ static inline int drmcgrp_unregister_device(struct drm_device *device)
> >       return 0;
> >  }
> >
> > +static inline void drmcgrp_register_device_mm(struct drm_device *dev,
> > +             unsigned type, struct work_struct *wq)
> > +{
> > +}
> > +
> > +static inline void drmcgrp_unregister_device_mm(struct drm_device *dev,
> > +             unsigned type)
> > +{
> > +}
> > +
> >  static inline bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
> >               struct drmcgrp *relative)
> >  {
> > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> > index 4cbcb41e5aa9..0956ca7888fc 100644
> > --- a/include/drm/ttm/ttm_bo_driver.h
> > +++ b/include/drm/ttm/ttm_bo_driver.h
> > @@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
> >        * Protected by @move_lock.
> >        */
> >       struct dma_fence *move;
> > +
> > +     struct work_struct reclaim_wq;
> >  };
> >
> >  /**
> > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> > index 1ce13db36ce9..985a89e849d3 100644
> > --- a/kernel/cgroup/drm.c
> > +++ b/kernel/cgroup/drm.c
> > @@ -31,6 +31,8 @@ struct drmcgrp_device {
> >       s64                     mem_bw_avg_bytes_per_us_default;
> >
> >       s64                     mem_highs_default[TTM_PL_PRIV+1];
> > +
> > +     struct work_struct      *mem_reclaim_wq[TTM_PL_PRIV];
> >  };
> >
> >  #define DRMCG_CTF_PRIV_SIZE 3
> > @@ -793,6 +795,31 @@ int drmcgrp_unregister_device(struct drm_device *dev)
> >  }
> >  EXPORT_SYMBOL(drmcgrp_unregister_device);
> >
> > +void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
> > +             struct work_struct *wq)
> > +{
> > +     if (dev == NULL || dev->primary->index > max_minor
> > +                     || type >= TTM_PL_PRIV)
> > +             return;
> > +
> > +     mutex_lock(&drmcgrp_mutex);
> > +     known_drmcgrp_devs[dev->primary->index]->mem_reclaim_wq[type] = wq;
> > +     mutex_unlock(&drmcgrp_mutex);
> > +}
> > +EXPORT_SYMBOL(drmcgrp_register_device_mm);
> > +
> > +void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type)
> > +{
> > +     if (dev == NULL || dev->primary->index > max_minor
> > +                     || type >= TTM_PL_PRIV)
> > +             return;
> > +
> > +     mutex_lock(&drmcgrp_mutex);
> > +     known_drmcgrp_devs[dev->primary->index]->mem_reclaim_wq[type] = NULL;
> > +     mutex_unlock(&drmcgrp_mutex);
> > +}
> > +EXPORT_SYMBOL(drmcgrp_unregister_device_mm);
> > +
> >  bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self, struct drmcgrp *relative)
> >  {
> >       for (; self != NULL; self = parent_drmcgrp(self))
> > @@ -1004,6 +1031,12 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
> >
> >               ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
> >                       -= move_in_bytes;
> > +
> > +             if (known_dev->mem_reclaim_wq[new_mem_type] != NULL &&
> > +                        ddr->mem_stats[new_mem_type] >
> > +                             ddr->mem_highs[new_mem_type])
> > +                     schedule_work(
> > +                             known_dev->mem_reclaim_wq[new_mem_type]);
> >       }
> >       mutex_unlock(&known_dev->mutex);
> >  }
> > --
> > 2.21.0
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
       [not found]     ` <20190626161254.GS12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-06-27  4:06       ` Kenny Ho
       [not found]         ` <CAOWid-f3kKnM=4oC5Bba5WW5WNV2MH5PvVamrhO6LBr5ydPJQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-27  4:06 UTC (permalink / raw)
  To: Daniel Vetter, Brian Welty, kraxel-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Kenny Ho, jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel, joseph.greathouse-5C7GfCeVMHo,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Wed, Jun 26, 2019 at 12:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Jun 26, 2019 at 11:05:18AM -0400, Kenny Ho wrote:
> > drm.memory.stats
> >         A read-only nested-keyed file which exists on all cgroups.
> >         Each entry is keyed by the drm device's major:minor.  The
> >         following nested keys are defined.
> >
> >           ======         =============================================
> >           system         Host/system memory
>
> Shouldn't that be covered by gem bo stats already? Also, system memory is
> definitely something a lot of non-ttm drivers want to be able to track, so
> that needs to be separate from ttm.
The gem bo stats covers all of these type.  I am treat the gem stats
as more of the front end and a hard limit and this set of stats as the
backing store which can be of various type.  How does non-ttm drivers
identify various memory types?

> >           tt             Host memory used by the drm device (GTT/GART)
> >           vram           Video RAM used by the drm device
> >           priv           Other drm device, vendor specific memory
>
> So what's "priv". In general I think we need some way to register the
> different kinds of memory, e.g. stuff not in your list:
>
> - multiple kinds of vram (like numa-style gpus)
> - cma (for all those non-ttm drivers that's a big one, it's like system
>   memory but also totally different)
> - any carveouts and stuff
privs are vendor specific, which is why I have truncated it.  For
example, AMD has AMDGPU_PL_GDS, GWS, OA
https://elixir.bootlin.com/linux/v5.2-rc6/source/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h#L30

Since we are using keyed file type, we should be able to support
vendor specific memory type but I am not sure if this is acceptable to
cgroup upstream.  This is why I stick to the 3 memory type that is
common across all ttm drivers.

> I think with all the ttm refactoring going on I think we need to de-ttm
> the interface functions here a bit. With Gerd Hoffmans series you can just
> use a gem_bo pointer here, so what's left to do is have some extracted
> structure for tracking memory types. I think Brian Welty has some ideas
> for this, even in patch form. Would be good to keep him on cc at least for
> the next version. We'd need to explicitly hand in the ttm_mem_reg (or
> whatever the specific thing is going to be).

I assume Gerd Hoffman's series you are referring to is this one?
https://www.spinics.net/lists/dri-devel/msg215056.html

I can certainly keep an eye out for Gerd's refactoring while
refactoring other parts of this RFC.

I have added Brian and Gerd to the thread for awareness.

Regards,
Kenny
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control
       [not found]         ` <20190626162554.GU12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-06-27  4:34           ` Kenny Ho
       [not found]             ` <CAOWid-dO5QH4wLyN_ztMaoZtLM9yzw-FEMgk3ufbh1ahHJ2vVg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-27  4:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel, joseph.greathouse-5C7GfCeVMHo,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Wed, Jun 26, 2019 at 12:25 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Jun 26, 2019 at 11:05:20AM -0400, Kenny Ho wrote:
> > The bandwidth is measured by keeping track of the amount of bytes moved
> > by ttm within a time period.  We defined two type of bandwidth: burst
> > and average.  Average bandwidth is calculated by dividing the total
> > amount of bytes moved within a cgroup by the lifetime of the cgroup.
> > Burst bandwidth is similar except that the byte and time measurement is
> > reset after a user configurable period.
>
> So I'm not too sure exposing this is a great idea, at least depending upon
> what you're trying to do with it. There's a few concerns here:
>
> - I think bo movement stats might be useful, but they're not telling you
>   everything. Applications can also copy data themselves and put buffers
>   where they want them, especially with more explicit apis like vk.
>
> - which kind of moves are we talking about here? Eviction related bo moves
>   seem not counted here, and if you have lots of gpus with funny
>   interconnects you might also get other kinds of moves, not just system
>   ram <-> vram.
Eviction move is counted but I think I placed the delay in the wrong
place (the tracking of byte moved is in previous patch in
ttm_bo_handle_move_mem, which is common to all move as far as I can
tell.)

> - What happens if we slow down, but someone else needs to evict our
>   buffers/move them (ttm is atm not great at this, but Christian König is
>   working on patches). I think there's lots of priority inversion
>   potential here.
>
> - If the goal is to avoid thrashing the interconnects, then this isn't the
>   full picture by far - apps can use copy engines and explicit placement,
>   again that's how vulkan at least is supposed to work.
>
> I guess these all boil down to: What do you want to achieve here? The
> commit message doesn't explain the intended use-case of this.
Thrashing prevention is the intent.  I am not familiar with Vulkan so
I will have to get back to you on that.  I don't know how those
explicit placement translate into the kernel.  At this stage, I think
it's still worth while to have this as a resource even if some
applications bypass the kernel.  I certainly welcome more feedback on
this topic.

Regards,
Kenny
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
       [not found]                   ` <CAOWid-egYGijS0a6uuG4mPUmOWaPwF-EKokR=LFNJ=5M+akVZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-27  5:43                     ` Daniel Vetter
  2019-06-27 18:42                       ` Kenny Ho
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-27  5:43 UTC (permalink / raw)
  To: Kenny Ho
  Cc: joseph.greathouse-5C7GfCeVMHo, Kenny Ho, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel,
	Daniel Vetter, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Wed, Jun 26, 2019 at 06:41:32PM -0400, Kenny Ho wrote:
> On Wed, Jun 26, 2019 at 5:41 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Wed, Jun 26, 2019 at 05:27:48PM -0400, Kenny Ho wrote:
> > > On Wed, Jun 26, 2019 at 12:05 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > So what happens when you start a lot of threads all at the same time,
> > > > allocating gem bo? Also would be nice if we could roll out at least the
> > > > accounting part of this cgroup to all GEM drivers.
> > >
> > > When there is a large number of allocation, the allocation will be
> > > checked in sequence within a device (since I used a per device mutex
> > > in the check.)  Are you suggesting the overhead here is significant
> > > enough to be a bottleneck?  The accounting part should be available to
> > > all GEM drivers (unless I missed something) since the chg and unchg
> > > function is called via the generic drm_gem_private_object_init and
> > > drm_gem_object_release.
> >
> > thread 1: checks limits, still under the total
> >
> > thread 2: checks limits, still under the total
> >
> > thread 1: allocates, still under
> >
> > thread 2: allocates, now over the limit
> >
> > I think the check and chg need to be one step, or this wont work. Or I'm
> > missing something somewhere.
> 
> Ok, I see what you are saying.
> 
> > Wrt rolling out the accounting for all drivers: Since you also roll out
> > enforcement in this patch I'm not sure whether the accounting part is
> > fully stand-alone. And as discussed a bit on an earlier patch, I think for
> > DRIVER_GEM we should set up the accounting cgroup automatically.
> I think I should be able to split the commit and restructure things a bit.
> 
> > > > What's the underlying technical reason for not allowing sharing across
> > > > cgroups?
> > > To be clear, sharing across cgroup is allowed, the buffer just needs
> > > to be allocated by a process that is parent to the cgroup.  So in the
> > > case of xorg allocating buffer for client, the xorg would be in the
> > > root cgroup and the buffer can be passed around by different clients
> > > (in root or other cgroup.)  The idea here is to establish some form of
> > > ownership, otherwise there wouldn't be a way to account for or limit
> > > the usage.
> >
> > But why? What's the problem if I allocate something and then hand it to
> > someone else. E.g. one popular use of cgroups is to isolate clients, so
> > maybe you'd do a cgroup + namespace for each X11 client (ok wayland, with
> > X11 this is probably pointless).
> >
> > But with your current limitation those clients can't pass buffers to the
> > compositor anymore, making cgroups useless. Your example here only works
> > if Xorg is in the root and allocates all the buffers. That's not even true
> > for DRI3 anymore.
> >
> > So pretty serious limitation on cgroups, and I'm not really understanding
> > why we need this. I think if we want to prevent buffer sharing, what we
> > need are some selinux hooks and stuff so you can prevent an import/access
> > by someone who's not allowed to touch a buffer. But that kind of access
> > right management should be separate from resource control imo.
> So without the sharing restriction and some kind of ownership
> structure, we will have to migrate/change the owner of the buffer when
> the cgroup that created the buffer die before the receiving cgroup(s)
> and I am not sure how to do that properly at the moment.  1) Should
> each cgroup keep track of all the buffers that belongs to it and
> migrate?  (Is that efficient?)  2) which cgroup should be the new
> owner (and therefore have the limit applied?)  Having the creator
> being the owner is kind of natural, but let say the buffer is shared
> with 5 other cgroups, which of these 5 cgroups should be the new owner
> of the buffer?

Different answers:

- Do we care if we leak bos like this in a cgroup, if the cgroup
  disappears before all the bo are cleaned up?

- Just charge the bo to each cgroup it's in? Will be quite a bit more
  tracking needed to get that done ...

- Also, there's the legacy way of sharing a bo, with the FLINK and
  GEM_OPEN ioctls. We need to plug these holes too.

Just feels like your current solution is technically well-justified, but
it completely defeats the point of cgroups/containers and buffer sharing
...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
       [not found]         ` <CAOWid-f3kKnM=4oC5Bba5WW5WNV2MH5PvVamrhO6LBr5ydPJQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-27  6:01           ` Daniel Vetter
  2019-06-27 20:17             ` Kenny Ho
  2019-06-28  1:16             ` Welty, Brian
  0 siblings, 2 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-06-27  6:01 UTC (permalink / raw)
  To: Kenny Ho
  Cc: amd-gfx list, joseph.greathouse-5C7GfCeVMHo, Kenny Ho,
	Brian Welty, jsparks-WVYJKLFxKCc, dri-devel, lkaplan-WVYJKLFxKCc,
	Alex Deucher, kraxel-H+wXaHxf7aLQT0dZR+AlfA, Daniel Vetter,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Thu, Jun 27, 2019 at 12:06:13AM -0400, Kenny Ho wrote:
> On Wed, Jun 26, 2019 at 12:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Jun 26, 2019 at 11:05:18AM -0400, Kenny Ho wrote:
> > > drm.memory.stats
> > >         A read-only nested-keyed file which exists on all cgroups.
> > >         Each entry is keyed by the drm device's major:minor.  The
> > >         following nested keys are defined.
> > >
> > >           ======         =============================================
> > >           system         Host/system memory
> >
> > Shouldn't that be covered by gem bo stats already? Also, system memory is
> > definitely something a lot of non-ttm drivers want to be able to track, so
> > that needs to be separate from ttm.
> The gem bo stats covers all of these type.  I am treat the gem stats
> as more of the front end and a hard limit and this set of stats as the
> backing store which can be of various type.  How does non-ttm drivers
> identify various memory types?

Not explicitly, they generally just have one. I think i915 currently has
two, system and carveout (with vram getting added).

> > >           tt             Host memory used by the drm device (GTT/GART)
> > >           vram           Video RAM used by the drm device
> > >           priv           Other drm device, vendor specific memory
> >
> > So what's "priv". In general I think we need some way to register the
> > different kinds of memory, e.g. stuff not in your list:
> >
> > - multiple kinds of vram (like numa-style gpus)
> > - cma (for all those non-ttm drivers that's a big one, it's like system
> >   memory but also totally different)
> > - any carveouts and stuff
> privs are vendor specific, which is why I have truncated it.  For
> example, AMD has AMDGPU_PL_GDS, GWS, OA
> https://elixir.bootlin.com/linux/v5.2-rc6/source/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h#L30
> 
> Since we are using keyed file type, we should be able to support
> vendor specific memory type but I am not sure if this is acceptable to
> cgroup upstream.  This is why I stick to the 3 memory type that is
> common across all ttm drivers.

I think we'll need custom memory pools, not just priv, and I guess some
naming scheme for them. I think just exposing them as amd-gws, amd-oa,
amd-gds would make sense.

Another thing I wonder about is multi-gpu cards, with multiple gpus and
each their own vram and other device-specific resources. For those we'd
have node0.vram and node1.vram too (on top of maybe an overall vram node,
not sure).

> > I think with all the ttm refactoring going on I think we need to de-ttm
> > the interface functions here a bit. With Gerd Hoffmans series you can just
> > use a gem_bo pointer here, so what's left to do is have some extracted
> > structure for tracking memory types. I think Brian Welty has some ideas
> > for this, even in patch form. Would be good to keep him on cc at least for
> > the next version. We'd need to explicitly hand in the ttm_mem_reg (or
> > whatever the specific thing is going to be).
> 
> I assume Gerd Hoffman's series you are referring to is this one?
> https://www.spinics.net/lists/dri-devel/msg215056.html

There's a newer one, much more complete, but yes that's the work.

> I can certainly keep an eye out for Gerd's refactoring while
> refactoring other parts of this RFC.
> 
> I have added Brian and Gerd to the thread for awareness.

btw just realized that maybe building the interfaces on top of ttm_mem_reg
is maybe not the best. That's what you're using right now, but in a way
that's just the ttm internal detail of how the backing storage is
allocated. I think the structure we need to abstract away is
ttm_mem_type_manager, without any of the actual management details.

btw reminds me: I guess it would be good to have a per-type .total
read-only exposed, so that userspace has an idea of how much there is?
ttm is trying to be agnostic to the allocator that's used to manage a
memory type/resource, so doesn't even know that. But I think something we
need to expose to admins, otherwise they can't meaningfully set limits.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control
       [not found]             ` <CAOWid-dO5QH4wLyN_ztMaoZtLM9yzw-FEMgk3ufbh1ahHJ2vVg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-27  6:11               ` Daniel Vetter
       [not found]                 ` <20190627061153.GD12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-27  6:11 UTC (permalink / raw)
  To: Kenny Ho
  Cc: joseph.greathouse-5C7GfCeVMHo, Kenny Ho, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel,
	Daniel Vetter, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Thu, Jun 27, 2019 at 12:34:05AM -0400, Kenny Ho wrote:
> On Wed, Jun 26, 2019 at 12:25 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Jun 26, 2019 at 11:05:20AM -0400, Kenny Ho wrote:
> > > The bandwidth is measured by keeping track of the amount of bytes moved
> > > by ttm within a time period.  We defined two type of bandwidth: burst
> > > and average.  Average bandwidth is calculated by dividing the total
> > > amount of bytes moved within a cgroup by the lifetime of the cgroup.
> > > Burst bandwidth is similar except that the byte and time measurement is
> > > reset after a user configurable period.
> >
> > So I'm not too sure exposing this is a great idea, at least depending upon
> > what you're trying to do with it. There's a few concerns here:
> >
> > - I think bo movement stats might be useful, but they're not telling you
> >   everything. Applications can also copy data themselves and put buffers
> >   where they want them, especially with more explicit apis like vk.
> >
> > - which kind of moves are we talking about here? Eviction related bo moves
> >   seem not counted here, and if you have lots of gpus with funny
> >   interconnects you might also get other kinds of moves, not just system
> >   ram <-> vram.
> Eviction move is counted but I think I placed the delay in the wrong
> place (the tracking of byte moved is in previous patch in
> ttm_bo_handle_move_mem, which is common to all move as far as I can
> tell.)
> 
> > - What happens if we slow down, but someone else needs to evict our
> >   buffers/move them (ttm is atm not great at this, but Christian König is
> >   working on patches). I think there's lots of priority inversion
> >   potential here.
> >
> > - If the goal is to avoid thrashing the interconnects, then this isn't the
> >   full picture by far - apps can use copy engines and explicit placement,
> >   again that's how vulkan at least is supposed to work.
> >
> > I guess these all boil down to: What do you want to achieve here? The
> > commit message doesn't explain the intended use-case of this.
> Thrashing prevention is the intent.  I am not familiar with Vulkan so
> I will have to get back to you on that.  I don't know how those
> explicit placement translate into the kernel.  At this stage, I think
> it's still worth while to have this as a resource even if some
> applications bypass the kernel.  I certainly welcome more feedback on
> this topic.

The trouble with thrashing prevention like this is that either you don't
limit all the bo moves, and then you don't count everything. Or you limit
them all, and then you create priority inversions in the ttm eviction
handler, essentially rate-limiting everyone who's thrashing. Or at least
you run the risk of that happening.

Not what you want I think :-)

I also think that the blkcg people are still trying to figure out how to
make this work fully reliable (it's the same problem really), and a
critical piece is knowing/estimating the overall bandwidth. Without that
the admin can't really do something meaningful. The problem with that is
you don't know, not just because of vk, but any userspace that has buffers
in the pci gart uses the same interconnect just as part of its rendering
job. So if your goal is to guaranteed some minimal amount of bo move
bandwidth, then this wont work, because you have no idea how much bandwith
there even is for bo moves.

Getting thrashing limited is very hard.

I feel like a better approach would by to add a cgroup for the various
engines on the gpu, and then also account all the sdma (or whatever the
name of the amd copy engines is again) usage by ttm_bo moves to the right
cgroup. I think that's a more meaningful limitation. For direct thrashing
control I think there's both not enough information available in the
kernel (you'd need some performance counters to watch how much bandwidth
userspace batches/CS are wasting), and I don't think the ttm eviction
logic is ready to step over all the priority inversion issues this will
bring up. Managing sdma usage otoh will be a lot more straightforward (but
still has all the priority inversion problems, but in the scheduler that
might be easier to fix perhaps with the explicit dependency graph - in the
i915 scheduler we already have priority boosting afaiui).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim
  2019-06-26 22:52         ` Kenny Ho
@ 2019-06-27  6:15           ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-06-27  6:15 UTC (permalink / raw)
  To: Kenny Ho
  Cc: joseph.greathouse, Kenny Ho, jsparks, amd-gfx list, lkaplan,
	Alex Deucher, dri-devel, Tejun Heo, cgroups,
	Christian König

On Wed, Jun 26, 2019 at 06:52:50PM -0400, Kenny Ho wrote:
> Ok.  I am not too familiar with shrinker but I will dig into it.  Just
> so that I am looking into the right things, you are referring to
> things like struct shrinker and struct shrink_control?

Yeah. Reason I'm asking for this is this is how system memory is shrunk
right now, so at least having some conceptual similarities might be useful
here. And a lot of people have thought quite hard about system memory
shrinking and all that, so hopefully that gives us good design
inspiration.
-Daniel

> 
> Regards,
> Kenny
> 
> On Wed, Jun 26, 2019 at 12:44 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Jun 26, 2019 at 11:05:22AM -0400, Kenny Ho wrote:
> > > Allow DRM TTM memory manager to register a work_struct, such that, when
> > > a drmcgrp is under memory pressure, memory reclaiming can be triggered
> > > immediately.
> > >
> > > Change-Id: I25ac04e2db9c19ff12652b88ebff18b44b2706d8
> > > Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> > > ---
> > >  drivers/gpu/drm/ttm/ttm_bo.c    | 47 +++++++++++++++++++++++++++++++++
> > >  include/drm/drm_cgroup.h        | 14 ++++++++++
> > >  include/drm/ttm/ttm_bo_driver.h |  2 ++
> > >  kernel/cgroup/drm.c             | 33 +++++++++++++++++++++++
> > >  4 files changed, 96 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > > index 79c530f4a198..5fc3bc5bd4c5 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > @@ -1509,6 +1509,44 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned mem_type)
> > >  }
> > >  EXPORT_SYMBOL(ttm_bo_evict_mm);
> > >
> > > +static void ttm_bo_reclaim_wq(struct work_struct *work)
> > > +{
> >
> > I think a design a bit more inspired by memcg aware core shrinkers would
> > be nice, i.e. explicitly passing:
> > - which drm_cgroup needs to be shrunk
> > - which ttm_mem_reg (well the fancy new abstracted out stuff for tracking
> >   special gpu memory resources like tt or vram or whatever)
> > - how much it needs to be shrunk
> >
> > I think with that a lot more the book-keeping could be pushed into the
> > drm_cgroup code, and the callback just needs to actually shrink enough as
> > requested.
> > -Daniel
> >
> > > +     struct ttm_operation_ctx ctx = {
> > > +             .interruptible = false,
> > > +             .no_wait_gpu = false,
> > > +             .flags = TTM_OPT_FLAG_FORCE_ALLOC
> > > +     };
> > > +     struct ttm_mem_type_manager *man =
> > > +         container_of(work, struct ttm_mem_type_manager, reclaim_wq);
> > > +     struct ttm_bo_device *bdev = man->bdev;
> > > +     struct dma_fence *fence;
> > > +     int mem_type;
> > > +     int ret;
> > > +
> > > +     for (mem_type = 0; mem_type < TTM_NUM_MEM_TYPES; mem_type++)
> > > +             if (&bdev->man[mem_type] == man)
> > > +                     break;
> > > +
> > > +     BUG_ON(mem_type >= TTM_NUM_MEM_TYPES);
> > > +
> > > +     if (!drmcgrp_mem_pressure_scan(bdev, mem_type))
> > > +             return;
> > > +
> > > +     ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx);
> > > +     if (ret)
> > > +             return;
> > > +
> > > +     spin_lock(&man->move_lock);
> > > +     fence = dma_fence_get(man->move);
> > > +     spin_unlock(&man->move_lock);
> > > +
> > > +     if (fence) {
> > > +             ret = dma_fence_wait(fence, false);
> > > +             dma_fence_put(fence);
> > > +     }
> > > +
> > > +}
> > > +
> > >  int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> > >                       unsigned long p_size)
> > >  {
> > > @@ -1543,6 +1581,13 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> > >               INIT_LIST_HEAD(&man->lru[i]);
> > >       man->move = NULL;
> > >
> > > +     pr_err("drmcgrp %p type %d\n", bdev->ddev, type);
> > > +
> > > +     if (type <= TTM_PL_VRAM) {
> > > +             INIT_WORK(&man->reclaim_wq, ttm_bo_reclaim_wq);
> > > +             drmcgrp_register_device_mm(bdev->ddev, type, &man->reclaim_wq);
> > > +     }
> > > +
> > >       return 0;
> > >  }
> > >  EXPORT_SYMBOL(ttm_bo_init_mm);
> > > @@ -1620,6 +1665,8 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
> > >               man = &bdev->man[i];
> > >               if (man->has_type) {
> > >                       man->use_type = false;
> > > +                     drmcgrp_unregister_device_mm(bdev->ddev, i);
> > > +                     cancel_work_sync(&man->reclaim_wq);
> > >                       if ((i != TTM_PL_SYSTEM) && ttm_bo_clean_mm(bdev, i)) {
> > >                               ret = -EBUSY;
> > >                               pr_err("DRM memory manager type %d is not clean\n",
> > > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > > index 360c1e6c809f..134d6e5475f3 100644
> > > --- a/include/drm/drm_cgroup.h
> > > +++ b/include/drm/drm_cgroup.h
> > > @@ -5,6 +5,7 @@
> > >  #define __DRM_CGROUP_H__
> > >
> > >  #include <linux/cgroup_drm.h>
> > > +#include <linux/workqueue.h>
> > >  #include <drm/ttm/ttm_bo_api.h>
> > >  #include <drm/ttm/ttm_bo_driver.h>
> > >
> > > @@ -12,6 +13,9 @@
> > >
> > >  int drmcgrp_register_device(struct drm_device *device);
> > >  int drmcgrp_unregister_device(struct drm_device *device);
> > > +void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
> > > +             struct work_struct *wq);
> > > +void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type);
> > >  bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
> > >               struct drmcgrp *relative);
> > >  void drmcgrp_chg_bo_alloc(struct drmcgrp *drmcgrp, struct drm_device *dev,
> > > @@ -40,6 +44,16 @@ static inline int drmcgrp_unregister_device(struct drm_device *device)
> > >       return 0;
> > >  }
> > >
> > > +static inline void drmcgrp_register_device_mm(struct drm_device *dev,
> > > +             unsigned type, struct work_struct *wq)
> > > +{
> > > +}
> > > +
> > > +static inline void drmcgrp_unregister_device_mm(struct drm_device *dev,
> > > +             unsigned type)
> > > +{
> > > +}
> > > +
> > >  static inline bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self,
> > >               struct drmcgrp *relative)
> > >  {
> > > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> > > index 4cbcb41e5aa9..0956ca7888fc 100644
> > > --- a/include/drm/ttm/ttm_bo_driver.h
> > > +++ b/include/drm/ttm/ttm_bo_driver.h
> > > @@ -205,6 +205,8 @@ struct ttm_mem_type_manager {
> > >        * Protected by @move_lock.
> > >        */
> > >       struct dma_fence *move;
> > > +
> > > +     struct work_struct reclaim_wq;
> > >  };
> > >
> > >  /**
> > > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> > > index 1ce13db36ce9..985a89e849d3 100644
> > > --- a/kernel/cgroup/drm.c
> > > +++ b/kernel/cgroup/drm.c
> > > @@ -31,6 +31,8 @@ struct drmcgrp_device {
> > >       s64                     mem_bw_avg_bytes_per_us_default;
> > >
> > >       s64                     mem_highs_default[TTM_PL_PRIV+1];
> > > +
> > > +     struct work_struct      *mem_reclaim_wq[TTM_PL_PRIV];
> > >  };
> > >
> > >  #define DRMCG_CTF_PRIV_SIZE 3
> > > @@ -793,6 +795,31 @@ int drmcgrp_unregister_device(struct drm_device *dev)
> > >  }
> > >  EXPORT_SYMBOL(drmcgrp_unregister_device);
> > >
> > > +void drmcgrp_register_device_mm(struct drm_device *dev, unsigned type,
> > > +             struct work_struct *wq)
> > > +{
> > > +     if (dev == NULL || dev->primary->index > max_minor
> > > +                     || type >= TTM_PL_PRIV)
> > > +             return;
> > > +
> > > +     mutex_lock(&drmcgrp_mutex);
> > > +     known_drmcgrp_devs[dev->primary->index]->mem_reclaim_wq[type] = wq;
> > > +     mutex_unlock(&drmcgrp_mutex);
> > > +}
> > > +EXPORT_SYMBOL(drmcgrp_register_device_mm);
> > > +
> > > +void drmcgrp_unregister_device_mm(struct drm_device *dev, unsigned type)
> > > +{
> > > +     if (dev == NULL || dev->primary->index > max_minor
> > > +                     || type >= TTM_PL_PRIV)
> > > +             return;
> > > +
> > > +     mutex_lock(&drmcgrp_mutex);
> > > +     known_drmcgrp_devs[dev->primary->index]->mem_reclaim_wq[type] = NULL;
> > > +     mutex_unlock(&drmcgrp_mutex);
> > > +}
> > > +EXPORT_SYMBOL(drmcgrp_unregister_device_mm);
> > > +
> > >  bool drmcgrp_is_self_or_ancestor(struct drmcgrp *self, struct drmcgrp *relative)
> > >  {
> > >       for (; self != NULL; self = parent_drmcgrp(self))
> > > @@ -1004,6 +1031,12 @@ void drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
> > >
> > >               ddr->mem_bw_stats[DRMCGRP_MEM_BW_ATTR_BYTE_CREDIT]
> > >                       -= move_in_bytes;
> > > +
> > > +             if (known_dev->mem_reclaim_wq[new_mem_type] != NULL &&
> > > +                        ddr->mem_stats[new_mem_type] >
> > > +                             ddr->mem_highs[new_mem_type])
> > > +                     schedule_work(
> > > +                             known_dev->mem_reclaim_wq[new_mem_type]);
> > >       }
> > >       mutex_unlock(&known_dev->mutex);
> > >  }
> > > --
> > > 2.21.0
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem
  2019-06-26 15:05 [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
                   ` (6 preceding siblings ...)
  2019-06-26 15:05 ` [RFC PATCH v3 10/11] drm, cgroup: Add soft VRAM limit Kenny Ho
@ 2019-06-27  7:24 ` Daniel Vetter
  2019-06-30  5:10   ` Kenny Ho
  7 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-27  7:24 UTC (permalink / raw)
  To: Kenny Ho, Jerome Glisse
  Cc: jsparks, amd-gfx list, lkaplan, Alex Deucher, Kenny Ho,
	dri-devel, joseph.greathouse, Tejun Heo, cgroups,
	Christian König

On Wed, Jun 26, 2019 at 5:05 PM Kenny Ho <Kenny.Ho@amd.com> wrote:
> This is a follow up to the RFC I made previously to introduce a cgroup
> controller for the GPU/DRM subsystem [v1,v2].  The goal is to be able to
> provide resource management to GPU resources using things like container.
> The cover letter from v1 is copied below for reference.
>
> [v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
>
> v3:
> Base on feedbacks on v2:
> * removed .help type file from v2
> * conform to cgroup convention for default and max handling
> * conform to cgroup convention for addressing device specific limits (with major:minor)
> New function:
> * adopted memparse for memory size related attributes
> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
> * added ttm buffer usage limit (per cgroup, for vram.)
> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
>
> v2:
> * Removed the vendoring concepts
> * Add limit to total buffer allocation
> * Add limit to the maximum size of a buffer allocation
>
> v1: cover letter
>
> The purpose of this patch series is to start a discussion for a generic cgroup
> controller for the drm subsystem.  The design proposed here is a very early one.
> We are hoping to engage the community as we develop the idea.
>
>
> Backgrounds
> ==========
> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
> tasks, and all their future children, into hierarchical groups with specialized
> behaviour, such as accounting/limiting the resources which processes in a cgroup
> can access[1].  Weights, limits, protections, allocations are the main resource
> distribution models.  Existing cgroup controllers includes cpu, memory, io,
> rdma, and more.  cgroup is one of the foundational technologies that enables the
> popular container application deployment and management method.
>
> Direct Rendering Manager/drm contains code intended to support the needs of
> complex graphics devices. Graphics drivers in the kernel may make use of DRM
> functions to make tasks like memory management, interrupt handling and DMA
> easier, and provide a uniform interface to applications.  The DRM has also
> developed beyond traditional graphics applications to support compute/GPGPU
> applications.
>
>
> Motivations
> =========
> As GPU grow beyond the realm of desktop/workstation graphics into areas like
> data center clusters and IoT, there are increasing needs to monitor and regulate
> GPU as a resource like cpu, memory and io.
>
> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
> purpose of managing GPU priority using the cgroup hierarchy.  While that
> particular use case may not warrant a standalone drm cgroup controller, there
> are other use cases where having one can be useful [3].  Monitoring GPU
> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
> sysadmins get a better understanding of the applications usage profile.  Further
> usage regulations of the aforementioned resources can also help sysadmins
> optimize workload deployment on limited GPU resources.
>
> With the increased importance of machine learning, data science and other
> cloud-based applications, GPUs are already in production use in data centers
> today [5,6,7].  Existing GPU resource management is very course grain, however,
> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
> alternative is to use GPU virtualization (with or without SRIOV) but it
> generally acts on the entire GPU instead of the specific resources in a GPU.
> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
> resource management (in addition to what may be available via GPU
> virtualization.)
>
> In addition to production use, the DRM cgroup can also help with testing
> graphics application robustness by providing a mean to artificially limit DRM
> resources availble to the applications.
>
>
> Challenges
> ========
> While there are common infrastructure in DRM that is shared across many vendors
> (the scheduler [4] for example), there are also aspects of DRM that are vendor
> specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
> handle different kinds of cgroup controller.
>
> Resources for DRM are also often device (GPU) specific instead of system
> specific and a system may contain more than one GPU.  For this, we borrowed some
> of the ideas from RDMA cgroup controller.

Another question I have: What about HMM? With the device memory zone
the core mm will be a lot more involved in managing that, but I also
expect that we'll have classic buffer-based management for a long time
still. So these need to work together, and I fear slightly that we'll
have memcg and drmcg fighting over the same pieces a bit perhaps?

Adding Jerome, maybe he has some thoughts on this.
-Daniel

> Approach
> =======
> To experiment with the idea of a DRM cgroup, we would like to start with basic
> accounting and statistics, then continue to iterate and add regulating
> mechanisms into the driver.
>
> [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
> [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
> [3] https://www.spinics.net/lists/cgroups/msg20720.html
> [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
> [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
> [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
> [7] https://github.com/RadeonOpenCompute/k8s-device-plugin
> [8] https://github.com/kubernetes/kubernetes/issues/52757
>
> Kenny Ho (11):
>   cgroup: Introduce cgroup for drm subsystem
>   cgroup: Add mechanism to register DRM devices
>   drm/amdgpu: Register AMD devices for DRM cgroup
>   drm, cgroup: Add total GEM buffer allocation limit
>   drm, cgroup: Add peak GEM buffer allocation limit
>   drm, cgroup: Add GEM buffer allocation count stats
>   drm, cgroup: Add TTM buffer allocation stats
>   drm, cgroup: Add TTM buffer peak usage stats
>   drm, cgroup: Add per cgroup bw measure and control
>   drm, cgroup: Add soft VRAM limit
>   drm, cgroup: Allow more aggressive memory reclaim
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |    4 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |    4 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |    3 +-
>  drivers/gpu/drm/drm_gem.c                  |    8 +
>  drivers/gpu/drm/drm_prime.c                |    9 +
>  drivers/gpu/drm/ttm/ttm_bo.c               |   91 ++
>  drivers/gpu/drm/ttm/ttm_bo_util.c          |    4 +
>  include/drm/drm_cgroup.h                   |  115 ++
>  include/drm/drm_gem.h                      |   11 +
>  include/drm/ttm/ttm_bo_api.h               |    2 +
>  include/drm/ttm/ttm_bo_driver.h            |   10 +
>  include/linux/cgroup_drm.h                 |  114 ++
>  include/linux/cgroup_subsys.h              |    4 +
>  init/Kconfig                               |    5 +
>  kernel/cgroup/Makefile                     |    1 +
>  kernel/cgroup/drm.c                        | 1171 ++++++++++++++++++++
>  16 files changed, 1555 insertions(+), 1 deletion(-)
>  create mode 100644 include/drm/drm_cgroup.h
>  create mode 100644 include/linux/cgroup_drm.h
>  create mode 100644 kernel/cgroup/drm.c
>
> --
> 2.21.0
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
  2019-06-27  5:43                     ` Daniel Vetter
@ 2019-06-27 18:42                       ` Kenny Ho
       [not found]                         ` <CAOWid-cT4TQ7HGzcSWjmLGjAW_D1hRrkNguEiV8N+baNiKQm_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-27 18:42 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks, amd-gfx list, lkaplan, Alex Deucher,
	dri-devel, joseph.greathouse, Tejun Heo, cgroups,
	Christian König

On Thu, Jun 27, 2019 at 1:43 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Jun 26, 2019 at 06:41:32PM -0400, Kenny Ho wrote:
> > So without the sharing restriction and some kind of ownership
> > structure, we will have to migrate/change the owner of the buffer when
> > the cgroup that created the buffer die before the receiving cgroup(s)
> > and I am not sure how to do that properly at the moment.  1) Should
> > each cgroup keep track of all the buffers that belongs to it and
> > migrate?  (Is that efficient?)  2) which cgroup should be the new
> > owner (and therefore have the limit applied?)  Having the creator
> > being the owner is kind of natural, but let say the buffer is shared
> > with 5 other cgroups, which of these 5 cgroups should be the new owner
> > of the buffer?
>
> Different answers:
>
> - Do we care if we leak bos like this in a cgroup, if the cgroup
>   disappears before all the bo are cleaned up?
>
> - Just charge the bo to each cgroup it's in? Will be quite a bit more
>   tracking needed to get that done ...
That seems to be the approach memcg takes, but as shown by the lwn
link you sent me from the last rfc (talk from Roman Gushchin), that
approach is not problem free either.  And wouldn't this approach
disconnect resource management from the underlying resource one would
like to control?  For example, if you have 5 MB of memory, you can
have 5 users using 1 MB each.  But in the charge-everybody approach, a
1 MB usage shared 4 times will make it looks like 5MB is used.  So the
resource being control is no longer 'real' since the amount of
resource you have is now dynamic and depends on the amount of sharing
one does.

> - Also, there's the legacy way of sharing a bo, with the FLINK and
>   GEM_OPEN ioctls. We need to plug these holes too.
>
> Just feels like your current solution is technically well-justified, but
> it completely defeats the point of cgroups/containers and buffer sharing
> ...
Um... I am going to get a bit philosophical here and suggest that the
idea of sharing (especially uncontrolled sharing) is inherently at odd
with containment.  It's like, if everybody is special, no one is
special.  Perhaps an alternative is to make this configurable so that
people can allow sharing knowing the caveat?  And just to be clear,
the current solution allows for sharing, even between cgroup.

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
  2019-06-27  6:01           ` Daniel Vetter
@ 2019-06-27 20:17             ` Kenny Ho
  2019-06-27 21:33               ` Daniel Vetter
  2019-06-28  1:16             ` Welty, Brian
  1 sibling, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-27 20:17 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, dri-devel, Brian Welty, jsparks, amd-gfx list, lkaplan,
	Alex Deucher, kraxel, joseph.greathouse, Tejun Heo, cgroups,
	Christian König

On Thu, Jun 27, 2019 at 2:01 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> btw reminds me: I guess it would be good to have a per-type .total
> read-only exposed, so that userspace has an idea of how much there is?
> ttm is trying to be agnostic to the allocator that's used to manage a
> memory type/resource, so doesn't even know that. But I think something we
> need to expose to admins, otherwise they can't meaningfully set limits.

I don't think I understand this bit, do you mean total across multiple
GPU of the same mem type?  Or do you mean the total available per GPU
(or something else?)

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
       [not found]                         ` <CAOWid-cT4TQ7HGzcSWjmLGjAW_D1hRrkNguEiV8N+baNiKQm_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-06-27 21:24                           ` Daniel Vetter
  2019-06-28 18:43                             ` Kenny Ho
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2019-06-27 21:24 UTC (permalink / raw)
  To: Kenny Ho
  Cc: joseph.greathouse-5C7GfCeVMHo, Kenny Ho, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel,
	Daniel Vetter, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Thu, Jun 27, 2019 at 02:42:43PM -0400, Kenny Ho wrote:
> On Thu, Jun 27, 2019 at 1:43 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Jun 26, 2019 at 06:41:32PM -0400, Kenny Ho wrote:
> > > So without the sharing restriction and some kind of ownership
> > > structure, we will have to migrate/change the owner of the buffer when
> > > the cgroup that created the buffer die before the receiving cgroup(s)
> > > and I am not sure how to do that properly at the moment.  1) Should
> > > each cgroup keep track of all the buffers that belongs to it and
> > > migrate?  (Is that efficient?)  2) which cgroup should be the new
> > > owner (and therefore have the limit applied?)  Having the creator
> > > being the owner is kind of natural, but let say the buffer is shared
> > > with 5 other cgroups, which of these 5 cgroups should be the new owner
> > > of the buffer?
> >
> > Different answers:
> >
> > - Do we care if we leak bos like this in a cgroup, if the cgroup
> >   disappears before all the bo are cleaned up?
> >
> > - Just charge the bo to each cgroup it's in? Will be quite a bit more
> >   tracking needed to get that done ...
> That seems to be the approach memcg takes, but as shown by the lwn
> link you sent me from the last rfc (talk from Roman Gushchin), that
> approach is not problem free either.  And wouldn't this approach
> disconnect resource management from the underlying resource one would
> like to control?  For example, if you have 5 MB of memory, you can
> have 5 users using 1 MB each.  But in the charge-everybody approach, a
> 1 MB usage shared 4 times will make it looks like 5MB is used.  So the
> resource being control is no longer 'real' since the amount of
> resource you have is now dynamic and depends on the amount of sharing
> one does.

The problem with memcg is that it's not just the allocation, but a ton of
memory allocated to track these allocations. At least that's my
understanding of the nature of the memcg leak. Made a lot worse by pages
being small and plentiful and shared extremely widely (e.g. it's really
hard to control who gets charged for pagecache allocations, so those
pagecache entries might outlive the memcg forever if you're unlucky).

For us it's just a counter, plus bo sharing is a lot more controlled: On
any reasonable system if you do kill the compositor, then all the clients
go down. And when you do kill a client, the compositor will release all
the shared buffers (and any other resources).

So I think for drmcg we won't have anything near the same resource leak
problem even in theory, and in practice I think the issue is none.

> > - Also, there's the legacy way of sharing a bo, with the FLINK and
> >   GEM_OPEN ioctls. We need to plug these holes too.
> >
> > Just feels like your current solution is technically well-justified, but
> > it completely defeats the point of cgroups/containers and buffer sharing
> > ...
> Um... I am going to get a bit philosophical here and suggest that the
> idea of sharing (especially uncontrolled sharing) is inherently at odd
> with containment.  It's like, if everybody is special, no one is
> special.  Perhaps an alternative is to make this configurable so that
> people can allow sharing knowing the caveat?  And just to be clear,
> the current solution allows for sharing, even between cgroup.

The thing is, why shouldn't we just allow it (with some documented
caveat)?

I mean if all people do is share it as your current patches allow, then
there's nothing funny going on (at least if we go with just leaking the
allocations). If we allow additional sharing, then that's a plus.

And if you want additional containment, that's a different thing: The
entire linux architecture for containers is that a container doesn't
exist. Instead you get a pile of building blocks that all solve different
aspects of what a container needs to do:
- cgroups for resource limits
- namespaces for resource visibility
- selinux/secomp/lsm for resource isolation and access rights

Let's not try to build a drm cgroup control that tries to do more than
what cgroups are meant to solve. If you have a need to restrict the
sharing, imo that should be done with an lsm security hook.

btw for bo sharing, I've found a 3rd sharing path (besides dma-buf and gem
flink): GETCRTC ioctl can also be used (it's the itended goal actually) to
share buffers across processes.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
  2019-06-27 20:17             ` Kenny Ho
@ 2019-06-27 21:33               ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-06-27 21:33 UTC (permalink / raw)
  To: Kenny Ho
  Cc: amd-gfx list, joseph.greathouse, Kenny Ho, Brian Welty, jsparks,
	dri-devel, lkaplan, Alex Deucher, kraxel, Tejun Heo, cgroups,
	Christian König

On Thu, Jun 27, 2019 at 04:17:09PM -0400, Kenny Ho wrote:
> On Thu, Jun 27, 2019 at 2:01 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > btw reminds me: I guess it would be good to have a per-type .total
> > read-only exposed, so that userspace has an idea of how much there is?
> > ttm is trying to be agnostic to the allocator that's used to manage a
> > memory type/resource, so doesn't even know that. But I think something we
> > need to expose to admins, otherwise they can't meaningfully set limits.
> 
> I don't think I understand this bit, do you mean total across multiple
> GPU of the same mem type?  Or do you mean the total available per GPU
> (or something else?)

Total for a given type on a given cpu. E.g. maybe you want to give 50% of
your vram to one cgroup, and the other 50% to the other cgroup. For that
you need to know how much vram you have. And expecting people to lspci and
then look at wikipedia for how much vram that chip should have (or
something like that) isn't great. Hence 0.vram.total, 0.tt.total, and so
on (also for all the other gpu minors ofc).  For system memory we probably
don't want to provide a total, since that's already a value that's easy to
obtain from various sources.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
  2019-06-27  6:01           ` Daniel Vetter
  2019-06-27 20:17             ` Kenny Ho
@ 2019-06-28  1:16             ` Welty, Brian
       [not found]               ` <01a6efa8-802c-b8b1-931e-4f0c1c63beca-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 47+ messages in thread
From: Welty, Brian @ 2019-06-28  1:16 UTC (permalink / raw)
  To: Daniel Vetter, Kenny Ho
  Cc: Kenny Ho, dri-devel, jsparks, amd-gfx list, lkaplan,
	Alex Deucher, kraxel, joseph.greathouse, Tejun Heo,
	Christian König


On 6/26/2019 11:01 PM, Daniel Vetter wrote:
> On Thu, Jun 27, 2019 at 12:06:13AM -0400, Kenny Ho wrote:
>> On Wed, Jun 26, 2019 at 12:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>
>>> I think with all the ttm refactoring going on I think we need to de-ttm
>>> the interface functions here a bit. With Gerd Hoffmans series you can just
>>> use a gem_bo pointer here, so what's left to do is have some extracted
>>> structure for tracking memory types. I think Brian Welty has some ideas
>>> for this, even in patch form. Would be good to keep him on cc at least for
>>> the next version. We'd need to explicitly hand in the ttm_mem_reg (or
>>> whatever the specific thing is going to be).
>>
>> I assume Gerd Hoffman's series you are referring to is this one?
>> https://www.spinics.net/lists/dri-devel/msg215056.html
> 
> There's a newer one, much more complete, but yes that's the work.
> 
>> I can certainly keep an eye out for Gerd's refactoring while
>> refactoring other parts of this RFC.
>>
>> I have added Brian and Gerd to the thread for awareness.
> 
> btw just realized that maybe building the interfaces on top of ttm_mem_reg
> is maybe not the best. That's what you're using right now, but in a way
> that's just the ttm internal detail of how the backing storage is
> allocated. I think the structure we need to abstract away is
> ttm_mem_type_manager, without any of the actual management details.
> 

Any de-ttm refactoring should probably not spam all the cgroups folks.
So I removed cgroups list.

As Daniel mentioned, some of us are looking at possible refactoring of TTM
for reuse in i915 driver.
Here is a brief summary of some ideas to be considered:

 1) refactor part of ttm_mem_type_manager into a new drm_mem_type_region.
    Really, should then move the array from ttm_bo_device.man[] into drm_device.

    Relevant to drm_cgroup, you could then perhaps access these stats through
    drm_device and don't need the mem_stats array in drmcgrp_device_resource.

  1a)  doing this right means replacing TTM_PL_XXX memory types with new DRM
     defines.  But could keep the TTM ones as redefinition of (new) DRM ones.
     Probably those private ones (TTM_PL_PRIV) make this difficult.

  All of the above could be eventually leveraged by the vram support being
  implemented now in i915 driver.
  
  2) refactor ttm_mem_reg + ttm_bus_placement into something generic for
     any GEM object,  maybe call it drm_gem_object_placement.
     ttm_mem_reg could remain as a wrapper for TTM drivers.
     This hasn't been broadly discussed with intel-gfx folks, so not sure
     this fits well into i915 or not.

     Relevant to drm_cgroup, maybe this function:
	drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
		struct ttm_mem_reg *new_mem)
     could potentially become:
        drmcgrp_mem_track_move(struct drm_gem_object *old_bo, bool evict,
		struct drm_gem_object_placement *new_place)

     Though from ttm_mem_reg, you look to only be using mem_type and size.
     I think Daniel is noting that ttm_mem_reg wasn't truly needed here, so
     you could just pass in the mem_type and size instead.

Would appreciate any feedback (positive or negative) on above....
Perhaps this should move to a new thread?   I could send out basic RFC
patches for (1) if helpful but as it touches all the TTM drivers, nice to
hear some feedback first.
Anyway, this doesn't necessarily need to block forward progress on drm_cgroup,
as refactoring into common base structures could happen incrementally.

Thanks,
-Brian
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
       [not found]               ` <01a6efa8-802c-b8b1-931e-4f0c1c63beca-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2019-06-28  6:53                 ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-06-28  6:53 UTC (permalink / raw)
  To: Welty, Brian
  Cc: Kenny Ho, dri-devel, jsparks-WVYJKLFxKCc, amd-gfx list,
	lkaplan-WVYJKLFxKCc, Alex Deucher, Kenny Ho, Gerd Hoffmann,
	joseph.greathouse-5C7GfCeVMHo, Tejun Heo, Christian König

On Fri, Jun 28, 2019 at 3:16 AM Welty, Brian <brian.welty@intel.com> wrote:
> On 6/26/2019 11:01 PM, Daniel Vetter wrote:
> > On Thu, Jun 27, 2019 at 12:06:13AM -0400, Kenny Ho wrote:
> >> On Wed, Jun 26, 2019 at 12:12 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>
> >>> I think with all the ttm refactoring going on I think we need to de-ttm
> >>> the interface functions here a bit. With Gerd Hoffmans series you can just
> >>> use a gem_bo pointer here, so what's left to do is have some extracted
> >>> structure for tracking memory types. I think Brian Welty has some ideas
> >>> for this, even in patch form. Would be good to keep him on cc at least for
> >>> the next version. We'd need to explicitly hand in the ttm_mem_reg (or
> >>> whatever the specific thing is going to be).
> >>
> >> I assume Gerd Hoffman's series you are referring to is this one?
> >> https://www.spinics.net/lists/dri-devel/msg215056.html
> >
> > There's a newer one, much more complete, but yes that's the work.
> >
> >> I can certainly keep an eye out for Gerd's refactoring while
> >> refactoring other parts of this RFC.
> >>
> >> I have added Brian and Gerd to the thread for awareness.
> >
> > btw just realized that maybe building the interfaces on top of ttm_mem_reg
> > is maybe not the best. That's what you're using right now, but in a way
> > that's just the ttm internal detail of how the backing storage is
> > allocated. I think the structure we need to abstract away is
> > ttm_mem_type_manager, without any of the actual management details.
> >
>
> Any de-ttm refactoring should probably not spam all the cgroups folks.
> So I removed cgroups list.
>
> As Daniel mentioned, some of us are looking at possible refactoring of TTM
> for reuse in i915 driver.
> Here is a brief summary of some ideas to be considered:
>
>  1) refactor part of ttm_mem_type_manager into a new drm_mem_type_region.
>     Really, should then move the array from ttm_bo_device.man[] into drm_device.
>
>     Relevant to drm_cgroup, you could then perhaps access these stats through
>     drm_device and don't need the mem_stats array in drmcgrp_device_resource.
>
>   1a)  doing this right means replacing TTM_PL_XXX memory types with new DRM
>      defines.  But could keep the TTM ones as redefinition of (new) DRM ones.
>      Probably those private ones (TTM_PL_PRIV) make this difficult.
>
>   All of the above could be eventually leveraged by the vram support being
>   implemented now in i915 driver.
>
>   2) refactor ttm_mem_reg + ttm_bus_placement into something generic for
>      any GEM object,  maybe call it drm_gem_object_placement.
>      ttm_mem_reg could remain as a wrapper for TTM drivers.
>      This hasn't been broadly discussed with intel-gfx folks, so not sure
>      this fits well into i915 or not.
>
>      Relevant to drm_cgroup, maybe this function:
>         drmcgrp_mem_track_move(struct ttm_buffer_object *old_bo, bool evict,
>                 struct ttm_mem_reg *new_mem)
>      could potentially become:
>         drmcgrp_mem_track_move(struct drm_gem_object *old_bo, bool evict,
>                 struct drm_gem_object_placement *new_place)
>
>      Though from ttm_mem_reg, you look to only be using mem_type and size.
>      I think Daniel is noting that ttm_mem_reg wasn't truly needed here, so
>      you could just pass in the mem_type and size instead.

Yeah I think the relevant part of your refactoring is creating a more
abstract memory type/resource thing (not the individual allocations
from it which ttm calls regions and I get confused about that every
time). I think that abstraction should also have a field for the total
(which I think cgroups also needs, both as read-only information and
starting value). ttm would that put somewhere into the
ttm_mem_type_manager, i915 would put it somewhere else, cma based
drivers could perhaps expose the cma heap like that (if it's exclusive
to the gpu at least).

> Would appreciate any feedback (positive or negative) on above....
> Perhaps this should move to a new thread?   I could send out basic RFC
> patches for (1) if helpful but as it touches all the TTM drivers, nice to
> hear some feedback first.
> Anyway, this doesn't necessarily need to block forward progress on drm_cgroup,
> as refactoring into common base structures could happen incrementally.

Yeah I think new dri-devel thread with a totally unpolished rfc as
draft plus this as the intro would be good. That way we can ground
this a bit better in actual code.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
  2019-06-27 21:24                           ` Daniel Vetter
@ 2019-06-28 18:43                             ` Kenny Ho
       [not found]                               ` <CAOWid-dZQhpKHxYEFn+X+WSep+B66M_LtN6v0=4-uO3ecZ0pcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-28 18:43 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks, amd-gfx list, lkaplan, Alex Deucher,
	dri-devel, joseph.greathouse, Tejun Heo, cgroups,
	Christian König

On Thu, Jun 27, 2019 at 5:24 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jun 27, 2019 at 02:42:43PM -0400, Kenny Ho wrote:
> > Um... I am going to get a bit philosophical here and suggest that the
> > idea of sharing (especially uncontrolled sharing) is inherently at odd
> > with containment.  It's like, if everybody is special, no one is
> > special.  Perhaps an alternative is to make this configurable so that
> > people can allow sharing knowing the caveat?  And just to be clear,
> > the current solution allows for sharing, even between cgroup.
>
> The thing is, why shouldn't we just allow it (with some documented
> caveat)?
>
> I mean if all people do is share it as your current patches allow, then
> there's nothing funny going on (at least if we go with just leaking the
> allocations). If we allow additional sharing, then that's a plus.
Um... perhaps I was being overly conservative :).  So let me
illustrate with an example to add more clarity and get more comments
on it.

Let say we have the following cgroup hierarchy (The letters are
cgroups with R being the root cgroup.  The numbers in brackets are
processes.  The processes are placed with the 'No Internal Process
Constraint' in mind.)
R (4, 5) ------ A (6)
  \
    B ---- C (7,8)
     \
       D (9)

Here is a list of operation and the associated effect on the size
track by the cgroups (for simplicity, each buffer is 1 unit in size.)
With current implementation (charge on buffer creation with
restriction on sharing.)
R   A   B   C   D   |Ops
================
1   0   0   0   0   |4 allocated a buffer
1   0   0   0   0   |4 shared a buffer with 5
1   0   0   0   0   |4 shared a buffer with 9
2   0   1   0   1   |9 allocated a buffer
3   0   2   1   1   |7 allocated a buffer
3   0   2   1   1   |7 shared a buffer with 8
3   0   2   1   1   |7 sharing with 9 (not allowed)
3   0   2   1   1   |7 sharing with 4 (not allowed)
3   0   2   1   1   |7 release a buffer
2   0   1   0   1   |8 release a buffer from 7

The suggestion as I understand it (charge per buffer reference with
unrestricted sharing.)
R   A   B   C   D   |Ops
================
1   0   0   0   0   |4 allocated a buffer
2   0   0   0   0   |4 shared a buffer with 5
3   0   0   0   1   |4 shared a buffer with 9
4   0   1   0   2   |9 allocated a buffer
5   0   2   1   1   |7 allocated a buffer
6   0   3   2   1   |7 shared a buffer with 8
7   0   4   2   2   |7 sharing with 9
8   0   4   2   2   |7 sharing with 4
7   0   3   1   2   |7 release a buffer
6   0   2   0   2   |8 release a buffer from 7

Is this a correct understanding of the suggestion?

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control
       [not found]                 ` <20190627061153.GD12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-06-28 19:49                   ` Kenny Ho
       [not found]                     ` <CAOWid-dCkevUiN27pkwfPketdqS8O+ZGYu8vRMPY2GhXGaVARA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-28 19:49 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks-WVYJKLFxKCc, amd-gfx list, lkaplan-WVYJKLFxKCc,
	Alex Deucher, dri-devel, joseph.greathouse-5C7GfCeVMHo,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian König

On Thu, Jun 27, 2019 at 2:11 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> I feel like a better approach would by to add a cgroup for the various
> engines on the gpu, and then also account all the sdma (or whatever the
> name of the amd copy engines is again) usage by ttm_bo moves to the right
> cgroup.  I think that's a more meaningful limitation. For direct thrashing
> control I think there's both not enough information available in the
> kernel (you'd need some performance counters to watch how much bandwidth
> userspace batches/CS are wasting), and I don't think the ttm eviction
> logic is ready to step over all the priority inversion issues this will
> bring up. Managing sdma usage otoh will be a lot more straightforward (but
> still has all the priority inversion problems, but in the scheduler that
> might be easier to fix perhaps with the explicit dependency graph - in the
> i915 scheduler we already have priority boosting afaiui).
My concern with hooking into the engine/ lower level is that the
engine may not be process/cgroup aware.  So the bandwidth tracking is
per device.  I am also wondering if this is also potentially be a case
of perfect getting in the way of good.  While ttm_bo_handle_move_mem
may not track everything, it is still a key function for a lot of the
memory operation.  Also, if the programming model is designed to
bypass the kernel then I am not sure if there are anything the kernel
can do.  (Things like kernel-bypass network stack comes to mind.)  All
that said, I will certainly dig deeper into the topic.

Regards,
Kenny
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem
  2019-06-27  7:24 ` [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Daniel Vetter
@ 2019-06-30  5:10   ` Kenny Ho
  2019-07-02 13:21     ` Daniel Vetter
  0 siblings, 1 reply; 47+ messages in thread
From: Kenny Ho @ 2019-06-30  5:10 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Kenny Ho, jsparks, amd-gfx list, lkaplan, Alex Deucher,
	Jerome Glisse, dri-devel, joseph.greathouse, Tejun Heo, cgroups,
	Christian König

On Thu, Jun 27, 2019 at 3:24 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> Another question I have: What about HMM? With the device memory zone
> the core mm will be a lot more involved in managing that, but I also
> expect that we'll have classic buffer-based management for a long time
> still. So these need to work together, and I fear slightly that we'll
> have memcg and drmcg fighting over the same pieces a bit perhaps?
>
> Adding Jerome, maybe he has some thoughts on this.

I just did a bit of digging and this looks like the current behaviour:
https://www.kernel.org/doc/html/v5.1/vm/hmm.html#memory-cgroup-memcg-and-rss-accounting

"For now device memory is accounted as any regular page in rss
counters (either anonymous if device page is used for anonymous, file
if device page is used for file backed page or shmem if device page is
used for shared memory). This is a deliberate choice to keep existing
applications, that might start using device memory without knowing
about it, running unimpacted.

A drawback is that the OOM killer might kill an application using a
lot of device memory and not a lot of regular system memory and thus
not freeing much system memory. We want to gather more real world
experience on how applications and system react under memory pressure
in the presence of device memory before deciding to account device
memory differently."

Regards,
Kenny
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit
       [not found]                               ` <CAOWid-dZQhpKHxYEFn+X+WSep+B66M_LtN6v0=4-uO3ecZ0pcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-07-02 13:16                                 ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-07-02 13:16 UTC (permalink / raw)
  To: Kenny Ho
  Cc: joseph.greathouse-5C7GfCeVMHo, Kenny Ho, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel,
	Daniel Vetter, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Fri, Jun 28, 2019 at 02:43:18PM -0400, Kenny Ho wrote:
> On Thu, Jun 27, 2019 at 5:24 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Thu, Jun 27, 2019 at 02:42:43PM -0400, Kenny Ho wrote:
> > > Um... I am going to get a bit philosophical here and suggest that the
> > > idea of sharing (especially uncontrolled sharing) is inherently at odd
> > > with containment.  It's like, if everybody is special, no one is
> > > special.  Perhaps an alternative is to make this configurable so that
> > > people can allow sharing knowing the caveat?  And just to be clear,
> > > the current solution allows for sharing, even between cgroup.
> >
> > The thing is, why shouldn't we just allow it (with some documented
> > caveat)?
> >
> > I mean if all people do is share it as your current patches allow, then
> > there's nothing funny going on (at least if we go with just leaking the
> > allocations). If we allow additional sharing, then that's a plus.
> Um... perhaps I was being overly conservative :).  So let me
> illustrate with an example to add more clarity and get more comments
> on it.
> 
> Let say we have the following cgroup hierarchy (The letters are
> cgroups with R being the root cgroup.  The numbers in brackets are
> processes.  The processes are placed with the 'No Internal Process
> Constraint' in mind.)
> R (4, 5) ------ A (6)
>   \
>     B ---- C (7,8)
>      \
>        D (9)
> 
> Here is a list of operation and the associated effect on the size
> track by the cgroups (for simplicity, each buffer is 1 unit in size.)
> With current implementation (charge on buffer creation with
> restriction on sharing.)
> R   A   B   C   D   |Ops
> ================
> 1   0   0   0   0   |4 allocated a buffer
> 1   0   0   0   0   |4 shared a buffer with 5
> 1   0   0   0   0   |4 shared a buffer with 9
> 2   0   1   0   1   |9 allocated a buffer
> 3   0   2   1   1   |7 allocated a buffer
> 3   0   2   1   1   |7 shared a buffer with 8
> 3   0   2   1   1   |7 sharing with 9 (not allowed)
> 3   0   2   1   1   |7 sharing with 4 (not allowed)
> 3   0   2   1   1   |7 release a buffer
> 2   0   1   0   1   |8 release a buffer from 7

This is your current implementation, right? Let's call it A.

> The suggestion as I understand it (charge per buffer reference with
> unrestricted sharing.)
> R   A   B   C   D   |Ops
> ================
> 1   0   0   0   0   |4 allocated a buffer
> 2   0   0   0   0   |4 shared a buffer with 5
> 3   0   0   0   1   |4 shared a buffer with 9
> 4   0   1   0   2   |9 allocated a buffer
> 5   0   2   1   1   |7 allocated a buffer
> 6   0   3   2   1   |7 shared a buffer with 8
> 7   0   4   2   2   |7 sharing with 9
> 8   0   4   2   2   |7 sharing with 4
> 7   0   3   1   2   |7 release a buffer
> 6   0   2   0   2   |8 release a buffer from 7
> 
> Is this a correct understanding of the suggestion?

Yup that's one option I think. The other option (and it's probably
simpler), is to go with your current accounting, but drop the sharing
restriction. I.e. buffers are accounting to whomever allocates them first,
not who's all using them. For memcg this has some serious trouble with
cgroups not getting cleaned up due to leaked refrences. But for gem bo we
spread the references in a lot more controlled manner, and all the
long-lived references are under control of userspace.

E.g. if Xorg fails to clean up bo references of clients that dead, that's
clearly an Xorg bug and needs to be fixed there. But not something we need
to allow as a valid use-cases. For page references/accounting in memcg
this is totally different, since pages can survive in the pagecache
forever. No such bo-cache or anything similar exists for gem_bo.

Personally I prefer option A, but on sharing restriction. If you want that
sharing restriction, we need to figure out how to implement it using
something else. Plus we need to make sure all possible ways to share a bo
are covered (and there are many).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control
       [not found]                     ` <CAOWid-dCkevUiN27pkwfPketdqS8O+ZGYu8vRMPY2GhXGaVARA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-07-02 13:20                       ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-07-02 13:20 UTC (permalink / raw)
  To: Kenny Ho
  Cc: joseph.greathouse-5C7GfCeVMHo, Kenny Ho, jsparks-WVYJKLFxKCc,
	amd-gfx list, lkaplan-WVYJKLFxKCc, Alex Deucher, dri-devel,
	Daniel Vetter, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Christian König

On Fri, Jun 28, 2019 at 03:49:28PM -0400, Kenny Ho wrote:
> On Thu, Jun 27, 2019 at 2:11 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > I feel like a better approach would by to add a cgroup for the various
> > engines on the gpu, and then also account all the sdma (or whatever the
> > name of the amd copy engines is again) usage by ttm_bo moves to the right
> > cgroup.  I think that's a more meaningful limitation. For direct thrashing
> > control I think there's both not enough information available in the
> > kernel (you'd need some performance counters to watch how much bandwidth
> > userspace batches/CS are wasting), and I don't think the ttm eviction
> > logic is ready to step over all the priority inversion issues this will
> > bring up. Managing sdma usage otoh will be a lot more straightforward (but
> > still has all the priority inversion problems, but in the scheduler that
> > might be easier to fix perhaps with the explicit dependency graph - in the
> > i915 scheduler we already have priority boosting afaiui).
> My concern with hooking into the engine/ lower level is that the
> engine may not be process/cgroup aware.  So the bandwidth tracking is

Why is the engine not process aware? Thus far all command submission I'm
aware of is done by a real process from userspace ... we should be able to
track these with cgroups perfectly.

> per device.  I am also wondering if this is also potentially be a case
> of perfect getting in the way of good.  While ttm_bo_handle_move_mem
> may not track everything, it is still a key function for a lot of the
> memory operation.  Also, if the programming model is designed to
> bypass the kernel then I am not sure if there are anything the kernel
> can do.  (Things like kernel-bypass network stack comes to mind.)  All
> that said, I will certainly dig deeper into the topic.

The problem is there's not a full bypass of the kernel, any reasonable
workload will need both. But if you only control one side of the bandwidth
usuage, you're not really controlling anything.

Also, this is uapi: Perfect is pretty much the bar we need to clear, any
mistake will hurt us for the next 10 years at least :-)

btw if you haven't read it yet: The lwn article about the new block io
controller is pretty interesting. I think you're trying to solve a similar
problem here:

https://lwn.net/SubscriberLink/792256/e66982524fa9477b/

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem
  2019-06-30  5:10   ` Kenny Ho
@ 2019-07-02 13:21     ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2019-07-02 13:21 UTC (permalink / raw)
  To: Kenny Ho
  Cc: joseph.greathouse, Kenny Ho, jsparks, amd-gfx list, lkaplan,
	Alex Deucher, Jerome Glisse, dri-devel, Tejun Heo, cgroups,
	Christian König

On Sun, Jun 30, 2019 at 01:10:28AM -0400, Kenny Ho wrote:
> On Thu, Jun 27, 2019 at 3:24 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > Another question I have: What about HMM? With the device memory zone
> > the core mm will be a lot more involved in managing that, but I also
> > expect that we'll have classic buffer-based management for a long time
> > still. So these need to work together, and I fear slightly that we'll
> > have memcg and drmcg fighting over the same pieces a bit perhaps?
> >
> > Adding Jerome, maybe he has some thoughts on this.
> 
> I just did a bit of digging and this looks like the current behaviour:
> https://www.kernel.org/doc/html/v5.1/vm/hmm.html#memory-cgroup-memcg-and-rss-accounting
> 
> "For now device memory is accounted as any regular page in rss
> counters (either anonymous if device page is used for anonymous, file
> if device page is used for file backed page or shmem if device page is
> used for shared memory). This is a deliberate choice to keep existing
> applications, that might start using device memory without knowing
> about it, running unimpacted.
> 
> A drawback is that the OOM killer might kill an application using a
> lot of device memory and not a lot of regular system memory and thus
> not freeing much system memory. We want to gather more real world
> experience on how applications and system react under memory pressure
> in the presence of device memory before deciding to account device
> memory differently."

Hm ... I also just learned that the device memory stuff, at least the hmm
part, is probably getting removed again, and only the hmm_mirror part of
hmm will be kept. So maybe this doesn't matter to us. But really no idea.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2019-07-02 13:21 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-26 15:05 [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
2019-06-26 15:05 ` [RFC PATCH v3 02/11] cgroup: Add mechanism to register DRM devices Kenny Ho
     [not found]   ` <20190626150522.11618-3-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-06-26 15:56     ` Daniel Vetter
2019-06-26 20:37       ` Kenny Ho
2019-06-26 21:03         ` Daniel Vetter
     [not found]           ` <CAKMK7uERvn7Ed2trGQShM94Ozp6+x8bsULFyGj9CYWstuzb56A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-26 21:58             ` Kenny Ho
2019-06-26 15:05 ` [RFC PATCH v3 03/11] drm/amdgpu: Register AMD devices for DRM cgroup Kenny Ho
2019-06-26 15:05 ` [RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
     [not found]   ` <20190626150522.11618-5-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-06-26 16:05     ` Daniel Vetter
     [not found]       ` <20190626160553.GR12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-06-26 21:27         ` Kenny Ho
     [not found]           ` <CAOWid-eurCMx1F7ciUwx0e+p=s=NP8=UxQUhhF-hdK-iAna+fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-26 21:41             ` Daniel Vetter
     [not found]               ` <20190626214113.GA12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-06-26 22:41                 ` Kenny Ho
     [not found]                   ` <CAOWid-egYGijS0a6uuG4mPUmOWaPwF-EKokR=LFNJ=5M+akVZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-27  5:43                     ` Daniel Vetter
2019-06-27 18:42                       ` Kenny Ho
     [not found]                         ` <CAOWid-cT4TQ7HGzcSWjmLGjAW_D1hRrkNguEiV8N+baNiKQm_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-27 21:24                           ` Daniel Vetter
2019-06-28 18:43                             ` Kenny Ho
     [not found]                               ` <CAOWid-dZQhpKHxYEFn+X+WSep+B66M_LtN6v0=4-uO3ecZ0pcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-07-02 13:16                                 ` Daniel Vetter
     [not found] ` <20190626150522.11618-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-06-26 15:05   ` [RFC PATCH v3 01/11] cgroup: Introduce cgroup for drm subsystem Kenny Ho
     [not found]     ` <20190626150522.11618-2-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-06-26 15:49       ` Daniel Vetter
2019-06-26 19:35         ` Kenny Ho
     [not found]           ` <CAOWid-dyGwf=e0ikBEQ=bnVM_bC8-FeTOD8fJVMJKUgPv6vtyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-26 20:12             ` Daniel Vetter
2019-06-26 15:05   ` [RFC PATCH v3 05/11] drm, cgroup: Add peak GEM buffer allocation limit Kenny Ho
2019-06-26 15:05   ` [RFC PATCH v3 06/11] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
2019-06-26 15:05   ` [RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control Kenny Ho
     [not found]     ` <20190626150522.11618-10-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-06-26 16:25       ` Daniel Vetter
     [not found]         ` <20190626162554.GU12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-06-27  4:34           ` Kenny Ho
     [not found]             ` <CAOWid-dO5QH4wLyN_ztMaoZtLM9yzw-FEMgk3ufbh1ahHJ2vVg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-27  6:11               ` Daniel Vetter
     [not found]                 ` <20190627061153.GD12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-06-28 19:49                   ` Kenny Ho
     [not found]                     ` <CAOWid-dCkevUiN27pkwfPketdqS8O+ZGYu8vRMPY2GhXGaVARA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-07-02 13:20                       ` Daniel Vetter
2019-06-26 15:05   ` [RFC PATCH v3 11/11] drm, cgroup: Allow more aggressive memory reclaim Kenny Ho
     [not found]     ` <20190626150522.11618-12-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-06-26 16:44       ` Daniel Vetter
2019-06-26 22:52         ` Kenny Ho
2019-06-27  6:15           ` Daniel Vetter
2019-06-26 15:05 ` [RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats Kenny Ho
2019-06-26 16:12   ` Daniel Vetter
     [not found]     ` <20190626161254.GS12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-06-27  4:06       ` Kenny Ho
     [not found]         ` <CAOWid-f3kKnM=4oC5Bba5WW5WNV2MH5PvVamrhO6LBr5ydPJQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-06-27  6:01           ` Daniel Vetter
2019-06-27 20:17             ` Kenny Ho
2019-06-27 21:33               ` Daniel Vetter
2019-06-28  1:16             ` Welty, Brian
     [not found]               ` <01a6efa8-802c-b8b1-931e-4f0c1c63beca-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2019-06-28  6:53                 ` Daniel Vetter
2019-06-26 15:05 ` [RFC PATCH v3 08/11] drm, cgroup: Add TTM buffer peak usage stats Kenny Ho
2019-06-26 16:16   ` Daniel Vetter
2019-06-26 15:05 ` [RFC PATCH v3 10/11] drm, cgroup: Add soft VRAM limit Kenny Ho
2019-06-27  7:24 ` [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem Daniel Vetter
2019-06-30  5:10   ` Kenny Ho
2019-07-02 13:21     ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).