linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v4 0/8] Proposal for a GPU cgroup controller
@ 2022-03-28  3:59 T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 1/8] gpu: rfc: " T.J. Mercier
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.

Changelog:
v4:
Skip test if not run as root per Shuah Khan

Add better test logging for abnormal child termination per Shuah Khan

Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný

Adjust gpucg_try_charge critical section for charge transfer functionality

Fix uninitialized return code error for dmabuf_try_charge error case

v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz

Use more common dual author commit message format per John Stultz

Remove android from binder changes title per Todd Kjos

Add a kselftest for this new behavior per Greg Kroah-Hartman

Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.

Fix pid and uid types in binder UAPI header

v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/

Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.

Fix incorrect Kconfig help section indentation per Randy Dunlap.

History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.

[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/

Hridya Valsaraju (5):
  gpu: rfc: Proposal for a GPU cgroup controller
  cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
    memory
  dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  dmabuf: Add gpu cgroup charge transfer function
  binder: Add a buffer flag to relinquish ownership of fds

T.J. Mercier (3):
  dmabuf: Use the GPU cgroup charge/uncharge APIs
  binder: use __kernel_pid_t and __kernel_uid_t for userspace
  selftests: Add binder cgroup gpu memory transfer test

 Documentation/gpu/rfc/gpu-cgroup.rst          | 183 +++++++
 Documentation/gpu/rfc/index.rst               |   4 +
 drivers/android/binder.c                      |  26 +
 drivers/dma-buf/dma-buf.c                     | 107 ++++
 drivers/dma-buf/dma-heap.c                    |  27 +
 drivers/dma-buf/heaps/system_heap.c           |   3 +
 include/linux/cgroup_gpu.h                    | 139 +++++
 include/linux/cgroup_subsys.h                 |   4 +
 include/linux/dma-buf.h                       |  22 +-
 include/linux/dma-heap.h                      |  11 +
 include/uapi/linux/android/binder.h           |   5 +-
 init/Kconfig                                  |   7 +
 kernel/cgroup/Makefile                        |   1 +
 kernel/cgroup/gpu.c                           | 362 +++++++++++++
 .../selftests/drivers/android/binder/Makefile |   8 +
 .../drivers/android/binder/binder_util.c      | 254 +++++++++
 .../drivers/android/binder/binder_util.h      |  32 ++
 .../selftests/drivers/android/binder/config   |   4 +
 .../binder/test_dmabuf_cgroup_transfer.c      | 484 ++++++++++++++++++
 19 files changed, 1679 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c
 create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
 create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
 create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
 create mode 100644 tools/testing/selftests/drivers/android/binder/config
 create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c

-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC v4 1/8] gpu: rfc: Proposal for a GPU cgroup controller
  2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
@ 2022-03-28  3:59 ` T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory T.J. Mercier
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

From: Hridya Valsaraju <hridya@google.com>

This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.

The new cgroup controller would:
* Allow setting per-cgroup limits on the total size of buffers charged
  to it.
* Allow setting per-device limits on the total size of buffers
  allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
  cgroup.

The prototype in the following patches is only for memory accounting
using the GPU cgroup controller and does not implement limit setting.

[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/

Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>

---
v3 changes
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz.

Use more common dual author commit message format per John Stultz.
---
 Documentation/gpu/rfc/gpu-cgroup.rst | 183 +++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 2 files changed, 187 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst

diff --git a/Documentation/gpu/rfc/gpu-cgroup.rst b/Documentation/gpu/rfc/gpu-cgroup.rst
new file mode 100644
index 000000000000..5b40d5518a5e
--- /dev/null
+++ b/Documentation/gpu/rfc/gpu-cgroup.rst
@@ -0,0 +1,183 @@
+===================================
+GPU cgroup controller
+===================================
+
+Goals
+=====
+This document intends to outline a plan to create a cgroup v2 controller subsystem
+for the per-cgroup accounting of device and system memory allocated by the GPU
+and related subsystems.
+
+The new cgroup controller would:
+
+* Allow setting per-cgroup limits on the total size of buffers charged to it.
+
+* Allow setting per-device limits on the total size of buffers allocated by a
+  device/allocator within a cgroup.
+
+* Expose a per-device/allocator breakdown of the buffers charged to a cgroup.
+
+Alternatives Considered
+=======================
+
+The following alternatives were considered:
+
+The memory cgroup controller
+____________________________
+
+1. As was noted in [1], memory accounting provided by the GPU cgroup
+controller is not a good fit for integration into memcg due to the
+differences in how accounting is performed. It implements a mechanism
+for the allocator attribution of GPU and GPU-related memory by
+charging each buffer to the cgroup of the process on behalf of which
+the memory was allocated. The buffer stays charged to the cgroup until
+it is freed regardless of whether the process retains any references
+to it. On the other hand, the memory cgroup controller offers a more
+fine-grained charging and uncharging behavior depending on the kind of
+page being accounted.
+
+2. Memcg performs accounting in units of pages. In the DMA-BUF buffer sharing model,
+a process takes a reference to the entire buffer(hence keeping it alive) even if
+it is only accessing parts of it. Therefore, per-page memory tracking for DMA-BUF
+memory accounting would only introduce additional overhead without any benefits.
+
+[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
+
+Userspace service to keep track of buffer allocations and releases
+__________________________________________________________________
+
+1. There is no way for a userspace service to intercept all allocations and releases.
+2. In case the process gets killed or restarted, we lose all accounting so far.
+
+UAPI
+====
+When enabled, the new cgroup controller would create the following files in every cgroup.
+
+::
+
+        gpu.memory.current (R)
+        gpu.memory.max (R/W)
+
+gpu.memory.current is a read-only file and would contain per-device memory allocations
+in a key-value format where key is a string representing the device name
+and the value is the size of memory charged to the device in the cgroup in bytes.
+
+For example:
+
+::
+
+        cat /sys/kernel/fs/cgroup1/gpu.memory.current
+        dev1 4194304
+        dev2 4194304
+
+The string key for each device is set by the device driver when the device registers
+with the GPU cgroup controller to participate in resource accounting(see section
+'Design and Implementation' for more details).
+
+gpu.memory.max is a read/write file. It would show the current total
+size limits on memory usage for the cgroup and the limits on total memory usage
+for each allocator/device.
+
+Setting a total limit for a cgroup can be done as follows:
+
+::
+
+        echo “total 41943040” > /sys/kernel/fs/cgroup1/gpu.memory.max
+
+Setting a total limit for a particular device/allocator can be done as follows:
+
+::
+
+        echo “dev1 4194304” >  /sys/kernel/fs/cgroup1/gpu.memory.max
+
+In this example, 'dev1' is the string key set by the device driver during
+registration.
+
+Design and Implementation
+=========================
+
+The cgroup controller would closely follow the design of the RDMA cgroup controller
+subsystem where each cgroup maintains a list of resource pools.
+Each resource pool contains a struct device and the counter to track current total,
+and the maximum limit set for the device.
+
+The below code block is a preliminary estimation on how the core kernel data structures
+and APIs would look like.
+
+.. code-block:: c
+
+        /**
+         * The GPU cgroup controller data structure.
+         */
+        struct gpucg {
+                struct cgroup_subsys_state css;
+
+                /* list of all resource pools that belong to this cgroup */
+                struct list_head rpools;
+        };
+
+        struct gpucg_device {
+                /*
+                 * list  of various resource pools in various cgroups that the device is
+                 * part of.
+                 */
+                struct list_head rpools;
+
+                /* list of all devices registered for GPU cgroup accounting */
+                struct list_head dev_node;
+
+                /* name to be used as identifier for accounting and limit setting */
+                const char *name;
+        };
+
+        struct gpucg_resource_pool {
+                /* The device whose resource usage is tracked by this resource pool */
+                struct gpucg_device *device;
+
+                /* list of all resource pools for the cgroup */
+                struct list_head cg_node;
+
+                /*
+                 * list maintained by the gpucg_device to keep track of its
+                 * resource pools
+                 */
+                struct list_head dev_node;
+
+                /* tracks memory usage of the resource pool */
+                struct page_counter total;
+        };
+
+        /**
+         * gpucg_register_device - Registers a device for memory accounting using the
+         * GPU cgroup controller.
+         *
+         * @device: The device to register for memory accounting. Must remain valid
+         * after registration.
+         * @name: Pointer to a string literal to denote the name of the device.
+         */
+        void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+
+        /**
+         * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to charge the memory to.
+         * @device: The device to charge the memory to.
+         * @usage: size of memory to charge in bytes.
+         *
+         * Return: returns 0 if the charging is successful and otherwise returns an
+         * error code.
+         */
+        int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+        /**
+         * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to uncharge the memory from.
+         * @device: The device to charge the memory from.
+         * @usage: size of memory to uncharge in bytes.
+         */
+        void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+Future Work
+===========
+Additional GPU resources can be supported by adding new controller files.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..0a9bcd94e95d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    gpu-cgroup.rst
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 1/8] gpu: rfc: " T.J. Mercier
@ 2022-03-28  3:59 ` T.J. Mercier
  2022-03-29 16:59   ` Tejun Heo
  2022-03-28  3:59 ` [RFC v4 3/8] dmabuf: Use the GPU cgroup charge/uncharge APIs T.J. Mercier
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

From: Hridya Valsaraju <hridya@google.com>

The cgroup controller provides accounting for GPU and GPU-related
memory allocations. The memory being accounted can be device memory or
memory allocated from pools dedicated to serve GPU-related tasks.

This patch adds APIs to:
-allow a device to register for memory accounting using the GPU cgroup
controller.
-charge and uncharge allocated memory to a cgroup.

When the cgroup controller is enabled, it would expose information about
the memory allocated by each device(registered for GPU cgroup memory
accounting) for each cgroup.

The API/UAPI can be extended to set per-device/total allocation limits
in the future.

The cgroup controller has been named following the discussion in [1].

[1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv@phenom.ffwll.local/

Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>

---
v4 changes
Adjust gpucg_try_charge critical section for future charge transfer
functionality.

v3 changes
Use more common dual author commit message format per John Stultz.

v2 changes
Fix incorrect Kconfig help section indentation per Randy Dunlap.
---
 include/linux/cgroup_gpu.h    | 127 ++++++++++++++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig                  |   7 +
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/gpu.c           | 303 ++++++++++++++++++++++++++++++++++
 5 files changed, 442 insertions(+)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
new file mode 100644
index 000000000000..c90069719022
--- /dev/null
+++ b/include/linux/cgroup_gpu.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * Copyright (C) 2022 Google LLC.
+ */
+#ifndef _CGROUP_GPU_H
+#define _CGROUP_GPU_H
+
+#include <linux/cgroup.h>
+#include <linux/page_counter.h>
+
+#ifdef CONFIG_CGROUP_GPU
+ /* The GPU cgroup controller data structure */
+struct gpucg {
+	struct cgroup_subsys_state css;
+
+	/* list of all resource pools that belong to this cgroup */
+	struct list_head rpools;
+};
+
+struct gpucg_device {
+	/*
+	 * list of various resource pools in various cgroups that the device is
+	 * part of.
+	 */
+	struct list_head rpools;
+
+	/* list of all devices registered for GPU cgroup accounting */
+	struct list_head dev_node;
+
+	/*
+	 * pointer to string literal to be used as identifier for accounting and
+	 * limit setting
+	 */
+	const char *name;
+};
+
+/**
+ * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Returns: gpu cgroup that contains the @css
+ */
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct gpucg, css) : NULL;
+}
+
+/**
+ * gpucg_get - get the gpucg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increases the reference count of the css that the @task belongs to.
+ *
+ * Returns: reference to the gpu cgroup the task belongs to.
+ */
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	if (!cgroup_subsys_enabled(gpu_cgrp_subsys))
+		return NULL;
+	return css_to_gpucg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * gpucg_put - put a gpucg reference
+ * @gpucg: the target gpucg
+ *
+ * Put a reference obtained via gpucg_get
+ */
+static inline void gpucg_put(struct gpucg *gpucg)
+{
+	if (gpucg)
+		css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_parent - find the parent of a gpu cgroup
+ * @cg: the target gpucg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Returns: parent gpu cgroup of @cg
+ */
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return css_to_gpucg(cg->css.parent);
+}
+
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+#else /* CONFIG_CGROUP_GPU */
+
+struct gpucg;
+struct gpucg_device;
+
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void gpucg_put(struct gpucg *gpucg) {}
+
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return NULL;
+}
+
+static inline int gpucg_try_charge(struct gpucg *gpucg,
+				   struct gpucg_device *device,
+				   u64 usage)
+{
+	return 0;
+}
+
+static inline void gpucg_uncharge(struct gpucg *gpucg,
+				  struct gpucg_device *device,
+				  u64 usage) {}
+
+static inline void gpucg_register_device(struct gpucg_device *gpucg_dev,
+					 const char *name) {}
+#endif /* CONFIG_CGROUP_GPU */
+#endif /* _CGROUP_GPU_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..46a2a7b93c41 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_GPU)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..43568472930a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -980,6 +980,13 @@ config BLK_CGROUP
 
 	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
+config CGROUP_GPU
+	bool "gpu cgroup controller (EXPERIMENTAL)"
+	select PAGE_COUNTER
+	help
+	  Provides accounting and limit setting for memory allocations by the GPU and
+	  GPU-related subsystems.
+
 config CGROUP_WRITEBACK
 	bool
 	depends on MEMCG && BLK_CGROUP
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..be95a5a532fc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
+obj-$(CONFIG_CGROUP_GPU) += gpu.o
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
new file mode 100644
index 000000000000..ac4c470914b5
--- /dev/null
+++ b/kernel/cgroup/gpu.c
@@ -0,0 +1,303 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+// Copyright (C) 2022 Google LLC.
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_gpu.h>
+#include <linux/mm.h>
+#include <linux/page_counter.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+static struct gpucg *root_gpucg __read_mostly;
+
+/*
+ * Protects list of resource pools maintained on per cgroup basis
+ * and list of devices registered for memory accounting using the GPU cgroup
+ * controller.
+ */
+static DEFINE_MUTEX(gpucg_mutex);
+static LIST_HEAD(gpucg_devices);
+
+struct gpucg_resource_pool {
+	/* The device whose resource usage is tracked by this resource pool */
+	struct gpucg_device *device;
+
+	/* list of all resource pools for the cgroup */
+	struct list_head cg_node;
+
+	/* list maintained by the gpucg_device to keep track of its resource pools */
+	struct list_head dev_node;
+
+	/* tracks memory usage of the resource pool */
+	struct page_counter total;
+};
+
+static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool)
+{
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_del(&rpool->cg_node);
+	list_del(&rpool->dev_node);
+	kfree(rpool);
+}
+
+static void gpucg_css_free(struct cgroup_subsys_state *css)
+{
+	struct gpucg_resource_pool *rpool, *tmp;
+	struct gpucg *gpucg = css_to_gpucg(css);
+
+	// delete all resource pools
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&gpucg_mutex);
+
+	kfree(gpucg);
+}
+
+static struct cgroup_subsys_state *
+gpucg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct gpucg *gpucg, *parent;
+
+	gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL);
+	if (!gpucg)
+		return ERR_PTR(-ENOMEM);
+
+	parent = css_to_gpucg(parent_css);
+	if (!parent)
+		root_gpucg = gpucg;
+
+	INIT_LIST_HEAD(&gpucg->rpools);
+
+	return &gpucg->css;
+}
+
+static struct gpucg_resource_pool *find_cg_rpool_locked(
+	struct gpucg *cg,
+	struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *pool;
+
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_for_each_entry(pool, &cg->rpools, cg_node)
+		if (pool->device == device)
+			return pool;
+
+	return NULL;
+}
+
+static struct gpucg_resource_pool *init_cg_rpool(struct gpucg *cg,
+						 struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool),
+							GFP_KERNEL);
+	if (!rpool)
+		return ERR_PTR(-ENOMEM);
+
+	rpool->device = device;
+
+	page_counter_init(&rpool->total, NULL);
+	INIT_LIST_HEAD(&rpool->cg_node);
+	INIT_LIST_HEAD(&rpool->dev_node);
+	list_add_tail(&rpool->cg_node, &cg->rpools);
+	list_add_tail(&rpool->dev_node, &device->rpools);
+
+	return rpool;
+}
+
+/**
+ * get_cg_rpool_locked - find the resource pool for the specified device and
+ * specified cgroup. If the resource pool does not exist for the cg, it is
+ * created in a hierarchical manner in the cgroup and its ancestor cgroups who
+ * do not already have a resource pool entry for the device.
+ *
+ * @cg: The cgroup to find the resource pool for.
+ * @device: The device associated with the returned resource pool.
+ *
+ * Return: return resource pool entry corresponding to the specified device in
+ * the specified cgroup (hierarchically creating them if not existing already).
+ *
+ */
+static struct gpucg_resource_pool *
+get_cg_rpool_locked(struct gpucg *cg, struct gpucg_device *device)
+{
+	struct gpucg *parent_cg, *p, *stop_cg;
+	struct gpucg_resource_pool *rpool, *tmp_rpool;
+	struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL;
+
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool)
+		return rpool;
+
+	stop_cg = cg;
+	do {
+		rpool = init_cg_rpool(stop_cg, device);
+		if (IS_ERR(rpool))
+			goto err;
+
+		if (!leaf_rpool)
+			leaf_rpool = rpool;
+
+		stop_cg = gpucg_parent(stop_cg);
+		if (!stop_cg)
+			break;
+
+		rpool = find_cg_rpool_locked(stop_cg, device);
+	} while (!rpool);
+
+	/*
+	 * Re-initialize page counters of all rpools created in this invocation
+	 * to enable hierarchical charging.
+	 * stop_cg is the first ancestor cg who already had a resource pool for
+	 * the device. It can also be NULL if no ancestors had a pre-existing
+	 * resource pool for the device before this invocation.
+	 */
+	rpool = leaf_rpool;
+	for (p = cg; p != stop_cg; p = parent_cg) {
+		parent_cg = gpucg_parent(p);
+		if (!parent_cg)
+			break;
+		parent_rpool = find_cg_rpool_locked(parent_cg, device);
+		page_counter_init(&rpool->total, &parent_rpool->total);
+
+		rpool = parent_rpool;
+	}
+
+	return leaf_rpool;
+err:
+	for (p = cg; p != stop_cg; p = gpucg_parent(p)) {
+		tmp_rpool = find_cg_rpool_locked(p, device);
+		free_cg_rpool_locked(tmp_rpool);
+	}
+	return rpool;
+}
+
+/**
+ * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+ * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size
+ * of the memory is rounded up to be a multiple of the page size.
+ *
+ * @gpucg: The gpu cgroup to charge the memory to.
+ * @device: The device to charge the memory to.
+ * @usage: size of memory to charge in bytes.
+ *
+ * Return: returns 0 if the charging is successful and otherwise returns an
+ * error code.
+ */
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+	int ret = 0;
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+
+	mutex_lock(&gpucg_mutex);
+	rp = get_cg_rpool_locked(gpucg, device);
+	/*
+	 * Continue to hold gpucg_mutex because we use it to block charges
+	 * while transfers are in progress.
+	 */
+	if (IS_ERR(rp)) {
+		mutex_unlock(&gpucg_mutex);
+		return PTR_ERR(rp);
+	}
+
+	if (page_counter_try_charge(&rp->total, nr_pages, &counter))
+		css_get_many(&gpucg->css, nr_pages);
+	else
+		ret = -ENOMEM;
+	mutex_unlock(&gpucg_mutex);
+
+	return ret;
+}
+
+/**
+ * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+ * The caller must hold a reference to @gpucg obtained through gpucg_get().
+ *
+ * @gpucg: The gpu cgroup to uncharge the memory from.
+ * @device: The device to uncharge the memory from.
+ * @usage: size of memory to uncharge in bytes.
+ */
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+
+	mutex_lock(&gpucg_mutex);
+	rp = find_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is freed and there are
+	 * active refs on gpucg. Uncharges are fine while transfers are in progress.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (unlikely(!rp)) {
+		pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n");
+		return;
+	}
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	page_counter_uncharge(&rp->total, nr_pages);
+	css_put_many(&gpucg->css, nr_pages);
+}
+
+/**
+ * gpucg_register_device - Registers a device for memory accounting using the
+ * GPU cgroup controller.
+ *
+ * @device: The device to register for memory accounting.
+ * @name: Pointer to a string literal to denote the name of the device.
+ *
+ * Both @device andd @name must remain valid.
+ */
+void gpucg_register_device(struct gpucg_device *device, const char *name)
+{
+	if (!device)
+		return;
+
+	INIT_LIST_HEAD(&device->dev_node);
+	INIT_LIST_HEAD(&device->rpools);
+
+	mutex_lock(&gpucg_mutex);
+	list_add_tail(&device->dev_node, &gpucg_devices);
+	mutex_unlock(&gpucg_mutex);
+
+	device->name = name;
+}
+
+static int gpucg_resource_show(struct seq_file *sf, void *v)
+{
+	struct gpucg_resource_pool *rpool;
+	struct gpucg *cg = css_to_gpucg(seq_css(sf));
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(rpool, &cg->rpools, cg_node) {
+		seq_printf(sf, "%s %lu\n", rpool->device->name,
+			   page_counter_read(&rpool->total) * PAGE_SIZE);
+	}
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+struct cftype files[] = {
+	{
+		.name = "memory.current",
+		.seq_show = gpucg_resource_show,
+	},
+	{ }     /* terminate */
+};
+
+struct cgroup_subsys gpu_cgrp_subsys = {
+	.css_alloc      = gpucg_css_alloc,
+	.css_free       = gpucg_css_free,
+	.early_init     = false,
+	.legacy_cftypes = files,
+	.dfl_cftypes    = files,
+};
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v4 3/8] dmabuf: Use the GPU cgroup charge/uncharge APIs
  2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 1/8] gpu: rfc: " T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory T.J. Mercier
@ 2022-03-28  3:59 ` T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging T.J. Mercier
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

This patch uses the GPU cgroup charge/uncharge APIs to charge buffers
allocated by any DMA-BUF exporter that exports a buffer with a GPU cgroup
device association.

By doing so, it becomes possible to track who allocated/exported a
DMA-BUF even after the allocating process drops all references to a
buffer.

Originally-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>

---
v4 changes
Fix uninitialized return code error for dmabuf_try_charge error case.

v3 changes
Use more common dual author commit message format per John Stultz.

v2 changes
Move dma-buf cgroup charging/uncharging from a dma_buf_op defined by
every heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.
---
 drivers/dma-buf/dma-buf.c | 58 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   | 20 ++++++++++++--
 2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 602b12d7470d..1ee5c60d3d6d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -56,6 +56,53 @@ static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
 			     dentry->d_name.name, ret > 0 ? name : "");
 }
 
+#ifdef CONFIG_CGROUP_GPU
+static inline struct gpucg_device *
+exp_info_gpucg_dev(const struct dma_buf_export_info *exp_info)
+{
+	return exp_info->gpucg_dev;
+}
+
+static int dmabuf_try_charge(struct dma_buf *dmabuf,
+			     struct gpucg_device *gpucg_dev)
+{
+	int ret;
+
+	dmabuf->gpucg = gpucg_get(current);
+	dmabuf->gpucg_dev = gpucg_dev;
+
+	ret = gpucg_try_charge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size);
+	if (ret) {
+		gpucg_put(dmabuf->gpucg);
+		dmabuf->gpucg = NULL;
+		dmabuf->gpucg_dev = NULL;
+	}
+	return ret;
+}
+
+static void dmabuf_uncharge(struct dma_buf *dmabuf)
+{
+	if (dmabuf->gpucg && dmabuf->gpucg_dev) {
+		gpucg_uncharge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size);
+		gpucg_put(dmabuf->gpucg);
+	}
+}
+#else /* CONFIG_CGROUP_GPU */
+static inline struct gpucg_device *exp_info_gpucg_dev(
+const struct dma_buf_export_info *exp_info)
+{
+	return NULL;
+}
+
+static inline int dmabuf_try_charge(struct dma_buf *dmabuf,
+				     struct gpucg_device *gpucg_dev))
+{
+	return 0;
+}
+
+static inline void dmabuf_uncharge(struct dma_buf *dmabuf) {}
+#endif /* CONFIG_CGROUP_GPU */
+
 static void dma_buf_release(struct dentry *dentry)
 {
 	struct dma_buf *dmabuf;
@@ -79,6 +126,8 @@ static void dma_buf_release(struct dentry *dentry)
 	if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
 		dma_resv_fini(dmabuf->resv);
 
+	dmabuf_uncharge(dmabuf);
+
 	WARN_ON(!list_empty(&dmabuf->attachments));
 	module_put(dmabuf->owner);
 	kfree(dmabuf->name);
@@ -484,6 +533,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 {
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv = exp_info->resv;
+	struct gpucg_device *gpucg_dev = exp_info_gpucg_dev(exp_info);
 	struct file *file;
 	size_t alloc_size = sizeof(struct dma_buf);
 	int ret;
@@ -534,6 +584,12 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	}
 	dmabuf->resv = resv;
 
+	if (gpucg_dev) {
+		ret = dmabuf_try_charge(dmabuf, gpucg_dev);
+		if (ret)
+			goto err_charge;
+	}
+
 	file = dma_buf_getfile(dmabuf, exp_info->flags);
 	if (IS_ERR(file)) {
 		ret = PTR_ERR(file);
@@ -565,6 +621,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	file->f_path.dentry->d_fsdata = NULL;
 	fput(file);
 err_dmabuf:
+	dmabuf_uncharge(dmabuf);
+err_charge:
 	kfree(dmabuf);
 err_module:
 	module_put(exp_info->owner);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7ab50076e7a6..742f29c3daaf 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -13,6 +13,7 @@
 #ifndef __DMA_BUF_H__
 #define __DMA_BUF_H__
 
+#include <linux/cgroup_gpu.h>
 #include <linux/dma-buf-map.h>
 #include <linux/file.h>
 #include <linux/err.h>
@@ -303,7 +304,7 @@ struct dma_buf {
 	/**
 	 * @size:
 	 *
-	 * Size of the buffer; invariant over the lifetime of the buffer.
+	 * Size of the buffer in bytes; invariant over the lifetime of the buffer.
 	 */
 	size_t size;
 
@@ -453,6 +454,17 @@ struct dma_buf {
 		struct dma_buf *dmabuf;
 	} *sysfs_entry;
 #endif
+
+#ifdef CONFIG_CGROUP_GPU
+	/** @gpucg: Pointer to the cgroup this buffer currently belongs to. */
+	struct gpucg *gpucg;
+
+	/** @gpucg_dev:
+	 *
+	 * Pointer to the cgroup GPU device whence this buffer originates.
+	 */
+	struct gpucg_device *gpucg_dev;
+#endif
 };
 
 /**
@@ -529,9 +541,10 @@ struct dma_buf_attachment {
  * @exp_name:	name of the exporter - useful for debugging.
  * @owner:	pointer to exporter module - used for refcounting kernel module
  * @ops:	Attach allocator-defined dma buf ops to the new buffer
- * @size:	Size of the buffer - invariant over the lifetime of the buffer
+ * @size:	Size of the buffer in bytes - invariant over the lifetime of the buffer
  * @flags:	mode flags for the file
  * @resv:	reservation-object, NULL to allocate default one
+ * @gpucg_dev:	pointer to the gpu cgroup device this buffer belongs to
  * @priv:	Attach private data of allocator to this buffer
  *
  * This structure holds the information required to export the buffer. Used
@@ -544,6 +557,9 @@ struct dma_buf_export_info {
 	size_t size;
 	int flags;
 	struct dma_resv *resv;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device *gpucg_dev;
+#endif
 	void *priv;
 };
 
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
                   ` (2 preceding siblings ...)
  2022-03-28  3:59 ` [RFC v4 3/8] dmabuf: Use the GPU cgroup charge/uncharge APIs T.J. Mercier
@ 2022-03-28  3:59 ` T.J. Mercier
  2022-03-28 14:36   ` Daniel Vetter
  2022-03-28  3:59 ` [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function T.J. Mercier
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

From: Hridya Valsaraju <hridya@google.com>

All DMA heaps now register a new GPU cgroup device upon creation, and the
system_heap now exports buffers associated with its GPU cgroup device for
tracking purposes.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>

---
v3 changes
Use more common dual author commit message format per John Stultz.

v2 changes
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.
---
 drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
 drivers/dma-buf/heaps/system_heap.c |  3 +++
 include/linux/dma-heap.h            | 11 +++++++++++
 3 files changed, 41 insertions(+)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 8f5848aa144f..885072427775 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/debugfs.h>
 #include <linux/device.h>
 #include <linux/dma-buf.h>
@@ -31,6 +32,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @gpucg_dev		gpu cgroup device for memory accounting
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -41,6 +43,9 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device gpucg_dev;
+#endif
 };
 
 static LIST_HEAD(heap_list);
@@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
 	return heap->name;
 }
 
+#ifdef CONFIG_CGROUP_GPU
+/**
+ * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
+ * @heap: DMA-Heap to get the gpucg_device struct for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
+ * not enabled.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return &heap->gpucg_dev;
+}
+#else /* CONFIG_CGROUP_GPU */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return NULL;
+}
+#endif /* CONFIG_CGROUP_GPU */
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
@@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	list_add(&heap->list, &heap_list);
 	mutex_unlock(&heap_list_lock);
 
+	gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
+
 	return heap;
 
 err2:
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index ab7fd896d2c4..752a05c3cfe2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	exp_info.ops = &system_heap_buf_ops;
 	exp_info.size = buffer->len;
 	exp_info.flags = fd_flags;
+#ifdef CONFIG_CGROUP_GPU
+	exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap);
+#endif
 	exp_info.priv = buffer;
 	dmabuf = dma_buf_export(&exp_info);
 	if (IS_ERR(dmabuf)) {
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 0c05561cad6e..e447a61d054e 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -10,6 +10,7 @@
 #define _DMA_HEAPS_H
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/types.h>
 
 struct dma_heap;
@@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
  */
 const char *dma_heap_get_name(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
+ * heap.
+ * @heap: DMA-Heap to retrieve gpucg_device for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function
  2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
                   ` (3 preceding siblings ...)
  2022-03-28  3:59 ` [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging T.J. Mercier
@ 2022-03-28  3:59 ` T.J. Mercier
  2022-03-29 15:21   ` Michal Koutný
  2022-03-28  3:59 ` [RFC v4 6/8] binder: Add a buffer flag to relinquish ownership of fds T.J. Mercier
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

From: Hridya Valsaraju <hridya@google.com>

The dma_buf_charge_transfer function provides a way for processes to
transfer charge of a buffer to a different process. This is essential
for the cases where a central allocator process does allocations for
various subsystems, hands over the fd to the client who requested the
memory and drops all references to the allocated memory.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>

---
v4 changes
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný.

v3 changes
Use more common dual author commit message format per John Stultz.

v2 changes
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.
---
 drivers/dma-buf/dma-buf.c  | 49 +++++++++++++++++++++++++++++++
 include/linux/cgroup_gpu.h | 12 ++++++++
 include/linux/dma-buf.h    |  2 ++
 kernel/cgroup/gpu.c        | 59 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 122 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 1ee5c60d3d6d..7748c3453b91 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1380,6 +1380,55 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
 
+/**
+ * dma_buf_transfer_charge - Change the GPU cgroup to which the provided dma_buf is charged.
+ * @dmabuf:	[in]	buffer whose charge will be migrated to a different GPU cgroup
+ * @gpucg:	[in]	the destination GPU cgroup for dmabuf's charge
+ *
+ * Only tasks that belong to the same cgroup the buffer is currently charged to
+ * may call this function, otherwise it will return -EPERM.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg *current_gpucg;
+	int ret;
+
+	/* If the source and destination cgroups are the same, don't do anything. */
+	current_gpucg = gpucg_get(current);
+	if (current_gpucg == gpucg) {
+		ret = 0;
+		goto skip_transfer;
+	}
+
+	/*
+	 * Verify that the cgroup of the process requesting the transfer is the
+	 * same as the one the buffer is currently charged to.
+	 */
+	current_gpucg = gpucg_get(current);
+	mutex_lock(&dmabuf->lock);
+	if (current_gpucg != dmabuf->gpucg) {
+		ret = -EPERM;
+		goto err;
+	}
+
+	ret = gpucg_transfer_charge(current_gpucg, gpucg, dmabuf->gpucg_dev, dmabuf->size);
+	if (ret)
+		goto err;
+	dmabuf->gpucg = gpucg;
+err:
+	mutex_unlock(&dmabuf->lock);
+skip_transfer:
+	gpucg_put(current_gpucg);
+	return ret;
+#else
+	return 0;
+#endif /* CONFIG_CGROUP_GPU */
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_transfer_charge, DMA_BUF);
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
index c90069719022..e30f15d5e9be 100644
--- a/include/linux/cgroup_gpu.h
+++ b/include/linux/cgroup_gpu.h
@@ -87,6 +87,10 @@ static inline struct gpucg *gpucg_parent(struct gpucg *cg)
 
 int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
 void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+int gpucg_transfer_charge(struct gpucg *source,
+			  struct gpucg *dest,
+			  struct gpucg_device *device,
+			  u64 usage);
 void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
 #else /* CONFIG_CGROUP_GPU */
 
@@ -121,6 +125,14 @@ static inline void gpucg_uncharge(struct gpucg *gpucg,
 				  struct gpucg_device *device,
 				  u64 usage) {}
 
+static inline int gpucg_transfer_charge(struct gpucg *source,
+					struct gpucg *dest,
+					struct gpucg_device *device,
+					u64 usage)
+{
+	return 0;
+}
+
 static inline void gpucg_register_device(struct gpucg_device *gpucg_dev,
 					 const char *name) {}
 #endif /* CONFIG_CGROUP_GPU */
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 742f29c3daaf..646827156213 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -646,4 +646,6 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 		 unsigned long);
 int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct gpucg *gpucg);
 #endif /* __DMA_BUF_H__ */
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
index ac4c470914b5..40531323d6da 100644
--- a/kernel/cgroup/gpu.c
+++ b/kernel/cgroup/gpu.c
@@ -247,6 +247,65 @@ void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
 	css_put_many(&gpucg->css, nr_pages);
 }
 
+/**
+ * gpucg_transfer_charge - Transfer a GPU charge from one cgroup to another.
+ * @source:	[in]	The GPU cgroup the charge will be transferred from.
+ * @dest:	[in]	The GPU cgroup the charge will be transferred to.
+ * @device:	[in]	The GPU cgroup device corresponding to the charge.
+ * @usage:	[in]	The size of the memory in bytes.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int gpucg_transfer_charge(struct gpucg *source,
+			  struct gpucg *dest,
+			  struct gpucg_device *device,
+			  u64 usage)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp_source, *rp_dest;
+	int ret = 0;
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+
+	mutex_lock(&gpucg_mutex);
+	rp_source = find_cg_rpool_locked(source, device);
+	if (unlikely(!rp_source)) {
+		ret = -ENOENT;
+		goto exit_early;
+	}
+
+	rp_dest = get_cg_rpool_locked(dest, device);
+	if (IS_ERR(rp_dest)) {
+		ret = PTR_ERR(rp_dest);
+		goto exit_early;
+	}
+
+	/*
+	 * First uncharge from the pool it's currently charged to. This ordering avoids double
+	 * charging while the transfer is in progress, which could cause us to hit a limit.
+	 * If the try_charge fails for this transfer, we need to be able to reverse this uncharge,
+	 * so we continue to hold the gpucg_mutex here.
+	 */
+	page_counter_uncharge(&rp_source->total, nr_pages);
+	css_put_many(&source->css, nr_pages);
+
+	/* Now attempt the new charge */
+	if (page_counter_try_charge(&rp_dest->total, nr_pages, &counter)) {
+		css_get_many(&dest->css, nr_pages);
+	} else {
+		/*
+		 * The new charge failed, so reverse the uncharge from above. This should always
+		 * succeed since charges on source are blocked by gpucg_mutex.
+		 */
+		WARN_ON(!page_counter_try_charge(&rp_source->total, nr_pages, &counter));
+		css_get_many(&source->css, nr_pages);
+	}
+exit_early:
+	mutex_unlock(&gpucg_mutex);
+	return ret;
+}
+
 /**
  * gpucg_register_device - Registers a device for memory accounting using the
  * GPU cgroup controller.
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v4 6/8] binder: Add a buffer flag to relinquish ownership of fds
  2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
                   ` (4 preceding siblings ...)
  2022-03-28  3:59 ` [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function T.J. Mercier
@ 2022-03-28  3:59 ` T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 7/8] binder: use __kernel_pid_t and __kernel_uid_t for userspace T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 8/8] selftests: Add binder cgroup gpu memory transfer test T.J. Mercier
  7 siblings, 0 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

From: Hridya Valsaraju <hridya@google.com>

This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
that a process sending an fd array to another process over binder IPC
can set to relinquish ownership of the fds being sent for memory
accounting purposes. If the flag is found to be set during the fd array
translation and the fd is for a DMA-BUF, the buffer is uncharged from
the sender's cgroup and charged to the receiving process's cgroup
instead.

It is up to the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL process
allocates memory and sends the fds to the clients over binder IPC.
The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
correctly charge the buffers to the client processes instead of the
graphics allocator HAL.

Since this is a new feature exposed to userspace, the kernel and userspace
must be compatible for the accounting to work for transfers. In all cases
the allocation and transport of DMA buffers via binder will succeed, but
only when both the kernel supports, and userspace depends on this feature
will the transfer accounting work. The possible scenarios are detailed
below:

1. new kernel + old userspace
The kernel supports the feature but userspace does not use it. The old
userspace won't mount the new cgroup controller, accounting is not
performed, charge is not transferred.

2. old kernel + new userspace
The new cgroup controller is not supported by the kernel, accounting is
not performed, charge is not transferred.

3. old kernel + old userspace
Same as #2

4. new kernel + new userspace
Cgroup is mounted, feature is supported and used.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>

---
v3 changes
Remove android from title per Todd Kjos.

Use more common dual author commit message format per John Stultz.

Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.

v2 changes
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.
---
 drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
 include/uapi/linux/android/binder.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 8351c5638880..4357d2efc8e1 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/dma-buf.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
@@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 {
 	binder_size_t fdi, fd_buf_size;
 	binder_size_t fda_offset;
+	bool transfer_gpu_charge = false;
 	const void __user *sender_ufda_base;
 	struct binder_proc *proc = thread->proc;
+	struct binder_proc *target_proc = t->to_proc;
 	int ret;
 
 	fd_buf_size = sizeof(u32) * fda->num_fds;
@@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 	if (ret)
 		return ret;
 
+	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
+		parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
+		transfer_gpu_charge = true;
+
 	for (fdi = 0; fdi < fda->num_fds; fdi++) {
 		u32 fd;
+		struct dma_buf *dmabuf;
+		struct gpucg *gpucg;
+
 		binder_size_t offset = fda_offset + fdi * sizeof(fd);
 		binder_size_t sender_uoffset = fdi * sizeof(fd);
 
@@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 						  in_reply_to);
 		if (ret)
 			return ret > 0 ? -EINVAL : ret;
+
+		if (!transfer_gpu_charge)
+			continue;
+
+		dmabuf = dma_buf_get(fd);
+		if (IS_ERR(dmabuf))
+			continue;
+
+		gpucg = gpucg_get(target_proc->tsk);
+		ret = dma_buf_transfer_charge(dmabuf, gpucg);
+		if (ret) {
+			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
+				proc->pid, thread->pid, target_proc->pid);
+			gpucg_put(gpucg);
+		}
+		dma_buf_put(dmabuf);
 	}
 	return 0;
 }
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index 3246f2c74696..169fd5069a1a 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -137,6 +137,7 @@ struct binder_buffer_object {
 
 enum {
 	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
+	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
 };
 
 /* struct binder_fd_array_object - object describing an array of fds in a buffer
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v4 7/8] binder: use __kernel_pid_t and __kernel_uid_t for userspace
  2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
                   ` (5 preceding siblings ...)
  2022-03-28  3:59 ` [RFC v4 6/8] binder: Add a buffer flag to relinquish ownership of fds T.J. Mercier
@ 2022-03-28  3:59 ` T.J. Mercier
  2022-03-28  3:59 ` [RFC v4 8/8] selftests: Add binder cgroup gpu memory transfer test T.J. Mercier
  7 siblings, 0 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

The kernel interface should use types that the kernel defines instead of
pid_t and uid_t, whose definiton is owned by libc. This fixes the header
so that it can be included without first including sys/types.h.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 include/uapi/linux/android/binder.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index 169fd5069a1a..aa28454dbca3 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -289,8 +289,8 @@ struct binder_transaction_data {
 
 	/* General information about the transaction. */
 	__u32	        flags;
-	pid_t		sender_pid;
-	uid_t		sender_euid;
+	__kernel_pid_t	sender_pid;
+	__kernel_uid_t	sender_euid;
 	binder_size_t	data_size;	/* number of bytes of data */
 	binder_size_t	offsets_size;	/* number of bytes of offsets */
 
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v4 8/8] selftests: Add binder cgroup gpu memory transfer test
  2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
                   ` (6 preceding siblings ...)
  2022-03-28  3:59 ` [RFC v4 7/8] binder: use __kernel_pid_t and __kernel_uid_t for userspace T.J. Mercier
@ 2022-03-28  3:59 ` T.J. Mercier
  7 siblings, 0 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28  3:59 UTC (permalink / raw)
  To: tjmercier, David Airlie, Daniel Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan
  Cc: kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

This test verifies that the cgroup GPU memory charge is transferred
correctly when a dmabuf is passed between processes in two different
cgroups and the sender specifies BINDER_BUFFER_FLAG_SENDER_NO_NEED in the
binder transaction data containing the dmabuf file descriptor.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
v4 changes
Skip test if not run as root per Shuah Khan.

Add better logging for abnormal child termination per Shuah Khan.
---
 .../selftests/drivers/android/binder/Makefile |   8 +
 .../drivers/android/binder/binder_util.c      | 254 +++++++++
 .../drivers/android/binder/binder_util.h      |  32 ++
 .../selftests/drivers/android/binder/config   |   4 +
 .../binder/test_dmabuf_cgroup_transfer.c      | 484 ++++++++++++++++++
 5 files changed, 782 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
 create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
 create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
 create mode 100644 tools/testing/selftests/drivers/android/binder/config
 create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c

diff --git a/tools/testing/selftests/drivers/android/binder/Makefile b/tools/testing/selftests/drivers/android/binder/Makefile
new file mode 100644
index 000000000000..726439d10675
--- /dev/null
+++ b/tools/testing/selftests/drivers/android/binder/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -Wall
+
+TEST_GEN_PROGS = test_dmabuf_cgroup_transfer
+
+include ../../../lib.mk
+
+$(OUTPUT)/test_dmabuf_cgroup_transfer: ../../../cgroup/cgroup_util.c binder_util.c
diff --git a/tools/testing/selftests/drivers/android/binder/binder_util.c b/tools/testing/selftests/drivers/android/binder/binder_util.c
new file mode 100644
index 000000000000..c9dcf5b9d42b
--- /dev/null
+++ b/tools/testing/selftests/drivers/android/binder/binder_util.c
@@ -0,0 +1,254 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "binder_util.h"
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/mount.h>
+
+#include <linux/limits.h>
+#include <linux/android/binder.h>
+#include <linux/android/binderfs.h>
+
+static const size_t BINDER_MMAP_SIZE = 64 * 1024;
+
+static void binderfs_unmount(const char *mountpoint)
+{
+	if (umount2(mountpoint, MNT_DETACH))
+		fprintf(stderr, "Failed to unmount binderfs at %s: %s\n",
+			mountpoint, strerror(errno));
+	else
+		fprintf(stderr, "Binderfs unmounted: %s\n", mountpoint);
+
+	if (rmdir(mountpoint))
+		fprintf(stderr, "Failed to remove binderfs mount %s: %s\n",
+			mountpoint, strerror(errno));
+	else
+		fprintf(stderr, "Binderfs mountpoint destroyed: %s\n", mountpoint);
+}
+
+struct binderfs_ctx create_binderfs(const char *name)
+{
+	int fd, ret, saved_errno;
+	struct binderfs_device device = { 0 };
+	struct binderfs_ctx ctx = { 0 };
+
+	/*
+	 * P_tmpdir is set to "/tmp/" on Android platforms where Binder is most
+	 * commonly used, but this path does not actually exist on Android. We
+	 * will first try using "/data/local/tmp" and fallback to P_tmpdir if
+	 * that fails for non-Android platforms.
+	 */
+	static const char tmpdir[] = "/data/local/tmp";
+	static const size_t MAX_TMPDIR_SIZE =
+		sizeof(tmpdir) > sizeof(P_tmpdir) ?
+		sizeof(tmpdir) : sizeof(P_tmpdir);
+	static const char template[] = "/binderfs_XXXXXX";
+
+	char *mkdtemp_result;
+	char binderfs_mntpt[MAX_TMPDIR_SIZE + sizeof(template)];
+	char device_path[MAX_TMPDIR_SIZE + sizeof(template) + BINDERFS_MAX_NAME];
+
+	snprintf(binderfs_mntpt, sizeof(binderfs_mntpt), "%s%s", tmpdir, template);
+
+	mkdtemp_result = mkdtemp(binderfs_mntpt);
+	if (mkdtemp_result == NULL) {
+		fprintf(stderr, "Failed to create binderfs mountpoint at %s: %s.\n",
+			binderfs_mntpt, strerror(errno));
+		fprintf(stderr, "Trying fallback mountpoint...\n");
+		snprintf(binderfs_mntpt, sizeof(binderfs_mntpt), "%s%s", P_tmpdir, template);
+		if (mkdtemp(binderfs_mntpt) == NULL) {
+			fprintf(stderr, "Failed to create binderfs mountpoint at %s: %s\n",
+			binderfs_mntpt, strerror(errno));
+			return ctx;
+		}
+	}
+	fprintf(stderr, "Binderfs mountpoint created at %s\n", binderfs_mntpt);
+
+	if (mount(NULL, binderfs_mntpt, "binder", 0, 0)) {
+		perror("Could not mount binderfs");
+		rmdir(binderfs_mntpt);
+		return ctx;
+	}
+	fprintf(stderr, "Binderfs mounted at %s\n", binderfs_mntpt);
+
+	strncpy(device.name, name, sizeof(device.name));
+	snprintf(device_path, sizeof(device_path), "%s/binder-control", binderfs_mntpt);
+	fd = open(device_path, O_RDONLY | O_CLOEXEC);
+	if (!fd) {
+		perror("Failed to open binder-control device");
+		binderfs_unmount(binderfs_mntpt);
+		return ctx;
+	}
+
+	ret = ioctl(fd, BINDER_CTL_ADD, &device);
+	saved_errno = errno;
+	close(fd);
+	errno = saved_errno;
+	if (ret) {
+		perror("Failed to allocate new binder device");
+		binderfs_unmount(binderfs_mntpt);
+		return ctx;
+	}
+
+	fprintf(stderr, "Allocated new binder device with major %d, minor %d, and name %s at %s\n",
+		device.major, device.minor, device.name, binderfs_mntpt);
+
+	ctx.name = strdup(name);
+	ctx.mountpoint = strdup(binderfs_mntpt);
+	return ctx;
+}
+
+void destroy_binderfs(struct binderfs_ctx *ctx)
+{
+	char path[PATH_MAX];
+
+	snprintf(path, sizeof(path), "%s/%s", ctx->mountpoint, ctx->name);
+
+	if (unlink(path))
+		fprintf(stderr, "Failed to unlink binder device %s: %s\n", path, strerror(errno));
+	else
+		fprintf(stderr, "Destroyed binder %s at %s\n", ctx->name, ctx->mountpoint);
+
+	binderfs_unmount(ctx->mountpoint);
+
+	free(ctx->name);
+	free(ctx->mountpoint);
+}
+
+struct binder_ctx open_binder(struct binderfs_ctx *bfs_ctx)
+{
+	struct binder_ctx ctx = {.fd = -1, .memory = NULL};
+	char path[PATH_MAX];
+
+	snprintf(path, sizeof(path), "%s/%s", bfs_ctx->mountpoint, bfs_ctx->name);
+	ctx.fd = open(path, O_RDWR | O_NONBLOCK | O_CLOEXEC);
+	if (ctx.fd < 0) {
+		fprintf(stderr, "Error opening binder device %s: %s\n", path, strerror(errno));
+		return ctx;
+	}
+
+	ctx.memory = mmap(NULL, BINDER_MMAP_SIZE, PROT_READ, MAP_SHARED, ctx.fd, 0);
+	if (ctx.memory == NULL) {
+		perror("Error mapping binder memory");
+		close(ctx.fd);
+		ctx.fd = -1;
+	}
+
+	return ctx;
+}
+
+void close_binder(struct binder_ctx *ctx)
+{
+	if (munmap(ctx->memory, BINDER_MMAP_SIZE))
+		perror("Failed to unmap binder memory");
+	ctx->memory = NULL;
+
+	if (close(ctx->fd))
+		perror("Failed to close binder");
+	ctx->fd = -1;
+}
+
+int become_binder_context_manager(int binder_fd)
+{
+	return ioctl(binder_fd, BINDER_SET_CONTEXT_MGR, 0);
+}
+
+int do_binder_write_read(int binder_fd, void *writebuf, binder_size_t writesize,
+			 void *readbuf, binder_size_t readsize)
+{
+	int err;
+	struct binder_write_read bwr = {
+		.write_buffer = (binder_uintptr_t)writebuf,
+		.write_size = writesize,
+		.read_buffer = (binder_uintptr_t)readbuf,
+		.read_size = readsize
+	};
+
+	do {
+		if (ioctl(binder_fd, BINDER_WRITE_READ, &bwr) >= 0)
+			err = 0;
+		else
+			err = -errno;
+	} while (err == -EINTR);
+
+	if (err < 0) {
+		perror("BINDER_WRITE_READ");
+		return -1;
+	}
+
+	if (bwr.write_consumed < writesize) {
+		fprintf(stderr, "Binder did not consume full write buffer %llu %llu\n",
+			bwr.write_consumed, writesize);
+		return -1;
+	}
+
+	return bwr.read_consumed;
+}
+
+static const char *reply_string(int cmd)
+{
+	switch (cmd) {
+	case BR_ERROR:
+		return("BR_ERROR");
+	case BR_OK:
+		return("BR_OK");
+	case BR_TRANSACTION_SEC_CTX:
+		return("BR_TRANSACTION_SEC_CTX");
+	case BR_TRANSACTION:
+		return("BR_TRANSACTION");
+	case BR_REPLY:
+		return("BR_REPLY");
+	case BR_ACQUIRE_RESULT:
+		return("BR_ACQUIRE_RESULT");
+	case BR_DEAD_REPLY:
+		return("BR_DEAD_REPLY");
+	case BR_TRANSACTION_COMPLETE:
+		return("BR_TRANSACTION_COMPLETE");
+	case BR_INCREFS:
+		return("BR_INCREFS");
+	case BR_ACQUIRE:
+		return("BR_ACQUIRE");
+	case BR_RELEASE:
+		return("BR_RELEASE");
+	case BR_DECREFS:
+		return("BR_DECREFS");
+	case BR_ATTEMPT_ACQUIRE:
+		return("BR_ATTEMPT_ACQUIRE");
+	case BR_NOOP:
+		return("BR_NOOP");
+	case BR_SPAWN_LOOPER:
+		return("BR_SPAWN_LOOPER");
+	case BR_FINISHED:
+		return("BR_FINISHED");
+	case BR_DEAD_BINDER:
+		return("BR_DEAD_BINDER");
+	case BR_CLEAR_DEATH_NOTIFICATION_DONE:
+		return("BR_CLEAR_DEATH_NOTIFICATION_DONE");
+	case BR_FAILED_REPLY:
+		return("BR_FAILED_REPLY");
+	case BR_FROZEN_REPLY:
+		return("BR_FROZEN_REPLY");
+	case BR_ONEWAY_SPAM_SUSPECT:
+		return("BR_ONEWAY_SPAM_SUSPECT");
+	default:
+		return("Unknown");
+	};
+}
+
+int expect_binder_reply(int32_t actual, int32_t expected)
+{
+	if (actual != expected) {
+		fprintf(stderr, "Expected %s but received %s\n",
+			reply_string(expected), reply_string(actual));
+		return -1;
+	}
+	return 0;
+}
+
diff --git a/tools/testing/selftests/drivers/android/binder/binder_util.h b/tools/testing/selftests/drivers/android/binder/binder_util.h
new file mode 100644
index 000000000000..807f5abe987e
--- /dev/null
+++ b/tools/testing/selftests/drivers/android/binder/binder_util.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef SELFTEST_BINDER_UTIL_H
+#define SELFTEST_BINDER_UTIL_H
+
+#include <stdint.h>
+
+#include <linux/android/binder.h>
+
+struct binderfs_ctx {
+	char *name;
+	char *mountpoint;
+};
+
+struct binder_ctx {
+	int fd;
+	void *memory;
+};
+
+struct binderfs_ctx create_binderfs(const char *name);
+void destroy_binderfs(struct binderfs_ctx *ctx);
+
+struct binder_ctx open_binder(struct binderfs_ctx *bfs_ctx);
+void close_binder(struct binder_ctx *ctx);
+
+int become_binder_context_manager(int binder_fd);
+
+int do_binder_write_read(int binder_fd, void *writebuf, binder_size_t writesize,
+			 void *readbuf, binder_size_t readsize);
+
+int expect_binder_reply(int32_t actual, int32_t expected);
+#endif
diff --git a/tools/testing/selftests/drivers/android/binder/config b/tools/testing/selftests/drivers/android/binder/config
new file mode 100644
index 000000000000..fcc5f8f693b3
--- /dev/null
+++ b/tools/testing/selftests/drivers/android/binder/config
@@ -0,0 +1,4 @@
+CONFIG_CGROUP_GPU=y
+CONFIG_ANDROID=y
+CONFIG_ANDROID_BINDERFS=y
+CONFIG_ANDROID_BINDER_IPC=y
diff --git a/tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c b/tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
new file mode 100644
index 000000000000..586465d7bf6d
--- /dev/null
+++ b/tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
@@ -0,0 +1,484 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * This test verifies that the cgroup GPU memory charge is transferred correctly
+ * when a dmabuf is passed between processes in two different cgroups and the
+ * sender specifies BINDER_BUFFER_FLAG_SENDER_NO_NEED in the binder transaction
+ * data containing the dmabuf file descriptor.
+ *
+ * The gpu_cgroup_dmabuf_transfer test function becomes the binder context
+ * manager, then forks a child who initiates a transaction with the context
+ * manager by specifying a target of 0. The context manager reply contains a
+ * dmabuf file descriptor which was allocated by the gpu_cgroup_dmabuf_transfer
+ * test function, but should be charged to the child cgroup after the binder
+ * transaction.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/epoll.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "binder_util.h"
+#include "../../../cgroup/cgroup_util.h"
+#include "../../../kselftest.h"
+#include "../../../kselftest_harness.h"
+
+#include <linux/limits.h>
+#include <linux/dma-heap.h>
+#include <linux/android/binder.h>
+
+#define UNUSED(x) ((void)(x))
+
+static const unsigned int BINDER_CODE = 8675309; /* Any number will work here */
+
+struct cgroup_ctx {
+	char *root;
+	char *source;
+	char *dest;
+};
+
+void destroy_cgroups(struct __test_metadata *_metadata, struct cgroup_ctx *ctx)
+{
+	if (ctx->source != NULL) {
+		TH_LOG("Destroying cgroup: %s", ctx->source);
+		rmdir(ctx->source);
+		free(ctx->source);
+	}
+
+	if (ctx->dest != NULL) {
+		TH_LOG("Destroying cgroup: %s", ctx->dest);
+		rmdir(ctx->dest);
+		free(ctx->dest);
+	}
+
+	free(ctx->root);
+	ctx->root = ctx->source = ctx->dest = NULL;
+}
+
+struct cgroup_ctx create_cgroups(struct __test_metadata *_metadata)
+{
+	struct cgroup_ctx ctx = {0};
+	char root[PATH_MAX], *tmp;
+	static const char template[] = "/gpucg_XXXXXX";
+
+	if (cg_find_unified_root(root, sizeof(root))) {
+		TH_LOG("Could not find cgroups root");
+		return ctx;
+	}
+
+	if (cg_read_strstr(root, "cgroup.controllers", "gpu")) {
+		TH_LOG("Could not find GPU controller");
+		return ctx;
+	}
+
+	if (cg_write(root, "cgroup.subtree_control", "+gpu")) {
+		TH_LOG("Could not enable GPU controller");
+		return ctx;
+	}
+
+	ctx.root = strdup(root);
+
+	snprintf(root, sizeof(root), "%s/%s", ctx.root, template);
+	tmp = mkdtemp(root);
+	if (tmp == NULL) {
+		TH_LOG("%s - Could not create source cgroup", strerror(errno));
+		destroy_cgroups(_metadata, &ctx);
+		return ctx;
+	}
+	ctx.source = strdup(tmp);
+
+	snprintf(root, sizeof(root), "%s/%s", ctx.root, template);
+	tmp = mkdtemp(root);
+	if (tmp == NULL) {
+		TH_LOG("%s - Could not create destination cgroup", strerror(errno));
+		destroy_cgroups(_metadata, &ctx);
+		return ctx;
+	}
+	ctx.dest = strdup(tmp);
+
+	TH_LOG("Created cgroups: %s %s", ctx.source, ctx.dest);
+
+	return ctx;
+}
+
+int dmabuf_heap_alloc(int fd, size_t len, int *dmabuf_fd)
+{
+	struct dma_heap_allocation_data data = {
+		.len = len,
+		.fd = 0,
+		.fd_flags = O_RDONLY | O_CLOEXEC,
+		.heap_flags = 0,
+	};
+	int ret;
+
+	if (!dmabuf_fd)
+		return -EINVAL;
+
+	ret = ioctl(fd, DMA_HEAP_IOCTL_ALLOC, &data);
+	if (ret < 0)
+		return ret;
+	*dmabuf_fd = (int)data.fd;
+	return ret;
+}
+
+/* The system heap is known to export dmabufs with support for cgroup tracking */
+int alloc_dmabuf_from_system_heap(struct __test_metadata *_metadata, size_t bytes)
+{
+	int heap_fd = -1, dmabuf_fd = -1;
+	static const char * const heap_path = "/dev/dma_heap/system";
+
+	heap_fd = open(heap_path, O_RDONLY);
+	if (heap_fd < 0) {
+		TH_LOG("%s - open %s failed!\n", strerror(errno), heap_path);
+		return -1;
+	}
+
+	if (dmabuf_heap_alloc(heap_fd, bytes, &dmabuf_fd))
+		TH_LOG("dmabuf allocation failed! - %s", strerror(errno));
+	close(heap_fd);
+
+	return dmabuf_fd;
+}
+
+int binder_request_dmabuf(int binder_fd)
+{
+	int ret;
+
+	/*
+	 * We just send an empty binder_buffer_object to initiate a transaction
+	 * with the context manager, who should respond with a single dmabuf
+	 * inside a binder_fd_array_object.
+	 */
+
+	struct binder_buffer_object bbo = {
+		.hdr.type = BINDER_TYPE_PTR,
+		.flags = 0,
+		.buffer = 0,
+		.length = 0,
+		.parent = 0, /* No parent */
+		.parent_offset = 0 /* No parent */
+	};
+
+	binder_size_t offsets[] = {0};
+
+	struct {
+		int32_t cmd;
+		struct binder_transaction_data btd;
+	} __attribute__((packed)) bc = {
+		.cmd = BC_TRANSACTION,
+		.btd = {
+			.target = { 0 },
+			.cookie = 0,
+			.code = BINDER_CODE,
+			.flags = TF_ACCEPT_FDS, /* We expect a FDA in the reply */
+			.data_size = sizeof(bbo),
+			.offsets_size = sizeof(offsets),
+			.data.ptr = {
+				(binder_uintptr_t)&bbo,
+				(binder_uintptr_t)offsets
+			}
+		},
+	};
+
+	struct {
+		int32_t reply_noop;
+	} __attribute__((packed)) br;
+
+	ret = do_binder_write_read(binder_fd, &bc, sizeof(bc), &br, sizeof(br));
+	if (ret >= sizeof(br) && expect_binder_reply(br.reply_noop, BR_NOOP)) {
+		return -1;
+	} else if (ret < sizeof(br)) {
+		fprintf(stderr, "Not enough bytes in binder reply %d\n", ret);
+		return -1;
+	}
+	return 0;
+}
+
+int send_dmabuf_reply(int binder_fd, struct binder_transaction_data *tr, int dmabuf_fd)
+{
+	int ret;
+	/*
+	 * The trailing 0 is to achieve the necessary alignment for the binder
+	 * buffer_size.
+	 */
+	int fdarray[] = { dmabuf_fd, 0 };
+
+	struct binder_buffer_object bbo = {
+		.hdr.type = BINDER_TYPE_PTR,
+		.flags = BINDER_BUFFER_FLAG_SENDER_NO_NEED,
+		.buffer = (binder_uintptr_t)fdarray,
+		.length = sizeof(fdarray),
+		.parent = 0, /* No parent */
+		.parent_offset = 0 /* No parent */
+	};
+
+	struct binder_fd_array_object bfdao = {
+		.hdr.type = BINDER_TYPE_FDA,
+		.num_fds = 1,
+		.parent = 0, /* The binder_buffer_object */
+		.parent_offset = 0 /* FDs follow immediately */
+	};
+
+	uint64_t sz = sizeof(fdarray);
+	uint8_t data[sizeof(sz) + sizeof(bbo) + sizeof(bfdao)];
+	binder_size_t offsets[] = {sizeof(sz), sizeof(sz)+sizeof(bbo)};
+
+	memcpy(data,                            &sz, sizeof(sz));
+	memcpy(data + sizeof(sz),               &bbo, sizeof(bbo));
+	memcpy(data + sizeof(sz) + sizeof(bbo), &bfdao, sizeof(bfdao));
+
+	struct {
+		int32_t cmd;
+		struct binder_transaction_data_sg btd;
+	} __attribute__((packed)) bc = {
+		.cmd = BC_REPLY_SG,
+		.btd.transaction_data = {
+			.target = { tr->target.handle },
+			.cookie = tr->cookie,
+			.code = BINDER_CODE,
+			.flags = 0,
+			.data_size = sizeof(data),
+			.offsets_size = sizeof(offsets),
+			.data.ptr = {
+				(binder_uintptr_t)data,
+				(binder_uintptr_t)offsets
+			}
+		},
+		.btd.buffers_size = sizeof(fdarray)
+	};
+
+	struct {
+		int32_t reply_noop;
+	} __attribute__((packed)) br;
+
+	ret = do_binder_write_read(binder_fd, &bc, sizeof(bc), &br, sizeof(br));
+	if (ret >= sizeof(br) && expect_binder_reply(br.reply_noop, BR_NOOP)) {
+		return -1;
+	} else if (ret < sizeof(br)) {
+		fprintf(stderr, "Not enough bytes in binder reply %d\n", ret);
+		return -1;
+	}
+	return 0;
+}
+
+struct binder_transaction_data *binder_wait_for_transaction(int binder_fd,
+							    uint32_t *readbuf,
+							    size_t readsize)
+{
+	static const int MAX_EVENTS = 1, EPOLL_WAIT_TIME_MS = 3 * 1000;
+	struct binder_reply {
+		int32_t reply0;
+		int32_t reply1;
+		struct binder_transaction_data btd;
+	} *br;
+	struct binder_transaction_data *ret = NULL;
+	struct epoll_event events[MAX_EVENTS];
+	int epoll_fd, num_events, readcount;
+	uint32_t bc[] = { BC_ENTER_LOOPER };
+
+	do_binder_write_read(binder_fd, &bc, sizeof(bc), NULL, 0);
+
+	epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+	if (epoll_fd == -1) {
+		perror("epoll_create");
+		return NULL;
+	}
+
+	events[0].events = EPOLLIN;
+	if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, binder_fd, &events[0])) {
+		perror("epoll_ctl add");
+		goto err_close;
+	}
+
+	num_events = epoll_wait(epoll_fd, events, MAX_EVENTS, EPOLL_WAIT_TIME_MS);
+	if (num_events < 0) {
+		perror("epoll_wait");
+		goto err_ctl;
+	} else if (num_events == 0) {
+		fprintf(stderr, "No events\n");
+		goto err_ctl;
+	}
+
+	readcount = do_binder_write_read(binder_fd, NULL, 0, readbuf, readsize);
+	fprintf(stderr, "Read %d bytes from binder\n", readcount);
+
+	if (readcount < (int)sizeof(struct binder_reply)) {
+		fprintf(stderr, "read_consumed not large enough\n");
+		goto err_ctl;
+	}
+
+	br = (struct binder_reply *)readbuf;
+	if (expect_binder_reply(br->reply0, BR_NOOP))
+		goto err_ctl;
+
+	if (br->reply1 == BR_TRANSACTION) {
+		if (br->btd.code == BINDER_CODE)
+			ret = &br->btd;
+		else
+			fprintf(stderr, "Received transaction with unexpected code: %u\n",
+				br->btd.code);
+	} else {
+		expect_binder_reply(br->reply1, BR_TRANSACTION_COMPLETE);
+	}
+
+err_ctl:
+	if (epoll_ctl(epoll_fd, EPOLL_CTL_DEL, binder_fd, NULL))
+		perror("epoll_ctl del");
+err_close:
+	close(epoll_fd);
+	return ret;
+}
+
+static int child_request_dmabuf_transfer(const char *cgroup, void *arg)
+{
+	UNUSED(cgroup);
+	int ret = -1;
+	uint32_t readbuf[32];
+	struct binderfs_ctx bfs_ctx = *(struct binderfs_ctx *)arg;
+	struct binder_ctx b_ctx;
+
+	fprintf(stderr, "Child PID: %d\n", getpid());
+
+	b_ctx = open_binder(&bfs_ctx);
+	if (b_ctx.fd < 0) {
+		fprintf(stderr, "Child unable to open binder\n");
+		return -1;
+	}
+
+	if (binder_request_dmabuf(b_ctx.fd))
+		goto err;
+
+	/* The child must stay alive until the binder reply is received */
+	if (binder_wait_for_transaction(b_ctx.fd, readbuf, sizeof(readbuf)) == NULL)
+		ret = 0;
+
+	/*
+	 * We don't close the received dmabuf here so that the parent can
+	 * inspect the cgroup gpu memory charges to verify the charge transfer
+	 * completed successfully.
+	 */
+err:
+	close_binder(&b_ctx);
+	fprintf(stderr, "Child done\n");
+	return ret;
+}
+
+TEST(gpu_cgroup_dmabuf_transfer)
+{
+	static const char * const GPUMEM_FILENAME = "gpu.memory.current";
+	static const size_t ONE_MiB = 1024 * 1024;
+
+	int ret, dmabuf_fd;
+	uint32_t readbuf[32];
+	long memsize;
+	pid_t child_pid;
+	struct binderfs_ctx bfs_ctx;
+	struct binder_ctx b_ctx;
+	struct cgroup_ctx cg_ctx;
+	struct binder_transaction_data *tr;
+	struct flat_binder_object *fbo;
+	struct binder_buffer_object *bbo;
+
+	if (geteuid() != 0)
+		ksft_exit_skip("Need to be root to mount binderfs\n");
+
+	bfs_ctx = create_binderfs("testbinder");
+	if (bfs_ctx.name == NULL)
+		ksft_exit_skip("The Android binderfs filesystem is not available\n");
+
+	cg_ctx = create_cgroups(_metadata);
+	if (cg_ctx.root == NULL) {
+		destroy_binderfs(&bfs_ctx);
+		ksft_exit_skip("cgroup v2 isn't mounted\n");
+	}
+
+	ASSERT_EQ(cg_enter_current(cg_ctx.source), 0) {
+		TH_LOG("Could not move parent to cgroup: %s", cg_ctx.source);
+		goto err_cg;
+	}
+
+	dmabuf_fd = alloc_dmabuf_from_system_heap(_metadata, ONE_MiB);
+	ASSERT_GE(dmabuf_fd, 0) {
+		goto err_cg;
+	}
+	TH_LOG("Allocated dmabuf");
+
+	memsize = cg_read_key_long(cg_ctx.source, GPUMEM_FILENAME, "system");
+	ASSERT_EQ(memsize, ONE_MiB) {
+		TH_LOG("GPU memory used after allocation: %ld but it should be %lu",
+		       memsize, (unsigned long)ONE_MiB);
+		goto err_dmabuf;
+	}
+
+	b_ctx = open_binder(&bfs_ctx);
+	ASSERT_GE(b_ctx.fd, 0) {
+		TH_LOG("Parent unable to open binder");
+		goto err_dmabuf;
+	}
+	TH_LOG("Opened binder at %s/%s", bfs_ctx.mountpoint, bfs_ctx.name);
+
+	ASSERT_EQ(become_binder_context_manager(b_ctx.fd), 0) {
+		TH_LOG("Cannot become context manager: %s", strerror(errno));
+		goto err_binder;
+	}
+
+	child_pid = cg_run_nowait(cg_ctx.dest, child_request_dmabuf_transfer, &bfs_ctx);
+	ASSERT_GT(child_pid, 0) {
+		TH_LOG("Error forking: %s", strerror(errno));
+		goto err_binder;
+	}
+
+	tr = binder_wait_for_transaction(b_ctx.fd, readbuf, sizeof(readbuf));
+	ASSERT_NE(tr, NULL) {
+		TH_LOG("Error receiving transaction request from child");
+		goto err_child;
+	}
+	fbo = (struct flat_binder_object *)tr->data.ptr.buffer;
+	ASSERT_EQ(fbo->hdr.type, BINDER_TYPE_PTR) {
+		TH_LOG("Did not receive a buffer object from child");
+		goto err_child;
+	}
+	bbo = (struct binder_buffer_object *)fbo;
+	ASSERT_EQ(bbo->length, 0) {
+		TH_LOG("Did not receive an empty buffer object from child");
+		goto err_child;
+	}
+
+	TH_LOG("Received transaction from child");
+	send_dmabuf_reply(b_ctx.fd, tr, dmabuf_fd);
+
+	ASSERT_EQ(cg_read_key_long(cg_ctx.dest, GPUMEM_FILENAME, "system"), ONE_MiB) {
+		TH_LOG("Destination cgroup does not have system charge!");
+		goto err_child;
+	}
+	ASSERT_EQ(cg_read_key_long(cg_ctx.source, GPUMEM_FILENAME, "system"), 0) {
+		TH_LOG("Source cgroup still has system charge!");
+		goto err_child;
+	}
+	TH_LOG("Charge transfer succeeded!");
+
+err_child:
+	waitpid(child_pid, &ret, 0);
+	if (WIFEXITED(ret))
+		TH_LOG("Child %d terminated normally with %d", child_pid, WEXITSTATUS(ret));
+	else if (WIFSIGNALED(ret))
+		TH_LOG("Child %d terminated with signal %d\n", child_pid, WTERMSIG(ret));
+err_binder:
+	close_binder(&b_ctx);
+err_dmabuf:
+	close(dmabuf_fd);
+err_cg:
+	destroy_cgroups(_metadata, &cg_ctx);
+	destroy_binderfs(&bfs_ctx);
+}
+
+TEST_HARNESS_MAIN
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  2022-03-28  3:59 ` [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging T.J. Mercier
@ 2022-03-28 14:36   ` Daniel Vetter
  2022-03-28 18:28     ` T.J. Mercier
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Vetter @ 2022-03-28 14:36 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	kaleshsingh, Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

On Mon, Mar 28, 2022 at 03:59:43AM +0000, T.J. Mercier wrote:
> From: Hridya Valsaraju <hridya@google.com>
> 
> All DMA heaps now register a new GPU cgroup device upon creation, and the
> system_heap now exports buffers associated with its GPU cgroup device for
> tracking purposes.
> 
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> 
> ---
> v3 changes
> Use more common dual author commit message format per John Stultz.
> 
> v2 changes
> Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König.

Apologies for being out of the loop quite a bit. I scrolled through this
all and I think it looks good to get going.

The only thing I have is whether we should move the cgroup controllers out
of dma-buf heaps, since that's rather android centric. E.g.
- a system gpucg_device which is used by all the various single page
  allocators (dma-buf heap but also shmem helpers and really anything
  else)
- same for cma, again both for dma-buf heaps and also for the gem cma
  helpers in drm

Otherwise this will only work on non-upstream android where gpu drivers
allocate everything from dma-buf heap. If you use something like the x86
android project with mesa drivers, then driver-internal buffers will be
allocated through gem and not through dma-buf heaps. Or at least I think
that's how it works.

But also meh, we can fix this fairly easily later on by adding these
standard gpucg_dev somwehere with a bit of kerneldoc.

Anyway has my all my ack, but don't count this as my in-depth review :-)
-Daniel

> ---
>  drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
>  drivers/dma-buf/heaps/system_heap.c |  3 +++
>  include/linux/dma-heap.h            | 11 +++++++++++
>  3 files changed, 41 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> index 8f5848aa144f..885072427775 100644
> --- a/drivers/dma-buf/dma-heap.c
> +++ b/drivers/dma-buf/dma-heap.c
> @@ -7,6 +7,7 @@
>   */
>  
>  #include <linux/cdev.h>
> +#include <linux/cgroup_gpu.h>
>  #include <linux/debugfs.h>
>  #include <linux/device.h>
>  #include <linux/dma-buf.h>
> @@ -31,6 +32,7 @@
>   * @heap_devt		heap device node
>   * @list		list head connecting to list of heaps
>   * @heap_cdev		heap char device
> + * @gpucg_dev		gpu cgroup device for memory accounting
>   *
>   * Represents a heap of memory from which buffers can be made.
>   */
> @@ -41,6 +43,9 @@ struct dma_heap {
>  	dev_t heap_devt;
>  	struct list_head list;
>  	struct cdev heap_cdev;
> +#ifdef CONFIG_CGROUP_GPU
> +	struct gpucg_device gpucg_dev;
> +#endif
>  };
>  
>  static LIST_HEAD(heap_list);
> @@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
>  	return heap->name;
>  }
>  
> +#ifdef CONFIG_CGROUP_GPU
> +/**
> + * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
> + * @heap: DMA-Heap to get the gpucg_device struct for.
> + *
> + * Returns:
> + * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
> + * not enabled.
> + */
> +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
> +{
> +	return &heap->gpucg_dev;
> +}
> +#else /* CONFIG_CGROUP_GPU */
> +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
> +{
> +	return NULL;
> +}
> +#endif /* CONFIG_CGROUP_GPU */
> +
>  struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
>  {
>  	struct dma_heap *heap, *h, *err_ret;
> @@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
>  	list_add(&heap->list, &heap_list);
>  	mutex_unlock(&heap_list_lock);
>  
> +	gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
> +
>  	return heap;
>  
>  err2:
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index ab7fd896d2c4..752a05c3cfe2 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>  	exp_info.ops = &system_heap_buf_ops;
>  	exp_info.size = buffer->len;
>  	exp_info.flags = fd_flags;
> +#ifdef CONFIG_CGROUP_GPU
> +	exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap);
> +#endif
>  	exp_info.priv = buffer;
>  	dmabuf = dma_buf_export(&exp_info);
>  	if (IS_ERR(dmabuf)) {
> diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
> index 0c05561cad6e..e447a61d054e 100644
> --- a/include/linux/dma-heap.h
> +++ b/include/linux/dma-heap.h
> @@ -10,6 +10,7 @@
>  #define _DMA_HEAPS_H
>  
>  #include <linux/cdev.h>
> +#include <linux/cgroup_gpu.h>
>  #include <linux/types.h>
>  
>  struct dma_heap;
> @@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
>   */
>  const char *dma_heap_get_name(struct dma_heap *heap);
>  
> +/**
> + * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
> + * heap.
> + * @heap: DMA-Heap to retrieve gpucg_device for.
> + *
> + * Returns:
> + * The gpucg_device struct for the heap.
> + */
> +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
> +
>  /**
>   * dma_heap_add - adds a heap to dmabuf heaps
>   * @exp_info:		information needed to register this heap
> -- 
> 2.35.1.1021.g381101b075-goog
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  2022-03-28 14:36   ` Daniel Vetter
@ 2022-03-28 18:28     ` T.J. Mercier
  2022-03-29  8:42       ` Daniel Vetter
  0 siblings, 1 reply; 22+ messages in thread
From: T.J. Mercier @ 2022-03-28 18:28 UTC (permalink / raw)
  To: T.J. Mercier, David Airlie, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	Kalesh Singh, Kenny.Ho, Michal Koutný,
	Shuah Khan, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups, linux-kselftest
  Cc: Daniel Vetter

On Mon, Mar 28, 2022 at 7:36 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Mar 28, 2022 at 03:59:43AM +0000, T.J. Mercier wrote:
> > From: Hridya Valsaraju <hridya@google.com>
> >
> > All DMA heaps now register a new GPU cgroup device upon creation, and the
> > system_heap now exports buffers associated with its GPU cgroup device for
> > tracking purposes.
> >
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> >
> > ---
> > v3 changes
> > Use more common dual author commit message format per John Stultz.
> >
> > v2 changes
> > Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
>
> Apologies for being out of the loop quite a bit. I scrolled through this
> all and I think it looks good to get going.
>
> The only thing I have is whether we should move the cgroup controllers out
> of dma-buf heaps, since that's rather android centric. E.g.
> - a system gpucg_device which is used by all the various single page
>   allocators (dma-buf heap but also shmem helpers and really anything
>   else)
> - same for cma, again both for dma-buf heaps and also for the gem cma
>   helpers in drm

Thanks Daniel, in general that makes sense to me as an approach to
making this more universal. However for the Android case I'm not sure
if the part about a single system gpucg_device would be sufficient,
because there are at least 12 different graphics related heaps that
could potentially be accounted/limited differently. [1]  So that
raises the question of how fine grained we want this to be... I tend
towards separating them all, but I haven't formed a strong opinion
about this at the moment. It sounds like you are in favor of a
smaller, more rigidly defined set of them? Either way, we need to add
code for accounting at points where we know memory is specifically for
graphics use and not something else right? (I.E. Whether it is a
dma-buf heap or somewhere like drm_gem_object_init.) So IIUC the only
question is what to use for the gpucg_device(s) at these locations.

[1] https://cs.android.com/android/platform/superproject/+/master:hardware/google/graphics/common/libion/ion.cpp;l=39-50

>
> Otherwise this will only work on non-upstream android where gpu drivers
> allocate everything from dma-buf heap. If you use something like the x86
> android project with mesa drivers, then driver-internal buffers will be
> allocated through gem and not through dma-buf heaps. Or at least I think
> that's how it works.
>
> But also meh, we can fix this fairly easily later on by adding these
> standard gpucg_dev somwehere with a bit of kerneldoc.

This is what I was thinking would happen next, but IDK if anyone sees
a more central place to do this type of use-specific accounting.

>
> Anyway has my all my ack, but don't count this as my in-depth review :-)
> -Daniel

Thanks again for taking a look!
>
> > ---
> >  drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
> >  drivers/dma-buf/heaps/system_heap.c |  3 +++
> >  include/linux/dma-heap.h            | 11 +++++++++++
> >  3 files changed, 41 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index 8f5848aa144f..885072427775 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -7,6 +7,7 @@
> >   */
> >
> >  #include <linux/cdev.h>
> > +#include <linux/cgroup_gpu.h>
> >  #include <linux/debugfs.h>
> >  #include <linux/device.h>
> >  #include <linux/dma-buf.h>
> > @@ -31,6 +32,7 @@
> >   * @heap_devt                heap device node
> >   * @list             list head connecting to list of heaps
> >   * @heap_cdev                heap char device
> > + * @gpucg_dev                gpu cgroup device for memory accounting
> >   *
> >   * Represents a heap of memory from which buffers can be made.
> >   */
> > @@ -41,6 +43,9 @@ struct dma_heap {
> >       dev_t heap_devt;
> >       struct list_head list;
> >       struct cdev heap_cdev;
> > +#ifdef CONFIG_CGROUP_GPU
> > +     struct gpucg_device gpucg_dev;
> > +#endif
> >  };
> >
> >  static LIST_HEAD(heap_list);
> > @@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> >       return heap->name;
> >  }
> >
> > +#ifdef CONFIG_CGROUP_GPU
> > +/**
> > + * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
> > + * @heap: DMA-Heap to get the gpucg_device struct for.
> > + *
> > + * Returns:
> > + * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
> > + * not enabled.
> > + */
> > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
> > +{
> > +     return &heap->gpucg_dev;
> > +}
> > +#else /* CONFIG_CGROUP_GPU */
> > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
> > +{
> > +     return NULL;
> > +}
> > +#endif /* CONFIG_CGROUP_GPU */
> > +
> >  struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> >  {
> >       struct dma_heap *heap, *h, *err_ret;
> > @@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> >       list_add(&heap->list, &heap_list);
> >       mutex_unlock(&heap_list_lock);
> >
> > +     gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
> > +
> >       return heap;
> >
> >  err2:
> > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > index ab7fd896d2c4..752a05c3cfe2 100644
> > --- a/drivers/dma-buf/heaps/system_heap.c
> > +++ b/drivers/dma-buf/heaps/system_heap.c
> > @@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
> >       exp_info.ops = &system_heap_buf_ops;
> >       exp_info.size = buffer->len;
> >       exp_info.flags = fd_flags;
> > +#ifdef CONFIG_CGROUP_GPU
> > +     exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap);
> > +#endif
> >       exp_info.priv = buffer;
> >       dmabuf = dma_buf_export(&exp_info);
> >       if (IS_ERR(dmabuf)) {
> > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
> > index 0c05561cad6e..e447a61d054e 100644
> > --- a/include/linux/dma-heap.h
> > +++ b/include/linux/dma-heap.h
> > @@ -10,6 +10,7 @@
> >  #define _DMA_HEAPS_H
> >
> >  #include <linux/cdev.h>
> > +#include <linux/cgroup_gpu.h>
> >  #include <linux/types.h>
> >
> >  struct dma_heap;
> > @@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
> >   */
> >  const char *dma_heap_get_name(struct dma_heap *heap);
> >
> > +/**
> > + * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
> > + * heap.
> > + * @heap: DMA-Heap to retrieve gpucg_device for.
> > + *
> > + * Returns:
> > + * The gpucg_device struct for the heap.
> > + */
> > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
> > +
> >  /**
> >   * dma_heap_add - adds a heap to dmabuf heaps
> >   * @exp_info:                information needed to register this heap
> > --
> > 2.35.1.1021.g381101b075-goog
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  2022-03-28 18:28     ` T.J. Mercier
@ 2022-03-29  8:42       ` Daniel Vetter
  2022-03-29 16:50         ` Tejun Heo
  2022-03-29 17:52         ` T.J. Mercier
  0 siblings, 2 replies; 22+ messages in thread
From: Daniel Vetter @ 2022-03-29  8:42 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: David Airlie, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	Kalesh Singh, Kenny.Ho, Michal Koutný,
	Shuah Khan, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups, linux-kselftest, Daniel Vetter

On Mon, Mar 28, 2022 at 11:28:24AM -0700, T.J. Mercier wrote:
> On Mon, Mar 28, 2022 at 7:36 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Mon, Mar 28, 2022 at 03:59:43AM +0000, T.J. Mercier wrote:
> > > From: Hridya Valsaraju <hridya@google.com>
> > >
> > > All DMA heaps now register a new GPU cgroup device upon creation, and the
> > > system_heap now exports buffers associated with its GPU cgroup device for
> > > tracking purposes.
> > >
> > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > >
> > > ---
> > > v3 changes
> > > Use more common dual author commit message format per John Stultz.
> > >
> > > v2 changes
> > > Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > > Christian König.
> >
> > Apologies for being out of the loop quite a bit. I scrolled through this
> > all and I think it looks good to get going.
> >
> > The only thing I have is whether we should move the cgroup controllers out
> > of dma-buf heaps, since that's rather android centric. E.g.
> > - a system gpucg_device which is used by all the various single page
> >   allocators (dma-buf heap but also shmem helpers and really anything
> >   else)
> > - same for cma, again both for dma-buf heaps and also for the gem cma
> >   helpers in drm
> 
> Thanks Daniel, in general that makes sense to me as an approach to
> making this more universal. However for the Android case I'm not sure
> if the part about a single system gpucg_device would be sufficient,
> because there are at least 12 different graphics related heaps that
> could potentially be accounted/limited differently. [1]  So that
> raises the question of how fine grained we want this to be... I tend
> towards separating them all, but I haven't formed a strong opinion
> about this at the moment. It sounds like you are in favor of a
> smaller, more rigidly defined set of them? Either way, we need to add
> code for accounting at points where we know memory is specifically for
> graphics use and not something else right? (I.E. Whether it is a
> dma-buf heap or somewhere like drm_gem_object_init.) So IIUC the only
> question is what to use for the gpucg_device(s) at these locations.

We don't have 12 in upstream, so this is a lot easier here :-)

I'm not exactly sure why you have such a huge pile of them.

For gem buffers it would be fairly similar to what you've done for dma-buf
heaps I think, with the various helper libraries (drivers stopped
hand-rolling their gem buffer) setting the right accounting group. And
yeah for system memory I think we'd need to have standard ones, for driver
specific ones it's kinda different.

> [1] https://cs.android.com/android/platform/superproject/+/master:hardware/google/graphics/common/libion/ion.cpp;l=39-50
> 
> >
> > Otherwise this will only work on non-upstream android where gpu drivers
> > allocate everything from dma-buf heap. If you use something like the x86
> > android project with mesa drivers, then driver-internal buffers will be
> > allocated through gem and not through dma-buf heaps. Or at least I think
> > that's how it works.
> >
> > But also meh, we can fix this fairly easily later on by adding these
> > standard gpucg_dev somwehere with a bit of kerneldoc.
> 
> This is what I was thinking would happen next, but IDK if anyone sees
> a more central place to do this type of use-specific accounting.

Hm I just realized ... are the names in the groups abi? If yes then I
think we need to fix this before we merge anything.
-Daniel

> 
> >
> > Anyway has my all my ack, but don't count this as my in-depth review :-)
> > -Daniel
> 
> Thanks again for taking a look!
> >
> > > ---
> > >  drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
> > >  drivers/dma-buf/heaps/system_heap.c |  3 +++
> > >  include/linux/dma-heap.h            | 11 +++++++++++
> > >  3 files changed, 41 insertions(+)
> > >
> > > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > > index 8f5848aa144f..885072427775 100644
> > > --- a/drivers/dma-buf/dma-heap.c
> > > +++ b/drivers/dma-buf/dma-heap.c
> > > @@ -7,6 +7,7 @@
> > >   */
> > >
> > >  #include <linux/cdev.h>
> > > +#include <linux/cgroup_gpu.h>
> > >  #include <linux/debugfs.h>
> > >  #include <linux/device.h>
> > >  #include <linux/dma-buf.h>
> > > @@ -31,6 +32,7 @@
> > >   * @heap_devt                heap device node
> > >   * @list             list head connecting to list of heaps
> > >   * @heap_cdev                heap char device
> > > + * @gpucg_dev                gpu cgroup device for memory accounting
> > >   *
> > >   * Represents a heap of memory from which buffers can be made.
> > >   */
> > > @@ -41,6 +43,9 @@ struct dma_heap {
> > >       dev_t heap_devt;
> > >       struct list_head list;
> > >       struct cdev heap_cdev;
> > > +#ifdef CONFIG_CGROUP_GPU
> > > +     struct gpucg_device gpucg_dev;
> > > +#endif
> > >  };
> > >
> > >  static LIST_HEAD(heap_list);
> > > @@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> > >       return heap->name;
> > >  }
> > >
> > > +#ifdef CONFIG_CGROUP_GPU
> > > +/**
> > > + * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
> > > + * @heap: DMA-Heap to get the gpucg_device struct for.
> > > + *
> > > + * Returns:
> > > + * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
> > > + * not enabled.
> > > + */
> > > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
> > > +{
> > > +     return &heap->gpucg_dev;
> > > +}
> > > +#else /* CONFIG_CGROUP_GPU */
> > > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
> > > +{
> > > +     return NULL;
> > > +}
> > > +#endif /* CONFIG_CGROUP_GPU */
> > > +
> > >  struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> > >  {
> > >       struct dma_heap *heap, *h, *err_ret;
> > > @@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> > >       list_add(&heap->list, &heap_list);
> > >       mutex_unlock(&heap_list_lock);
> > >
> > > +     gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
> > > +
> > >       return heap;
> > >
> > >  err2:
> > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > > index ab7fd896d2c4..752a05c3cfe2 100644
> > > --- a/drivers/dma-buf/heaps/system_heap.c
> > > +++ b/drivers/dma-buf/heaps/system_heap.c
> > > @@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
> > >       exp_info.ops = &system_heap_buf_ops;
> > >       exp_info.size = buffer->len;
> > >       exp_info.flags = fd_flags;
> > > +#ifdef CONFIG_CGROUP_GPU
> > > +     exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap);
> > > +#endif
> > >       exp_info.priv = buffer;
> > >       dmabuf = dma_buf_export(&exp_info);
> > >       if (IS_ERR(dmabuf)) {
> > > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
> > > index 0c05561cad6e..e447a61d054e 100644
> > > --- a/include/linux/dma-heap.h
> > > +++ b/include/linux/dma-heap.h
> > > @@ -10,6 +10,7 @@
> > >  #define _DMA_HEAPS_H
> > >
> > >  #include <linux/cdev.h>
> > > +#include <linux/cgroup_gpu.h>
> > >  #include <linux/types.h>
> > >
> > >  struct dma_heap;
> > > @@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
> > >   */
> > >  const char *dma_heap_get_name(struct dma_heap *heap);
> > >
> > > +/**
> > > + * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
> > > + * heap.
> > > + * @heap: DMA-Heap to retrieve gpucg_device for.
> > > + *
> > > + * Returns:
> > > + * The gpucg_device struct for the heap.
> > > + */
> > > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
> > > +
> > >  /**
> > >   * dma_heap_add - adds a heap to dmabuf heaps
> > >   * @exp_info:                information needed to register this heap
> > > --
> > > 2.35.1.1021.g381101b075-goog
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function
  2022-03-28  3:59 ` [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function T.J. Mercier
@ 2022-03-29 15:21   ` Michal Koutný
  2022-04-01 18:41     ` T.J. Mercier
  0 siblings, 1 reply; 22+ messages in thread
From: Michal Koutný @ 2022-03-29 15:21 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	kaleshsingh, Kenny.Ho, skhan, dri-devel, linux-doc, linux-kernel,
	linux-media, linaro-mm-sig, cgroups, linux-kselftest

Hi.

On Mon, Mar 28, 2022 at 03:59:44AM +0000, "T.J. Mercier" <tjmercier@google.com> wrote:
> From: Hridya Valsaraju <hridya@google.com>
> 
> The dma_buf_charge_transfer function provides a way for processes to

(s/dma_bug_charge_transfer/dma_bug_transfer_charge/)

> transfer charge of a buffer to a different process. This is essential
> for the cases where a central allocator process does allocations for
> various subsystems, hands over the fd to the client who requested the
> memory and drops all references to the allocated memory.

I understood from [1] some buffers are backed by regular RAM. How are
these charges going to be transferred (if so)?


Thanks,
Michal

[1]
https://lore.kernel.org/r/CABdmKX2NSAKMC6rReMYfo2SSVNxEXcS466hk3qF6YFt-j-+_NQ@mail.gmail.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  2022-03-29  8:42       ` Daniel Vetter
@ 2022-03-29 16:50         ` Tejun Heo
  2022-03-29 17:52         ` T.J. Mercier
  1 sibling, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2022-03-29 16:50 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: T.J. Mercier, David Airlie, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Zefan Li, Johannes Weiner, Shuah Khan, Kalesh Singh,
	Kenny.Ho, Michal Koutný,
	Shuah Khan, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups, linux-kselftest

On Tue, Mar 29, 2022 at 10:42:20AM +0200, Daniel Vetter wrote:
> Hm I just realized ... are the names in the groups abi? If yes then I
> think we need to fix this before we merge anything.

Yes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-03-28  3:59 ` [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory T.J. Mercier
@ 2022-03-29 16:59   ` Tejun Heo
  2022-03-30 20:56     ` T.J. Mercier
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2022-03-29 16:59 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Zefan Li, Johannes Weiner, Shuah Khan, kaleshsingh,
	Kenny.Ho, mkoutny, skhan, dri-devel, linux-doc, linux-kernel,
	linux-media, linaro-mm-sig, cgroups, linux-kselftest

Hello,

On Mon, Mar 28, 2022 at 03:59:41AM +0000, T.J. Mercier wrote:
> The API/UAPI can be extended to set per-device/total allocation limits
> in the future.

This total thing kinda bothers me. Can you please provide some concrete
examples of how this and per-device limits would be used?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  2022-03-29  8:42       ` Daniel Vetter
  2022-03-29 16:50         ` Tejun Heo
@ 2022-03-29 17:52         ` T.J. Mercier
  1 sibling, 0 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-03-29 17:52 UTC (permalink / raw)
  To: T.J. Mercier, David Airlie, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	Kalesh Singh, Kenny.Ho, Michal Koutný,
	Shuah Khan, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups, linux-kselftest
  Cc: Daniel Vetter

On Tue, Mar 29, 2022 at 1:42 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Mar 28, 2022 at 11:28:24AM -0700, T.J. Mercier wrote:
> > On Mon, Mar 28, 2022 at 7:36 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Mon, Mar 28, 2022 at 03:59:43AM +0000, T.J. Mercier wrote:
> > > > From: Hridya Valsaraju <hridya@google.com>
> > > >
> > > > All DMA heaps now register a new GPU cgroup device upon creation, and the
> > > > system_heap now exports buffers associated with its GPU cgroup device for
> > > > tracking purposes.
> > > >
> > > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > > >
> > > > ---
> > > > v3 changes
> > > > Use more common dual author commit message format per John Stultz.
> > > >
> > > > v2 changes
> > > > Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > > > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > > > Christian König.
> > >
> > > Apologies for being out of the loop quite a bit. I scrolled through this
> > > all and I think it looks good to get going.
> > >
> > > The only thing I have is whether we should move the cgroup controllers out
> > > of dma-buf heaps, since that's rather android centric. E.g.
> > > - a system gpucg_device which is used by all the various single page
> > >   allocators (dma-buf heap but also shmem helpers and really anything
> > >   else)
> > > - same for cma, again both for dma-buf heaps and also for the gem cma
> > >   helpers in drm
> >
> > Thanks Daniel, in general that makes sense to me as an approach to
> > making this more universal. However for the Android case I'm not sure
> > if the part about a single system gpucg_device would be sufficient,
> > because there are at least 12 different graphics related heaps that
> > could potentially be accounted/limited differently. [1]  So that
> > raises the question of how fine grained we want this to be... I tend
> > towards separating them all, but I haven't formed a strong opinion
> > about this at the moment. It sounds like you are in favor of a
> > smaller, more rigidly defined set of them? Either way, we need to add
> > code for accounting at points where we know memory is specifically for
> > graphics use and not something else right? (I.E. Whether it is a
> > dma-buf heap or somewhere like drm_gem_object_init.) So IIUC the only
> > question is what to use for the gpucg_device(s) at these locations.
>
> We don't have 12 in upstream, so this is a lot easier here :-)
>
> I'm not exactly sure why you have such a huge pile of them.
>
> For gem buffers it would be fairly similar to what you've done for dma-buf
> heaps I think, with the various helper libraries (drivers stopped
> hand-rolling their gem buffer) setting the right accounting group. And
> yeah for system memory I think we'd need to have standard ones, for driver
> specific ones it's kinda different.
>
> > [1] https://cs.android.com/android/platform/superproject/+/master:hardware/google/graphics/common/libion/ion.cpp;l=39-50
> >
> > >
> > > Otherwise this will only work on non-upstream android where gpu drivers
> > > allocate everything from dma-buf heap. If you use something like the x86
> > > android project with mesa drivers, then driver-internal buffers will be
> > > allocated through gem and not through dma-buf heaps. Or at least I think
> > > that's how it works.
> > >
> > > But also meh, we can fix this fairly easily later on by adding these
> > > standard gpucg_dev somwehere with a bit of kerneldoc.
> >
> > This is what I was thinking would happen next, but IDK if anyone sees
> > a more central place to do this type of use-specific accounting.
>
> Hm I just realized ... are the names in the groups abi? If yes then I
> think we need to fix this before we merge anything.
> -Daniel

Do you mean the set of possible names being part of the ABI for GPU
cgroups? I'm not exactly sure what you mean here.

The name is a settable string inside the gpucg_device struct, and
right now the docs say it must be from a string literal but this can
be changed. The only one this patchset adds is "system", which comes
from the name field in its dma_heap_export_info struct when it's first
created (and that string is hardcoded). The heap gpucg_devices are all
registered from one spot in dma-heap.c, so maybe we should append
"-heap" to the gpucg_device names there so "system" is available for a
standardized name. Let me make those two changes, and I will also
rebase this series before resending.



>
> >
> > >
> > > Anyway has my all my ack, but don't count this as my in-depth review :-)
> > > -Daniel
> >
> > Thanks again for taking a look!
> > >
> > > > ---
> > > >  drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
> > > >  drivers/dma-buf/heaps/system_heap.c |  3 +++
> > > >  include/linux/dma-heap.h            | 11 +++++++++++
> > > >  3 files changed, 41 insertions(+)
> > > >
> > > > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > > > index 8f5848aa144f..885072427775 100644
> > > > --- a/drivers/dma-buf/dma-heap.c
> > > > +++ b/drivers/dma-buf/dma-heap.c
> > > > @@ -7,6 +7,7 @@
> > > >   */
> > > >
> > > >  #include <linux/cdev.h>
> > > > +#include <linux/cgroup_gpu.h>
> > > >  #include <linux/debugfs.h>
> > > >  #include <linux/device.h>
> > > >  #include <linux/dma-buf.h>
> > > > @@ -31,6 +32,7 @@
> > > >   * @heap_devt                heap device node
> > > >   * @list             list head connecting to list of heaps
> > > >   * @heap_cdev                heap char device
> > > > + * @gpucg_dev                gpu cgroup device for memory accounting
> > > >   *
> > > >   * Represents a heap of memory from which buffers can be made.
> > > >   */
> > > > @@ -41,6 +43,9 @@ struct dma_heap {
> > > >       dev_t heap_devt;
> > > >       struct list_head list;
> > > >       struct cdev heap_cdev;
> > > > +#ifdef CONFIG_CGROUP_GPU
> > > > +     struct gpucg_device gpucg_dev;
> > > > +#endif
> > > >  };
> > > >
> > > >  static LIST_HEAD(heap_list);
> > > > @@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> > > >       return heap->name;
> > > >  }
> > > >
> > > > +#ifdef CONFIG_CGROUP_GPU
> > > > +/**
> > > > + * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
> > > > + * @heap: DMA-Heap to get the gpucg_device struct for.
> > > > + *
> > > > + * Returns:
> > > > + * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
> > > > + * not enabled.
> > > > + */
> > > > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
> > > > +{
> > > > +     return &heap->gpucg_dev;
> > > > +}
> > > > +#else /* CONFIG_CGROUP_GPU */
> > > > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
> > > > +{
> > > > +     return NULL;
> > > > +}
> > > > +#endif /* CONFIG_CGROUP_GPU */
> > > > +
> > > >  struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> > > >  {
> > > >       struct dma_heap *heap, *h, *err_ret;
> > > > @@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> > > >       list_add(&heap->list, &heap_list);
> > > >       mutex_unlock(&heap_list_lock);
> > > >
> > > > +     gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
> > > > +
> > > >       return heap;
> > > >
> > > >  err2:
> > > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > > > index ab7fd896d2c4..752a05c3cfe2 100644
> > > > --- a/drivers/dma-buf/heaps/system_heap.c
> > > > +++ b/drivers/dma-buf/heaps/system_heap.c
> > > > @@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
> > > >       exp_info.ops = &system_heap_buf_ops;
> > > >       exp_info.size = buffer->len;
> > > >       exp_info.flags = fd_flags;
> > > > +#ifdef CONFIG_CGROUP_GPU
> > > > +     exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap);
> > > > +#endif
> > > >       exp_info.priv = buffer;
> > > >       dmabuf = dma_buf_export(&exp_info);
> > > >       if (IS_ERR(dmabuf)) {
> > > > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
> > > > index 0c05561cad6e..e447a61d054e 100644
> > > > --- a/include/linux/dma-heap.h
> > > > +++ b/include/linux/dma-heap.h
> > > > @@ -10,6 +10,7 @@
> > > >  #define _DMA_HEAPS_H
> > > >
> > > >  #include <linux/cdev.h>
> > > > +#include <linux/cgroup_gpu.h>
> > > >  #include <linux/types.h>
> > > >
> > > >  struct dma_heap;
> > > > @@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
> > > >   */
> > > >  const char *dma_heap_get_name(struct dma_heap *heap);
> > > >
> > > > +/**
> > > > + * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
> > > > + * heap.
> > > > + * @heap: DMA-Heap to retrieve gpucg_device for.
> > > > + *
> > > > + * Returns:
> > > > + * The gpucg_device struct for the heap.
> > > > + */
> > > > +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
> > > > +
> > > >  /**
> > > >   * dma_heap_add - adds a heap to dmabuf heaps
> > > >   * @exp_info:                information needed to register this heap
> > > > --
> > > > 2.35.1.1021.g381101b075-goog
> > > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-03-29 16:59   ` Tejun Heo
@ 2022-03-30 20:56     ` T.J. Mercier
  2022-04-04 17:41       ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: T.J. Mercier @ 2022-03-30 20:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Zefan Li, Johannes Weiner, Shuah Khan, Kalesh Singh,
	Kenny.Ho, Michal Koutný,
	Shuah Khan, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups, linux-kselftest

On Tue, Mar 29, 2022 at 9:59 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,

I'm sorry for the delay Tejun, my test device stopped working and my
attention has been occupied with that.

>
> On Mon, Mar 28, 2022 at 03:59:41AM +0000, T.J. Mercier wrote:
> > The API/UAPI can be extended to set per-device/total allocation limits
> > in the future.
>
> This total thing kinda bothers me. Can you please provide some concrete
> examples of how this and per-device limits would be used?

The use case we have for accounting the total (separate from the
individual devices) is to include the value as part of bugreports, for
understanding the system-wide amount of dmabuf allocations. I'm not
aware of an existing need to limit the total. Admittedly this is just
the sum over the devices, but we currently maintain out of tree code
to do this sort of thing today. [1]

The per-device limits would be used to restrict the amount of each
type of allocation charged to an individual application to prevent
hogging or to completely prevent access. This limitation is not
something we have implemented today, but it is on our roadmap.

[1] https://android-review.googlesource.com/c/kernel/common/+/1566704/3/drivers/dma-buf/dma-heap.c

>
> Thanks.
>
> --
> tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function
  2022-03-29 15:21   ` Michal Koutný
@ 2022-04-01 18:41     ` T.J. Mercier
  2022-04-05 12:12       ` Michal Koutný
  0 siblings, 1 reply; 22+ messages in thread
From: T.J. Mercier @ 2022-04-01 18:41 UTC (permalink / raw)
  To: Michal Koutný
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	Kalesh Singh, Kenny.Ho, Shuah Khan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

On Tue, Mar 29, 2022 at 8:21 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hi.
>
> On Mon, Mar 28, 2022 at 03:59:44AM +0000, "T.J. Mercier" <tjmercier@google.com> wrote:
> > From: Hridya Valsaraju <hridya@google.com>
> >
> > The dma_buf_charge_transfer function provides a way for processes to
>
> (s/dma_bug_charge_transfer/dma_bug_transfer_charge/)
>
Doh! Thanks.

> > transfer charge of a buffer to a different process. This is essential
> > for the cases where a central allocator process does allocations for
> > various subsystems, hands over the fd to the client who requested the
> > memory and drops all references to the allocated memory.
>
> I understood from [1] some buffers are backed by regular RAM. How are
> these charges going to be transferred (if so)?
>
This link doesn't work for me, but I think you're referring to the
discussion about your "RAM_backed_buffers" comment from March 23rd. I
wanted to do a simple test to confirm my own understanding here, but
that got delayed due to some problems on my end. Anyway the test I did
goes like this: enable memcg and gpu cgoups tracking and run a process
that allocates 100MiB of dmabufs. Observe memcg and gpu accounting
values before and after the allocation.

Before
# cat memory.current gpu.memory.current
14909440
system 0

<Test program does the allocation of 100MiB of dmabufs>

After
# cat memory.current gpu.memory.current
48025600
system 104857600

So the memcg value increases by about 30 MiB while the gpu values
increases by 100 MiB. This is with kmem enabled, and the /proc/maps
file for this process indicates that the majority of that 30 MiB is
kernel memory. I think this result shows that neither the kernel nor
process memory overlap with the gpu cgroup tracking of these
allocations. So despite the fact that these buffers are in main
memory, they are allocated in a way that does not result in memcg
attribution. (It looks to me like __GFP_ACCOUNT is not set for these.)

>
> Thanks,
> Michal
>
> [1]
> https://lore.kernel.org/r/CABdmKX2NSAKMC6rReMYfo2SSVNxEXcS466hk3qF6YFt-j-+_NQ@mail.gmail.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-03-30 20:56     ` T.J. Mercier
@ 2022-04-04 17:41       ` Tejun Heo
  2022-04-04 17:49         ` T.J. Mercier
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2022-04-04 17:41 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Zefan Li, Johannes Weiner, Shuah Khan, Kalesh Singh,
	Kenny.Ho, Michal Koutný,
	Shuah Khan, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups, linux-kselftest

Hello,

On Wed, Mar 30, 2022 at 01:56:09PM -0700, T.J. Mercier wrote:
> The use case we have for accounting the total (separate from the
> individual devices) is to include the value as part of bugreports, for
> understanding the system-wide amount of dmabuf allocations. I'm not
> aware of an existing need to limit the total. Admittedly this is just
> the sum over the devices, but we currently maintain out of tree code
> to do this sort of thing today. [1]

So, drop this part?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-04-04 17:41       ` Tejun Heo
@ 2022-04-04 17:49         ` T.J. Mercier
  0 siblings, 0 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-04-04 17:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Zefan Li, Johannes Weiner, Shuah Khan, Kalesh Singh,
	Kenny.Ho, Michal Koutný,
	Shuah Khan, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups, linux-kselftest

On Mon, Apr 4, 2022 at 10:42 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Wed, Mar 30, 2022 at 01:56:09PM -0700, T.J. Mercier wrote:
> > The use case we have for accounting the total (separate from the
> > individual devices) is to include the value as part of bugreports, for
> > understanding the system-wide amount of dmabuf allocations. I'm not
> > aware of an existing need to limit the total. Admittedly this is just
> > the sum over the devices, but we currently maintain out of tree code
> > to do this sort of thing today. [1]
>
> So, drop this part?

Ok, will do - I'll drop the "total" limitation text from the series.
>
> Thanks.
>
> --
> tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function
  2022-04-01 18:41     ` T.J. Mercier
@ 2022-04-05 12:12       ` Michal Koutný
  2022-04-05 17:48         ` T.J. Mercier
  0 siblings, 1 reply; 22+ messages in thread
From: Michal Koutný @ 2022-04-05 12:12 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	Kalesh Singh, Kenny.Ho, Shuah Khan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

On Fri, Apr 01, 2022 at 11:41:36AM -0700, "T.J. Mercier" <tjmercier@google.com> wrote:
> This link doesn't work for me, but I think you're referring to the
> discussion about your "RAM_backed_buffers" comment from March 23rd.

(Oops, it's a non-public message. But yes, you guessed it right ;-))

> Anyway the test I did goes like this: enable memcg and gpu cgoups
> tracking and run a process that allocates 100MiB of dmabufs. Observe
> memcg and gpu accounting values before and after the allocation.

Thanks for this measurement/dem/demoo.

> Before
> # cat memory.current gpu.memory.current
> 14909440
> system 0
> 
> <Test program does the allocation of 100MiB of dmabufs>
> 
> After
> # cat memory.current gpu.memory.current
> 48025600
> system 104857600
> 
> So the memcg value increases by about 30 MiB while the gpu values
> increases by 100 MiB.

> This is with kmem enabled, and the /proc/maps
> file for this process indicates that the majority of that 30 MiB is
> kernel memory.

> I think this result shows that neither the kernel nor process memory
> overlap with the gpu cgroup tracking of these allocations.

It depends how the semantics of the 'system' entry is defined, no?
As I grasped from other thread, the 'total' is going to be removed, so
'system' represents exclusively device memory?


> So despite the fact that these buffers are in main memory, they are
> allocated in a way that does not result in memcg attribution. (It
> looks to me like __GFP_ACCOUNT is not set for these.)

(I thought you knew what dmabufs your program used :-p)

So, the goal is to do the tracking and migrations only via the gpu cg
layer, regardless how memcg charges it (or not).

(I have no opinion on that, I'm just summing it so that we're on the
same page.)

Michal

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function
  2022-04-05 12:12       ` Michal Koutný
@ 2022-04-05 17:48         ` T.J. Mercier
  0 siblings, 0 replies; 22+ messages in thread
From: T.J. Mercier @ 2022-04-05 17:48 UTC (permalink / raw)
  To: Michal Koutný
  Cc: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	Kalesh Singh, Kenny.Ho, Shuah Khan, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups,
	linux-kselftest

On Tue, Apr 5, 2022 at 5:12 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Fri, Apr 01, 2022 at 11:41:36AM -0700, "T.J. Mercier" <tjmercier@google.com> wrote:
> > This link doesn't work for me, but I think you're referring to the
> > discussion about your "RAM_backed_buffers" comment from March 23rd.
>
> (Oops, it's a non-public message. But yes, you guessed it right ;-))
>
> > Anyway the test I did goes like this: enable memcg and gpu cgoups
> > tracking and run a process that allocates 100MiB of dmabufs. Observe
> > memcg and gpu accounting values before and after the allocation.
>
> Thanks for this measurement/dem/demoo.
>
> > Before
> > # cat memory.current gpu.memory.current
> > 14909440
> > system 0
> >
> > <Test program does the allocation of 100MiB of dmabufs>
> >
> > After
> > # cat memory.current gpu.memory.current
> > 48025600
> > system 104857600
> >
> > So the memcg value increases by about 30 MiB while the gpu values
> > increases by 100 MiB.
>
> > This is with kmem enabled, and the /proc/maps
> > file for this process indicates that the majority of that 30 MiB is
> > kernel memory.
>
> > I think this result shows that neither the kernel nor process memory
> > overlap with the gpu cgroup tracking of these allocations.
>
> It depends how the semantics of the 'system' entry is defined, no?
> As I grasped from other thread, the 'total' is going to be removed, so
> 'system' represents exclusively device memory?
>
That's right. The system charges (soon to be renamed "system-heap")
result only from an allocator (in this case the system heap) deciding
to call gpucg_try_charge for the buffer which is entirely device
memory.
>
> > So despite the fact that these buffers are in main memory, they are
> > allocated in a way that does not result in memcg attribution. (It
> > looks to me like __GFP_ACCOUNT is not set for these.)
>
> (I thought you knew what dmabufs your program used :-p)
>
I'm coming up to speed on a lot of new-to-me code here. :)
Just for completeness, these buffers were allocated with
libdmabufheap's AllocSystem.

> So, the goal is to do the tracking and migrations only via the gpu cg
> layer, regardless how memcg charges it (or not).
>
> (I have no opinion on that, I'm just summing it so that we're on the
> same page.)
>
Yes, this reflects my intention and current state of the code in this series.

> Michal

Thanks,
T.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-04-06  0:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-28  3:59 [RFC v4 0/8] Proposal for a GPU cgroup controller T.J. Mercier
2022-03-28  3:59 ` [RFC v4 1/8] gpu: rfc: " T.J. Mercier
2022-03-28  3:59 ` [RFC v4 2/8] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory T.J. Mercier
2022-03-29 16:59   ` Tejun Heo
2022-03-30 20:56     ` T.J. Mercier
2022-04-04 17:41       ` Tejun Heo
2022-04-04 17:49         ` T.J. Mercier
2022-03-28  3:59 ` [RFC v4 3/8] dmabuf: Use the GPU cgroup charge/uncharge APIs T.J. Mercier
2022-03-28  3:59 ` [RFC v4 4/8] dmabuf: heaps: export system_heap buffers with GPU cgroup charging T.J. Mercier
2022-03-28 14:36   ` Daniel Vetter
2022-03-28 18:28     ` T.J. Mercier
2022-03-29  8:42       ` Daniel Vetter
2022-03-29 16:50         ` Tejun Heo
2022-03-29 17:52         ` T.J. Mercier
2022-03-28  3:59 ` [RFC v4 5/8] dmabuf: Add gpu cgroup charge transfer function T.J. Mercier
2022-03-29 15:21   ` Michal Koutný
2022-04-01 18:41     ` T.J. Mercier
2022-04-05 12:12       ` Michal Koutný
2022-04-05 17:48         ` T.J. Mercier
2022-03-28  3:59 ` [RFC v4 6/8] binder: Add a buffer flag to relinquish ownership of fds T.J. Mercier
2022-03-28  3:59 ` [RFC v4 7/8] binder: use __kernel_pid_t and __kernel_uid_t for userspace T.J. Mercier
2022-03-28  3:59 ` [RFC v4 8/8] selftests: Add binder cgroup gpu memory transfer test T.J. Mercier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).