All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/6] Proposal for a GPU cgroup controller
@ 2022-02-11 16:18 ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.

Changelog:

v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/

Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.

Fix incorrect Kconfig help section indentation per Randy Dunlap.

History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.

[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/

T.J. Mercier (6):
  gpu: rfc: Proposal for a GPU cgroup controller
  cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
    memory
  dmabuf: Use the GPU cgroup charge/uncharge APIs
  dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  dmabuf: Add gpu cgroup charge transfer function
  android: binder: Add a buffer flag to relinquish ownership of fds

 Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 drivers/android/binder.c             |  26 +++
 drivers/dma-buf/dma-buf.c            | 100 +++++++++
 drivers/dma-buf/dma-heap.c           |  27 +++
 drivers/dma-buf/heaps/system_heap.c  |   3 +
 include/linux/cgroup_gpu.h           | 127 +++++++++++
 include/linux/cgroup_subsys.h        |   4 +
 include/linux/dma-buf.h              |  22 +-
 include/linux/dma-heap.h             |  11 +
 include/uapi/linux/android/binder.h  |   1 +
 init/Kconfig                         |   7 +
 kernel/cgroup/Makefile               |   1 +
 kernel/cgroup/gpu.c                  | 304 +++++++++++++++++++++++++++
 14 files changed, 830 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [RFC v2 0/6] Proposal for a GPU cgroup controller
@ 2022-02-11 16:18 ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: linux-doc, Kenny.Ho, linux-kernel, dri-devel, linaro-mm-sig,
	kaleshsingh, cgroups, T.J. Mercier, linux-media

This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.

Changelog:

v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/

Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.

Fix incorrect Kconfig help section indentation per Randy Dunlap.

History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.

[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/

T.J. Mercier (6):
  gpu: rfc: Proposal for a GPU cgroup controller
  cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
    memory
  dmabuf: Use the GPU cgroup charge/uncharge APIs
  dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  dmabuf: Add gpu cgroup charge transfer function
  android: binder: Add a buffer flag to relinquish ownership of fds

 Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 drivers/android/binder.c             |  26 +++
 drivers/dma-buf/dma-buf.c            | 100 +++++++++
 drivers/dma-buf/dma-heap.c           |  27 +++
 drivers/dma-buf/heaps/system_heap.c  |   3 +
 include/linux/cgroup_gpu.h           | 127 +++++++++++
 include/linux/cgroup_subsys.h        |   4 +
 include/linux/dma-buf.h              |  22 +-
 include/linux/dma-heap.h             |  11 +
 include/uapi/linux/android/binder.h  |   1 +
 init/Kconfig                         |   7 +
 kernel/cgroup/Makefile               |   1 +
 kernel/cgroup/gpu.c                  | 304 +++++++++++++++++++++++++++
 14 files changed, 830 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [RFC v2 0/6] Proposal for a GPU cgroup controller
@ 2022-02-11 16:18 ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark
  Cc: kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA, Kenny.Ho-5C7GfCeVMHo,
	T.J. Mercier, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	cgroups-u79uwXL29TY76Z2rM5mHXA

This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.

Changelog:

v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/

Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.

Fix incorrect Kconfig help section indentation per Randy Dunlap.

History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.

[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org/

T.J. Mercier (6):
  gpu: rfc: Proposal for a GPU cgroup controller
  cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
    memory
  dmabuf: Use the GPU cgroup charge/uncharge APIs
  dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  dmabuf: Add gpu cgroup charge transfer function
  android: binder: Add a buffer flag to relinquish ownership of fds

 Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 drivers/android/binder.c             |  26 +++
 drivers/dma-buf/dma-buf.c            | 100 +++++++++
 drivers/dma-buf/dma-heap.c           |  27 +++
 drivers/dma-buf/heaps/system_heap.c  |   3 +
 include/linux/cgroup_gpu.h           | 127 +++++++++++
 include/linux/cgroup_subsys.h        |   4 +
 include/linux/dma-buf.h              |  22 +-
 include/linux/dma-heap.h             |  11 +
 include/uapi/linux/android/binder.h  |   1 +
 init/Kconfig                         |   7 +
 kernel/cgroup/Makefile               |   1 +
 kernel/cgroup/gpu.c                  | 304 +++++++++++++++++++++++++++
 14 files changed, 830 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [RFC v2 1/6] gpu: rfc: Proposal for a GPU cgroup controller
  2022-02-11 16:18 ` T.J. Mercier
  (?)
@ 2022-02-11 16:18   ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.

The new cgroup controller would:
* Allow setting per-cgroup limits on the total size of buffers charged
  to it.
* Allow setting per-device limits on the total size of buffers
  allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
  cgroup.

The prototype in the following patches is only for memory accounting
using the GPU cgroup controller and does not implement limit setting.

[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 2 files changed, 199 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst

diff --git a/Documentation/gpu/rfc/gpu-cgroup.rst b/Documentation/gpu/rfc/gpu-cgroup.rst
new file mode 100644
index 000000000000..0bb761223b97
--- /dev/null
+++ b/Documentation/gpu/rfc/gpu-cgroup.rst
@@ -0,0 +1,195 @@
+===================================
+GPU cgroup controller
+===================================
+
+Goals
+=====
+This document intends to outline a plan to create a cgroup v2 controller subsystem
+for the per-cgroup accounting of device and system memory allocated by the GPU
+and related subsystems.
+
+The new cgroup controller would:
+
+* Allow setting per-cgroup limits on the total size of buffers charged to it.
+
+* Allow setting per-device limits on the total size of buffers allocated by a
+  device/allocator within a cgroup.
+
+* Expose a per-device/allocator breakdown of the buffers charged to a cgroup.
+
+Alternatives Considered
+=======================
+
+The following alternatives were considered:
+
+The memory cgroup controller
+____________________________
+
+1. As was noted in [1], memory accounting provided by the GPU cgroup
+controller is not a good fit for integration into memcg due to the
+differences in how accounting is performed. It implements a mechanism
+for the allocator attribution of GPU and GPU-related memory by
+charging each buffer to the cgroup of the process on behalf of which
+the memory was allocated. The buffer stays charged to the cgroup until
+it is freed regardless of whether the process retains any references
+to it. On the other hand, the memory cgroup controller offers a more
+fine-grained charging and uncharging behavior depending on the kind of
+page being accounted.
+
+2. Memcg performs accounting in units of pages. In the DMA-BUF buffer sharing model,
+a process takes a reference to the entire buffer(hence keeping it alive) even if
+it is only accessing parts of it. Therefore, per-page memory tracking for DMA-BUF
+memory accounting would only introduce additional overhead without any benefits.
+
+[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
+
+Userspace service to keep track of buffer allocations and releases
+__________________________________________________________________
+
+1. There is no way for a userspace service to intercept all allocations and releases.
+2. In case the process gets killed or restarted, we lose all accounting so far.
+
+UAPI
+====
+When enabled, the new cgroup controller would create the following files in every cgroup.
+
+::
+
+        gpu.memory.current (R)
+        gpu.memory.max (R/W)
+
+gpu.memory.current is a read-only file and would contain per-device memory allocations
+in a key-value format where key is a string representing the device name
+and the value is the size of memory charged to the device in the cgroup in bytes.
+
+For example:
+
+::
+
+        cat /sys/kernel/fs/cgroup1/gpu.memory.current
+        dev1 4194304
+        dev2 4194304
+
+The string key for each device is set by the device driver when the device registers
+with the GPU cgroup controller to participate in resource accounting(see section
+'Design and Implementation' for more details).
+
+gpu.memory.max is a read/write file. It would show the current total
+size limits on memory usage for the cgroup and the limits on total memory usage
+for each allocator/device.
+
+Setting a total limit for a cgroup can be done as follows:
+
+::
+
+        echo “total 41943040” > /sys/kernel/fs/cgroup1/gpu.memory.max
+
+Setting a total limit for a particular device/allocator can be done as follows:
+
+::
+
+        echo “dev1 4194304” >  /sys/kernel/fs/cgroup1/gpu.memory.max
+
+In this example, 'dev1' is the string key set by the device driver during
+registration.
+
+Design and Implementation
+=========================
+
+The cgroup controller would closely follow the design of the RDMA cgroup controller
+subsystem where each cgroup maintains a list of resource pools.
+Each resource pool contains a struct device and the counter to track current total,
+and the maximum limit set for the device.
+
+The below code block is a preliminary estimation on how the core kernel data structures
+and APIs would look like.
+
+.. code-block:: c
+
+        /**
+         * The GPU cgroup controller data structure.
+         */
+        struct gpucg {
+                struct cgroup_subsys_state css;
+
+                /* list of all resource pools that belong to this cgroup */
+                struct list_head rpools;
+        };
+
+        struct gpucg_device {
+                /*
+                 * list  of various resource pools in various cgroups that the device is
+                 * part of.
+                 */
+                struct list_head rpools;
+
+                /* list of all devices registered for GPU cgroup accounting */
+                struct list_head dev_node;
+
+                /* name to be used as identifier for accounting and limit setting */
+                const char *name;
+        };
+
+        struct gpucg_resource_pool {
+                /* The device whose resource usage is tracked by this resource pool */
+                struct gpucg_device *device;
+
+                /* list of all resource pools for the cgroup */
+                struct list_head cg_node;
+
+                /*
+                 * list maintained by the gpucg_device to keep track of its
+                 * resource pools
+                 */
+                struct list_head dev_node;
+
+                /* tracks memory usage of the resource pool */
+                struct page_counter total;
+        };
+
+        /**
+         * gpucg_register_device - Registers a device for memory accounting using the
+         * GPU cgroup controller.
+         *
+         * @device: The device to register for memory accounting. Must remain valid
+         * after registration.
+         * @name: Pointer to a string literal to denote the name of the device.
+         */
+        void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+
+        /**
+         * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to charge the memory to.
+         * @device: The device to charge the memory to.
+         * @usage: size of memory to charge in bytes.
+         *
+         * Return: returns 0 if the charging is successful and otherwise returns an
+         * error code.
+         */
+        int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+        /**
+         * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to uncharge the memory from.
+         * @device: The device to charge the memory from.
+         * @usage: size of memory to uncharge in bytes.
+         */
+        void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+Future Work
+===========
+Additional GPU resources can be supported by adding new controller files.
+
+Upstreaming Plan
+================
+* Decide on a UAPI that accommodates all use-cases for the upstream GPU ecosystem
+  as well as for Android.
+
+* Prototype the GPU cgroup controller and integrate its usage into the DMA-BUF
+  system heap.
+
+* Demonstrate its usage from userspace in the Android Open Space Project.
+
+* Send out RFCs to LKML for the GPU cgroup controller and iterate.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..0a9bcd94e95d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    gpu-cgroup.rst
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 1/6] gpu: rfc: Proposal for a GPU cgroup controller
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: linux-doc, Kenny.Ho, linux-kernel, dri-devel, linaro-mm-sig,
	kaleshsingh, cgroups, T.J. Mercier, linux-media

This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.

The new cgroup controller would:
* Allow setting per-cgroup limits on the total size of buffers charged
  to it.
* Allow setting per-device limits on the total size of buffers
  allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
  cgroup.

The prototype in the following patches is only for memory accounting
using the GPU cgroup controller and does not implement limit setting.

[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 2 files changed, 199 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst

diff --git a/Documentation/gpu/rfc/gpu-cgroup.rst b/Documentation/gpu/rfc/gpu-cgroup.rst
new file mode 100644
index 000000000000..0bb761223b97
--- /dev/null
+++ b/Documentation/gpu/rfc/gpu-cgroup.rst
@@ -0,0 +1,195 @@
+===================================
+GPU cgroup controller
+===================================
+
+Goals
+=====
+This document intends to outline a plan to create a cgroup v2 controller subsystem
+for the per-cgroup accounting of device and system memory allocated by the GPU
+and related subsystems.
+
+The new cgroup controller would:
+
+* Allow setting per-cgroup limits on the total size of buffers charged to it.
+
+* Allow setting per-device limits on the total size of buffers allocated by a
+  device/allocator within a cgroup.
+
+* Expose a per-device/allocator breakdown of the buffers charged to a cgroup.
+
+Alternatives Considered
+=======================
+
+The following alternatives were considered:
+
+The memory cgroup controller
+____________________________
+
+1. As was noted in [1], memory accounting provided by the GPU cgroup
+controller is not a good fit for integration into memcg due to the
+differences in how accounting is performed. It implements a mechanism
+for the allocator attribution of GPU and GPU-related memory by
+charging each buffer to the cgroup of the process on behalf of which
+the memory was allocated. The buffer stays charged to the cgroup until
+it is freed regardless of whether the process retains any references
+to it. On the other hand, the memory cgroup controller offers a more
+fine-grained charging and uncharging behavior depending on the kind of
+page being accounted.
+
+2. Memcg performs accounting in units of pages. In the DMA-BUF buffer sharing model,
+a process takes a reference to the entire buffer(hence keeping it alive) even if
+it is only accessing parts of it. Therefore, per-page memory tracking for DMA-BUF
+memory accounting would only introduce additional overhead without any benefits.
+
+[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
+
+Userspace service to keep track of buffer allocations and releases
+__________________________________________________________________
+
+1. There is no way for a userspace service to intercept all allocations and releases.
+2. In case the process gets killed or restarted, we lose all accounting so far.
+
+UAPI
+====
+When enabled, the new cgroup controller would create the following files in every cgroup.
+
+::
+
+        gpu.memory.current (R)
+        gpu.memory.max (R/W)
+
+gpu.memory.current is a read-only file and would contain per-device memory allocations
+in a key-value format where key is a string representing the device name
+and the value is the size of memory charged to the device in the cgroup in bytes.
+
+For example:
+
+::
+
+        cat /sys/kernel/fs/cgroup1/gpu.memory.current
+        dev1 4194304
+        dev2 4194304
+
+The string key for each device is set by the device driver when the device registers
+with the GPU cgroup controller to participate in resource accounting(see section
+'Design and Implementation' for more details).
+
+gpu.memory.max is a read/write file. It would show the current total
+size limits on memory usage for the cgroup and the limits on total memory usage
+for each allocator/device.
+
+Setting a total limit for a cgroup can be done as follows:
+
+::
+
+        echo “total 41943040” > /sys/kernel/fs/cgroup1/gpu.memory.max
+
+Setting a total limit for a particular device/allocator can be done as follows:
+
+::
+
+        echo “dev1 4194304” >  /sys/kernel/fs/cgroup1/gpu.memory.max
+
+In this example, 'dev1' is the string key set by the device driver during
+registration.
+
+Design and Implementation
+=========================
+
+The cgroup controller would closely follow the design of the RDMA cgroup controller
+subsystem where each cgroup maintains a list of resource pools.
+Each resource pool contains a struct device and the counter to track current total,
+and the maximum limit set for the device.
+
+The below code block is a preliminary estimation on how the core kernel data structures
+and APIs would look like.
+
+.. code-block:: c
+
+        /**
+         * The GPU cgroup controller data structure.
+         */
+        struct gpucg {
+                struct cgroup_subsys_state css;
+
+                /* list of all resource pools that belong to this cgroup */
+                struct list_head rpools;
+        };
+
+        struct gpucg_device {
+                /*
+                 * list  of various resource pools in various cgroups that the device is
+                 * part of.
+                 */
+                struct list_head rpools;
+
+                /* list of all devices registered for GPU cgroup accounting */
+                struct list_head dev_node;
+
+                /* name to be used as identifier for accounting and limit setting */
+                const char *name;
+        };
+
+        struct gpucg_resource_pool {
+                /* The device whose resource usage is tracked by this resource pool */
+                struct gpucg_device *device;
+
+                /* list of all resource pools for the cgroup */
+                struct list_head cg_node;
+
+                /*
+                 * list maintained by the gpucg_device to keep track of its
+                 * resource pools
+                 */
+                struct list_head dev_node;
+
+                /* tracks memory usage of the resource pool */
+                struct page_counter total;
+        };
+
+        /**
+         * gpucg_register_device - Registers a device for memory accounting using the
+         * GPU cgroup controller.
+         *
+         * @device: The device to register for memory accounting. Must remain valid
+         * after registration.
+         * @name: Pointer to a string literal to denote the name of the device.
+         */
+        void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+
+        /**
+         * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to charge the memory to.
+         * @device: The device to charge the memory to.
+         * @usage: size of memory to charge in bytes.
+         *
+         * Return: returns 0 if the charging is successful and otherwise returns an
+         * error code.
+         */
+        int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+        /**
+         * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to uncharge the memory from.
+         * @device: The device to charge the memory from.
+         * @usage: size of memory to uncharge in bytes.
+         */
+        void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+Future Work
+===========
+Additional GPU resources can be supported by adding new controller files.
+
+Upstreaming Plan
+================
+* Decide on a UAPI that accommodates all use-cases for the upstream GPU ecosystem
+  as well as for Android.
+
+* Prototype the GPU cgroup controller and integrate its usage into the DMA-BUF
+  system heap.
+
+* Demonstrate its usage from userspace in the Android Open Space Project.
+
+* Send out RFCs to LKML for the GPU cgroup controller and iterate.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..0a9bcd94e95d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    gpu-cgroup.rst
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 1/6] gpu: rfc: Proposal for a GPU cgroup controller
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark
  Cc: kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA, Kenny.Ho-5C7GfCeVMHo,
	T.J. Mercier, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	cgroups-u79uwXL29TY76Z2rM5mHXA

This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.

The new cgroup controller would:
* Allow setting per-cgroup limits on the total size of buffers charged
  to it.
* Allow setting per-device limits on the total size of buffers
  allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
  cgroup.

The prototype in the following patches is only for memory accounting
using the GPU cgroup controller and does not implement limit setting.

[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/

From: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Co-developed-by: T.J. Mercier <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: T.J. Mercier <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 2 files changed, 199 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst

diff --git a/Documentation/gpu/rfc/gpu-cgroup.rst b/Documentation/gpu/rfc/gpu-cgroup.rst
new file mode 100644
index 000000000000..0bb761223b97
--- /dev/null
+++ b/Documentation/gpu/rfc/gpu-cgroup.rst
@@ -0,0 +1,195 @@
+===================================
+GPU cgroup controller
+===================================
+
+Goals
+=====
+This document intends to outline a plan to create a cgroup v2 controller subsystem
+for the per-cgroup accounting of device and system memory allocated by the GPU
+and related subsystems.
+
+The new cgroup controller would:
+
+* Allow setting per-cgroup limits on the total size of buffers charged to it.
+
+* Allow setting per-device limits on the total size of buffers allocated by a
+  device/allocator within a cgroup.
+
+* Expose a per-device/allocator breakdown of the buffers charged to a cgroup.
+
+Alternatives Considered
+=======================
+
+The following alternatives were considered:
+
+The memory cgroup controller
+____________________________
+
+1. As was noted in [1], memory accounting provided by the GPU cgroup
+controller is not a good fit for integration into memcg due to the
+differences in how accounting is performed. It implements a mechanism
+for the allocator attribution of GPU and GPU-related memory by
+charging each buffer to the cgroup of the process on behalf of which
+the memory was allocated. The buffer stays charged to the cgroup until
+it is freed regardless of whether the process retains any references
+to it. On the other hand, the memory cgroup controller offers a more
+fine-grained charging and uncharging behavior depending on the kind of
+page being accounted.
+
+2. Memcg performs accounting in units of pages. In the DMA-BUF buffer sharing model,
+a process takes a reference to the entire buffer(hence keeping it alive) even if
+it is only accessing parts of it. Therefore, per-page memory tracking for DMA-BUF
+memory accounting would only introduce additional overhead without any benefits.
+
+[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/#22624705
+
+Userspace service to keep track of buffer allocations and releases
+__________________________________________________________________
+
+1. There is no way for a userspace service to intercept all allocations and releases.
+2. In case the process gets killed or restarted, we lose all accounting so far.
+
+UAPI
+====
+When enabled, the new cgroup controller would create the following files in every cgroup.
+
+::
+
+        gpu.memory.current (R)
+        gpu.memory.max (R/W)
+
+gpu.memory.current is a read-only file and would contain per-device memory allocations
+in a key-value format where key is a string representing the device name
+and the value is the size of memory charged to the device in the cgroup in bytes.
+
+For example:
+
+::
+
+        cat /sys/kernel/fs/cgroup1/gpu.memory.current
+        dev1 4194304
+        dev2 4194304
+
+The string key for each device is set by the device driver when the device registers
+with the GPU cgroup controller to participate in resource accounting(see section
+'Design and Implementation' for more details).
+
+gpu.memory.max is a read/write file. It would show the current total
+size limits on memory usage for the cgroup and the limits on total memory usage
+for each allocator/device.
+
+Setting a total limit for a cgroup can be done as follows:
+
+::
+
+        echo “total 41943040” > /sys/kernel/fs/cgroup1/gpu.memory.max
+
+Setting a total limit for a particular device/allocator can be done as follows:
+
+::
+
+        echo “dev1 4194304” >  /sys/kernel/fs/cgroup1/gpu.memory.max
+
+In this example, 'dev1' is the string key set by the device driver during
+registration.
+
+Design and Implementation
+=========================
+
+The cgroup controller would closely follow the design of the RDMA cgroup controller
+subsystem where each cgroup maintains a list of resource pools.
+Each resource pool contains a struct device and the counter to track current total,
+and the maximum limit set for the device.
+
+The below code block is a preliminary estimation on how the core kernel data structures
+and APIs would look like.
+
+.. code-block:: c
+
+        /**
+         * The GPU cgroup controller data structure.
+         */
+        struct gpucg {
+                struct cgroup_subsys_state css;
+
+                /* list of all resource pools that belong to this cgroup */
+                struct list_head rpools;
+        };
+
+        struct gpucg_device {
+                /*
+                 * list  of various resource pools in various cgroups that the device is
+                 * part of.
+                 */
+                struct list_head rpools;
+
+                /* list of all devices registered for GPU cgroup accounting */
+                struct list_head dev_node;
+
+                /* name to be used as identifier for accounting and limit setting */
+                const char *name;
+        };
+
+        struct gpucg_resource_pool {
+                /* The device whose resource usage is tracked by this resource pool */
+                struct gpucg_device *device;
+
+                /* list of all resource pools for the cgroup */
+                struct list_head cg_node;
+
+                /*
+                 * list maintained by the gpucg_device to keep track of its
+                 * resource pools
+                 */
+                struct list_head dev_node;
+
+                /* tracks memory usage of the resource pool */
+                struct page_counter total;
+        };
+
+        /**
+         * gpucg_register_device - Registers a device for memory accounting using the
+         * GPU cgroup controller.
+         *
+         * @device: The device to register for memory accounting. Must remain valid
+         * after registration.
+         * @name: Pointer to a string literal to denote the name of the device.
+         */
+        void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+
+        /**
+         * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to charge the memory to.
+         * @device: The device to charge the memory to.
+         * @usage: size of memory to charge in bytes.
+         *
+         * Return: returns 0 if the charging is successful and otherwise returns an
+         * error code.
+         */
+        int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+        /**
+         * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to uncharge the memory from.
+         * @device: The device to charge the memory from.
+         * @usage: size of memory to uncharge in bytes.
+         */
+        void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+Future Work
+===========
+Additional GPU resources can be supported by adding new controller files.
+
+Upstreaming Plan
+================
+* Decide on a UAPI that accommodates all use-cases for the upstream GPU ecosystem
+  as well as for Android.
+
+* Prototype the GPU cgroup controller and integrate its usage into the DMA-BUF
+  system heap.
+
+* Demonstrate its usage from userspace in the Android Open Space Project.
+
+* Send out RFCs to LKML for the GPU cgroup controller and iterate.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..0a9bcd94e95d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    gpu-cgroup.rst
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-02-11 16:18 ` T.J. Mercier
  (?)
@ 2022-02-11 16:18   ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

The cgroup controller provides accounting for GPU and GPU-related
memory allocations. The memory being accounted can be device memory or
memory allocated from pools dedicated to serve GPU-related tasks.

This patch adds APIs to:
-allow a device to register for memory accounting using the GPU cgroup
controller.
-charge and uncharge allocated memory to a cgroup.

When the cgroup controller is enabled, it would expose information about
the memory allocated by each device(registered for GPU cgroup memory
accounting) for each cgroup.

The API/UAPI can be extended to set per-device/total allocation limits
in the future.

The cgroup controller has been named following the discussion in [1].

[1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv@phenom.ffwll.local/

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Fix incorrect Kconfig help section indentation per Randy Dunlap.

 include/linux/cgroup_gpu.h    | 127 ++++++++++++++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig                  |   7 +
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/gpu.c           | 304 ++++++++++++++++++++++++++++++++++
 5 files changed, 443 insertions(+)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
new file mode 100644
index 000000000000..c5bc2b882783
--- /dev/null
+++ b/include/linux/cgroup_gpu.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * Copyright (C) 2022 Google LLC.
+ */
+#ifndef _CGROUP_GPU_H
+#define _CGROUP_GPU_H
+
+#include <linux/cgroup.h>
+#include <linux/page_counter.h>
+
+#ifdef CONFIG_CGROUP_GPU
+ /* The GPU cgroup controller data structure */
+struct gpucg {
+	struct cgroup_subsys_state css;
+
+	/* list of all resource pools that belong to this cgroup */
+	struct list_head rpools;
+};
+
+struct gpucg_device {
+	/*
+	 * list  of various resource pools in various cgroups that the device is
+	 * part of.
+	 */
+	struct list_head rpools;
+
+	/* list of all devices registered for GPU cgroup accounting */
+	struct list_head dev_node;
+
+	/*
+	 * pointer to string literal to be used as identifier for accounting and
+	 * limit setting
+	 */
+	const char *name;
+};
+
+/**
+ * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Returns: gpu cgroup that contains the @css
+ */
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct gpucg, css) : NULL;
+}
+
+/**
+ * gpucg_get - get the gpucg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increases the reference count of the css that the @task belongs to.
+ *
+ * Returns: reference to the gpu cgroup the task belongs to.
+ */
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	if (!cgroup_subsys_enabled(gpu_cgrp_subsys))
+		return NULL;
+	return css_to_gpucg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * gpucg_put - put a gpucg reference
+ * @gpucg: the target gpucg
+ *
+ * Put a reference obtained via gpucg_get
+ */
+static inline void gpucg_put(struct gpucg *gpucg)
+{
+	if (gpucg)
+		css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_parent - find the parent of a gpu cgroup
+ * @cg: the target gpucg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Returns: parent gpu cgroup of @cg
+ */
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return css_to_gpucg(cg->css.parent);
+}
+
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+#else /* CONFIG_CGROUP_GPU */
+
+struct gpucg;
+struct gpucg_device;
+
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void gpucg_put(struct gpucg *gpucg) {}
+
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return NULL;
+}
+
+static inline int gpucg_try_charge(struct gpucg *gpucg,
+				   struct gpucg_device *device,
+				   u64 usage)
+{
+	return 0;
+}
+
+static inline void gpucg_uncharge(struct gpucg *gpucg,
+				  struct gpucg_device *device,
+				  u64 usage) {}
+
+static inline void gpucg_register_device(struct gpucg_device *gpucg_dev,
+					 const char *name) {}
+#endif /* CONFIG_CGROUP_GPU */
+#endif /* _CGROUP_GPU_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..46a2a7b93c41 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_GPU)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..43568472930a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -980,6 +980,13 @@ config BLK_CGROUP
 
 	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
+config CGROUP_GPU
+	bool "gpu cgroup controller (EXPERIMENTAL)"
+	select PAGE_COUNTER
+	help
+	  Provides accounting and limit setting for memory allocations by the GPU and
+	  GPU-related subsystems.
+
 config CGROUP_WRITEBACK
 	bool
 	depends on MEMCG && BLK_CGROUP
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..be95a5a532fc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
+obj-$(CONFIG_CGROUP_GPU) += gpu.o
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
new file mode 100644
index 000000000000..3e9bfb45c6af
--- /dev/null
+++ b/kernel/cgroup/gpu.c
@@ -0,0 +1,304 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+// Copyright (C) 2022 Google LLC.
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_gpu.h>
+#include <linux/mm.h>
+#include <linux/page_counter.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+static struct gpucg *root_gpucg __read_mostly;
+
+/*
+ * Protects list of resource pools maintained on per cgroup basis
+ * and list of devices registered for memory accounting using the GPU cgroup
+ * controller.
+ */
+static DEFINE_MUTEX(gpucg_mutex);
+static LIST_HEAD(gpucg_devices);
+
+struct gpucg_resource_pool {
+	/* The device whose resource usage is tracked by this resource pool */
+	struct gpucg_device *device;
+
+	/* list of all resource pools for the cgroup */
+	struct list_head cg_node;
+
+	/*
+	 * list maintained by the gpucg_device to keep track of its
+	 * resource pools
+	 */
+	struct list_head dev_node;
+
+	/* tracks memory usage of the resource pool */
+	struct page_counter total;
+};
+
+static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool)
+{
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_del(&rpool->cg_node);
+	list_del(&rpool->dev_node);
+	kfree(rpool);
+}
+
+static void gpucg_css_free(struct cgroup_subsys_state *css)
+{
+	struct gpucg_resource_pool *rpool, *tmp;
+	struct gpucg *gpucg = css_to_gpucg(css);
+
+	// delete all resource pools
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&gpucg_mutex);
+
+	kfree(gpucg);
+}
+
+static struct cgroup_subsys_state *
+gpucg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct gpucg *gpucg, *parent;
+
+	gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL);
+	if (!gpucg)
+		return ERR_PTR(-ENOMEM);
+
+	parent = css_to_gpucg(parent_css);
+	if (!parent)
+		root_gpucg = gpucg;
+
+	INIT_LIST_HEAD(&gpucg->rpools);
+
+	return &gpucg->css;
+}
+
+static struct gpucg_resource_pool *find_cg_rpool_locked(
+	struct gpucg *cg,
+	struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *pool;
+
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_for_each_entry(pool, &cg->rpools, cg_node)
+		if (pool->device == device)
+			return pool;
+
+	return NULL;
+}
+
+static struct gpucg_resource_pool *init_cg_rpool(struct gpucg *cg,
+						 struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool),
+							GFP_KERNEL);
+	if (!rpool)
+		return ERR_PTR(-ENOMEM);
+
+	rpool->device = device;
+
+	page_counter_init(&rpool->total, NULL);
+	INIT_LIST_HEAD(&rpool->cg_node);
+	INIT_LIST_HEAD(&rpool->dev_node);
+	list_add_tail(&rpool->cg_node, &cg->rpools);
+	list_add_tail(&rpool->dev_node, &device->rpools);
+
+	return rpool;
+}
+
+/**
+ * get_cg_rpool_locked - find the resource pool for the specified device and
+ * specified cgroup. If the resource pool does not exist for the cg, it is
+ * created in a hierarchical manner in the cgroup and its ancestor cgroups who
+ * do not already have a resource pool entry for the device.
+ *
+ * @cg: The cgroup to find the resource pool for.
+ * @device: The device associated with the returned resource pool.
+ *
+ * Return: return resource pool entry corresponding to the specified device in
+ * the specified cgroup (hierarchically creating them if not existing already).
+ *
+ */
+static struct gpucg_resource_pool *
+get_cg_rpool_locked(struct gpucg *cg, struct gpucg_device *device)
+{
+	struct gpucg *parent_cg, *p, *stop_cg;
+	struct gpucg_resource_pool *rpool, *tmp_rpool;
+	struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL;
+
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool)
+		return rpool;
+
+	stop_cg = cg;
+	do {
+		rpool = init_cg_rpool(stop_cg, device);
+		if (IS_ERR(rpool))
+			goto err;
+
+		if (!leaf_rpool)
+			leaf_rpool = rpool;
+
+		stop_cg = gpucg_parent(stop_cg);
+		if (!stop_cg)
+			break;
+
+		rpool = find_cg_rpool_locked(stop_cg, device);
+	} while (!rpool);
+
+	/*
+	 * Re-initialize page counters of all rpools created in this invocation
+	 * to enable hierarchical charging.
+	 * stop_cg is the first ancestor cg who already had a resource pool for
+	 * the device. It can also be NULL if no ancestors had a pre-existing
+	 * resource pool for the device before this invocation.
+	 */
+	rpool = leaf_rpool;
+	for (p = cg; p != stop_cg; p = parent_cg) {
+		parent_cg = gpucg_parent(p);
+		if (!parent_cg)
+			break;
+		parent_rpool = find_cg_rpool_locked(parent_cg, device);
+		page_counter_init(&rpool->total, &parent_rpool->total);
+
+		rpool = parent_rpool;
+	}
+
+	return leaf_rpool;
+err:
+	for (p = cg; p != stop_cg; p = gpucg_parent(p)) {
+		tmp_rpool = find_cg_rpool_locked(p, device);
+		free_cg_rpool_locked(tmp_rpool);
+	}
+	return rpool;
+}
+
+/**
+ * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+ * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size
+ * of the memory is rounded up to be a multiple of the page size.
+ *
+ * @gpucg: The gpu cgroup to charge the memory to.
+ * @device: The device to charge the memory to.
+ * @usage: size of memory to charge in bytes.
+ *
+ * Return: returns 0 if the charging is successful and otherwise returns an
+ * error code.
+ */
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+	int ret = 0;
+
+	mutex_lock(&gpucg_mutex);
+	rp = get_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and the caller is holding a reference to the gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (IS_ERR(rp))
+		return PTR_ERR(rp);
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	if (page_counter_try_charge(&rp->total, nr_pages, &counter))
+		css_get_many(&gpucg->css, nr_pages);
+	else
+		ret = -ENOMEM;
+
+	return ret;
+}
+
+/**
+ * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+ * The caller must hold a reference to @gpucg obtained through gpucg_get().
+ *
+ * @gpucg: The gpu cgroup to uncharge the memory from.
+ * @device: The device to uncharge the memory from.
+ * @usage: size of memory to uncharge in bytes.
+ */
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+
+	mutex_lock(&gpucg_mutex);
+	rp = find_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and there are active refs on gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (unlikely(!rp)) {
+		pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n");
+		return;
+	}
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	page_counter_uncharge(&rp->total, nr_pages);
+	css_put_many(&gpucg->css, nr_pages);
+}
+
+/**
+ * gpucg_register_device - Registers a device for memory accounting using the
+ * GPU cgroup controller.
+ *
+ * @device: The device to register for memory accounting.
+ * @name: Pointer to a string literal to denote the name of the device.
+ *
+ * Both @device andd @name must remain valid.
+ */
+void gpucg_register_device(struct gpucg_device *device, const char *name)
+{
+	if (!device)
+		return;
+
+	INIT_LIST_HEAD(&device->dev_node);
+	INIT_LIST_HEAD(&device->rpools);
+
+	mutex_lock(&gpucg_mutex);
+	list_add_tail(&device->dev_node, &gpucg_devices);
+	mutex_unlock(&gpucg_mutex);
+
+	device->name = name;
+}
+
+static int gpucg_resource_show(struct seq_file *sf, void *v)
+{
+	struct gpucg_resource_pool *rpool;
+	struct gpucg *cg = css_to_gpucg(seq_css(sf));
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(rpool, &cg->rpools, cg_node) {
+		seq_printf(sf, "%s %lu\n", rpool->device->name,
+			   page_counter_read(&rpool->total) * PAGE_SIZE);
+	}
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+struct cftype files[] = {
+	{
+		.name = "memory.current",
+		.seq_show = gpucg_resource_show,
+	},
+	{ }     /* terminate */
+};
+
+struct cgroup_subsys gpu_cgrp_subsys = {
+	.css_alloc      = gpucg_css_alloc,
+	.css_free       = gpucg_css_free,
+	.early_init     = false,
+	.legacy_cftypes = files,
+	.dfl_cftypes    = files,
+};
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: linux-doc, Kenny.Ho, linux-kernel, dri-devel, linaro-mm-sig,
	kaleshsingh, cgroups, T.J. Mercier, linux-media

The cgroup controller provides accounting for GPU and GPU-related
memory allocations. The memory being accounted can be device memory or
memory allocated from pools dedicated to serve GPU-related tasks.

This patch adds APIs to:
-allow a device to register for memory accounting using the GPU cgroup
controller.
-charge and uncharge allocated memory to a cgroup.

When the cgroup controller is enabled, it would expose information about
the memory allocated by each device(registered for GPU cgroup memory
accounting) for each cgroup.

The API/UAPI can be extended to set per-device/total allocation limits
in the future.

The cgroup controller has been named following the discussion in [1].

[1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv@phenom.ffwll.local/

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Fix incorrect Kconfig help section indentation per Randy Dunlap.

 include/linux/cgroup_gpu.h    | 127 ++++++++++++++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig                  |   7 +
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/gpu.c           | 304 ++++++++++++++++++++++++++++++++++
 5 files changed, 443 insertions(+)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
new file mode 100644
index 000000000000..c5bc2b882783
--- /dev/null
+++ b/include/linux/cgroup_gpu.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * Copyright (C) 2022 Google LLC.
+ */
+#ifndef _CGROUP_GPU_H
+#define _CGROUP_GPU_H
+
+#include <linux/cgroup.h>
+#include <linux/page_counter.h>
+
+#ifdef CONFIG_CGROUP_GPU
+ /* The GPU cgroup controller data structure */
+struct gpucg {
+	struct cgroup_subsys_state css;
+
+	/* list of all resource pools that belong to this cgroup */
+	struct list_head rpools;
+};
+
+struct gpucg_device {
+	/*
+	 * list  of various resource pools in various cgroups that the device is
+	 * part of.
+	 */
+	struct list_head rpools;
+
+	/* list of all devices registered for GPU cgroup accounting */
+	struct list_head dev_node;
+
+	/*
+	 * pointer to string literal to be used as identifier for accounting and
+	 * limit setting
+	 */
+	const char *name;
+};
+
+/**
+ * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Returns: gpu cgroup that contains the @css
+ */
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct gpucg, css) : NULL;
+}
+
+/**
+ * gpucg_get - get the gpucg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increases the reference count of the css that the @task belongs to.
+ *
+ * Returns: reference to the gpu cgroup the task belongs to.
+ */
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	if (!cgroup_subsys_enabled(gpu_cgrp_subsys))
+		return NULL;
+	return css_to_gpucg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * gpucg_put - put a gpucg reference
+ * @gpucg: the target gpucg
+ *
+ * Put a reference obtained via gpucg_get
+ */
+static inline void gpucg_put(struct gpucg *gpucg)
+{
+	if (gpucg)
+		css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_parent - find the parent of a gpu cgroup
+ * @cg: the target gpucg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Returns: parent gpu cgroup of @cg
+ */
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return css_to_gpucg(cg->css.parent);
+}
+
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+#else /* CONFIG_CGROUP_GPU */
+
+struct gpucg;
+struct gpucg_device;
+
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void gpucg_put(struct gpucg *gpucg) {}
+
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return NULL;
+}
+
+static inline int gpucg_try_charge(struct gpucg *gpucg,
+				   struct gpucg_device *device,
+				   u64 usage)
+{
+	return 0;
+}
+
+static inline void gpucg_uncharge(struct gpucg *gpucg,
+				  struct gpucg_device *device,
+				  u64 usage) {}
+
+static inline void gpucg_register_device(struct gpucg_device *gpucg_dev,
+					 const char *name) {}
+#endif /* CONFIG_CGROUP_GPU */
+#endif /* _CGROUP_GPU_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..46a2a7b93c41 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_GPU)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..43568472930a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -980,6 +980,13 @@ config BLK_CGROUP
 
 	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
+config CGROUP_GPU
+	bool "gpu cgroup controller (EXPERIMENTAL)"
+	select PAGE_COUNTER
+	help
+	  Provides accounting and limit setting for memory allocations by the GPU and
+	  GPU-related subsystems.
+
 config CGROUP_WRITEBACK
 	bool
 	depends on MEMCG && BLK_CGROUP
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..be95a5a532fc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
+obj-$(CONFIG_CGROUP_GPU) += gpu.o
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
new file mode 100644
index 000000000000..3e9bfb45c6af
--- /dev/null
+++ b/kernel/cgroup/gpu.c
@@ -0,0 +1,304 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+// Copyright (C) 2022 Google LLC.
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_gpu.h>
+#include <linux/mm.h>
+#include <linux/page_counter.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+static struct gpucg *root_gpucg __read_mostly;
+
+/*
+ * Protects list of resource pools maintained on per cgroup basis
+ * and list of devices registered for memory accounting using the GPU cgroup
+ * controller.
+ */
+static DEFINE_MUTEX(gpucg_mutex);
+static LIST_HEAD(gpucg_devices);
+
+struct gpucg_resource_pool {
+	/* The device whose resource usage is tracked by this resource pool */
+	struct gpucg_device *device;
+
+	/* list of all resource pools for the cgroup */
+	struct list_head cg_node;
+
+	/*
+	 * list maintained by the gpucg_device to keep track of its
+	 * resource pools
+	 */
+	struct list_head dev_node;
+
+	/* tracks memory usage of the resource pool */
+	struct page_counter total;
+};
+
+static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool)
+{
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_del(&rpool->cg_node);
+	list_del(&rpool->dev_node);
+	kfree(rpool);
+}
+
+static void gpucg_css_free(struct cgroup_subsys_state *css)
+{
+	struct gpucg_resource_pool *rpool, *tmp;
+	struct gpucg *gpucg = css_to_gpucg(css);
+
+	// delete all resource pools
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&gpucg_mutex);
+
+	kfree(gpucg);
+}
+
+static struct cgroup_subsys_state *
+gpucg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct gpucg *gpucg, *parent;
+
+	gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL);
+	if (!gpucg)
+		return ERR_PTR(-ENOMEM);
+
+	parent = css_to_gpucg(parent_css);
+	if (!parent)
+		root_gpucg = gpucg;
+
+	INIT_LIST_HEAD(&gpucg->rpools);
+
+	return &gpucg->css;
+}
+
+static struct gpucg_resource_pool *find_cg_rpool_locked(
+	struct gpucg *cg,
+	struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *pool;
+
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_for_each_entry(pool, &cg->rpools, cg_node)
+		if (pool->device == device)
+			return pool;
+
+	return NULL;
+}
+
+static struct gpucg_resource_pool *init_cg_rpool(struct gpucg *cg,
+						 struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool),
+							GFP_KERNEL);
+	if (!rpool)
+		return ERR_PTR(-ENOMEM);
+
+	rpool->device = device;
+
+	page_counter_init(&rpool->total, NULL);
+	INIT_LIST_HEAD(&rpool->cg_node);
+	INIT_LIST_HEAD(&rpool->dev_node);
+	list_add_tail(&rpool->cg_node, &cg->rpools);
+	list_add_tail(&rpool->dev_node, &device->rpools);
+
+	return rpool;
+}
+
+/**
+ * get_cg_rpool_locked - find the resource pool for the specified device and
+ * specified cgroup. If the resource pool does not exist for the cg, it is
+ * created in a hierarchical manner in the cgroup and its ancestor cgroups who
+ * do not already have a resource pool entry for the device.
+ *
+ * @cg: The cgroup to find the resource pool for.
+ * @device: The device associated with the returned resource pool.
+ *
+ * Return: return resource pool entry corresponding to the specified device in
+ * the specified cgroup (hierarchically creating them if not existing already).
+ *
+ */
+static struct gpucg_resource_pool *
+get_cg_rpool_locked(struct gpucg *cg, struct gpucg_device *device)
+{
+	struct gpucg *parent_cg, *p, *stop_cg;
+	struct gpucg_resource_pool *rpool, *tmp_rpool;
+	struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL;
+
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool)
+		return rpool;
+
+	stop_cg = cg;
+	do {
+		rpool = init_cg_rpool(stop_cg, device);
+		if (IS_ERR(rpool))
+			goto err;
+
+		if (!leaf_rpool)
+			leaf_rpool = rpool;
+
+		stop_cg = gpucg_parent(stop_cg);
+		if (!stop_cg)
+			break;
+
+		rpool = find_cg_rpool_locked(stop_cg, device);
+	} while (!rpool);
+
+	/*
+	 * Re-initialize page counters of all rpools created in this invocation
+	 * to enable hierarchical charging.
+	 * stop_cg is the first ancestor cg who already had a resource pool for
+	 * the device. It can also be NULL if no ancestors had a pre-existing
+	 * resource pool for the device before this invocation.
+	 */
+	rpool = leaf_rpool;
+	for (p = cg; p != stop_cg; p = parent_cg) {
+		parent_cg = gpucg_parent(p);
+		if (!parent_cg)
+			break;
+		parent_rpool = find_cg_rpool_locked(parent_cg, device);
+		page_counter_init(&rpool->total, &parent_rpool->total);
+
+		rpool = parent_rpool;
+	}
+
+	return leaf_rpool;
+err:
+	for (p = cg; p != stop_cg; p = gpucg_parent(p)) {
+		tmp_rpool = find_cg_rpool_locked(p, device);
+		free_cg_rpool_locked(tmp_rpool);
+	}
+	return rpool;
+}
+
+/**
+ * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+ * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size
+ * of the memory is rounded up to be a multiple of the page size.
+ *
+ * @gpucg: The gpu cgroup to charge the memory to.
+ * @device: The device to charge the memory to.
+ * @usage: size of memory to charge in bytes.
+ *
+ * Return: returns 0 if the charging is successful and otherwise returns an
+ * error code.
+ */
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+	int ret = 0;
+
+	mutex_lock(&gpucg_mutex);
+	rp = get_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and the caller is holding a reference to the gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (IS_ERR(rp))
+		return PTR_ERR(rp);
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	if (page_counter_try_charge(&rp->total, nr_pages, &counter))
+		css_get_many(&gpucg->css, nr_pages);
+	else
+		ret = -ENOMEM;
+
+	return ret;
+}
+
+/**
+ * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+ * The caller must hold a reference to @gpucg obtained through gpucg_get().
+ *
+ * @gpucg: The gpu cgroup to uncharge the memory from.
+ * @device: The device to uncharge the memory from.
+ * @usage: size of memory to uncharge in bytes.
+ */
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+
+	mutex_lock(&gpucg_mutex);
+	rp = find_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and there are active refs on gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (unlikely(!rp)) {
+		pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n");
+		return;
+	}
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	page_counter_uncharge(&rp->total, nr_pages);
+	css_put_many(&gpucg->css, nr_pages);
+}
+
+/**
+ * gpucg_register_device - Registers a device for memory accounting using the
+ * GPU cgroup controller.
+ *
+ * @device: The device to register for memory accounting.
+ * @name: Pointer to a string literal to denote the name of the device.
+ *
+ * Both @device andd @name must remain valid.
+ */
+void gpucg_register_device(struct gpucg_device *device, const char *name)
+{
+	if (!device)
+		return;
+
+	INIT_LIST_HEAD(&device->dev_node);
+	INIT_LIST_HEAD(&device->rpools);
+
+	mutex_lock(&gpucg_mutex);
+	list_add_tail(&device->dev_node, &gpucg_devices);
+	mutex_unlock(&gpucg_mutex);
+
+	device->name = name;
+}
+
+static int gpucg_resource_show(struct seq_file *sf, void *v)
+{
+	struct gpucg_resource_pool *rpool;
+	struct gpucg *cg = css_to_gpucg(seq_css(sf));
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(rpool, &cg->rpools, cg_node) {
+		seq_printf(sf, "%s %lu\n", rpool->device->name,
+			   page_counter_read(&rpool->total) * PAGE_SIZE);
+	}
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+struct cftype files[] = {
+	{
+		.name = "memory.current",
+		.seq_show = gpucg_resource_show,
+	},
+	{ }     /* terminate */
+};
+
+struct cgroup_subsys gpu_cgrp_subsys = {
+	.css_alloc      = gpucg_css_alloc,
+	.css_free       = gpucg_css_free,
+	.early_init     = false,
+	.legacy_cftypes = files,
+	.dfl_cftypes    = files,
+};
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

The cgroup controller provides accounting for GPU and GPU-related
memory allocations. The memory being accounted can be device memory or
memory allocated from pools dedicated to serve GPU-related tasks.

This patch adds APIs to:
-allow a device to register for memory accounting using the GPU cgroup
controller.
-charge and uncharge allocated memory to a cgroup.

When the cgroup controller is enabled, it would expose information about
the memory allocated by each device(registered for GPU cgroup memory
accounting) for each cgroup.

The API/UAPI can be extended to set per-device/total allocation limits
in the future.

The cgroup controller has been named following the discussion in [1].

[1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv@phenom.ffwll.local/

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Fix incorrect Kconfig help section indentation per Randy Dunlap.

 include/linux/cgroup_gpu.h    | 127 ++++++++++++++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig                  |   7 +
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/gpu.c           | 304 ++++++++++++++++++++++++++++++++++
 5 files changed, 443 insertions(+)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
new file mode 100644
index 000000000000..c5bc2b882783
--- /dev/null
+++ b/include/linux/cgroup_gpu.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * Copyright (C) 2022 Google LLC.
+ */
+#ifndef _CGROUP_GPU_H
+#define _CGROUP_GPU_H
+
+#include <linux/cgroup.h>
+#include <linux/page_counter.h>
+
+#ifdef CONFIG_CGROUP_GPU
+ /* The GPU cgroup controller data structure */
+struct gpucg {
+	struct cgroup_subsys_state css;
+
+	/* list of all resource pools that belong to this cgroup */
+	struct list_head rpools;
+};
+
+struct gpucg_device {
+	/*
+	 * list  of various resource pools in various cgroups that the device is
+	 * part of.
+	 */
+	struct list_head rpools;
+
+	/* list of all devices registered for GPU cgroup accounting */
+	struct list_head dev_node;
+
+	/*
+	 * pointer to string literal to be used as identifier for accounting and
+	 * limit setting
+	 */
+	const char *name;
+};
+
+/**
+ * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Returns: gpu cgroup that contains the @css
+ */
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct gpucg, css) : NULL;
+}
+
+/**
+ * gpucg_get - get the gpucg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increases the reference count of the css that the @task belongs to.
+ *
+ * Returns: reference to the gpu cgroup the task belongs to.
+ */
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	if (!cgroup_subsys_enabled(gpu_cgrp_subsys))
+		return NULL;
+	return css_to_gpucg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * gpucg_put - put a gpucg reference
+ * @gpucg: the target gpucg
+ *
+ * Put a reference obtained via gpucg_get
+ */
+static inline void gpucg_put(struct gpucg *gpucg)
+{
+	if (gpucg)
+		css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_parent - find the parent of a gpu cgroup
+ * @cg: the target gpucg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Returns: parent gpu cgroup of @cg
+ */
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return css_to_gpucg(cg->css.parent);
+}
+
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+#else /* CONFIG_CGROUP_GPU */
+
+struct gpucg;
+struct gpucg_device;
+
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void gpucg_put(struct gpucg *gpucg) {}
+
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return NULL;
+}
+
+static inline int gpucg_try_charge(struct gpucg *gpucg,
+				   struct gpucg_device *device,
+				   u64 usage)
+{
+	return 0;
+}
+
+static inline void gpucg_uncharge(struct gpucg *gpucg,
+				  struct gpucg_device *device,
+				  u64 usage) {}
+
+static inline void gpucg_register_device(struct gpucg_device *gpucg_dev,
+					 const char *name) {}
+#endif /* CONFIG_CGROUP_GPU */
+#endif /* _CGROUP_GPU_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..46a2a7b93c41 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_GPU)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..43568472930a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -980,6 +980,13 @@ config BLK_CGROUP
 
 	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
+config CGROUP_GPU
+	bool "gpu cgroup controller (EXPERIMENTAL)"
+	select PAGE_COUNTER
+	help
+	  Provides accounting and limit setting for memory allocations by the GPU and
+	  GPU-related subsystems.
+
 config CGROUP_WRITEBACK
 	bool
 	depends on MEMCG && BLK_CGROUP
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..be95a5a532fc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
+obj-$(CONFIG_CGROUP_GPU) += gpu.o
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
new file mode 100644
index 000000000000..3e9bfb45c6af
--- /dev/null
+++ b/kernel/cgroup/gpu.c
@@ -0,0 +1,304 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+// Copyright (C) 2022 Google LLC.
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_gpu.h>
+#include <linux/mm.h>
+#include <linux/page_counter.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+static struct gpucg *root_gpucg __read_mostly;
+
+/*
+ * Protects list of resource pools maintained on per cgroup basis
+ * and list of devices registered for memory accounting using the GPU cgroup
+ * controller.
+ */
+static DEFINE_MUTEX(gpucg_mutex);
+static LIST_HEAD(gpucg_devices);
+
+struct gpucg_resource_pool {
+	/* The device whose resource usage is tracked by this resource pool */
+	struct gpucg_device *device;
+
+	/* list of all resource pools for the cgroup */
+	struct list_head cg_node;
+
+	/*
+	 * list maintained by the gpucg_device to keep track of its
+	 * resource pools
+	 */
+	struct list_head dev_node;
+
+	/* tracks memory usage of the resource pool */
+	struct page_counter total;
+};
+
+static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool)
+{
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_del(&rpool->cg_node);
+	list_del(&rpool->dev_node);
+	kfree(rpool);
+}
+
+static void gpucg_css_free(struct cgroup_subsys_state *css)
+{
+	struct gpucg_resource_pool *rpool, *tmp;
+	struct gpucg *gpucg = css_to_gpucg(css);
+
+	// delete all resource pools
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&gpucg_mutex);
+
+	kfree(gpucg);
+}
+
+static struct cgroup_subsys_state *
+gpucg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct gpucg *gpucg, *parent;
+
+	gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL);
+	if (!gpucg)
+		return ERR_PTR(-ENOMEM);
+
+	parent = css_to_gpucg(parent_css);
+	if (!parent)
+		root_gpucg = gpucg;
+
+	INIT_LIST_HEAD(&gpucg->rpools);
+
+	return &gpucg->css;
+}
+
+static struct gpucg_resource_pool *find_cg_rpool_locked(
+	struct gpucg *cg,
+	struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *pool;
+
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_for_each_entry(pool, &cg->rpools, cg_node)
+		if (pool->device == device)
+			return pool;
+
+	return NULL;
+}
+
+static struct gpucg_resource_pool *init_cg_rpool(struct gpucg *cg,
+						 struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool),
+							GFP_KERNEL);
+	if (!rpool)
+		return ERR_PTR(-ENOMEM);
+
+	rpool->device = device;
+
+	page_counter_init(&rpool->total, NULL);
+	INIT_LIST_HEAD(&rpool->cg_node);
+	INIT_LIST_HEAD(&rpool->dev_node);
+	list_add_tail(&rpool->cg_node, &cg->rpools);
+	list_add_tail(&rpool->dev_node, &device->rpools);
+
+	return rpool;
+}
+
+/**
+ * get_cg_rpool_locked - find the resource pool for the specified device and
+ * specified cgroup. If the resource pool does not exist for the cg, it is
+ * created in a hierarchical manner in the cgroup and its ancestor cgroups who
+ * do not already have a resource pool entry for the device.
+ *
+ * @cg: The cgroup to find the resource pool for.
+ * @device: The device associated with the returned resource pool.
+ *
+ * Return: return resource pool entry corresponding to the specified device in
+ * the specified cgroup (hierarchically creating them if not existing already).
+ *
+ */
+static struct gpucg_resource_pool *
+get_cg_rpool_locked(struct gpucg *cg, struct gpucg_device *device)
+{
+	struct gpucg *parent_cg, *p, *stop_cg;
+	struct gpucg_resource_pool *rpool, *tmp_rpool;
+	struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL;
+
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool)
+		return rpool;
+
+	stop_cg = cg;
+	do {
+		rpool = init_cg_rpool(stop_cg, device);
+		if (IS_ERR(rpool))
+			goto err;
+
+		if (!leaf_rpool)
+			leaf_rpool = rpool;
+
+		stop_cg = gpucg_parent(stop_cg);
+		if (!stop_cg)
+			break;
+
+		rpool = find_cg_rpool_locked(stop_cg, device);
+	} while (!rpool);
+
+	/*
+	 * Re-initialize page counters of all rpools created in this invocation
+	 * to enable hierarchical charging.
+	 * stop_cg is the first ancestor cg who already had a resource pool for
+	 * the device. It can also be NULL if no ancestors had a pre-existing
+	 * resource pool for the device before this invocation.
+	 */
+	rpool = leaf_rpool;
+	for (p = cg; p != stop_cg; p = parent_cg) {
+		parent_cg = gpucg_parent(p);
+		if (!parent_cg)
+			break;
+		parent_rpool = find_cg_rpool_locked(parent_cg, device);
+		page_counter_init(&rpool->total, &parent_rpool->total);
+
+		rpool = parent_rpool;
+	}
+
+	return leaf_rpool;
+err:
+	for (p = cg; p != stop_cg; p = gpucg_parent(p)) {
+		tmp_rpool = find_cg_rpool_locked(p, device);
+		free_cg_rpool_locked(tmp_rpool);
+	}
+	return rpool;
+}
+
+/**
+ * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+ * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size
+ * of the memory is rounded up to be a multiple of the page size.
+ *
+ * @gpucg: The gpu cgroup to charge the memory to.
+ * @device: The device to charge the memory to.
+ * @usage: size of memory to charge in bytes.
+ *
+ * Return: returns 0 if the charging is successful and otherwise returns an
+ * error code.
+ */
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+	int ret = 0;
+
+	mutex_lock(&gpucg_mutex);
+	rp = get_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and the caller is holding a reference to the gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (IS_ERR(rp))
+		return PTR_ERR(rp);
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	if (page_counter_try_charge(&rp->total, nr_pages, &counter))
+		css_get_many(&gpucg->css, nr_pages);
+	else
+		ret = -ENOMEM;
+
+	return ret;
+}
+
+/**
+ * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+ * The caller must hold a reference to @gpucg obtained through gpucg_get().
+ *
+ * @gpucg: The gpu cgroup to uncharge the memory from.
+ * @device: The device to uncharge the memory from.
+ * @usage: size of memory to uncharge in bytes.
+ */
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+
+	mutex_lock(&gpucg_mutex);
+	rp = find_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and there are active refs on gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (unlikely(!rp)) {
+		pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n");
+		return;
+	}
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	page_counter_uncharge(&rp->total, nr_pages);
+	css_put_many(&gpucg->css, nr_pages);
+}
+
+/**
+ * gpucg_register_device - Registers a device for memory accounting using the
+ * GPU cgroup controller.
+ *
+ * @device: The device to register for memory accounting.
+ * @name: Pointer to a string literal to denote the name of the device.
+ *
+ * Both @device andd @name must remain valid.
+ */
+void gpucg_register_device(struct gpucg_device *device, const char *name)
+{
+	if (!device)
+		return;
+
+	INIT_LIST_HEAD(&device->dev_node);
+	INIT_LIST_HEAD(&device->rpools);
+
+	mutex_lock(&gpucg_mutex);
+	list_add_tail(&device->dev_node, &gpucg_devices);
+	mutex_unlock(&gpucg_mutex);
+
+	device->name = name;
+}
+
+static int gpucg_resource_show(struct seq_file *sf, void *v)
+{
+	struct gpucg_resource_pool *rpool;
+	struct gpucg *cg = css_to_gpucg(seq_css(sf));
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(rpool, &cg->rpools, cg_node) {
+		seq_printf(sf, "%s %lu\n", rpool->device->name,
+			   page_counter_read(&rpool->total) * PAGE_SIZE);
+	}
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+struct cftype files[] = {
+	{
+		.name = "memory.current",
+		.seq_show = gpucg_resource_show,
+	},
+	{ }     /* terminate */
+};
+
+struct cgroup_subsys gpu_cgrp_subsys = {
+	.css_alloc      = gpucg_css_alloc,
+	.css_free       = gpucg_css_free,
+	.early_init     = false,
+	.legacy_cftypes = files,
+	.dfl_cftypes    = files,
+};
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 3/6] dmabuf: Use the GPU cgroup charge/uncharge APIs
  2022-02-11 16:18 ` T.J. Mercier
  (?)
@ 2022-02-11 16:18   ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

This patch uses the GPU cgroup charge/uncharge APIs to charge buffers
allocated by any DMA-BUF exporter that exports a buffer with a GPU cgroup
device association.

By doing so, it becomes possible to track who allocated/exported a
DMA-BUF even after the allocating process drops all references to a
buffer.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charging/uncharging from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-buf.c | 52 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   | 20 +++++++++++++--
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 602b12d7470d..83d0d1b91547 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -56,6 +56,50 @@ static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
 			     dentry->d_name.name, ret > 0 ? name : "");
 }
 
+#ifdef CONFIG_CGROUP_GPU
+static inline struct gpucg_device *
+exp_info_gpucg_dev(const struct dma_buf_export_info *exp_info)
+{
+	return exp_info->gpucg_dev;
+}
+
+static bool dmabuf_try_charge(struct dma_buf *dmabuf,
+			      struct gpucg_device *gpucg_dev)
+{
+	dmabuf->gpucg = gpucg_get(current);
+	dmabuf->gpucg_dev = gpucg_dev;
+	if (gpucg_try_charge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size)) {
+		gpucg_put(dmabuf->gpucg);
+		dmabuf->gpucg = NULL;
+		dmabuf->gpucg_dev = NULL;
+		return false;
+	}
+	return true;
+}
+
+static void dmabuf_uncharge(struct dma_buf *dmabuf)
+{
+	if (dmabuf->gpucg && dmabuf->gpucg_dev) {
+		gpucg_uncharge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size);
+		gpucg_put(dmabuf->gpucg);
+	}
+}
+#else /* CONFIG_CGROUP_GPU */
+static inline struct gpucg_device *exp_info_gpucg_dev(
+const struct dma_buf_export_info *exp_info)
+{
+	return NULL;
+}
+
+static inline bool dmabuf_try_charge(struct dma_buf *dmabuf,
+				     struct gpucg_device *gpucg_dev))
+{
+	return false;
+}
+
+static inline void dmabuf_uncharge(struct dma_buf *dmabuf) {}
+#endif /* CONFIG_CGROUP_GPU */
+
 static void dma_buf_release(struct dentry *dentry)
 {
 	struct dma_buf *dmabuf;
@@ -79,6 +123,8 @@ static void dma_buf_release(struct dentry *dentry)
 	if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
 		dma_resv_fini(dmabuf->resv);
 
+	dmabuf_uncharge(dmabuf);
+
 	WARN_ON(!list_empty(&dmabuf->attachments));
 	module_put(dmabuf->owner);
 	kfree(dmabuf->name);
@@ -484,6 +530,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 {
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv = exp_info->resv;
+	struct gpucg_device *gpucg_dev = exp_info_gpucg_dev(exp_info);
 	struct file *file;
 	size_t alloc_size = sizeof(struct dma_buf);
 	int ret;
@@ -534,6 +581,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	}
 	dmabuf->resv = resv;
 
+	if (gpucg_dev && !dmabuf_try_charge(dmabuf, gpucg_dev))
+		goto err_charge;
+
 	file = dma_buf_getfile(dmabuf, exp_info->flags);
 	if (IS_ERR(file)) {
 		ret = PTR_ERR(file);
@@ -565,6 +615,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	file->f_path.dentry->d_fsdata = NULL;
 	fput(file);
 err_dmabuf:
+	dmabuf_uncharge(dmabuf);
+err_charge:
 	kfree(dmabuf);
 err_module:
 	module_put(exp_info->owner);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7ab50076e7a6..742f29c3daaf 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -13,6 +13,7 @@
 #ifndef __DMA_BUF_H__
 #define __DMA_BUF_H__
 
+#include <linux/cgroup_gpu.h>
 #include <linux/dma-buf-map.h>
 #include <linux/file.h>
 #include <linux/err.h>
@@ -303,7 +304,7 @@ struct dma_buf {
 	/**
 	 * @size:
 	 *
-	 * Size of the buffer; invariant over the lifetime of the buffer.
+	 * Size of the buffer in bytes; invariant over the lifetime of the buffer.
 	 */
 	size_t size;
 
@@ -453,6 +454,17 @@ struct dma_buf {
 		struct dma_buf *dmabuf;
 	} *sysfs_entry;
 #endif
+
+#ifdef CONFIG_CGROUP_GPU
+	/** @gpucg: Pointer to the cgroup this buffer currently belongs to. */
+	struct gpucg *gpucg;
+
+	/** @gpucg_dev:
+	 *
+	 * Pointer to the cgroup GPU device whence this buffer originates.
+	 */
+	struct gpucg_device *gpucg_dev;
+#endif
 };
 
 /**
@@ -529,9 +541,10 @@ struct dma_buf_attachment {
  * @exp_name:	name of the exporter - useful for debugging.
  * @owner:	pointer to exporter module - used for refcounting kernel module
  * @ops:	Attach allocator-defined dma buf ops to the new buffer
- * @size:	Size of the buffer - invariant over the lifetime of the buffer
+ * @size:	Size of the buffer in bytes - invariant over the lifetime of the buffer
  * @flags:	mode flags for the file
  * @resv:	reservation-object, NULL to allocate default one
+ * @gpucg_dev:	pointer to the gpu cgroup device this buffer belongs to
  * @priv:	Attach private data of allocator to this buffer
  *
  * This structure holds the information required to export the buffer. Used
@@ -544,6 +557,9 @@ struct dma_buf_export_info {
 	size_t size;
 	int flags;
 	struct dma_resv *resv;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device *gpucg_dev;
+#endif
 	void *priv;
 };
 
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 3/6] dmabuf: Use the GPU cgroup charge/uncharge APIs
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: linux-doc, Kenny.Ho, linux-kernel, dri-devel, linaro-mm-sig,
	kaleshsingh, cgroups, T.J. Mercier, linux-media

This patch uses the GPU cgroup charge/uncharge APIs to charge buffers
allocated by any DMA-BUF exporter that exports a buffer with a GPU cgroup
device association.

By doing so, it becomes possible to track who allocated/exported a
DMA-BUF even after the allocating process drops all references to a
buffer.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charging/uncharging from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-buf.c | 52 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   | 20 +++++++++++++--
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 602b12d7470d..83d0d1b91547 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -56,6 +56,50 @@ static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
 			     dentry->d_name.name, ret > 0 ? name : "");
 }
 
+#ifdef CONFIG_CGROUP_GPU
+static inline struct gpucg_device *
+exp_info_gpucg_dev(const struct dma_buf_export_info *exp_info)
+{
+	return exp_info->gpucg_dev;
+}
+
+static bool dmabuf_try_charge(struct dma_buf *dmabuf,
+			      struct gpucg_device *gpucg_dev)
+{
+	dmabuf->gpucg = gpucg_get(current);
+	dmabuf->gpucg_dev = gpucg_dev;
+	if (gpucg_try_charge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size)) {
+		gpucg_put(dmabuf->gpucg);
+		dmabuf->gpucg = NULL;
+		dmabuf->gpucg_dev = NULL;
+		return false;
+	}
+	return true;
+}
+
+static void dmabuf_uncharge(struct dma_buf *dmabuf)
+{
+	if (dmabuf->gpucg && dmabuf->gpucg_dev) {
+		gpucg_uncharge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size);
+		gpucg_put(dmabuf->gpucg);
+	}
+}
+#else /* CONFIG_CGROUP_GPU */
+static inline struct gpucg_device *exp_info_gpucg_dev(
+const struct dma_buf_export_info *exp_info)
+{
+	return NULL;
+}
+
+static inline bool dmabuf_try_charge(struct dma_buf *dmabuf,
+				     struct gpucg_device *gpucg_dev))
+{
+	return false;
+}
+
+static inline void dmabuf_uncharge(struct dma_buf *dmabuf) {}
+#endif /* CONFIG_CGROUP_GPU */
+
 static void dma_buf_release(struct dentry *dentry)
 {
 	struct dma_buf *dmabuf;
@@ -79,6 +123,8 @@ static void dma_buf_release(struct dentry *dentry)
 	if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
 		dma_resv_fini(dmabuf->resv);
 
+	dmabuf_uncharge(dmabuf);
+
 	WARN_ON(!list_empty(&dmabuf->attachments));
 	module_put(dmabuf->owner);
 	kfree(dmabuf->name);
@@ -484,6 +530,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 {
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv = exp_info->resv;
+	struct gpucg_device *gpucg_dev = exp_info_gpucg_dev(exp_info);
 	struct file *file;
 	size_t alloc_size = sizeof(struct dma_buf);
 	int ret;
@@ -534,6 +581,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	}
 	dmabuf->resv = resv;
 
+	if (gpucg_dev && !dmabuf_try_charge(dmabuf, gpucg_dev))
+		goto err_charge;
+
 	file = dma_buf_getfile(dmabuf, exp_info->flags);
 	if (IS_ERR(file)) {
 		ret = PTR_ERR(file);
@@ -565,6 +615,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	file->f_path.dentry->d_fsdata = NULL;
 	fput(file);
 err_dmabuf:
+	dmabuf_uncharge(dmabuf);
+err_charge:
 	kfree(dmabuf);
 err_module:
 	module_put(exp_info->owner);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7ab50076e7a6..742f29c3daaf 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -13,6 +13,7 @@
 #ifndef __DMA_BUF_H__
 #define __DMA_BUF_H__
 
+#include <linux/cgroup_gpu.h>
 #include <linux/dma-buf-map.h>
 #include <linux/file.h>
 #include <linux/err.h>
@@ -303,7 +304,7 @@ struct dma_buf {
 	/**
 	 * @size:
 	 *
-	 * Size of the buffer; invariant over the lifetime of the buffer.
+	 * Size of the buffer in bytes; invariant over the lifetime of the buffer.
 	 */
 	size_t size;
 
@@ -453,6 +454,17 @@ struct dma_buf {
 		struct dma_buf *dmabuf;
 	} *sysfs_entry;
 #endif
+
+#ifdef CONFIG_CGROUP_GPU
+	/** @gpucg: Pointer to the cgroup this buffer currently belongs to. */
+	struct gpucg *gpucg;
+
+	/** @gpucg_dev:
+	 *
+	 * Pointer to the cgroup GPU device whence this buffer originates.
+	 */
+	struct gpucg_device *gpucg_dev;
+#endif
 };
 
 /**
@@ -529,9 +541,10 @@ struct dma_buf_attachment {
  * @exp_name:	name of the exporter - useful for debugging.
  * @owner:	pointer to exporter module - used for refcounting kernel module
  * @ops:	Attach allocator-defined dma buf ops to the new buffer
- * @size:	Size of the buffer - invariant over the lifetime of the buffer
+ * @size:	Size of the buffer in bytes - invariant over the lifetime of the buffer
  * @flags:	mode flags for the file
  * @resv:	reservation-object, NULL to allocate default one
+ * @gpucg_dev:	pointer to the gpu cgroup device this buffer belongs to
  * @priv:	Attach private data of allocator to this buffer
  *
  * This structure holds the information required to export the buffer. Used
@@ -544,6 +557,9 @@ struct dma_buf_export_info {
 	size_t size;
 	int flags;
 	struct dma_resv *resv;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device *gpucg_dev;
+#endif
 	void *priv;
 };
 
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 3/6] dmabuf: Use the GPU cgroup charge/uncharge APIs
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

This patch uses the GPU cgroup charge/uncharge APIs to charge buffers
allocated by any DMA-BUF exporter that exports a buffer with a GPU cgroup
device association.

By doing so, it becomes possible to track who allocated/exported a
DMA-BUF even after the allocating process drops all references to a
buffer.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charging/uncharging from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-buf.c | 52 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   | 20 +++++++++++++--
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 602b12d7470d..83d0d1b91547 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -56,6 +56,50 @@ static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
 			     dentry->d_name.name, ret > 0 ? name : "");
 }
 
+#ifdef CONFIG_CGROUP_GPU
+static inline struct gpucg_device *
+exp_info_gpucg_dev(const struct dma_buf_export_info *exp_info)
+{
+	return exp_info->gpucg_dev;
+}
+
+static bool dmabuf_try_charge(struct dma_buf *dmabuf,
+			      struct gpucg_device *gpucg_dev)
+{
+	dmabuf->gpucg = gpucg_get(current);
+	dmabuf->gpucg_dev = gpucg_dev;
+	if (gpucg_try_charge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size)) {
+		gpucg_put(dmabuf->gpucg);
+		dmabuf->gpucg = NULL;
+		dmabuf->gpucg_dev = NULL;
+		return false;
+	}
+	return true;
+}
+
+static void dmabuf_uncharge(struct dma_buf *dmabuf)
+{
+	if (dmabuf->gpucg && dmabuf->gpucg_dev) {
+		gpucg_uncharge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size);
+		gpucg_put(dmabuf->gpucg);
+	}
+}
+#else /* CONFIG_CGROUP_GPU */
+static inline struct gpucg_device *exp_info_gpucg_dev(
+const struct dma_buf_export_info *exp_info)
+{
+	return NULL;
+}
+
+static inline bool dmabuf_try_charge(struct dma_buf *dmabuf,
+				     struct gpucg_device *gpucg_dev))
+{
+	return false;
+}
+
+static inline void dmabuf_uncharge(struct dma_buf *dmabuf) {}
+#endif /* CONFIG_CGROUP_GPU */
+
 static void dma_buf_release(struct dentry *dentry)
 {
 	struct dma_buf *dmabuf;
@@ -79,6 +123,8 @@ static void dma_buf_release(struct dentry *dentry)
 	if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
 		dma_resv_fini(dmabuf->resv);
 
+	dmabuf_uncharge(dmabuf);
+
 	WARN_ON(!list_empty(&dmabuf->attachments));
 	module_put(dmabuf->owner);
 	kfree(dmabuf->name);
@@ -484,6 +530,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 {
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv = exp_info->resv;
+	struct gpucg_device *gpucg_dev = exp_info_gpucg_dev(exp_info);
 	struct file *file;
 	size_t alloc_size = sizeof(struct dma_buf);
 	int ret;
@@ -534,6 +581,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	}
 	dmabuf->resv = resv;
 
+	if (gpucg_dev && !dmabuf_try_charge(dmabuf, gpucg_dev))
+		goto err_charge;
+
 	file = dma_buf_getfile(dmabuf, exp_info->flags);
 	if (IS_ERR(file)) {
 		ret = PTR_ERR(file);
@@ -565,6 +615,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	file->f_path.dentry->d_fsdata = NULL;
 	fput(file);
 err_dmabuf:
+	dmabuf_uncharge(dmabuf);
+err_charge:
 	kfree(dmabuf);
 err_module:
 	module_put(exp_info->owner);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7ab50076e7a6..742f29c3daaf 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -13,6 +13,7 @@
 #ifndef __DMA_BUF_H__
 #define __DMA_BUF_H__
 
+#include <linux/cgroup_gpu.h>
 #include <linux/dma-buf-map.h>
 #include <linux/file.h>
 #include <linux/err.h>
@@ -303,7 +304,7 @@ struct dma_buf {
 	/**
 	 * @size:
 	 *
-	 * Size of the buffer; invariant over the lifetime of the buffer.
+	 * Size of the buffer in bytes; invariant over the lifetime of the buffer.
 	 */
 	size_t size;
 
@@ -453,6 +454,17 @@ struct dma_buf {
 		struct dma_buf *dmabuf;
 	} *sysfs_entry;
 #endif
+
+#ifdef CONFIG_CGROUP_GPU
+	/** @gpucg: Pointer to the cgroup this buffer currently belongs to. */
+	struct gpucg *gpucg;
+
+	/** @gpucg_dev:
+	 *
+	 * Pointer to the cgroup GPU device whence this buffer originates.
+	 */
+	struct gpucg_device *gpucg_dev;
+#endif
 };
 
 /**
@@ -529,9 +541,10 @@ struct dma_buf_attachment {
  * @exp_name:	name of the exporter - useful for debugging.
  * @owner:	pointer to exporter module - used for refcounting kernel module
  * @ops:	Attach allocator-defined dma buf ops to the new buffer
- * @size:	Size of the buffer - invariant over the lifetime of the buffer
+ * @size:	Size of the buffer in bytes - invariant over the lifetime of the buffer
  * @flags:	mode flags for the file
  * @resv:	reservation-object, NULL to allocate default one
+ * @gpucg_dev:	pointer to the gpu cgroup device this buffer belongs to
  * @priv:	Attach private data of allocator to this buffer
  *
  * This structure holds the information required to export the buffer. Used
@@ -544,6 +557,9 @@ struct dma_buf_export_info {
 	size_t size;
 	int flags;
 	struct dma_resv *resv;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device *gpucg_dev;
+#endif
 	void *priv;
 };
 
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 4/6] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
  2022-02-11 16:18 ` T.J. Mercier
  (?)
@ 2022-02-11 16:18   ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

All DMA heaps now register a new GPU cgroup device upon creation, and the
system_heap now exports buffers associated with its GPU cgroup device for
tracking purposes.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
 drivers/dma-buf/heaps/system_heap.c |  3 +++
 include/linux/dma-heap.h            | 11 +++++++++++
 3 files changed, 41 insertions(+)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 8f5848aa144f..885072427775 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/debugfs.h>
 #include <linux/device.h>
 #include <linux/dma-buf.h>
@@ -31,6 +32,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @gpucg_dev		gpu cgroup device for memory accounting
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -41,6 +43,9 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device gpucg_dev;
+#endif
 };
 
 static LIST_HEAD(heap_list);
@@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
 	return heap->name;
 }
 
+#ifdef CONFIG_CGROUP_GPU
+/**
+ * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
+ * @heap: DMA-Heap to get the gpucg_device struct for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
+ * not enabled.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return &heap->gpucg_dev;
+}
+#else /* CONFIG_CGROUP_GPU */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return NULL;
+}
+#endif /* CONFIG_CGROUP_GPU */
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
@@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	list_add(&heap->list, &heap_list);
 	mutex_unlock(&heap_list_lock);
 
+	gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
+
 	return heap;
 
 err2:
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index ab7fd896d2c4..752a05c3cfe2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	exp_info.ops = &system_heap_buf_ops;
 	exp_info.size = buffer->len;
 	exp_info.flags = fd_flags;
+#ifdef CONFIG_CGROUP_GPU
+	exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap);
+#endif
 	exp_info.priv = buffer;
 	dmabuf = dma_buf_export(&exp_info);
 	if (IS_ERR(dmabuf)) {
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 0c05561cad6e..e447a61d054e 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -10,6 +10,7 @@
 #define _DMA_HEAPS_H
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/types.h>
 
 struct dma_heap;
@@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
  */
 const char *dma_heap_get_name(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
+ * heap.
+ * @heap: DMA-Heap to retrieve gpucg_device for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 4/6] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: linux-doc, Kenny.Ho, linux-kernel, dri-devel, linaro-mm-sig,
	kaleshsingh, cgroups, T.J. Mercier, linux-media

All DMA heaps now register a new GPU cgroup device upon creation, and the
system_heap now exports buffers associated with its GPU cgroup device for
tracking purposes.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
 drivers/dma-buf/heaps/system_heap.c |  3 +++
 include/linux/dma-heap.h            | 11 +++++++++++
 3 files changed, 41 insertions(+)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 8f5848aa144f..885072427775 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/debugfs.h>
 #include <linux/device.h>
 #include <linux/dma-buf.h>
@@ -31,6 +32,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @gpucg_dev		gpu cgroup device for memory accounting
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -41,6 +43,9 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device gpucg_dev;
+#endif
 };
 
 static LIST_HEAD(heap_list);
@@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
 	return heap->name;
 }
 
+#ifdef CONFIG_CGROUP_GPU
+/**
+ * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
+ * @heap: DMA-Heap to get the gpucg_device struct for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
+ * not enabled.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return &heap->gpucg_dev;
+}
+#else /* CONFIG_CGROUP_GPU */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return NULL;
+}
+#endif /* CONFIG_CGROUP_GPU */
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
@@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	list_add(&heap->list, &heap_list);
 	mutex_unlock(&heap_list_lock);
 
+	gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
+
 	return heap;
 
 err2:
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index ab7fd896d2c4..752a05c3cfe2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	exp_info.ops = &system_heap_buf_ops;
 	exp_info.size = buffer->len;
 	exp_info.flags = fd_flags;
+#ifdef CONFIG_CGROUP_GPU
+	exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap);
+#endif
 	exp_info.priv = buffer;
 	dmabuf = dma_buf_export(&exp_info);
 	if (IS_ERR(dmabuf)) {
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 0c05561cad6e..e447a61d054e 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -10,6 +10,7 @@
 #define _DMA_HEAPS_H
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/types.h>
 
 struct dma_heap;
@@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
  */
 const char *dma_heap_get_name(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
+ * heap.
+ * @heap: DMA-Heap to retrieve gpucg_device for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 4/6] dmabuf: heaps: export system_heap buffers with GPU cgroup charging
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

All DMA heaps now register a new GPU cgroup device upon creation, and the
system_heap now exports buffers associated with its GPU cgroup device for
tracking purposes.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
 drivers/dma-buf/heaps/system_heap.c |  3 +++
 include/linux/dma-heap.h            | 11 +++++++++++
 3 files changed, 41 insertions(+)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 8f5848aa144f..885072427775 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/debugfs.h>
 #include <linux/device.h>
 #include <linux/dma-buf.h>
@@ -31,6 +32,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @gpucg_dev		gpu cgroup device for memory accounting
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -41,6 +43,9 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device gpucg_dev;
+#endif
 };
 
 static LIST_HEAD(heap_list);
@@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
 	return heap->name;
 }
 
+#ifdef CONFIG_CGROUP_GPU
+/**
+ * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
+ * @heap: DMA-Heap to get the gpucg_device struct for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
+ * not enabled.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return &heap->gpucg_dev;
+}
+#else /* CONFIG_CGROUP_GPU */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return NULL;
+}
+#endif /* CONFIG_CGROUP_GPU */
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
@@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	list_add(&heap->list, &heap_list);
 	mutex_unlock(&heap_list_lock);
 
+	gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
+
 	return heap;
 
 err2:
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index ab7fd896d2c4..752a05c3cfe2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	exp_info.ops = &system_heap_buf_ops;
 	exp_info.size = buffer->len;
 	exp_info.flags = fd_flags;
+#ifdef CONFIG_CGROUP_GPU
+	exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap);
+#endif
 	exp_info.priv = buffer;
 	dmabuf = dma_buf_export(&exp_info);
 	if (IS_ERR(dmabuf)) {
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 0c05561cad6e..e447a61d054e 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -10,6 +10,7 @@
 #define _DMA_HEAPS_H
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/types.h>
 
 struct dma_heap;
@@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
  */
 const char *dma_heap_get_name(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
+ * heap.
+ * @heap: DMA-Heap to retrieve gpucg_device for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 5/6] dmabuf: Add gpu cgroup charge transfer function
  2022-02-11 16:18 ` T.J. Mercier
  (?)
@ 2022-02-11 16:18   ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

The dma_buf_charge_transfer function provides a way for processes to
transfer charge of a buffer to a different process. This is essential
for the cases where a central allocator process does allocations for
various subsystems, hands over the fd to the client who requested the
memory and drops all references to the allocated memory.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-buf.c | 48 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   |  2 ++
 2 files changed, 50 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 83d0d1b91547..55e1b982f840 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1374,6 +1374,54 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
 
+/**
+ * dma_buf_charge_transfer - Change the GPU cgroup to which the provided dma_buf
+ * is charged.
+ * @dmabuf:	[in]	buffer whose charge will be migrated to a different GPU
+ *			cgroup
+ * @gpucg:	[in]	the destination GPU cgroup for dmabuf's charge
+ *
+ * Only tasks that belong to the same cgroup the buffer is currently charged to
+ * may call this function, otherwise it will return -EPERM.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_charge_transfer(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg *current_gpucg;
+	int ret = 0;
+
+	/*
+	 * Verify that the cgroup of the process requesting the transfer is the
+	 * same as the one the buffer is currently charged to.
+	 */
+	current_gpucg = gpucg_get(current);
+	mutex_lock(&dmabuf->lock);
+	if (current_gpucg != dmabuf->gpucg) {
+		ret = -EPERM;
+		goto err;
+	}
+
+	ret = gpucg_try_charge(gpucg, dmabuf->gpucg_dev, dmabuf->size);
+	if (ret)
+		goto err;
+
+	dmabuf->gpucg = gpucg;
+
+	/* uncharge the buffer from the cgroup it's currently charged to. */
+	gpucg_uncharge(current_gpucg, dmabuf->gpucg_dev, dmabuf->size);
+
+err:
+	mutex_unlock(&dmabuf->lock);
+	gpucg_put(current_gpucg);
+	return ret;
+#else
+	return 0;
+#endif /* CONFIG_CGROUP_GPU */
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_charge_transfer, DMA_BUF);
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 742f29c3daaf..85c940c08867 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -646,4 +646,6 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 		 unsigned long);
 int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+int dma_buf_charge_transfer(struct dma_buf *dmabuf, struct gpucg *gpucg);
 #endif /* __DMA_BUF_H__ */
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 5/6] dmabuf: Add gpu cgroup charge transfer function
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: linux-doc, Kenny.Ho, linux-kernel, dri-devel, linaro-mm-sig,
	kaleshsingh, cgroups, T.J. Mercier, linux-media

The dma_buf_charge_transfer function provides a way for processes to
transfer charge of a buffer to a different process. This is essential
for the cases where a central allocator process does allocations for
various subsystems, hands over the fd to the client who requested the
memory and drops all references to the allocated memory.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-buf.c | 48 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   |  2 ++
 2 files changed, 50 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 83d0d1b91547..55e1b982f840 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1374,6 +1374,54 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
 
+/**
+ * dma_buf_charge_transfer - Change the GPU cgroup to which the provided dma_buf
+ * is charged.
+ * @dmabuf:	[in]	buffer whose charge will be migrated to a different GPU
+ *			cgroup
+ * @gpucg:	[in]	the destination GPU cgroup for dmabuf's charge
+ *
+ * Only tasks that belong to the same cgroup the buffer is currently charged to
+ * may call this function, otherwise it will return -EPERM.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_charge_transfer(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg *current_gpucg;
+	int ret = 0;
+
+	/*
+	 * Verify that the cgroup of the process requesting the transfer is the
+	 * same as the one the buffer is currently charged to.
+	 */
+	current_gpucg = gpucg_get(current);
+	mutex_lock(&dmabuf->lock);
+	if (current_gpucg != dmabuf->gpucg) {
+		ret = -EPERM;
+		goto err;
+	}
+
+	ret = gpucg_try_charge(gpucg, dmabuf->gpucg_dev, dmabuf->size);
+	if (ret)
+		goto err;
+
+	dmabuf->gpucg = gpucg;
+
+	/* uncharge the buffer from the cgroup it's currently charged to. */
+	gpucg_uncharge(current_gpucg, dmabuf->gpucg_dev, dmabuf->size);
+
+err:
+	mutex_unlock(&dmabuf->lock);
+	gpucg_put(current_gpucg);
+	return ret;
+#else
+	return 0;
+#endif /* CONFIG_CGROUP_GPU */
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_charge_transfer, DMA_BUF);
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 742f29c3daaf..85c940c08867 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -646,4 +646,6 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 		 unsigned long);
 int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+int dma_buf_charge_transfer(struct dma_buf *dmabuf, struct gpucg *gpucg);
 #endif /* __DMA_BUF_H__ */
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 5/6] dmabuf: Add gpu cgroup charge transfer function
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

The dma_buf_charge_transfer function provides a way for processes to
transfer charge of a buffer to a different process. This is essential
for the cases where a central allocator process does allocations for
various subsystems, hands over the fd to the client who requested the
memory and drops all references to the allocated memory.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/dma-buf/dma-buf.c | 48 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   |  2 ++
 2 files changed, 50 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 83d0d1b91547..55e1b982f840 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1374,6 +1374,54 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
 
+/**
+ * dma_buf_charge_transfer - Change the GPU cgroup to which the provided dma_buf
+ * is charged.
+ * @dmabuf:	[in]	buffer whose charge will be migrated to a different GPU
+ *			cgroup
+ * @gpucg:	[in]	the destination GPU cgroup for dmabuf's charge
+ *
+ * Only tasks that belong to the same cgroup the buffer is currently charged to
+ * may call this function, otherwise it will return -EPERM.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_charge_transfer(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg *current_gpucg;
+	int ret = 0;
+
+	/*
+	 * Verify that the cgroup of the process requesting the transfer is the
+	 * same as the one the buffer is currently charged to.
+	 */
+	current_gpucg = gpucg_get(current);
+	mutex_lock(&dmabuf->lock);
+	if (current_gpucg != dmabuf->gpucg) {
+		ret = -EPERM;
+		goto err;
+	}
+
+	ret = gpucg_try_charge(gpucg, dmabuf->gpucg_dev, dmabuf->size);
+	if (ret)
+		goto err;
+
+	dmabuf->gpucg = gpucg;
+
+	/* uncharge the buffer from the cgroup it's currently charged to. */
+	gpucg_uncharge(current_gpucg, dmabuf->gpucg_dev, dmabuf->size);
+
+err:
+	mutex_unlock(&dmabuf->lock);
+	gpucg_put(current_gpucg);
+	return ret;
+#else
+	return 0;
+#endif /* CONFIG_CGROUP_GPU */
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_charge_transfer, DMA_BUF);
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 742f29c3daaf..85c940c08867 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -646,4 +646,6 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 		 unsigned long);
 int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+int dma_buf_charge_transfer(struct dma_buf *dmabuf, struct gpucg *gpucg);
 #endif /* __DMA_BUF_H__ */
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-11 16:18 ` T.J. Mercier
  (?)
@ 2022-02-11 16:18   ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
that a process sending an fd array to another process over binder IPC
can set to relinquish ownership of the fds being sent for memory
accounting purposes. If the flag is found to be set during the fd array
translation and the fd is for a DMA-BUF, the buffer is uncharged from
the sender's cgroup and charged to the receiving process's cgroup
instead.

It is up to the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL process
allocates memory and sends the fds to the clients over binder IPC.
The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
correctly charge the buffers to the client processes instead of the
graphics allocator HAL.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
 include/uapi/linux/android/binder.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 8351c5638880..f50d88ded188 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/dma-buf.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
@@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 {
 	binder_size_t fdi, fd_buf_size;
 	binder_size_t fda_offset;
+	bool transfer_gpu_charge = false;
 	const void __user *sender_ufda_base;
 	struct binder_proc *proc = thread->proc;
+	struct binder_proc *target_proc = t->to_proc;
 	int ret;
 
 	fd_buf_size = sizeof(u32) * fda->num_fds;
@@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 	if (ret)
 		return ret;
 
+	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
+		parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
+		transfer_gpu_charge = true;
+
 	for (fdi = 0; fdi < fda->num_fds; fdi++) {
 		u32 fd;
+		struct dma_buf *dmabuf;
+		struct gpucg *gpucg;
+
 		binder_size_t offset = fda_offset + fdi * sizeof(fd);
 		binder_size_t sender_uoffset = fdi * sizeof(fd);
 
@@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 						  in_reply_to);
 		if (ret)
 			return ret > 0 ? -EINVAL : ret;
+
+		if (!transfer_gpu_charge)
+			continue;
+
+		dmabuf = dma_buf_get(fd);
+		if (IS_ERR(dmabuf))
+			continue;
+
+		gpucg = gpucg_get(target_proc->tsk);
+		ret = dma_buf_charge_transfer(dmabuf, gpucg);
+		if (ret) {
+			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
+				proc->pid, thread->pid, target_proc->pid);
+			gpucg_put(gpucg);
+		}
+		dma_buf_put(dmabuf);
 	}
 	return 0;
 }
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index 3246f2c74696..169fd5069a1a 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -137,6 +137,7 @@ struct binder_buffer_object {
 
 enum {
 	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
+	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
 };
 
 /* struct binder_fd_array_object - object describing an array of fds in a buffer
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: linux-doc, Kenny.Ho, linux-kernel, dri-devel, linaro-mm-sig,
	kaleshsingh, cgroups, T.J. Mercier, linux-media

This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
that a process sending an fd array to another process over binder IPC
can set to relinquish ownership of the fds being sent for memory
accounting purposes. If the flag is found to be set during the fd array
translation and the fd is for a DMA-BUF, the buffer is uncharged from
the sender's cgroup and charged to the receiving process's cgroup
instead.

It is up to the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL process
allocates memory and sends the fds to the clients over binder IPC.
The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
correctly charge the buffers to the client processes instead of the
graphics allocator HAL.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
 include/uapi/linux/android/binder.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 8351c5638880..f50d88ded188 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/dma-buf.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
@@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 {
 	binder_size_t fdi, fd_buf_size;
 	binder_size_t fda_offset;
+	bool transfer_gpu_charge = false;
 	const void __user *sender_ufda_base;
 	struct binder_proc *proc = thread->proc;
+	struct binder_proc *target_proc = t->to_proc;
 	int ret;
 
 	fd_buf_size = sizeof(u32) * fda->num_fds;
@@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 	if (ret)
 		return ret;
 
+	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
+		parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
+		transfer_gpu_charge = true;
+
 	for (fdi = 0; fdi < fda->num_fds; fdi++) {
 		u32 fd;
+		struct dma_buf *dmabuf;
+		struct gpucg *gpucg;
+
 		binder_size_t offset = fda_offset + fdi * sizeof(fd);
 		binder_size_t sender_uoffset = fdi * sizeof(fd);
 
@@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 						  in_reply_to);
 		if (ret)
 			return ret > 0 ? -EINVAL : ret;
+
+		if (!transfer_gpu_charge)
+			continue;
+
+		dmabuf = dma_buf_get(fd);
+		if (IS_ERR(dmabuf))
+			continue;
+
+		gpucg = gpucg_get(target_proc->tsk);
+		ret = dma_buf_charge_transfer(dmabuf, gpucg);
+		if (ret) {
+			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
+				proc->pid, thread->pid, target_proc->pid);
+			gpucg_put(gpucg);
+		}
+		dma_buf_put(dmabuf);
 	}
 	return 0;
 }
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index 3246f2c74696..169fd5069a1a 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -137,6 +137,7 @@ struct binder_buffer_object {
 
 enum {
 	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
+	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
 };
 
 /* struct binder_fd_array_object - object describing an array of fds in a buffer
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-11 16:18   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-11 16:18 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark
  Cc: kaleshsingh, Kenny.Ho, T.J. Mercier, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
that a process sending an fd array to another process over binder IPC
can set to relinquish ownership of the fds being sent for memory
accounting purposes. If the flag is found to be set during the fd array
translation and the fd is for a DMA-BUF, the buffer is uncharged from
the sender's cgroup and charged to the receiving process's cgroup
instead.

It is up to the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL process
allocates memory and sends the fds to the clients over binder IPC.
The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
correctly charge the buffers to the client processes instead of the
graphics allocator HAL.

From: Hridya Valsaraju <hridya@google.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Co-developed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
changes in v2
- Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König.

 drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
 include/uapi/linux/android/binder.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 8351c5638880..f50d88ded188 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/dma-buf.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
@@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 {
 	binder_size_t fdi, fd_buf_size;
 	binder_size_t fda_offset;
+	bool transfer_gpu_charge = false;
 	const void __user *sender_ufda_base;
 	struct binder_proc *proc = thread->proc;
+	struct binder_proc *target_proc = t->to_proc;
 	int ret;
 
 	fd_buf_size = sizeof(u32) * fda->num_fds;
@@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 	if (ret)
 		return ret;
 
+	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
+		parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
+		transfer_gpu_charge = true;
+
 	for (fdi = 0; fdi < fda->num_fds; fdi++) {
 		u32 fd;
+		struct dma_buf *dmabuf;
+		struct gpucg *gpucg;
+
 		binder_size_t offset = fda_offset + fdi * sizeof(fd);
 		binder_size_t sender_uoffset = fdi * sizeof(fd);
 
@@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 						  in_reply_to);
 		if (ret)
 			return ret > 0 ? -EINVAL : ret;
+
+		if (!transfer_gpu_charge)
+			continue;
+
+		dmabuf = dma_buf_get(fd);
+		if (IS_ERR(dmabuf))
+			continue;
+
+		gpucg = gpucg_get(target_proc->tsk);
+		ret = dma_buf_charge_transfer(dmabuf, gpucg);
+		if (ret) {
+			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
+				proc->pid, thread->pid, target_proc->pid);
+			gpucg_put(gpucg);
+		}
+		dma_buf_put(dmabuf);
 	}
 	return 0;
 }
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index 3246f2c74696..169fd5069a1a 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -137,6 +137,7 @@ struct binder_buffer_object {
 
 enum {
 	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
+	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
 };
 
 /* struct binder_fd_array_object - object describing an array of fds in a buffer
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-11 16:18   ` T.J. Mercier
  (?)
@ 2022-02-12  7:19     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-12  7:19 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, kaleshsingh,
	Kenny.Ho, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups

On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> that a process sending an fd array to another process over binder IPC
> can set to relinquish ownership of the fds being sent for memory
> accounting purposes. If the flag is found to be set during the fd array
> translation and the fd is for a DMA-BUF, the buffer is uncharged from
> the sender's cgroup and charged to the receiving process's cgroup
> instead.
> 
> It is up to the sending process to ensure that it closes the fds
> regardless of whether the transfer failed or succeeded.
> 
> Most graphics shared memory allocations in Android are done by the
> graphics allocator HAL process. On requests from clients, the HAL process
> allocates memory and sends the fds to the clients over binder IPC.
> The graphics allocator HAL will not retain any references to the
> buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> correctly charge the buffers to the client processes instead of the
> graphics allocator HAL.
> 
> From: Hridya Valsaraju <hridya@google.com>
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> Co-developed-by: T.J. Mercier <tjmercier@google.com>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> ---
> changes in v2
> - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König.
> 
>  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/android/binder.h |  1 +
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 8351c5638880..f50d88ded188 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -42,6 +42,7 @@
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include <linux/dma-buf.h>
>  #include <linux/fdtable.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
> @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  {
>  	binder_size_t fdi, fd_buf_size;
>  	binder_size_t fda_offset;
> +	bool transfer_gpu_charge = false;
>  	const void __user *sender_ufda_base;
>  	struct binder_proc *proc = thread->proc;
> +	struct binder_proc *target_proc = t->to_proc;
>  	int ret;
>  
>  	fd_buf_size = sizeof(u32) * fda->num_fds;
> @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  	if (ret)
>  		return ret;
>  
> +	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> +		parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> +		transfer_gpu_charge = true;
> +
>  	for (fdi = 0; fdi < fda->num_fds; fdi++) {
>  		u32 fd;
> +		struct dma_buf *dmabuf;
> +		struct gpucg *gpucg;
> +
>  		binder_size_t offset = fda_offset + fdi * sizeof(fd);
>  		binder_size_t sender_uoffset = fdi * sizeof(fd);
>  
> @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  						  in_reply_to);
>  		if (ret)
>  			return ret > 0 ? -EINVAL : ret;
> +
> +		if (!transfer_gpu_charge)
> +			continue;
> +
> +		dmabuf = dma_buf_get(fd);
> +		if (IS_ERR(dmabuf))
> +			continue;
> +
> +		gpucg = gpucg_get(target_proc->tsk);
> +		ret = dma_buf_charge_transfer(dmabuf, gpucg);
> +		if (ret) {
> +			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> +				proc->pid, thread->pid, target_proc->pid);
> +			gpucg_put(gpucg);
> +		}
> +		dma_buf_put(dmabuf);
>  	}
>  	return 0;
>  }
> diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> index 3246f2c74696..169fd5069a1a 100644
> --- a/include/uapi/linux/android/binder.h
> +++ b/include/uapi/linux/android/binder.h
> @@ -137,6 +137,7 @@ struct binder_buffer_object {
>  
>  enum {
>  	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> +	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
>  };
>  
>  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> -- 
> 2.35.1.265.g69c8d7142f-goog
> 

How does userspace know that binder supports this new flag?  And where
is the userspace test for this new feature?  Isn't there a binder test
framework somewhere?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-12  7:19     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-12  7:19 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	kaleshsingh, Joel Fernandes, Sumit Semwal, Jonathan Corbet,
	Martijn Coenen, Laura Abbott, linux-media, Todd Kjos,
	linaro-mm-sig, Tejun Heo, cgroups, Suren Baghdasaryan,
	Christian Brauner, Kenny.Ho, linux-kernel, Liam Mark,
	Christian König, Arve Hjønnevåg,
	Thomas Zimmermann, Johannes Weiner, Hridya Valsaraju

On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> that a process sending an fd array to another process over binder IPC
> can set to relinquish ownership of the fds being sent for memory
> accounting purposes. If the flag is found to be set during the fd array
> translation and the fd is for a DMA-BUF, the buffer is uncharged from
> the sender's cgroup and charged to the receiving process's cgroup
> instead.
> 
> It is up to the sending process to ensure that it closes the fds
> regardless of whether the transfer failed or succeeded.
> 
> Most graphics shared memory allocations in Android are done by the
> graphics allocator HAL process. On requests from clients, the HAL process
> allocates memory and sends the fds to the clients over binder IPC.
> The graphics allocator HAL will not retain any references to the
> buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> correctly charge the buffers to the client processes instead of the
> graphics allocator HAL.
> 
> From: Hridya Valsaraju <hridya@google.com>
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> Co-developed-by: T.J. Mercier <tjmercier@google.com>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> ---
> changes in v2
> - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König.
> 
>  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/android/binder.h |  1 +
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 8351c5638880..f50d88ded188 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -42,6 +42,7 @@
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include <linux/dma-buf.h>
>  #include <linux/fdtable.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
> @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  {
>  	binder_size_t fdi, fd_buf_size;
>  	binder_size_t fda_offset;
> +	bool transfer_gpu_charge = false;
>  	const void __user *sender_ufda_base;
>  	struct binder_proc *proc = thread->proc;
> +	struct binder_proc *target_proc = t->to_proc;
>  	int ret;
>  
>  	fd_buf_size = sizeof(u32) * fda->num_fds;
> @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  	if (ret)
>  		return ret;
>  
> +	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> +		parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> +		transfer_gpu_charge = true;
> +
>  	for (fdi = 0; fdi < fda->num_fds; fdi++) {
>  		u32 fd;
> +		struct dma_buf *dmabuf;
> +		struct gpucg *gpucg;
> +
>  		binder_size_t offset = fda_offset + fdi * sizeof(fd);
>  		binder_size_t sender_uoffset = fdi * sizeof(fd);
>  
> @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  						  in_reply_to);
>  		if (ret)
>  			return ret > 0 ? -EINVAL : ret;
> +
> +		if (!transfer_gpu_charge)
> +			continue;
> +
> +		dmabuf = dma_buf_get(fd);
> +		if (IS_ERR(dmabuf))
> +			continue;
> +
> +		gpucg = gpucg_get(target_proc->tsk);
> +		ret = dma_buf_charge_transfer(dmabuf, gpucg);
> +		if (ret) {
> +			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> +				proc->pid, thread->pid, target_proc->pid);
> +			gpucg_put(gpucg);
> +		}
> +		dma_buf_put(dmabuf);
>  	}
>  	return 0;
>  }
> diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> index 3246f2c74696..169fd5069a1a 100644
> --- a/include/uapi/linux/android/binder.h
> +++ b/include/uapi/linux/android/binder.h
> @@ -137,6 +137,7 @@ struct binder_buffer_object {
>  
>  enum {
>  	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> +	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
>  };
>  
>  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> -- 
> 2.35.1.265.g69c8d7142f-goog
> 

How does userspace know that binder supports this new flag?  And where
is the userspace test for this new feature?  Isn't there a binder test
framework somewhere?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-12  7:19     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-12  7:19 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott

On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> that a process sending an fd array to another process over binder IPC
> can set to relinquish ownership of the fds being sent for memory
> accounting purposes. If the flag is found to be set during the fd array
> translation and the fd is for a DMA-BUF, the buffer is uncharged from
> the sender's cgroup and charged to the receiving process's cgroup
> instead.
> 
> It is up to the sending process to ensure that it closes the fds
> regardless of whether the transfer failed or succeeded.
> 
> Most graphics shared memory allocations in Android are done by the
> graphics allocator HAL process. On requests from clients, the HAL process
> allocates memory and sends the fds to the clients over binder IPC.
> The graphics allocator HAL will not retain any references to the
> buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> correctly charge the buffers to the client processes instead of the
> graphics allocator HAL.
> 
> From: Hridya Valsaraju <hridya@google.com>
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> Co-developed-by: T.J. Mercier <tjmercier@google.com>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> ---
> changes in v2
> - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König.
> 
>  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/android/binder.h |  1 +
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 8351c5638880..f50d88ded188 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -42,6 +42,7 @@
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include <linux/dma-buf.h>
>  #include <linux/fdtable.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
> @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  {
>  	binder_size_t fdi, fd_buf_size;
>  	binder_size_t fda_offset;
> +	bool transfer_gpu_charge = false;
>  	const void __user *sender_ufda_base;
>  	struct binder_proc *proc = thread->proc;
> +	struct binder_proc *target_proc = t->to_proc;
>  	int ret;
>  
>  	fd_buf_size = sizeof(u32) * fda->num_fds;
> @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  	if (ret)
>  		return ret;
>  
> +	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> +		parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> +		transfer_gpu_charge = true;
> +
>  	for (fdi = 0; fdi < fda->num_fds; fdi++) {
>  		u32 fd;
> +		struct dma_buf *dmabuf;
> +		struct gpucg *gpucg;
> +
>  		binder_size_t offset = fda_offset + fdi * sizeof(fd);
>  		binder_size_t sender_uoffset = fdi * sizeof(fd);
>  
> @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  						  in_reply_to);
>  		if (ret)
>  			return ret > 0 ? -EINVAL : ret;
> +
> +		if (!transfer_gpu_charge)
> +			continue;
> +
> +		dmabuf = dma_buf_get(fd);
> +		if (IS_ERR(dmabuf))
> +			continue;
> +
> +		gpucg = gpucg_get(target_proc->tsk);
> +		ret = dma_buf_charge_transfer(dmabuf, gpucg);
> +		if (ret) {
> +			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> +				proc->pid, thread->pid, target_proc->pid);
> +			gpucg_put(gpucg);
> +		}
> +		dma_buf_put(dmabuf);
>  	}
>  	return 0;
>  }
> diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> index 3246f2c74696..169fd5069a1a 100644
> --- a/include/uapi/linux/android/binder.h
> +++ b/include/uapi/linux/android/binder.h
> @@ -137,6 +137,7 @@ struct binder_buffer_object {
>  
>  enum {
>  	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> +	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
>  };
>  
>  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> -- 
> 2.35.1.265.g69c8d7142f-goog
> 

How does userspace know that binder supports this new flag?  And where
is the userspace test for this new feature?  Isn't there a binder test
framework somewhere?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-12  7:19     ` Greg Kroah-Hartman
  (?)
@ 2022-02-14 18:33       ` Todd Kjos
  -1 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 18:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: T.J. Mercier, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, kaleshsingh,
	Kenny.Ho, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups

On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:

Title: "android: binder: Add a buffer flag to relinquish ownership of fds"

Please drop the "android:" from the title.

> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >  {
> >       binder_size_t fdi, fd_buf_size;
> >       binder_size_t fda_offset;
> > +     bool transfer_gpu_charge = false;
> >       const void __user *sender_ufda_base;
> >       struct binder_proc *proc = thread->proc;
> > +     struct binder_proc *target_proc = t->to_proc;
> >       int ret;
> >
> >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >       if (ret)
> >               return ret;
> >
> > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +             transfer_gpu_charge = true;
> > +
> >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >               u32 fd;
> > +             struct dma_buf *dmabuf;
> > +             struct gpucg *gpucg;
> > +
> >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                 in_reply_to);
> >               if (ret)
> >                       return ret > 0 ? -EINVAL : ret;
> > +
> > +             if (!transfer_gpu_charge)
> > +                     continue;
> > +
> > +             dmabuf = dma_buf_get(fd);
> > +             if (IS_ERR(dmabuf))
> > +                     continue;
> > +
> > +             gpucg = gpucg_get(target_proc->tsk);
> > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +             if (ret) {
> > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                             proc->pid, thread->pid, target_proc->pid);
> > +                     gpucg_put(gpucg);
> > +             }
> > +             dma_buf_put(dmabuf);

Since we are creating a new gpu cgroup abstraction, couldn't this
"transfer" be done in userspace by the target instead of in the kernel
driver? Then this patch would reduce to just a flag on the buffer
object. This also solves the issue that Greg brought up about
userspace needing to know whether the kernel implements this feature
(older kernel running with newer userspace). I think we could just
reserve some flags for userspace to use (and since those flags are
"reserved" for older kernels, this would enable this feature even for
old kernels)

> >       }
> >       return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >
>
> How does userspace know that binder supports this new flag?  And where
> is the userspace test for this new feature?  Isn't there a binder test
> framework somewhere?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 18:33       ` Todd Kjos
  0 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 18:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	kaleshsingh, Joel Fernandes, Sumit Semwal, Jonathan Corbet,
	Martijn Coenen, Laura Abbott, linux-media, Todd Kjos,
	Thomas Zimmermann, linaro-mm-sig, Tejun Heo, cgroups,
	Suren Baghdasaryan, T.J. Mercier, Christian Brauner, Kenny.Ho,
	linux-kernel, Liam Mark, Christian König,
	Arve Hjønnevåg, Johannes Weiner, Hridya Valsaraju

On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:

Title: "android: binder: Add a buffer flag to relinquish ownership of fds"

Please drop the "android:" from the title.

> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >  {
> >       binder_size_t fdi, fd_buf_size;
> >       binder_size_t fda_offset;
> > +     bool transfer_gpu_charge = false;
> >       const void __user *sender_ufda_base;
> >       struct binder_proc *proc = thread->proc;
> > +     struct binder_proc *target_proc = t->to_proc;
> >       int ret;
> >
> >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >       if (ret)
> >               return ret;
> >
> > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +             transfer_gpu_charge = true;
> > +
> >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >               u32 fd;
> > +             struct dma_buf *dmabuf;
> > +             struct gpucg *gpucg;
> > +
> >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                 in_reply_to);
> >               if (ret)
> >                       return ret > 0 ? -EINVAL : ret;
> > +
> > +             if (!transfer_gpu_charge)
> > +                     continue;
> > +
> > +             dmabuf = dma_buf_get(fd);
> > +             if (IS_ERR(dmabuf))
> > +                     continue;
> > +
> > +             gpucg = gpucg_get(target_proc->tsk);
> > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +             if (ret) {
> > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                             proc->pid, thread->pid, target_proc->pid);
> > +                     gpucg_put(gpucg);
> > +             }
> > +             dma_buf_put(dmabuf);

Since we are creating a new gpu cgroup abstraction, couldn't this
"transfer" be done in userspace by the target instead of in the kernel
driver? Then this patch would reduce to just a flag on the buffer
object. This also solves the issue that Greg brought up about
userspace needing to know whether the kernel implements this feature
(older kernel running with newer userspace). I think we could just
reserve some flags for userspace to use (and since those flags are
"reserved" for older kernels, this would enable this feature even for
old kernels)

> >       }
> >       return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >
>
> How does userspace know that binder supports this new flag?  And where
> is the userspace test for this new feature?  Isn't there a binder test
> framework somewhere?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 18:33       ` Todd Kjos
  0 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 18:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: T.J. Mercier, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark

On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:

Title: "android: binder: Add a buffer flag to relinquish ownership of fds"

Please drop the "android:" from the title.

> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >  {
> >       binder_size_t fdi, fd_buf_size;
> >       binder_size_t fda_offset;
> > +     bool transfer_gpu_charge = false;
> >       const void __user *sender_ufda_base;
> >       struct binder_proc *proc = thread->proc;
> > +     struct binder_proc *target_proc = t->to_proc;
> >       int ret;
> >
> >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >       if (ret)
> >               return ret;
> >
> > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +             transfer_gpu_charge = true;
> > +
> >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >               u32 fd;
> > +             struct dma_buf *dmabuf;
> > +             struct gpucg *gpucg;
> > +
> >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                 in_reply_to);
> >               if (ret)
> >                       return ret > 0 ? -EINVAL : ret;
> > +
> > +             if (!transfer_gpu_charge)
> > +                     continue;
> > +
> > +             dmabuf = dma_buf_get(fd);
> > +             if (IS_ERR(dmabuf))
> > +                     continue;
> > +
> > +             gpucg = gpucg_get(target_proc->tsk);
> > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +             if (ret) {
> > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                             proc->pid, thread->pid, target_proc->pid);
> > +                     gpucg_put(gpucg);
> > +             }
> > +             dma_buf_put(dmabuf);

Since we are creating a new gpu cgroup abstraction, couldn't this
"transfer" be done in userspace by the target instead of in the kernel
driver? Then this patch would reduce to just a flag on the buffer
object. This also solves the issue that Greg brought up about
userspace needing to know whether the kernel implements this feature
(older kernel running with newer userspace). I think we could just
reserve some flags for userspace to use (and since those flags are
"reserved" for older kernels, this would enable this feature even for
old kernels)

> >       }
> >       return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >
>
> How does userspace know that binder supports this new flag?  And where
> is the userspace test for this new feature?  Isn't there a binder test
> framework somewhere?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 0/6] Proposal for a GPU cgroup controller
  2022-02-11 16:18 ` T.J. Mercier
  (?)
@ 2022-02-14 19:23   ` Tejun Heo
  -1 siblings, 0 replies; 63+ messages in thread
From: Tejun Heo @ 2022-02-14 19:23 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	kaleshsingh, Joel Fernandes, Sumit Semwal, Kenny.Ho,
	Jonathan Corbet, Martijn Coenen, Laura Abbott, linux-media,
	Todd Kjos, linaro-mm-sig, cgroups, Suren Baghdasaryan,
	Christian Brauner, Greg Kroah-Hartman, linux-kernel, Liam Mark,
	Christian König, Arve Hjønnevåg,
	Thomas Zimmermann, Johannes Weiner, Hridya Valsaraju

Hello,

On Fri, Feb 11, 2022 at 04:18:23PM +0000, T.J. Mercier wrote:
> The GPU/DRM cgroup controller came into being when a consensus[1]
> was reached that the resources it tracked were unsuitable to be integrated
> into memcg. Originally, the proposed controller was specific to the DRM
> subsystem and was intended to track GEM buffers and GPU-specific
> resources[2]. In order to help establish a unified memory accounting model
> for all GPU and all related subsystems, Daniel Vetter put forth a
> suggestion to move it out of the DRM subsystem so that it can be used by
> other DMA-BUF exporters as well[3]. This RFC proposes an interface that
> does the same.
> 
> [1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
> [2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
> [3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/

IIRC, the only consensus was that it needs to be a separate controller and
folks had trouble agreeing on resource types, control mechanism and
interface. Imma keep an eye on how the discussion develops among GPU folks.
Please feel free to ping if there's an area my input may be useful.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 0/6] Proposal for a GPU cgroup controller
@ 2022-02-14 19:23   ` Tejun Heo
  0 siblings, 0 replies; 63+ messages in thread
From: Tejun Heo @ 2022-02-14 19:23 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Zefan Li, Johannes Weiner, kaleshsingh, Kenny.Ho,
	dri-devel, linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	cgroups

Hello,

On Fri, Feb 11, 2022 at 04:18:23PM +0000, T.J. Mercier wrote:
> The GPU/DRM cgroup controller came into being when a consensus[1]
> was reached that the resources it tracked were unsuitable to be integrated
> into memcg. Originally, the proposed controller was specific to the DRM
> subsystem and was intended to track GEM buffers and GPU-specific
> resources[2]. In order to help establish a unified memory accounting model
> for all GPU and all related subsystems, Daniel Vetter put forth a
> suggestion to move it out of the DRM subsystem so that it can be used by
> other DMA-BUF exporters as well[3]. This RFC proposes an interface that
> does the same.
> 
> [1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
> [2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
> [3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/

IIRC, the only consensus was that it needs to be a separate controller and
folks had trouble agreeing on resource types, control mechanism and
interface. Imma keep an eye on how the discussion develops among GPU folks.
Please feel free to ping if there's an area my input may be useful.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 0/6] Proposal for a GPU cgroup controller
@ 2022-02-14 19:23   ` Tejun Heo
  0 siblings, 0 replies; 63+ messages in thread
From: Tejun Heo @ 2022-02-14 19:23 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark

Hello,

On Fri, Feb 11, 2022 at 04:18:23PM +0000, T.J. Mercier wrote:
> The GPU/DRM cgroup controller came into being when a consensus[1]
> was reached that the resources it tracked were unsuitable to be integrated
> into memcg. Originally, the proposed controller was specific to the DRM
> subsystem and was intended to track GEM buffers and GPU-specific
> resources[2]. In order to help establish a unified memory accounting model
> for all GPU and all related subsystems, Daniel Vetter put forth a
> suggestion to move it out of the DRM subsystem so that it can be used by
> other DMA-BUF exporters as well[3]. This RFC proposes an interface that
> does the same.
> 
> [1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/#22624705
> [2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/
> [3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org/

IIRC, the only consensus was that it needs to be a separate controller and
folks had trouble agreeing on resource types, control mechanism and
interface. Imma keep an eye on how the discussion develops among GPU folks.
Please feel free to ping if there's an area my input may be useful.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-14 18:33       ` Todd Kjos
  (?)
@ 2022-02-14 19:29         ` Suren Baghdasaryan
  -1 siblings, 0 replies; 63+ messages in thread
From: Suren Baghdasaryan @ 2022-02-14 19:29 UTC (permalink / raw)
  To: Todd Kjos
  Cc: Zefan Li, open list:DOCUMENTATION, David Airlie,
	DRI mailing list, Benjamin Gaignard, Kalesh Singh,
	Joel Fernandes, Sumit Semwal, Kenny.Ho, Jonathan Corbet,
	Laura Abbott, linux-media, Todd Kjos, Thomas Zimmermann,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Tejun Heo,
	cgroups mailinglist, Martijn Coenen, T.J. Mercier,
	Christian Brauner, Greg Kroah-Hartman, LKML, Liam Mark,
	Christian König, Arve Hjønnevåg, Johannes Weiner,
	Hridya Valsaraju

On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
>
> On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
>
> Title: "android: binder: Add a buffer flag to relinquish ownership of fds"
>
> Please drop the "android:" from the title.
>
> > > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > > that a process sending an fd array to another process over binder IPC
> > > can set to relinquish ownership of the fds being sent for memory
> > > accounting purposes. If the flag is found to be set during the fd array
> > > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > > the sender's cgroup and charged to the receiving process's cgroup
> > > instead.
> > >
> > > It is up to the sending process to ensure that it closes the fds
> > > regardless of whether the transfer failed or succeeded.
> > >
> > > Most graphics shared memory allocations in Android are done by the
> > > graphics allocator HAL process. On requests from clients, the HAL process
> > > allocates memory and sends the fds to the clients over binder IPC.
> > > The graphics allocator HAL will not retain any references to the
> > > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > > correctly charge the buffers to the client processes instead of the
> > > graphics allocator HAL.
> > >
> > > From: Hridya Valsaraju <hridya@google.com>
> > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > > ---
> > > changes in v2
> > > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > > Christian König.
> > >
> > >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> > >  include/uapi/linux/android/binder.h |  1 +
> > >  2 files changed, 27 insertions(+)
> > >
> > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > > index 8351c5638880..f50d88ded188 100644
> > > --- a/drivers/android/binder.c
> > > +++ b/drivers/android/binder.c
> > > @@ -42,6 +42,7 @@
> > >
> > >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > >
> > > +#include <linux/dma-buf.h>
> > >  #include <linux/fdtable.h>
> > >  #include <linux/file.h>
> > >  #include <linux/freezer.h>
> > > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >  {
> > >       binder_size_t fdi, fd_buf_size;
> > >       binder_size_t fda_offset;
> > > +     bool transfer_gpu_charge = false;
> > >       const void __user *sender_ufda_base;
> > >       struct binder_proc *proc = thread->proc;
> > > +     struct binder_proc *target_proc = t->to_proc;
> > >       int ret;
> > >
> > >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >       if (ret)
> > >               return ret;
> > >
> > > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > > +             transfer_gpu_charge = true;
> > > +
> > >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> > >               u32 fd;
> > > +             struct dma_buf *dmabuf;
> > > +             struct gpucg *gpucg;
> > > +
> > >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> > >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> > >
> > > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >                                                 in_reply_to);
> > >               if (ret)
> > >                       return ret > 0 ? -EINVAL : ret;
> > > +
> > > +             if (!transfer_gpu_charge)
> > > +                     continue;
> > > +
> > > +             dmabuf = dma_buf_get(fd);
> > > +             if (IS_ERR(dmabuf))
> > > +                     continue;
> > > +
> > > +             gpucg = gpucg_get(target_proc->tsk);
> > > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > > +             if (ret) {
> > > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > > +                             proc->pid, thread->pid, target_proc->pid);
> > > +                     gpucg_put(gpucg);
> > > +             }
> > > +             dma_buf_put(dmabuf);
>
> Since we are creating a new gpu cgroup abstraction, couldn't this
> "transfer" be done in userspace by the target instead of in the kernel
> driver? Then this patch would reduce to just a flag on the buffer
> object.

Are you suggesting to have a userspace accessible cgroup interface for
transferring buffer charges and the target process to use that
interface for requesting the buffer to be charged to its cgroup?
I'm worried about the case when the target process does not request
the transfer after receiving the buffer with this flag set. The charge
would stay with the wrong process and accounting will be invalid.

Technically, since the proposed cgroup supports charge transfer from
the very beginning, the userspace can check if the cgroup is mounted
and if so then it knows this feature is supported.

> This also solves the issue that Greg brought up about
> userspace needing to know whether the kernel implements this feature
> (older kernel running with newer userspace). I think we could just
> reserve some flags for userspace to use (and since those flags are
> "reserved" for older kernels, this would enable this feature even for
> old kernels)
>
> > >       }
> > >       return 0;
> > >  }
> > > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > > index 3246f2c74696..169fd5069a1a 100644
> > > --- a/include/uapi/linux/android/binder.h
> > > +++ b/include/uapi/linux/android/binder.h
> > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > >
> > >  enum {
> > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > >  };
> > >
> > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > --
> > > 2.35.1.265.g69c8d7142f-goog
> > >
> >
> > How does userspace know that binder supports this new flag?  And where
> > is the userspace test for this new feature?  Isn't there a binder test
> > framework somewhere?
> >
> > thanks,
> >
> > greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 19:29         ` Suren Baghdasaryan
  0 siblings, 0 replies; 63+ messages in thread
From: Suren Baghdasaryan @ 2022-02-14 19:29 UTC (permalink / raw)
  To: Todd Kjos
  Cc: Greg Kroah-Hartman, T.J. Mercier, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter,
	Jonathan Corbet, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Kalesh Singh,
	Kenny.Ho, DRI mailing list, open list:DOCUMENTATION, LKML,
	linux-media, moderated list:DMA BUFFER SHARING FRAMEWORK,
	cgroups mailinglist

On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
>
> On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
>
> Title: "android: binder: Add a buffer flag to relinquish ownership of fds"
>
> Please drop the "android:" from the title.
>
> > > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > > that a process sending an fd array to another process over binder IPC
> > > can set to relinquish ownership of the fds being sent for memory
> > > accounting purposes. If the flag is found to be set during the fd array
> > > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > > the sender's cgroup and charged to the receiving process's cgroup
> > > instead.
> > >
> > > It is up to the sending process to ensure that it closes the fds
> > > regardless of whether the transfer failed or succeeded.
> > >
> > > Most graphics shared memory allocations in Android are done by the
> > > graphics allocator HAL process. On requests from clients, the HAL process
> > > allocates memory and sends the fds to the clients over binder IPC.
> > > The graphics allocator HAL will not retain any references to the
> > > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > > correctly charge the buffers to the client processes instead of the
> > > graphics allocator HAL.
> > >
> > > From: Hridya Valsaraju <hridya@google.com>
> > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > > ---
> > > changes in v2
> > > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > > Christian König.
> > >
> > >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> > >  include/uapi/linux/android/binder.h |  1 +
> > >  2 files changed, 27 insertions(+)
> > >
> > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > > index 8351c5638880..f50d88ded188 100644
> > > --- a/drivers/android/binder.c
> > > +++ b/drivers/android/binder.c
> > > @@ -42,6 +42,7 @@
> > >
> > >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > >
> > > +#include <linux/dma-buf.h>
> > >  #include <linux/fdtable.h>
> > >  #include <linux/file.h>
> > >  #include <linux/freezer.h>
> > > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >  {
> > >       binder_size_t fdi, fd_buf_size;
> > >       binder_size_t fda_offset;
> > > +     bool transfer_gpu_charge = false;
> > >       const void __user *sender_ufda_base;
> > >       struct binder_proc *proc = thread->proc;
> > > +     struct binder_proc *target_proc = t->to_proc;
> > >       int ret;
> > >
> > >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >       if (ret)
> > >               return ret;
> > >
> > > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > > +             transfer_gpu_charge = true;
> > > +
> > >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> > >               u32 fd;
> > > +             struct dma_buf *dmabuf;
> > > +             struct gpucg *gpucg;
> > > +
> > >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> > >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> > >
> > > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >                                                 in_reply_to);
> > >               if (ret)
> > >                       return ret > 0 ? -EINVAL : ret;
> > > +
> > > +             if (!transfer_gpu_charge)
> > > +                     continue;
> > > +
> > > +             dmabuf = dma_buf_get(fd);
> > > +             if (IS_ERR(dmabuf))
> > > +                     continue;
> > > +
> > > +             gpucg = gpucg_get(target_proc->tsk);
> > > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > > +             if (ret) {
> > > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > > +                             proc->pid, thread->pid, target_proc->pid);
> > > +                     gpucg_put(gpucg);
> > > +             }
> > > +             dma_buf_put(dmabuf);
>
> Since we are creating a new gpu cgroup abstraction, couldn't this
> "transfer" be done in userspace by the target instead of in the kernel
> driver? Then this patch would reduce to just a flag on the buffer
> object.

Are you suggesting to have a userspace accessible cgroup interface for
transferring buffer charges and the target process to use that
interface for requesting the buffer to be charged to its cgroup?
I'm worried about the case when the target process does not request
the transfer after receiving the buffer with this flag set. The charge
would stay with the wrong process and accounting will be invalid.

Technically, since the proposed cgroup supports charge transfer from
the very beginning, the userspace can check if the cgroup is mounted
and if so then it knows this feature is supported.

> This also solves the issue that Greg brought up about
> userspace needing to know whether the kernel implements this feature
> (older kernel running with newer userspace). I think we could just
> reserve some flags for userspace to use (and since those flags are
> "reserved" for older kernels, this would enable this feature even for
> old kernels)
>
> > >       }
> > >       return 0;
> > >  }
> > > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > > index 3246f2c74696..169fd5069a1a 100644
> > > --- a/include/uapi/linux/android/binder.h
> > > +++ b/include/uapi/linux/android/binder.h
> > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > >
> > >  enum {
> > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > >  };
> > >
> > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > --
> > > 2.35.1.265.g69c8d7142f-goog
> > >
> >
> > How does userspace know that binder supports this new flag?  And where
> > is the userspace test for this new feature?  Isn't there a binder test
> > framework somewhere?
> >
> > thanks,
> >
> > greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 19:29         ` Suren Baghdasaryan
  0 siblings, 0 replies; 63+ messages in thread
From: Suren Baghdasaryan @ 2022-02-14 19:29 UTC (permalink / raw)
  To: Todd Kjos
  Cc: Greg Kroah-Hartman, T.J. Mercier, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter,
	Jonathan Corbet, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark

On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
>
> On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
>
> Title: "android: binder: Add a buffer flag to relinquish ownership of fds"
>
> Please drop the "android:" from the title.
>
> > > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > > that a process sending an fd array to another process over binder IPC
> > > can set to relinquish ownership of the fds being sent for memory
> > > accounting purposes. If the flag is found to be set during the fd array
> > > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > > the sender's cgroup and charged to the receiving process's cgroup
> > > instead.
> > >
> > > It is up to the sending process to ensure that it closes the fds
> > > regardless of whether the transfer failed or succeeded.
> > >
> > > Most graphics shared memory allocations in Android are done by the
> > > graphics allocator HAL process. On requests from clients, the HAL process
> > > allocates memory and sends the fds to the clients over binder IPC.
> > > The graphics allocator HAL will not retain any references to the
> > > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > > correctly charge the buffers to the client processes instead of the
> > > graphics allocator HAL.
> > >
> > > From: Hridya Valsaraju <hridya@google.com>
> > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > > ---
> > > changes in v2
> > > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > > Christian König.
> > >
> > >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> > >  include/uapi/linux/android/binder.h |  1 +
> > >  2 files changed, 27 insertions(+)
> > >
> > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > > index 8351c5638880..f50d88ded188 100644
> > > --- a/drivers/android/binder.c
> > > +++ b/drivers/android/binder.c
> > > @@ -42,6 +42,7 @@
> > >
> > >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > >
> > > +#include <linux/dma-buf.h>
> > >  #include <linux/fdtable.h>
> > >  #include <linux/file.h>
> > >  #include <linux/freezer.h>
> > > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >  {
> > >       binder_size_t fdi, fd_buf_size;
> > >       binder_size_t fda_offset;
> > > +     bool transfer_gpu_charge = false;
> > >       const void __user *sender_ufda_base;
> > >       struct binder_proc *proc = thread->proc;
> > > +     struct binder_proc *target_proc = t->to_proc;
> > >       int ret;
> > >
> > >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >       if (ret)
> > >               return ret;
> > >
> > > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > > +             transfer_gpu_charge = true;
> > > +
> > >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> > >               u32 fd;
> > > +             struct dma_buf *dmabuf;
> > > +             struct gpucg *gpucg;
> > > +
> > >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> > >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> > >
> > > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > >                                                 in_reply_to);
> > >               if (ret)
> > >                       return ret > 0 ? -EINVAL : ret;
> > > +
> > > +             if (!transfer_gpu_charge)
> > > +                     continue;
> > > +
> > > +             dmabuf = dma_buf_get(fd);
> > > +             if (IS_ERR(dmabuf))
> > > +                     continue;
> > > +
> > > +             gpucg = gpucg_get(target_proc->tsk);
> > > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > > +             if (ret) {
> > > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > > +                             proc->pid, thread->pid, target_proc->pid);
> > > +                     gpucg_put(gpucg);
> > > +             }
> > > +             dma_buf_put(dmabuf);
>
> Since we are creating a new gpu cgroup abstraction, couldn't this
> "transfer" be done in userspace by the target instead of in the kernel
> driver? Then this patch would reduce to just a flag on the buffer
> object.

Are you suggesting to have a userspace accessible cgroup interface for
transferring buffer charges and the target process to use that
interface for requesting the buffer to be charged to its cgroup?
I'm worried about the case when the target process does not request
the transfer after receiving the buffer with this flag set. The charge
would stay with the wrong process and accounting will be invalid.

Technically, since the proposed cgroup supports charge transfer from
the very beginning, the userspace can check if the cgroup is mounted
and if so then it knows this feature is supported.

> This also solves the issue that Greg brought up about
> userspace needing to know whether the kernel implements this feature
> (older kernel running with newer userspace). I think we could just
> reserve some flags for userspace to use (and since those flags are
> "reserved" for older kernels, this would enable this feature even for
> old kernels)
>
> > >       }
> > >       return 0;
> > >  }
> > > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > > index 3246f2c74696..169fd5069a1a 100644
> > > --- a/include/uapi/linux/android/binder.h
> > > +++ b/include/uapi/linux/android/binder.h
> > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > >
> > >  enum {
> > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > >  };
> > >
> > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > --
> > > 2.35.1.265.g69c8d7142f-goog
> > >
> >
> > How does userspace know that binder supports this new flag?  And where
> > is the userspace test for this new feature?  Isn't there a binder test
> > framework somewhere?
> >
> > thanks,
> >
> > greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-14 19:29         ` Suren Baghdasaryan
  (?)
@ 2022-02-14 20:19           ` Todd Kjos
  -1 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 20:19 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Greg Kroah-Hartman, T.J. Mercier, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter,
	Jonathan Corbet, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Kalesh Singh,
	Kenny.Ho, DRI mailing list, open list:DOCUMENTATION, LKML,
	linux-media, moderated list:DMA BUFFER SHARING FRAMEWORK,
	cgroups mailinglist

On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> >
> > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> >
> > Title: "android: binder: Add a buffer flag to relinquish ownership of fds"
> >
> > Please drop the "android:" from the title.
> >
> > > > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > > > that a process sending an fd array to another process over binder IPC
> > > > can set to relinquish ownership of the fds being sent for memory
> > > > accounting purposes. If the flag is found to be set during the fd array
> > > > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > > > the sender's cgroup and charged to the receiving process's cgroup
> > > > instead.
> > > >
> > > > It is up to the sending process to ensure that it closes the fds
> > > > regardless of whether the transfer failed or succeeded.
> > > >
> > > > Most graphics shared memory allocations in Android are done by the
> > > > graphics allocator HAL process. On requests from clients, the HAL process
> > > > allocates memory and sends the fds to the clients over binder IPC.
> > > > The graphics allocator HAL will not retain any references to the
> > > > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > > > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > > > correctly charge the buffers to the client processes instead of the
> > > > graphics allocator HAL.
> > > >
> > > > From: Hridya Valsaraju <hridya@google.com>
> > > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > > > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > > > ---
> > > > changes in v2
> > > > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > > > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > > > Christian König.
> > > >
> > > >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> > > >  include/uapi/linux/android/binder.h |  1 +
> > > >  2 files changed, 27 insertions(+)
> > > >
> > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > > > index 8351c5638880..f50d88ded188 100644
> > > > --- a/drivers/android/binder.c
> > > > +++ b/drivers/android/binder.c
> > > > @@ -42,6 +42,7 @@
> > > >
> > > >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > > >
> > > > +#include <linux/dma-buf.h>
> > > >  #include <linux/fdtable.h>
> > > >  #include <linux/file.h>
> > > >  #include <linux/freezer.h>
> > > > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >  {
> > > >       binder_size_t fdi, fd_buf_size;
> > > >       binder_size_t fda_offset;
> > > > +     bool transfer_gpu_charge = false;
> > > >       const void __user *sender_ufda_base;
> > > >       struct binder_proc *proc = thread->proc;
> > > > +     struct binder_proc *target_proc = t->to_proc;
> > > >       int ret;
> > > >
> > > >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > > > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >       if (ret)
> > > >               return ret;
> > > >
> > > > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > > > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > > > +             transfer_gpu_charge = true;
> > > > +
> > > >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> > > >               u32 fd;
> > > > +             struct dma_buf *dmabuf;
> > > > +             struct gpucg *gpucg;
> > > > +
> > > >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> > > >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> > > >
> > > > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >                                                 in_reply_to);
> > > >               if (ret)
> > > >                       return ret > 0 ? -EINVAL : ret;
> > > > +
> > > > +             if (!transfer_gpu_charge)
> > > > +                     continue;
> > > > +
> > > > +             dmabuf = dma_buf_get(fd);
> > > > +             if (IS_ERR(dmabuf))
> > > > +                     continue;
> > > > +
> > > > +             gpucg = gpucg_get(target_proc->tsk);
> > > > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > > > +             if (ret) {
> > > > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > > > +                             proc->pid, thread->pid, target_proc->pid);
> > > > +                     gpucg_put(gpucg);
> > > > +             }
> > > > +             dma_buf_put(dmabuf);
> >
> > Since we are creating a new gpu cgroup abstraction, couldn't this
> > "transfer" be done in userspace by the target instead of in the kernel
> > driver? Then this patch would reduce to just a flag on the buffer
> > object.
>
> Are you suggesting to have a userspace accessible cgroup interface for
> transferring buffer charges and the target process to use that
> interface for requesting the buffer to be charged to its cgroup?

Well, I'm asking why we need to do these cgroup-ish actions in the
kernel when it seems more natural to do it in userspace.

> I'm worried about the case when the target process does not request
> the transfer after receiving the buffer with this flag set. The charge
> would stay with the wrong process and accounting will be invalid.

I suspect this would be implemented in libbinder wherever the fd array
object is handled, so it wouldn't require changes to every process.

>
> Technically, since the proposed cgroup supports charge transfer from
> the very beginning, the userspace can check if the cgroup is mounted
> and if so then it knows this feature is supported.

Has some userspace code for this been written? I'd like to be
convinced that these changes need to be in the binder kernel driver
instead of in userspace.

>
> > This also solves the issue that Greg brought up about
> > userspace needing to know whether the kernel implements this feature
> > (older kernel running with newer userspace). I think we could just
> > reserve some flags for userspace to use (and since those flags are
> > "reserved" for older kernels, this would enable this feature even for
> > old kernels)
> >
> > > >       }
> > > >       return 0;
> > > >  }
> > > > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > > > index 3246f2c74696..169fd5069a1a 100644
> > > > --- a/include/uapi/linux/android/binder.h
> > > > +++ b/include/uapi/linux/android/binder.h
> > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > >
> > > >  enum {
> > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > >  };
> > > >
> > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > --
> > > > 2.35.1.265.g69c8d7142f-goog
> > > >
> > >
> > > How does userspace know that binder supports this new flag?  And where
> > > is the userspace test for this new feature?  Isn't there a binder test
> > > framework somewhere?
> > >
> > > thanks,
> > >
> > > greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 20:19           ` Todd Kjos
  0 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 20:19 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Zefan Li, open list:DOCUMENTATION, David Airlie,
	DRI mailing list, Benjamin Gaignard, Kalesh Singh,
	Joel Fernandes, Sumit Semwal, Kenny.Ho, Jonathan Corbet,
	Laura Abbott, linux-media, Todd Kjos, Thomas Zimmermann,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Tejun Heo,
	cgroups mailinglist, Martijn Coenen, T.J. Mercier,
	Christian Brauner, Greg Kroah-Hartman, LKML, Liam Mark,
	Christian König, Arve Hjønnevåg, Johannes Weiner,
	Hridya Valsaraju

On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> >
> > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> >
> > Title: "android: binder: Add a buffer flag to relinquish ownership of fds"
> >
> > Please drop the "android:" from the title.
> >
> > > > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > > > that a process sending an fd array to another process over binder IPC
> > > > can set to relinquish ownership of the fds being sent for memory
> > > > accounting purposes. If the flag is found to be set during the fd array
> > > > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > > > the sender's cgroup and charged to the receiving process's cgroup
> > > > instead.
> > > >
> > > > It is up to the sending process to ensure that it closes the fds
> > > > regardless of whether the transfer failed or succeeded.
> > > >
> > > > Most graphics shared memory allocations in Android are done by the
> > > > graphics allocator HAL process. On requests from clients, the HAL process
> > > > allocates memory and sends the fds to the clients over binder IPC.
> > > > The graphics allocator HAL will not retain any references to the
> > > > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > > > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > > > correctly charge the buffers to the client processes instead of the
> > > > graphics allocator HAL.
> > > >
> > > > From: Hridya Valsaraju <hridya@google.com>
> > > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > > > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > > > ---
> > > > changes in v2
> > > > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > > > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > > > Christian König.
> > > >
> > > >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> > > >  include/uapi/linux/android/binder.h |  1 +
> > > >  2 files changed, 27 insertions(+)
> > > >
> > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > > > index 8351c5638880..f50d88ded188 100644
> > > > --- a/drivers/android/binder.c
> > > > +++ b/drivers/android/binder.c
> > > > @@ -42,6 +42,7 @@
> > > >
> > > >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > > >
> > > > +#include <linux/dma-buf.h>
> > > >  #include <linux/fdtable.h>
> > > >  #include <linux/file.h>
> > > >  #include <linux/freezer.h>
> > > > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >  {
> > > >       binder_size_t fdi, fd_buf_size;
> > > >       binder_size_t fda_offset;
> > > > +     bool transfer_gpu_charge = false;
> > > >       const void __user *sender_ufda_base;
> > > >       struct binder_proc *proc = thread->proc;
> > > > +     struct binder_proc *target_proc = t->to_proc;
> > > >       int ret;
> > > >
> > > >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > > > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >       if (ret)
> > > >               return ret;
> > > >
> > > > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > > > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > > > +             transfer_gpu_charge = true;
> > > > +
> > > >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> > > >               u32 fd;
> > > > +             struct dma_buf *dmabuf;
> > > > +             struct gpucg *gpucg;
> > > > +
> > > >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> > > >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> > > >
> > > > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >                                                 in_reply_to);
> > > >               if (ret)
> > > >                       return ret > 0 ? -EINVAL : ret;
> > > > +
> > > > +             if (!transfer_gpu_charge)
> > > > +                     continue;
> > > > +
> > > > +             dmabuf = dma_buf_get(fd);
> > > > +             if (IS_ERR(dmabuf))
> > > > +                     continue;
> > > > +
> > > > +             gpucg = gpucg_get(target_proc->tsk);
> > > > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > > > +             if (ret) {
> > > > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > > > +                             proc->pid, thread->pid, target_proc->pid);
> > > > +                     gpucg_put(gpucg);
> > > > +             }
> > > > +             dma_buf_put(dmabuf);
> >
> > Since we are creating a new gpu cgroup abstraction, couldn't this
> > "transfer" be done in userspace by the target instead of in the kernel
> > driver? Then this patch would reduce to just a flag on the buffer
> > object.
>
> Are you suggesting to have a userspace accessible cgroup interface for
> transferring buffer charges and the target process to use that
> interface for requesting the buffer to be charged to its cgroup?

Well, I'm asking why we need to do these cgroup-ish actions in the
kernel when it seems more natural to do it in userspace.

> I'm worried about the case when the target process does not request
> the transfer after receiving the buffer with this flag set. The charge
> would stay with the wrong process and accounting will be invalid.

I suspect this would be implemented in libbinder wherever the fd array
object is handled, so it wouldn't require changes to every process.

>
> Technically, since the proposed cgroup supports charge transfer from
> the very beginning, the userspace can check if the cgroup is mounted
> and if so then it knows this feature is supported.

Has some userspace code for this been written? I'd like to be
convinced that these changes need to be in the binder kernel driver
instead of in userspace.

>
> > This also solves the issue that Greg brought up about
> > userspace needing to know whether the kernel implements this feature
> > (older kernel running with newer userspace). I think we could just
> > reserve some flags for userspace to use (and since those flags are
> > "reserved" for older kernels, this would enable this feature even for
> > old kernels)
> >
> > > >       }
> > > >       return 0;
> > > >  }
> > > > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > > > index 3246f2c74696..169fd5069a1a 100644
> > > > --- a/include/uapi/linux/android/binder.h
> > > > +++ b/include/uapi/linux/android/binder.h
> > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > >
> > > >  enum {
> > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > >  };
> > > >
> > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > --
> > > > 2.35.1.265.g69c8d7142f-goog
> > > >
> > >
> > > How does userspace know that binder supports this new flag?  And where
> > > is the userspace test for this new feature?  Isn't there a binder test
> > > framework somewhere?
> > >
> > > thanks,
> > >
> > > greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 20:19           ` Todd Kjos
  0 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 20:19 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Greg Kroah-Hartman, T.J. Mercier, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter,
	Jonathan Corbet, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Hridya Valsaraju, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark

On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> >
> > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> >
> > Title: "android: binder: Add a buffer flag to relinquish ownership of fds"
> >
> > Please drop the "android:" from the title.
> >
> > > > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > > > that a process sending an fd array to another process over binder IPC
> > > > can set to relinquish ownership of the fds being sent for memory
> > > > accounting purposes. If the flag is found to be set during the fd array
> > > > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > > > the sender's cgroup and charged to the receiving process's cgroup
> > > > instead.
> > > >
> > > > It is up to the sending process to ensure that it closes the fds
> > > > regardless of whether the transfer failed or succeeded.
> > > >
> > > > Most graphics shared memory allocations in Android are done by the
> > > > graphics allocator HAL process. On requests from clients, the HAL process
> > > > allocates memory and sends the fds to the clients over binder IPC.
> > > > The graphics allocator HAL will not retain any references to the
> > > > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > > > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > > > correctly charge the buffers to the client processes instead of the
> > > > graphics allocator HAL.
> > > >
> > > > From: Hridya Valsaraju <hridya@google.com>
> > > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > > > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > > > ---
> > > > changes in v2
> > > > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > > > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > > > Christian König.
> > > >
> > > >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> > > >  include/uapi/linux/android/binder.h |  1 +
> > > >  2 files changed, 27 insertions(+)
> > > >
> > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > > > index 8351c5638880..f50d88ded188 100644
> > > > --- a/drivers/android/binder.c
> > > > +++ b/drivers/android/binder.c
> > > > @@ -42,6 +42,7 @@
> > > >
> > > >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > > >
> > > > +#include <linux/dma-buf.h>
> > > >  #include <linux/fdtable.h>
> > > >  #include <linux/file.h>
> > > >  #include <linux/freezer.h>
> > > > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >  {
> > > >       binder_size_t fdi, fd_buf_size;
> > > >       binder_size_t fda_offset;
> > > > +     bool transfer_gpu_charge = false;
> > > >       const void __user *sender_ufda_base;
> > > >       struct binder_proc *proc = thread->proc;
> > > > +     struct binder_proc *target_proc = t->to_proc;
> > > >       int ret;
> > > >
> > > >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > > > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >       if (ret)
> > > >               return ret;
> > > >
> > > > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > > > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > > > +             transfer_gpu_charge = true;
> > > > +
> > > >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> > > >               u32 fd;
> > > > +             struct dma_buf *dmabuf;
> > > > +             struct gpucg *gpucg;
> > > > +
> > > >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> > > >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> > > >
> > > > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> > > >                                                 in_reply_to);
> > > >               if (ret)
> > > >                       return ret > 0 ? -EINVAL : ret;
> > > > +
> > > > +             if (!transfer_gpu_charge)
> > > > +                     continue;
> > > > +
> > > > +             dmabuf = dma_buf_get(fd);
> > > > +             if (IS_ERR(dmabuf))
> > > > +                     continue;
> > > > +
> > > > +             gpucg = gpucg_get(target_proc->tsk);
> > > > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > > > +             if (ret) {
> > > > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > > > +                             proc->pid, thread->pid, target_proc->pid);
> > > > +                     gpucg_put(gpucg);
> > > > +             }
> > > > +             dma_buf_put(dmabuf);
> >
> > Since we are creating a new gpu cgroup abstraction, couldn't this
> > "transfer" be done in userspace by the target instead of in the kernel
> > driver? Then this patch would reduce to just a flag on the buffer
> > object.
>
> Are you suggesting to have a userspace accessible cgroup interface for
> transferring buffer charges and the target process to use that
> interface for requesting the buffer to be charged to its cgroup?

Well, I'm asking why we need to do these cgroup-ish actions in the
kernel when it seems more natural to do it in userspace.

> I'm worried about the case when the target process does not request
> the transfer after receiving the buffer with this flag set. The charge
> would stay with the wrong process and accounting will be invalid.

I suspect this would be implemented in libbinder wherever the fd array
object is handled, so it wouldn't require changes to every process.

>
> Technically, since the proposed cgroup supports charge transfer from
> the very beginning, the userspace can check if the cgroup is mounted
> and if so then it knows this feature is supported.

Has some userspace code for this been written? I'd like to be
convinced that these changes need to be in the binder kernel driver
instead of in userspace.

>
> > This also solves the issue that Greg brought up about
> > userspace needing to know whether the kernel implements this feature
> > (older kernel running with newer userspace). I think we could just
> > reserve some flags for userspace to use (and since those flags are
> > "reserved" for older kernels, this would enable this feature even for
> > old kernels)
> >
> > > >       }
> > > >       return 0;
> > > >  }
> > > > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > > > index 3246f2c74696..169fd5069a1a 100644
> > > > --- a/include/uapi/linux/android/binder.h
> > > > +++ b/include/uapi/linux/android/binder.h
> > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > >
> > > >  enum {
> > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > >  };
> > > >
> > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > --
> > > > 2.35.1.265.g69c8d7142f-goog
> > > >
> > >
> > > How does userspace know that binder supports this new flag?  And where
> > > is the userspace test for this new feature?  Isn't there a binder test
> > > framework somewhere?
> > >
> > > thanks,
> > >
> > > greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-14 20:19           ` Todd Kjos
  (?)
@ 2022-02-14 20:37             ` John Stultz
  -1 siblings, 0 replies; 63+ messages in thread
From: John Stultz @ 2022-02-14 20:37 UTC (permalink / raw)
  To: Todd Kjos
  Cc: Zefan Li, open list:DOCUMENTATION, David Airlie,
	DRI mailing list, Benjamin Gaignard, Kalesh Singh,
	Joel Fernandes, Sumit Semwal, Kenny.Ho, Jonathan Corbet,
	Suren Baghdasaryan, Laura Abbott, linux-media, Todd Kjos,
	Thomas Zimmermann, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Tejun Heo, cgroups mailinglist, Martijn Coenen, T.J. Mercier,
	Christian Brauner, Greg Kroah-Hartman, LKML, Liam Mark,
	Christian König, Arve Hjønnevåg, Johannes Weiner,
	Hridya Valsaraju

On Mon, Feb 14, 2022 at 12:19 PM Todd Kjos <tkjos@google.com> wrote:
> On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> > >
> > > Since we are creating a new gpu cgroup abstraction, couldn't this
> > > "transfer" be done in userspace by the target instead of in the kernel
> > > driver? Then this patch would reduce to just a flag on the buffer
> > > object.
> >
> > Are you suggesting to have a userspace accessible cgroup interface for
> > transferring buffer charges and the target process to use that
> > interface for requesting the buffer to be charged to its cgroup?
>
> Well, I'm asking why we need to do these cgroup-ish actions in the
> kernel when it seems more natural to do it in userspace.
>

In case its useful, some additional context from some of the Linux
Plumber's discussions last fall:

Daniel Stone outlines some concerns with the cgroup userland handling
for accounting:
  https://youtu.be/3OqllZONTiQ?t=3430

And the binder ownership transfer bit was suggested here by Daniel Vetter:
  https://youtu.be/3OqllZONTiQ?t=3730

thanks
-john

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 20:37             ` John Stultz
  0 siblings, 0 replies; 63+ messages in thread
From: John Stultz @ 2022-02-14 20:37 UTC (permalink / raw)
  To: Todd Kjos
  Cc: Suren Baghdasaryan, Greg Kroah-Hartman, T.J. Mercier,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Sumit Semwal, Christian König, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, Tejun Heo, Zefan Li,
	Johannes Weiner, Kalesh Singh, Kenny.Ho, DRI mailing list,
	open list:DOCUMENTATION, LKML, linux-media,
	moderated list:DMA BUFFER SHARING FRAMEWORK, cgroups mailinglist

On Mon, Feb 14, 2022 at 12:19 PM Todd Kjos <tkjos@google.com> wrote:
> On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> > >
> > > Since we are creating a new gpu cgroup abstraction, couldn't this
> > > "transfer" be done in userspace by the target instead of in the kernel
> > > driver? Then this patch would reduce to just a flag on the buffer
> > > object.
> >
> > Are you suggesting to have a userspace accessible cgroup interface for
> > transferring buffer charges and the target process to use that
> > interface for requesting the buffer to be charged to its cgroup?
>
> Well, I'm asking why we need to do these cgroup-ish actions in the
> kernel when it seems more natural to do it in userspace.
>

In case its useful, some additional context from some of the Linux
Plumber's discussions last fall:

Daniel Stone outlines some concerns with the cgroup userland handling
for accounting:
  https://youtu.be/3OqllZONTiQ?t=3430

And the binder ownership transfer bit was suggested here by Daniel Vetter:
  https://youtu.be/3OqllZONTiQ?t=3730

thanks
-john

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 20:37             ` John Stultz
  0 siblings, 0 replies; 63+ messages in thread
From: John Stultz @ 2022-02-14 20:37 UTC (permalink / raw)
  To: Todd Kjos
  Cc: Suren Baghdasaryan, Greg Kroah-Hartman, T.J. Mercier,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Sumit Semwal, Christian König, Benjamin Gaignard

On Mon, Feb 14, 2022 at 12:19 PM Todd Kjos <tkjos@google.com> wrote:
> On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> > >
> > > Since we are creating a new gpu cgroup abstraction, couldn't this
> > > "transfer" be done in userspace by the target instead of in the kernel
> > > driver? Then this patch would reduce to just a flag on the buffer
> > > object.
> >
> > Are you suggesting to have a userspace accessible cgroup interface for
> > transferring buffer charges and the target process to use that
> > interface for requesting the buffer to be charged to its cgroup?
>
> Well, I'm asking why we need to do these cgroup-ish actions in the
> kernel when it seems more natural to do it in userspace.
>

In case its useful, some additional context from some of the Linux
Plumber's discussions last fall:

Daniel Stone outlines some concerns with the cgroup userland handling
for accounting:
  https://youtu.be/3OqllZONTiQ?t=3430

And the binder ownership transfer bit was suggested here by Daniel Vetter:
  https://youtu.be/3OqllZONTiQ?t=3730

thanks
-john

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-14 20:37             ` John Stultz
  (?)
@ 2022-02-14 21:14               ` Hridya Valsaraju
  -1 siblings, 0 replies; 63+ messages in thread
From: Hridya Valsaraju @ 2022-02-14 21:14 UTC (permalink / raw)
  To: John Stultz
  Cc: Zefan Li, open list:DOCUMENTATION, David Airlie,
	DRI mailing list, Benjamin Gaignard, Kalesh Singh,
	Joel Fernandes, Sumit Semwal, Kenny.Ho, Jonathan Corbet,
	Suren Baghdasaryan, Laura Abbott, linux-media, Todd Kjos,
	Thomas Zimmermann, moderated list:DMA BUFFER SHARING FRAMEWORK,
	cgroups mailinglist, Martijn Coenen, T.J. Mercier,
	Christian Brauner, Greg Kroah-Hartman, LKML, Liam Mark,
	Christian König, Arve Hjønnevåg, Johannes Weiner,
	Tejun Heo, Todd Kjos

On Mon, Feb 14, 2022 at 12:37 PM John Stultz <john.stultz@linaro.org> wrote:
>
> On Mon, Feb 14, 2022 at 12:19 PM Todd Kjos <tkjos@google.com> wrote:
> > On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> > > >
> > > > Since we are creating a new gpu cgroup abstraction, couldn't this
> > > > "transfer" be done in userspace by the target instead of in the kernel
> > > > driver? Then this patch would reduce to just a flag on the buffer
> > > > object.
> > >
> > > Are you suggesting to have a userspace accessible cgroup interface for
> > > transferring buffer charges and the target process to use that
> > > interface for requesting the buffer to be charged to its cgroup?
> >
> > Well, I'm asking why we need to do these cgroup-ish actions in the
> > kernel when it seems more natural to do it in userspace.

This was our plan originally i.e. to create a cgroup interface that
userspace could use for explicit charge transfer. However, in our
initial discussions with all interested parties and cgroup maintainers
we reached a conclusion that an explicit charge transfer UAPI for the
cgroup controller did not fit in with the current cgroup
charge/uncharge mechanisms. Like John mentioned, the charge transfer
during binder IPC was suggested by Daniel during LPC.

Regards,
Hridya

> >
>
> In case its useful, some additional context from some of the Linux
> Plumber's discussions last fall:
>
> Daniel Stone outlines some concerns with the cgroup userland handling
> for accounting:
>   https://youtu.be/3OqllZONTiQ?t=3430
>
> And the binder ownership transfer bit was suggested here by Daniel Vetter:
>   https://youtu.be/3OqllZONTiQ?t=3730
>
> thanks
> -john

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 21:14               ` Hridya Valsaraju
  0 siblings, 0 replies; 63+ messages in thread
From: Hridya Valsaraju @ 2022-02-14 21:14 UTC (permalink / raw)
  To: John Stultz
  Cc: Todd Kjos, Suren Baghdasaryan, Greg Kroah-Hartman, T.J. Mercier,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Sumit Semwal,
	Christian König, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, Tejun Heo, Zefan Li, Johannes Weiner,
	Kalesh Singh, Kenny.Ho, DRI mailing list,
	open list:DOCUMENTATION, LKML, linux-media,
	moderated list:DMA BUFFER SHARING FRAMEWORK, cgroups mailinglist

On Mon, Feb 14, 2022 at 12:37 PM John Stultz <john.stultz@linaro.org> wrote:
>
> On Mon, Feb 14, 2022 at 12:19 PM Todd Kjos <tkjos@google.com> wrote:
> > On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> > > >
> > > > Since we are creating a new gpu cgroup abstraction, couldn't this
> > > > "transfer" be done in userspace by the target instead of in the kernel
> > > > driver? Then this patch would reduce to just a flag on the buffer
> > > > object.
> > >
> > > Are you suggesting to have a userspace accessible cgroup interface for
> > > transferring buffer charges and the target process to use that
> > > interface for requesting the buffer to be charged to its cgroup?
> >
> > Well, I'm asking why we need to do these cgroup-ish actions in the
> > kernel when it seems more natural to do it in userspace.

This was our plan originally i.e. to create a cgroup interface that
userspace could use for explicit charge transfer. However, in our
initial discussions with all interested parties and cgroup maintainers
we reached a conclusion that an explicit charge transfer UAPI for the
cgroup controller did not fit in with the current cgroup
charge/uncharge mechanisms. Like John mentioned, the charge transfer
during binder IPC was suggested by Daniel during LPC.

Regards,
Hridya

> >
>
> In case its useful, some additional context from some of the Linux
> Plumber's discussions last fall:
>
> Daniel Stone outlines some concerns with the cgroup userland handling
> for accounting:
>   https://youtu.be/3OqllZONTiQ?t=3430
>
> And the binder ownership transfer bit was suggested here by Daniel Vetter:
>   https://youtu.be/3OqllZONTiQ?t=3730
>
> thanks
> -john

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 21:14               ` Hridya Valsaraju
  0 siblings, 0 replies; 63+ messages in thread
From: Hridya Valsaraju @ 2022-02-14 21:14 UTC (permalink / raw)
  To: John Stultz
  Cc: Todd Kjos, Suren Baghdasaryan, Greg Kroah-Hartman, T.J. Mercier,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Sumit Semwal,
	Christian König, Benjamin Gaignard

On Mon, Feb 14, 2022 at 12:37 PM John Stultz <john.stultz@linaro.org> wrote:
>
> On Mon, Feb 14, 2022 at 12:19 PM Todd Kjos <tkjos@google.com> wrote:
> > On Mon, Feb 14, 2022 at 11:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > On Mon, Feb 14, 2022 at 10:33 AM Todd Kjos <tkjos@google.com> wrote:
> > > >
> > > > Since we are creating a new gpu cgroup abstraction, couldn't this
> > > > "transfer" be done in userspace by the target instead of in the kernel
> > > > driver? Then this patch would reduce to just a flag on the buffer
> > > > object.
> > >
> > > Are you suggesting to have a userspace accessible cgroup interface for
> > > transferring buffer charges and the target process to use that
> > > interface for requesting the buffer to be charged to its cgroup?
> >
> > Well, I'm asking why we need to do these cgroup-ish actions in the
> > kernel when it seems more natural to do it in userspace.

This was our plan originally i.e. to create a cgroup interface that
userspace could use for explicit charge transfer. However, in our
initial discussions with all interested parties and cgroup maintainers
we reached a conclusion that an explicit charge transfer UAPI for the
cgroup controller did not fit in with the current cgroup
charge/uncharge mechanisms. Like John mentioned, the charge transfer
during binder IPC was suggested by Daniel during LPC.

Regards,
Hridya

> >
>
> In case its useful, some additional context from some of the Linux
> Plumber's discussions last fall:
>
> Daniel Stone outlines some concerns with the cgroup userland handling
> for accounting:
>   https://youtu.be/3OqllZONTiQ?t=3430
>
> And the binder ownership transfer bit was suggested here by Daniel Vetter:
>   https://youtu.be/3OqllZONTiQ?t=3730
>
> thanks
> -john

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-11 16:18   ` T.J. Mercier
  (?)
@ 2022-02-14 21:25     ` Todd Kjos
  -1 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 21:25 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Liam Mark, Brian Starkey, John Stultz, Tejun Heo, Zefan Li,
	Johannes Weiner, kaleshsingh, Kenny.Ho, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

On Fri, Feb 11, 2022 at 8:19 AM T.J. Mercier <tjmercier@google.com> wrote:
>
> This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> that a process sending an fd array to another process over binder IPC
> can set to relinquish ownership of the fds being sent for memory
> accounting purposes. If the flag is found to be set during the fd array
> translation and the fd is for a DMA-BUF, the buffer is uncharged from
> the sender's cgroup and charged to the receiving process's cgroup
> instead.
>
> It is up to the sending process to ensure that it closes the fds
> regardless of whether the transfer failed or succeeded.
>
> Most graphics shared memory allocations in Android are done by the
> graphics allocator HAL process. On requests from clients, the HAL process
> allocates memory and sends the fds to the clients over binder IPC.
> The graphics allocator HAL will not retain any references to the
> buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> correctly charge the buffers to the client processes instead of the
> graphics allocator HAL.
>
> From: Hridya Valsaraju <hridya@google.com>
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> Co-developed-by: T.J. Mercier <tjmercier@google.com>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> ---
> changes in v2
> - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König.
>
>  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/android/binder.h |  1 +
>  2 files changed, 27 insertions(+)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 8351c5638880..f50d88ded188 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -42,6 +42,7 @@
>
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/dma-buf.h>
>  #include <linux/fdtable.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
> @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,

Is this only needed for the BINDER_TYPE_FDA case (multiple fds)? This
never needs to be done in the BINDER_TYPE_FD case (single fd)?

>  {
>         binder_size_t fdi, fd_buf_size;
>         binder_size_t fda_offset;
> +       bool transfer_gpu_charge = false;
>         const void __user *sender_ufda_base;
>         struct binder_proc *proc = thread->proc;
> +       struct binder_proc *target_proc = t->to_proc;
>         int ret;
>
>         fd_buf_size = sizeof(u32) * fda->num_fds;
> @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>         if (ret)
>                 return ret;
>
> +       if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> +               parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> +               transfer_gpu_charge = true;
> +
>         for (fdi = 0; fdi < fda->num_fds; fdi++) {
>                 u32 fd;
> +               struct dma_buf *dmabuf;
> +               struct gpucg *gpucg;
> +
>                 binder_size_t offset = fda_offset + fdi * sizeof(fd);
>                 binder_size_t sender_uoffset = fdi * sizeof(fd);
>
> @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>                                                   in_reply_to);
>                 if (ret)
>                         return ret > 0 ? -EINVAL : ret;
> +
> +               if (!transfer_gpu_charge)
> +                       continue;
> +
> +               dmabuf = dma_buf_get(fd);
> +               if (IS_ERR(dmabuf))
> +                       continue;
> +
> +               gpucg = gpucg_get(target_proc->tsk);
> +               ret = dma_buf_charge_transfer(dmabuf, gpucg);
> +               if (ret) {
> +                       pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> +                               proc->pid, thread->pid, target_proc->pid);
> +                       gpucg_put(gpucg);
> +               }
> +               dma_buf_put(dmabuf);
>         }
>         return 0;
>  }
> diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> index 3246f2c74696..169fd5069a1a 100644
> --- a/include/uapi/linux/android/binder.h
> +++ b/include/uapi/linux/android/binder.h
> @@ -137,6 +137,7 @@ struct binder_buffer_object {
>
>  enum {
>         BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> +       BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
>  };
>
>  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> --
> 2.35.1.265.g69c8d7142f-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 21:25     ` Todd Kjos
  0 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 21:25 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: linux-doc, David Airlie, dri-devel, Zefan Li, kaleshsingh,
	Joel Fernandes, Sumit Semwal, Kenny.Ho, Jonathan Corbet,
	Martijn Coenen, linux-media, Todd Kjos, linaro-mm-sig, Tejun Heo,
	cgroups, Suren Baghdasaryan, Christian Brauner,
	Greg Kroah-Hartman, linux-kernel, Liam Mark,
	Christian König, Arve Hjønnevåg,
	Thomas Zimmermann, Johannes Weiner, Hridya Valsaraju

On Fri, Feb 11, 2022 at 8:19 AM T.J. Mercier <tjmercier@google.com> wrote:
>
> This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> that a process sending an fd array to another process over binder IPC
> can set to relinquish ownership of the fds being sent for memory
> accounting purposes. If the flag is found to be set during the fd array
> translation and the fd is for a DMA-BUF, the buffer is uncharged from
> the sender's cgroup and charged to the receiving process's cgroup
> instead.
>
> It is up to the sending process to ensure that it closes the fds
> regardless of whether the transfer failed or succeeded.
>
> Most graphics shared memory allocations in Android are done by the
> graphics allocator HAL process. On requests from clients, the HAL process
> allocates memory and sends the fds to the clients over binder IPC.
> The graphics allocator HAL will not retain any references to the
> buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> correctly charge the buffers to the client processes instead of the
> graphics allocator HAL.
>
> From: Hridya Valsaraju <hridya@google.com>
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> Co-developed-by: T.J. Mercier <tjmercier@google.com>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> ---
> changes in v2
> - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König.
>
>  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/android/binder.h |  1 +
>  2 files changed, 27 insertions(+)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 8351c5638880..f50d88ded188 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -42,6 +42,7 @@
>
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/dma-buf.h>
>  #include <linux/fdtable.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
> @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,

Is this only needed for the BINDER_TYPE_FDA case (multiple fds)? This
never needs to be done in the BINDER_TYPE_FD case (single fd)?

>  {
>         binder_size_t fdi, fd_buf_size;
>         binder_size_t fda_offset;
> +       bool transfer_gpu_charge = false;
>         const void __user *sender_ufda_base;
>         struct binder_proc *proc = thread->proc;
> +       struct binder_proc *target_proc = t->to_proc;
>         int ret;
>
>         fd_buf_size = sizeof(u32) * fda->num_fds;
> @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>         if (ret)
>                 return ret;
>
> +       if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> +               parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> +               transfer_gpu_charge = true;
> +
>         for (fdi = 0; fdi < fda->num_fds; fdi++) {
>                 u32 fd;
> +               struct dma_buf *dmabuf;
> +               struct gpucg *gpucg;
> +
>                 binder_size_t offset = fda_offset + fdi * sizeof(fd);
>                 binder_size_t sender_uoffset = fdi * sizeof(fd);
>
> @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>                                                   in_reply_to);
>                 if (ret)
>                         return ret > 0 ? -EINVAL : ret;
> +
> +               if (!transfer_gpu_charge)
> +                       continue;
> +
> +               dmabuf = dma_buf_get(fd);
> +               if (IS_ERR(dmabuf))
> +                       continue;
> +
> +               gpucg = gpucg_get(target_proc->tsk);
> +               ret = dma_buf_charge_transfer(dmabuf, gpucg);
> +               if (ret) {
> +                       pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> +                               proc->pid, thread->pid, target_proc->pid);
> +                       gpucg_put(gpucg);
> +               }
> +               dma_buf_put(dmabuf);
>         }
>         return 0;
>  }
> diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> index 3246f2c74696..169fd5069a1a 100644
> --- a/include/uapi/linux/android/binder.h
> +++ b/include/uapi/linux/android/binder.h
> @@ -137,6 +137,7 @@ struct binder_buffer_object {
>
>  enum {
>         BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> +       BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
>  };
>
>  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> --
> 2.35.1.265.g69c8d7142f-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 21:25     ` Todd Kjos
  0 siblings, 0 replies; 63+ messages in thread
From: Todd Kjos @ 2022-02-14 21:25 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Liam Mark, Brian Starkey

On Fri, Feb 11, 2022 at 8:19 AM T.J. Mercier <tjmercier@google.com> wrote:
>
> This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> that a process sending an fd array to another process over binder IPC
> can set to relinquish ownership of the fds being sent for memory
> accounting purposes. If the flag is found to be set during the fd array
> translation and the fd is for a DMA-BUF, the buffer is uncharged from
> the sender's cgroup and charged to the receiving process's cgroup
> instead.
>
> It is up to the sending process to ensure that it closes the fds
> regardless of whether the transfer failed or succeeded.
>
> Most graphics shared memory allocations in Android are done by the
> graphics allocator HAL process. On requests from clients, the HAL process
> allocates memory and sends the fds to the clients over binder IPC.
> The graphics allocator HAL will not retain any references to the
> buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> correctly charge the buffers to the client processes instead of the
> graphics allocator HAL.
>
> From: Hridya Valsaraju <hridya@google.com>
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> Co-developed-by: T.J. Mercier <tjmercier@google.com>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> ---
> changes in v2
> - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König.
>
>  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/android/binder.h |  1 +
>  2 files changed, 27 insertions(+)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 8351c5638880..f50d88ded188 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -42,6 +42,7 @@
>
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/dma-buf.h>
>  #include <linux/fdtable.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
> @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,

Is this only needed for the BINDER_TYPE_FDA case (multiple fds)? This
never needs to be done in the BINDER_TYPE_FD case (single fd)?

>  {
>         binder_size_t fdi, fd_buf_size;
>         binder_size_t fda_offset;
> +       bool transfer_gpu_charge = false;
>         const void __user *sender_ufda_base;
>         struct binder_proc *proc = thread->proc;
> +       struct binder_proc *target_proc = t->to_proc;
>         int ret;
>
>         fd_buf_size = sizeof(u32) * fda->num_fds;
> @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>         if (ret)
>                 return ret;
>
> +       if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> +               parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> +               transfer_gpu_charge = true;
> +
>         for (fdi = 0; fdi < fda->num_fds; fdi++) {
>                 u32 fd;
> +               struct dma_buf *dmabuf;
> +               struct gpucg *gpucg;
> +
>                 binder_size_t offset = fda_offset + fdi * sizeof(fd);
>                 binder_size_t sender_uoffset = fdi * sizeof(fd);
>
> @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>                                                   in_reply_to);
>                 if (ret)
>                         return ret > 0 ? -EINVAL : ret;
> +
> +               if (!transfer_gpu_charge)
> +                       continue;
> +
> +               dmabuf = dma_buf_get(fd);
> +               if (IS_ERR(dmabuf))
> +                       continue;
> +
> +               gpucg = gpucg_get(target_proc->tsk);
> +               ret = dma_buf_charge_transfer(dmabuf, gpucg);
> +               if (ret) {
> +                       pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> +                               proc->pid, thread->pid, target_proc->pid);
> +                       gpucg_put(gpucg);
> +               }
> +               dma_buf_put(dmabuf);
>         }
>         return 0;
>  }
> diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> index 3246f2c74696..169fd5069a1a 100644
> --- a/include/uapi/linux/android/binder.h
> +++ b/include/uapi/linux/android/binder.h
> @@ -137,6 +137,7 @@ struct binder_buffer_object {
>
>  enum {
>         BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> +       BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
>  };
>
>  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> --
> 2.35.1.265.g69c8d7142f-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-12  7:19     ` Greg Kroah-Hartman
  (?)
@ 2022-02-14 22:25       ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-14 22:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Kalesh Singh,
	Kenny.Ho, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups

On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >  {
> >       binder_size_t fdi, fd_buf_size;
> >       binder_size_t fda_offset;
> > +     bool transfer_gpu_charge = false;
> >       const void __user *sender_ufda_base;
> >       struct binder_proc *proc = thread->proc;
> > +     struct binder_proc *target_proc = t->to_proc;
> >       int ret;
> >
> >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >       if (ret)
> >               return ret;
> >
> > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +             transfer_gpu_charge = true;
> > +
> >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >               u32 fd;
> > +             struct dma_buf *dmabuf;
> > +             struct gpucg *gpucg;
> > +
> >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                 in_reply_to);
> >               if (ret)
> >                       return ret > 0 ? -EINVAL : ret;
> > +
> > +             if (!transfer_gpu_charge)
> > +                     continue;
> > +
> > +             dmabuf = dma_buf_get(fd);
> > +             if (IS_ERR(dmabuf))
> > +                     continue;
> > +
> > +             gpucg = gpucg_get(target_proc->tsk);
> > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +             if (ret) {
> > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                             proc->pid, thread->pid, target_proc->pid);
> > +                     gpucg_put(gpucg);
> > +             }
> > +             dma_buf_put(dmabuf);
> >       }
> >       return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >
>
> How does userspace know that binder supports this new flag?

Sorry, I don't completely follow even after Todd's comment. Doesn't
the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
this? So wouldn't userspace need to be compiled against the wrong
kernel headers for there to be a problem? In that case the allocation
would still succeed, but there would be no charge transfer and
unfortunately no error code.

> And where is the userspace test for this new feature?

I tested this on a Pixel after modifying the gralloc implementation to
mark allocated buffers as not used by the sender. This required
setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
code can be found here:
https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/

Then by inspecting gpu.memory.current files in sysfs I was able to see
the memory attributed to processes other than the graphics allocator
service. Before this change, several megabytes of memory were
attributed to the graphics allocator service but those buffers are
actually used by other processes like surfaceflinger, the camera, etc.
After the change, the gpu.memory.current amount for the graphics
allocator service was 0 and the charges showed up in the
gpu.memory.current files for those other processes like this:

PID: 764 Process Name: zygote64
system 8192
system-uncached 23191552

PID: 529 Process Name: /system/bin/surfaceflinger
system-uncached 109535232
system 92196864

PID: 530 Process Name:
/vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
system-uncached 0
system 0
sensor_direct_heap 0

PID: 806 Process Name:
/apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
system 1196032

PID: 4608 Process Name: com.google.android.GoogleCamera
system 2408448
system-uncached 38887424
sensor_direct_heap 0

PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
system-uncached 91279360
system 20480

PID: 2758 Process Name: com.google.android.youtube
system-uncached 1662976
system 8192

PID: 2517 Process Name: com.google.android.apps.nexuslauncher
system-uncached 115662848
system 122880

PID: 2066 Process Name: com.android.systemui
system 86016
system-uncached 37957632

>  Isn't there a binder test framework somewhere?

Android has the Vendor Test Suite where automated tests could be added
for this. Is that what you're thinking of?

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 22:25       ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-14 22:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	Kalesh Singh, Joel Fernandes, Sumit Semwal, Jonathan Corbet,
	Martijn Coenen, Laura Abbott, linux-media, Todd Kjos,
	linaro-mm-sig, Tejun Heo, cgroups, Suren Baghdasaryan,
	Christian Brauner, Kenny.Ho, linux-kernel, Liam Mark,
	Christian König, Arve Hjønnevåg,
	Thomas Zimmermann, Johannes Weiner, Hridya Valsaraju

On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >  {
> >       binder_size_t fdi, fd_buf_size;
> >       binder_size_t fda_offset;
> > +     bool transfer_gpu_charge = false;
> >       const void __user *sender_ufda_base;
> >       struct binder_proc *proc = thread->proc;
> > +     struct binder_proc *target_proc = t->to_proc;
> >       int ret;
> >
> >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >       if (ret)
> >               return ret;
> >
> > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +             transfer_gpu_charge = true;
> > +
> >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >               u32 fd;
> > +             struct dma_buf *dmabuf;
> > +             struct gpucg *gpucg;
> > +
> >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                 in_reply_to);
> >               if (ret)
> >                       return ret > 0 ? -EINVAL : ret;
> > +
> > +             if (!transfer_gpu_charge)
> > +                     continue;
> > +
> > +             dmabuf = dma_buf_get(fd);
> > +             if (IS_ERR(dmabuf))
> > +                     continue;
> > +
> > +             gpucg = gpucg_get(target_proc->tsk);
> > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +             if (ret) {
> > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                             proc->pid, thread->pid, target_proc->pid);
> > +                     gpucg_put(gpucg);
> > +             }
> > +             dma_buf_put(dmabuf);
> >       }
> >       return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >
>
> How does userspace know that binder supports this new flag?

Sorry, I don't completely follow even after Todd's comment. Doesn't
the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
this? So wouldn't userspace need to be compiled against the wrong
kernel headers for there to be a problem? In that case the allocation
would still succeed, but there would be no charge transfer and
unfortunately no error code.

> And where is the userspace test for this new feature?

I tested this on a Pixel after modifying the gralloc implementation to
mark allocated buffers as not used by the sender. This required
setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
code can be found here:
https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/

Then by inspecting gpu.memory.current files in sysfs I was able to see
the memory attributed to processes other than the graphics allocator
service. Before this change, several megabytes of memory were
attributed to the graphics allocator service but those buffers are
actually used by other processes like surfaceflinger, the camera, etc.
After the change, the gpu.memory.current amount for the graphics
allocator service was 0 and the charges showed up in the
gpu.memory.current files for those other processes like this:

PID: 764 Process Name: zygote64
system 8192
system-uncached 23191552

PID: 529 Process Name: /system/bin/surfaceflinger
system-uncached 109535232
system 92196864

PID: 530 Process Name:
/vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
system-uncached 0
system 0
sensor_direct_heap 0

PID: 806 Process Name:
/apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
system 1196032

PID: 4608 Process Name: com.google.android.GoogleCamera
system 2408448
system-uncached 38887424
sensor_direct_heap 0

PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
system-uncached 91279360
system 20480

PID: 2758 Process Name: com.google.android.youtube
system-uncached 1662976
system 8192

PID: 2517 Process Name: com.google.android.apps.nexuslauncher
system-uncached 115662848
system 122880

PID: 2066 Process Name: com.android.systemui
system 86016
system-uncached 37957632

>  Isn't there a binder test framework somewhere?

Android has the Vendor Test Suite where automated tests could be added
for this. Is that what you're thinking of?

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-14 22:25       ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-14 22:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott

On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
<gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org> wrote:
>
> On Fri, Feb 11, 2022 at 04:18:29PM +0000, T.J. Mercier wrote:
> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Co-developed-by: T.J. Mercier <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Signed-off-by: T.J. Mercier <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >  {
> >       binder_size_t fdi, fd_buf_size;
> >       binder_size_t fda_offset;
> > +     bool transfer_gpu_charge = false;
> >       const void __user *sender_ufda_base;
> >       struct binder_proc *proc = thread->proc;
> > +     struct binder_proc *target_proc = t->to_proc;
> >       int ret;
> >
> >       fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >       if (ret)
> >               return ret;
> >
> > +     if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +             parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +             transfer_gpu_charge = true;
> > +
> >       for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >               u32 fd;
> > +             struct dma_buf *dmabuf;
> > +             struct gpucg *gpucg;
> > +
> >               binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >               binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                 in_reply_to);
> >               if (ret)
> >                       return ret > 0 ? -EINVAL : ret;
> > +
> > +             if (!transfer_gpu_charge)
> > +                     continue;
> > +
> > +             dmabuf = dma_buf_get(fd);
> > +             if (IS_ERR(dmabuf))
> > +                     continue;
> > +
> > +             gpucg = gpucg_get(target_proc->tsk);
> > +             ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +             if (ret) {
> > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                             proc->pid, thread->pid, target_proc->pid);
> > +                     gpucg_put(gpucg);
> > +             }
> > +             dma_buf_put(dmabuf);
> >       }
> >       return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >
>
> How does userspace know that binder supports this new flag?

Sorry, I don't completely follow even after Todd's comment. Doesn't
the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
this? So wouldn't userspace need to be compiled against the wrong
kernel headers for there to be a problem? In that case the allocation
would still succeed, but there would be no charge transfer and
unfortunately no error code.

> And where is the userspace test for this new feature?

I tested this on a Pixel after modifying the gralloc implementation to
mark allocated buffers as not used by the sender. This required
setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
code can be found here:
https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/

Then by inspecting gpu.memory.current files in sysfs I was able to see
the memory attributed to processes other than the graphics allocator
service. Before this change, several megabytes of memory were
attributed to the graphics allocator service but those buffers are
actually used by other processes like surfaceflinger, the camera, etc.
After the change, the gpu.memory.current amount for the graphics
allocator service was 0 and the charges showed up in the
gpu.memory.current files for those other processes like this:

PID: 764 Process Name: zygote64
system 8192
system-uncached 23191552

PID: 529 Process Name: /system/bin/surfaceflinger
system-uncached 109535232
system 92196864

PID: 530 Process Name:
/vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
system-uncached 0
system 0
sensor_direct_heap 0

PID: 806 Process Name:
/apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
system 1196032

PID: 4608 Process Name: com.google.android.GoogleCamera
system 2408448
system-uncached 38887424
sensor_direct_heap 0

PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
system-uncached 91279360
system 20480

PID: 2758 Process Name: com.google.android.youtube
system-uncached 1662976
system 8192

PID: 2517 Process Name: com.google.android.apps.nexuslauncher
system-uncached 115662848
system 122880

PID: 2066 Process Name: com.android.systemui
system 86016
system-uncached 37957632

>  Isn't there a binder test framework somewhere?

Android has the Vendor Test Suite where automated tests could be added
for this. Is that what you're thinking of?

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-14 21:25     ` Todd Kjos
  (?)
@ 2022-02-15  0:03       ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-15  0:03 UTC (permalink / raw)
  To: Todd Kjos
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Liam Mark, Brian Starkey, John Stultz, Tejun Heo, Zefan Li,
	Johannes Weiner, Kalesh Singh, Kenny.Ho, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups

On Mon, Feb 14, 2022 at 1:26 PM Todd Kjos <tkjos@google.com> wrote:
>
> On Fri, Feb 11, 2022 at 8:19 AM T.J. Mercier <tjmercier@google.com> wrote:
> >
> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>
> Is this only needed for the BINDER_TYPE_FDA case (multiple fds)? This
> never needs to be done in the BINDER_TYPE_FD case (single fd)?
>

Currently this is the case as there is no user who would benefit from
the single fd case. The only known user is the gralloc HAL which
always uses BINDER_TYPE_FDA to send dmabufs. I guess we could move the
code into binder_translate_fd if we were willing to bring back
binder_fd_object's flags field. This looks possible, but I think it'd
be a more intrusive change.

> >  {
> >         binder_size_t fdi, fd_buf_size;
> >         binder_size_t fda_offset;
> > +       bool transfer_gpu_charge = false;
> >         const void __user *sender_ufda_base;
> >         struct binder_proc *proc = thread->proc;
> > +       struct binder_proc *target_proc = t->to_proc;
> >         int ret;
> >
> >         fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >         if (ret)
> >                 return ret;
> >
> > +       if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +               parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +               transfer_gpu_charge = true;
> > +
> >         for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >                 u32 fd;
> > +               struct dma_buf *dmabuf;
> > +               struct gpucg *gpucg;
> > +
> >                 binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >                 binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                   in_reply_to);
> >                 if (ret)
> >                         return ret > 0 ? -EINVAL : ret;
> > +
> > +               if (!transfer_gpu_charge)
> > +                       continue;
> > +
> > +               dmabuf = dma_buf_get(fd);
> > +               if (IS_ERR(dmabuf))
> > +                       continue;
> > +
> > +               gpucg = gpucg_get(target_proc->tsk);
> > +               ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +               if (ret) {
> > +                       pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                               proc->pid, thread->pid, target_proc->pid);
> > +                       gpucg_put(gpucg);
> > +               }
> > +               dma_buf_put(dmabuf);
> >         }
> >         return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >         BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +       BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-15  0:03       ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-15  0:03 UTC (permalink / raw)
  To: Todd Kjos
  Cc: linux-doc, David Airlie, dri-devel, Zefan Li, Kalesh Singh,
	Joel Fernandes, Sumit Semwal, Kenny.Ho, Jonathan Corbet,
	Martijn Coenen, linux-media, Todd Kjos, linaro-mm-sig, Tejun Heo,
	cgroups, Suren Baghdasaryan, Christian Brauner,
	Greg Kroah-Hartman, linux-kernel, Liam Mark,
	Christian König, Arve Hjønnevåg,
	Thomas Zimmermann, Johannes Weiner, Hridya Valsaraju

On Mon, Feb 14, 2022 at 1:26 PM Todd Kjos <tkjos@google.com> wrote:
>
> On Fri, Feb 11, 2022 at 8:19 AM T.J. Mercier <tjmercier@google.com> wrote:
> >
> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>
> Is this only needed for the BINDER_TYPE_FDA case (multiple fds)? This
> never needs to be done in the BINDER_TYPE_FD case (single fd)?
>

Currently this is the case as there is no user who would benefit from
the single fd case. The only known user is the gralloc HAL which
always uses BINDER_TYPE_FDA to send dmabufs. I guess we could move the
code into binder_translate_fd if we were willing to bring back
binder_fd_object's flags field. This looks possible, but I think it'd
be a more intrusive change.

> >  {
> >         binder_size_t fdi, fd_buf_size;
> >         binder_size_t fda_offset;
> > +       bool transfer_gpu_charge = false;
> >         const void __user *sender_ufda_base;
> >         struct binder_proc *proc = thread->proc;
> > +       struct binder_proc *target_proc = t->to_proc;
> >         int ret;
> >
> >         fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >         if (ret)
> >                 return ret;
> >
> > +       if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +               parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +               transfer_gpu_charge = true;
> > +
> >         for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >                 u32 fd;
> > +               struct dma_buf *dmabuf;
> > +               struct gpucg *gpucg;
> > +
> >                 binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >                 binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                   in_reply_to);
> >                 if (ret)
> >                         return ret > 0 ? -EINVAL : ret;
> > +
> > +               if (!transfer_gpu_charge)
> > +                       continue;
> > +
> > +               dmabuf = dma_buf_get(fd);
> > +               if (IS_ERR(dmabuf))
> > +                       continue;
> > +
> > +               gpucg = gpucg_get(target_proc->tsk);
> > +               ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +               if (ret) {
> > +                       pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                               proc->pid, thread->pid, target_proc->pid);
> > +                       gpucg_put(gpucg);
> > +               }
> > +               dma_buf_put(dmabuf);
> >         }
> >         return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >         BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +       BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-15  0:03       ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-15  0:03 UTC (permalink / raw)
  To: Todd Kjos
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Liam Mark, Brian Starkey

On Mon, Feb 14, 2022 at 1:26 PM Todd Kjos <tkjos@google.com> wrote:
>
> On Fri, Feb 11, 2022 at 8:19 AM T.J. Mercier <tjmercier@google.com> wrote:
> >
> > This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
> > that a process sending an fd array to another process over binder IPC
> > can set to relinquish ownership of the fds being sent for memory
> > accounting purposes. If the flag is found to be set during the fd array
> > translation and the fd is for a DMA-BUF, the buffer is uncharged from
> > the sender's cgroup and charged to the receiving process's cgroup
> > instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL process
> > allocates memory and sends the fds to the clients over binder IPC.
> > The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
> > arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
> > correctly charge the buffers to the client processes instead of the
> > graphics allocator HAL.
> >
> > From: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Co-developed-by: T.J. Mercier <tjmercier@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > changes in v2
> > - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> > heap to a single dma-buf function for all heaps per Daniel Vetter and
> > Christian König.
> >
> >  drivers/android/binder.c            | 26 ++++++++++++++++++++++++++
> >  include/uapi/linux/android/binder.h |  1 +
> >  2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 8351c5638880..f50d88ded188 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>
> Is this only needed for the BINDER_TYPE_FDA case (multiple fds)? This
> never needs to be done in the BINDER_TYPE_FD case (single fd)?
>

Currently this is the case as there is no user who would benefit from
the single fd case. The only known user is the gralloc HAL which
always uses BINDER_TYPE_FDA to send dmabufs. I guess we could move the
code into binder_translate_fd if we were willing to bring back
binder_fd_object's flags field. This looks possible, but I think it'd
be a more intrusive change.

> >  {
> >         binder_size_t fdi, fd_buf_size;
> >         binder_size_t fda_offset;
> > +       bool transfer_gpu_charge = false;
> >         const void __user *sender_ufda_base;
> >         struct binder_proc *proc = thread->proc;
> > +       struct binder_proc *target_proc = t->to_proc;
> >         int ret;
> >
> >         fd_buf_size = sizeof(u32) * fda->num_fds;
> > @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >         if (ret)
> >                 return ret;
> >
> > +       if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
> > +               parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
> > +               transfer_gpu_charge = true;
> > +
> >         for (fdi = 0; fdi < fda->num_fds; fdi++) {
> >                 u32 fd;
> > +               struct dma_buf *dmabuf;
> > +               struct gpucg *gpucg;
> > +
> >                 binder_size_t offset = fda_offset + fdi * sizeof(fd);
> >                 binder_size_t sender_uoffset = fdi * sizeof(fd);
> >
> > @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >                                                   in_reply_to);
> >                 if (ret)
> >                         return ret > 0 ? -EINVAL : ret;
> > +
> > +               if (!transfer_gpu_charge)
> > +                       continue;
> > +
> > +               dmabuf = dma_buf_get(fd);
> > +               if (IS_ERR(dmabuf))
> > +                       continue;
> > +
> > +               gpucg = gpucg_get(target_proc->tsk);
> > +               ret = dma_buf_charge_transfer(dmabuf, gpucg);
> > +               if (ret) {
> > +                       pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
> > +                               proc->pid, thread->pid, target_proc->pid);
> > +                       gpucg_put(gpucg);
> > +               }
> > +               dma_buf_put(dmabuf);
> >         }
> >         return 0;
> >  }
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index 3246f2c74696..169fd5069a1a 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> >
> >  enum {
> >         BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > +       BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> >  };
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-14 22:25       ` T.J. Mercier
  (?)
@ 2022-02-15  7:01         ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-15  7:01 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner, Kalesh Singh,
	Kenny.Ho, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups

On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > --- a/include/uapi/linux/android/binder.h
> > > +++ b/include/uapi/linux/android/binder.h
> > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > >
> > >  enum {
> > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > >  };
> > >
> > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > --
> > > 2.35.1.265.g69c8d7142f-goog
> > >
> >
> > How does userspace know that binder supports this new flag?
> 
> Sorry, I don't completely follow even after Todd's comment. Doesn't
> the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> this?

There is no "header" when running a new kernel on an old userspace,
right?  How about the other way around, old kernel, new userspace?

> So wouldn't userspace need to be compiled against the wrong
> kernel headers for there to be a problem? In that case the allocation
> would still succeed, but there would be no charge transfer and
> unfortunately no error code.

No error code is not good.  People upgrade their kernels all the time,
and do not do a "rebuild the world" when doing so.

> > And where is the userspace test for this new feature?
> 
> I tested this on a Pixel after modifying the gralloc implementation to
> mark allocated buffers as not used by the sender. This required
> setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
> code can be found here:
> https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
> https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/
> 
> Then by inspecting gpu.memory.current files in sysfs I was able to see
> the memory attributed to processes other than the graphics allocator
> service. Before this change, several megabytes of memory were
> attributed to the graphics allocator service but those buffers are
> actually used by other processes like surfaceflinger, the camera, etc.
> After the change, the gpu.memory.current amount for the graphics
> allocator service was 0 and the charges showed up in the
> gpu.memory.current files for those other processes like this:
> 
> PID: 764 Process Name: zygote64
> system 8192
> system-uncached 23191552
> 
> PID: 529 Process Name: /system/bin/surfaceflinger
> system-uncached 109535232
> system 92196864
> 
> PID: 530 Process Name:
> /vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
> system-uncached 0
> system 0
> sensor_direct_heap 0
> 
> PID: 806 Process Name:
> /apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
> system 1196032
> 
> PID: 4608 Process Name: com.google.android.GoogleCamera
> system 2408448
> system-uncached 38887424
> sensor_direct_heap 0
> 
> PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
> system-uncached 91279360
> system 20480
> 
> PID: 2758 Process Name: com.google.android.youtube
> system-uncached 1662976
> system 8192
> 
> PID: 2517 Process Name: com.google.android.apps.nexuslauncher
> system-uncached 115662848
> system 122880
> 
> PID: 2066 Process Name: com.android.systemui
> system 86016
> system-uncached 37957632
> 
> >  Isn't there a binder test framework somewhere?
> 
> Android has the Vendor Test Suite where automated tests could be added
> for this. Is that what you're thinking of?

tools/testing/selftests/ is a good start.  VTS is the worst-case as no
one can really run that on their own, but it is better than nothing.
Having no test at all for this is not ok.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-15  7:01         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-15  7:01 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	Kalesh Singh, Joel Fernandes, Sumit Semwal, Jonathan Corbet,
	Martijn Coenen, Laura Abbott, linux-media, Todd Kjos,
	linaro-mm-sig, Tejun Heo, cgroups, Suren Baghdasaryan,
	Christian Brauner, Kenny.Ho, linux-kernel, Liam Mark,
	Christian König, Arve Hjønnevåg,
	Thomas Zimmermann, Johannes Weiner, Hridya Valsaraju

On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > --- a/include/uapi/linux/android/binder.h
> > > +++ b/include/uapi/linux/android/binder.h
> > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > >
> > >  enum {
> > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > >  };
> > >
> > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > --
> > > 2.35.1.265.g69c8d7142f-goog
> > >
> >
> > How does userspace know that binder supports this new flag?
> 
> Sorry, I don't completely follow even after Todd's comment. Doesn't
> the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> this?

There is no "header" when running a new kernel on an old userspace,
right?  How about the other way around, old kernel, new userspace?

> So wouldn't userspace need to be compiled against the wrong
> kernel headers for there to be a problem? In that case the allocation
> would still succeed, but there would be no charge transfer and
> unfortunately no error code.

No error code is not good.  People upgrade their kernels all the time,
and do not do a "rebuild the world" when doing so.

> > And where is the userspace test for this new feature?
> 
> I tested this on a Pixel after modifying the gralloc implementation to
> mark allocated buffers as not used by the sender. This required
> setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
> code can be found here:
> https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
> https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/
> 
> Then by inspecting gpu.memory.current files in sysfs I was able to see
> the memory attributed to processes other than the graphics allocator
> service. Before this change, several megabytes of memory were
> attributed to the graphics allocator service but those buffers are
> actually used by other processes like surfaceflinger, the camera, etc.
> After the change, the gpu.memory.current amount for the graphics
> allocator service was 0 and the charges showed up in the
> gpu.memory.current files for those other processes like this:
> 
> PID: 764 Process Name: zygote64
> system 8192
> system-uncached 23191552
> 
> PID: 529 Process Name: /system/bin/surfaceflinger
> system-uncached 109535232
> system 92196864
> 
> PID: 530 Process Name:
> /vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
> system-uncached 0
> system 0
> sensor_direct_heap 0
> 
> PID: 806 Process Name:
> /apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
> system 1196032
> 
> PID: 4608 Process Name: com.google.android.GoogleCamera
> system 2408448
> system-uncached 38887424
> sensor_direct_heap 0
> 
> PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
> system-uncached 91279360
> system 20480
> 
> PID: 2758 Process Name: com.google.android.youtube
> system-uncached 1662976
> system 8192
> 
> PID: 2517 Process Name: com.google.android.apps.nexuslauncher
> system-uncached 115662848
> system 122880
> 
> PID: 2066 Process Name: com.android.systemui
> system 86016
> system-uncached 37957632
> 
> >  Isn't there a binder test framework somewhere?
> 
> Android has the Vendor Test Suite where automated tests could be added
> for this. Is that what you're thinking of?

tools/testing/selftests/ is a good start.  VTS is the worst-case as no
one can really run that on their own, but it is better than nothing.
Having no test at all for this is not ok.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-15  7:01         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-15  7:01 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott

On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > --- a/include/uapi/linux/android/binder.h
> > > +++ b/include/uapi/linux/android/binder.h
> > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > >
> > >  enum {
> > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > >  };
> > >
> > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > --
> > > 2.35.1.265.g69c8d7142f-goog
> > >
> >
> > How does userspace know that binder supports this new flag?
> 
> Sorry, I don't completely follow even after Todd's comment. Doesn't
> the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> this?

There is no "header" when running a new kernel on an old userspace,
right?  How about the other way around, old kernel, new userspace?

> So wouldn't userspace need to be compiled against the wrong
> kernel headers for there to be a problem? In that case the allocation
> would still succeed, but there would be no charge transfer and
> unfortunately no error code.

No error code is not good.  People upgrade their kernels all the time,
and do not do a "rebuild the world" when doing so.

> > And where is the userspace test for this new feature?
> 
> I tested this on a Pixel after modifying the gralloc implementation to
> mark allocated buffers as not used by the sender. This required
> setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
> code can be found here:
> https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
> https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/
> 
> Then by inspecting gpu.memory.current files in sysfs I was able to see
> the memory attributed to processes other than the graphics allocator
> service. Before this change, several megabytes of memory were
> attributed to the graphics allocator service but those buffers are
> actually used by other processes like surfaceflinger, the camera, etc.
> After the change, the gpu.memory.current amount for the graphics
> allocator service was 0 and the charges showed up in the
> gpu.memory.current files for those other processes like this:
> 
> PID: 764 Process Name: zygote64
> system 8192
> system-uncached 23191552
> 
> PID: 529 Process Name: /system/bin/surfaceflinger
> system-uncached 109535232
> system 92196864
> 
> PID: 530 Process Name:
> /vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
> system-uncached 0
> system 0
> sensor_direct_heap 0
> 
> PID: 806 Process Name:
> /apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
> system 1196032
> 
> PID: 4608 Process Name: com.google.android.GoogleCamera
> system 2408448
> system-uncached 38887424
> sensor_direct_heap 0
> 
> PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
> system-uncached 91279360
> system 20480
> 
> PID: 2758 Process Name: com.google.android.youtube
> system-uncached 1662976
> system 8192
> 
> PID: 2517 Process Name: com.google.android.apps.nexuslauncher
> system-uncached 115662848
> system 122880
> 
> PID: 2066 Process Name: com.android.systemui
> system 86016
> system-uncached 37957632
> 
> >  Isn't there a binder test framework somewhere?
> 
> Android has the Vendor Test Suite where automated tests could be added
> for this. Is that what you're thinking of?

tools/testing/selftests/ is a good start.  VTS is the worst-case as no
one can really run that on their own, but it is better than nothing.
Having no test at all for this is not ok.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-15  7:01         ` Greg Kroah-Hartman
  (?)
@ 2022-02-15  7:19           ` Suren Baghdasaryan
  -1 siblings, 0 replies; 63+ messages in thread
From: Suren Baghdasaryan @ 2022-02-15  7:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: T.J. Mercier, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Sumit Semwal, Christian König, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Tejun Heo, Zefan Li,
	Johannes Weiner, Kalesh Singh, Kenny.Ho, DRI mailing list,
	open list:DOCUMENTATION, LKML, linux-media,
	moderated list:DMA BUFFER SHARING FRAMEWORK, cgroups mailinglist

On Mon, Feb 14, 2022 at 11:01 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > > --- a/include/uapi/linux/android/binder.h
> > > > +++ b/include/uapi/linux/android/binder.h
> > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > >
> > > >  enum {
> > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > >  };
> > > >
> > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > --
> > > > 2.35.1.265.g69c8d7142f-goog
> > > >
> > >
> > > How does userspace know that binder supports this new flag?
> >
> > Sorry, I don't completely follow even after Todd's comment. Doesn't
> > the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> > this?
>
> There is no "header" when running a new kernel on an old userspace,
> right?  How about the other way around, old kernel, new userspace?

1. new kernel + old userspace = kernel supports the feature but
userspace does not use it. The old userspace won't even mount the new
cgroup controller, accounting is not performed, charge is not
transferred.
2. old kernel + new userspace = the new cgroup controller is not
supported by the kernel, accounting is not performed, charge is not
transferred.
3. old kernel + old userspace = same as #2
4. new kernel + new userspace = cgroup is mounted, feature is
supported and used.
Does that work or do we need a separate indication of whether binder
driver supports the charge transfer feature?

>
> > So wouldn't userspace need to be compiled against the wrong
> > kernel headers for there to be a problem? In that case the allocation
> > would still succeed, but there would be no charge transfer and
> > unfortunately no error code.
>
> No error code is not good.  People upgrade their kernels all the time,
> and do not do a "rebuild the world" when doing so.
>
> > > And where is the userspace test for this new feature?
> >
> > I tested this on a Pixel after modifying the gralloc implementation to
> > mark allocated buffers as not used by the sender. This required
> > setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
> > code can be found here:
> > https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
> > https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/
> >
> > Then by inspecting gpu.memory.current files in sysfs I was able to see
> > the memory attributed to processes other than the graphics allocator
> > service. Before this change, several megabytes of memory were
> > attributed to the graphics allocator service but those buffers are
> > actually used by other processes like surfaceflinger, the camera, etc.
> > After the change, the gpu.memory.current amount for the graphics
> > allocator service was 0 and the charges showed up in the
> > gpu.memory.current files for those other processes like this:
> >
> > PID: 764 Process Name: zygote64
> > system 8192
> > system-uncached 23191552
> >
> > PID: 529 Process Name: /system/bin/surfaceflinger
> > system-uncached 109535232
> > system 92196864
> >
> > PID: 530 Process Name:
> > /vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
> > system-uncached 0
> > system 0
> > sensor_direct_heap 0
> >
> > PID: 806 Process Name:
> > /apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
> > system 1196032
> >
> > PID: 4608 Process Name: com.google.android.GoogleCamera
> > system 2408448
> > system-uncached 38887424
> > sensor_direct_heap 0
> >
> > PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
> > system-uncached 91279360
> > system 20480
> >
> > PID: 2758 Process Name: com.google.android.youtube
> > system-uncached 1662976
> > system 8192
> >
> > PID: 2517 Process Name: com.google.android.apps.nexuslauncher
> > system-uncached 115662848
> > system 122880
> >
> > PID: 2066 Process Name: com.android.systemui
> > system 86016
> > system-uncached 37957632
> >
> > >  Isn't there a binder test framework somewhere?
> >
> > Android has the Vendor Test Suite where automated tests could be added
> > for this. Is that what you're thinking of?
>
> tools/testing/selftests/ is a good start.  VTS is the worst-case as no
> one can really run that on their own, but it is better than nothing.
> Having no test at all for this is not ok.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-15  7:19           ` Suren Baghdasaryan
  0 siblings, 0 replies; 63+ messages in thread
From: Suren Baghdasaryan @ 2022-02-15  7:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Zefan Li, open list:DOCUMENTATION, David Airlie,
	DRI mailing list, Benjamin Gaignard, Kalesh Singh,
	Joel Fernandes, Sumit Semwal, Jonathan Corbet, Laura Abbott,
	linux-media, Todd Kjos, Thomas Zimmermann,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Tejun Heo,
	cgroups mailinglist, Martijn Coenen, T.J. Mercier,
	Christian Brauner, Kenny.Ho, LKML, Liam Mark,
	Christian König, Arve Hjønnevåg, Johannes Weiner,
	Hridya Valsaraju

On Mon, Feb 14, 2022 at 11:01 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > > --- a/include/uapi/linux/android/binder.h
> > > > +++ b/include/uapi/linux/android/binder.h
> > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > >
> > > >  enum {
> > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > >  };
> > > >
> > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > --
> > > > 2.35.1.265.g69c8d7142f-goog
> > > >
> > >
> > > How does userspace know that binder supports this new flag?
> >
> > Sorry, I don't completely follow even after Todd's comment. Doesn't
> > the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> > this?
>
> There is no "header" when running a new kernel on an old userspace,
> right?  How about the other way around, old kernel, new userspace?

1. new kernel + old userspace = kernel supports the feature but
userspace does not use it. The old userspace won't even mount the new
cgroup controller, accounting is not performed, charge is not
transferred.
2. old kernel + new userspace = the new cgroup controller is not
supported by the kernel, accounting is not performed, charge is not
transferred.
3. old kernel + old userspace = same as #2
4. new kernel + new userspace = cgroup is mounted, feature is
supported and used.
Does that work or do we need a separate indication of whether binder
driver supports the charge transfer feature?

>
> > So wouldn't userspace need to be compiled against the wrong
> > kernel headers for there to be a problem? In that case the allocation
> > would still succeed, but there would be no charge transfer and
> > unfortunately no error code.
>
> No error code is not good.  People upgrade their kernels all the time,
> and do not do a "rebuild the world" when doing so.
>
> > > And where is the userspace test for this new feature?
> >
> > I tested this on a Pixel after modifying the gralloc implementation to
> > mark allocated buffers as not used by the sender. This required
> > setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
> > code can be found here:
> > https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
> > https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/
> >
> > Then by inspecting gpu.memory.current files in sysfs I was able to see
> > the memory attributed to processes other than the graphics allocator
> > service. Before this change, several megabytes of memory were
> > attributed to the graphics allocator service but those buffers are
> > actually used by other processes like surfaceflinger, the camera, etc.
> > After the change, the gpu.memory.current amount for the graphics
> > allocator service was 0 and the charges showed up in the
> > gpu.memory.current files for those other processes like this:
> >
> > PID: 764 Process Name: zygote64
> > system 8192
> > system-uncached 23191552
> >
> > PID: 529 Process Name: /system/bin/surfaceflinger
> > system-uncached 109535232
> > system 92196864
> >
> > PID: 530 Process Name:
> > /vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
> > system-uncached 0
> > system 0
> > sensor_direct_heap 0
> >
> > PID: 806 Process Name:
> > /apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
> > system 1196032
> >
> > PID: 4608 Process Name: com.google.android.GoogleCamera
> > system 2408448
> > system-uncached 38887424
> > sensor_direct_heap 0
> >
> > PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
> > system-uncached 91279360
> > system 20480
> >
> > PID: 2758 Process Name: com.google.android.youtube
> > system-uncached 1662976
> > system 8192
> >
> > PID: 2517 Process Name: com.google.android.apps.nexuslauncher
> > system-uncached 115662848
> > system 122880
> >
> > PID: 2066 Process Name: com.android.systemui
> > system 86016
> > system-uncached 37957632
> >
> > >  Isn't there a binder test framework somewhere?
> >
> > Android has the Vendor Test Suite where automated tests could be added
> > for this. Is that what you're thinking of?
>
> tools/testing/selftests/ is a good start.  VTS is the worst-case as no
> one can really run that on their own, but it is better than nothing.
> Having no test at all for this is not ok.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-15  7:19           ` Suren Baghdasaryan
  0 siblings, 0 replies; 63+ messages in thread
From: Suren Baghdasaryan @ 2022-02-15  7:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: T.J. Mercier, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Sumit Semwal, Christian König, Benjamin Gaignard, Liam Mark,
	Laura Abbott

On Mon, Feb 14, 2022 at 11:01 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > > --- a/include/uapi/linux/android/binder.h
> > > > +++ b/include/uapi/linux/android/binder.h
> > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > >
> > > >  enum {
> > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > >  };
> > > >
> > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > --
> > > > 2.35.1.265.g69c8d7142f-goog
> > > >
> > >
> > > How does userspace know that binder supports this new flag?
> >
> > Sorry, I don't completely follow even after Todd's comment. Doesn't
> > the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> > this?
>
> There is no "header" when running a new kernel on an old userspace,
> right?  How about the other way around, old kernel, new userspace?

1. new kernel + old userspace = kernel supports the feature but
userspace does not use it. The old userspace won't even mount the new
cgroup controller, accounting is not performed, charge is not
transferred.
2. old kernel + new userspace = the new cgroup controller is not
supported by the kernel, accounting is not performed, charge is not
transferred.
3. old kernel + old userspace = same as #2
4. new kernel + new userspace = cgroup is mounted, feature is
supported and used.
Does that work or do we need a separate indication of whether binder
driver supports the charge transfer feature?

>
> > So wouldn't userspace need to be compiled against the wrong
> > kernel headers for there to be a problem? In that case the allocation
> > would still succeed, but there would be no charge transfer and
> > unfortunately no error code.
>
> No error code is not good.  People upgrade their kernels all the time,
> and do not do a "rebuild the world" when doing so.
>
> > > And where is the userspace test for this new feature?
> >
> > I tested this on a Pixel after modifying the gralloc implementation to
> > mark allocated buffers as not used by the sender. This required
> > setting the BINDER_BUFFER_FLAG_SENDER_NO_NEED in libhwbinder. That
> > code can be found here:
> > https://android-review.googlesource.com/c/platform/system/libhwbinder/+/1910752/1/Parcel.cpp
> > https://android-review.googlesource.com/c/platform/system/libhidl/+/1910611/
> >
> > Then by inspecting gpu.memory.current files in sysfs I was able to see
> > the memory attributed to processes other than the graphics allocator
> > service. Before this change, several megabytes of memory were
> > attributed to the graphics allocator service but those buffers are
> > actually used by other processes like surfaceflinger, the camera, etc.
> > After the change, the gpu.memory.current amount for the graphics
> > allocator service was 0 and the charges showed up in the
> > gpu.memory.current files for those other processes like this:
> >
> > PID: 764 Process Name: zygote64
> > system 8192
> > system-uncached 23191552
> >
> > PID: 529 Process Name: /system/bin/surfaceflinger
> > system-uncached 109535232
> > system 92196864
> >
> > PID: 530 Process Name:
> > /vendor/bin/hw/android.hardware.graphics.allocator@4.0-service
> > system-uncached 0
> > system 0
> > sensor_direct_heap 0
> >
> > PID: 806 Process Name:
> > /apex/com.google.pixel.camera.hal/bin/hw/android.hardware.camera.provider@2.7-service-google
> > system 1196032
> >
> > PID: 4608 Process Name: com.google.android.GoogleCamera
> > system 2408448
> > system-uncached 38887424
> > sensor_direct_heap 0
> >
> > PID: 32102 Process Name: com.google.android.googlequicksearchbox:search
> > system-uncached 91279360
> > system 20480
> >
> > PID: 2758 Process Name: com.google.android.youtube
> > system-uncached 1662976
> > system 8192
> >
> > PID: 2517 Process Name: com.google.android.apps.nexuslauncher
> > system-uncached 115662848
> > system 122880
> >
> > PID: 2066 Process Name: com.android.systemui
> > system 86016
> > system-uncached 37957632
> >
> > >  Isn't there a binder test framework somewhere?
> >
> > Android has the Vendor Test Suite where automated tests could be added
> > for this. Is that what you're thinking of?
>
> tools/testing/selftests/ is a good start.  VTS is the worst-case as no
> one can really run that on their own, but it is better than nothing.
> Having no test at all for this is not ok.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-02-15  7:19           ` Suren Baghdasaryan
  (?)
@ 2022-02-15  7:30             ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-15  7:30 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: T.J. Mercier, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Sumit Semwal, Christian König, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Tejun Heo, Zefan Li,
	Johannes Weiner, Kalesh Singh, Kenny.Ho, DRI mailing list,
	open list:DOCUMENTATION, LKML, linux-media,
	moderated list:DMA BUFFER SHARING FRAMEWORK, cgroups mailinglist

On Mon, Feb 14, 2022 at 11:19:35PM -0800, Suren Baghdasaryan wrote:
> On Mon, Feb 14, 2022 at 11:01 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> > > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > > > --- a/include/uapi/linux/android/binder.h
> > > > > +++ b/include/uapi/linux/android/binder.h
> > > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > > >
> > > > >  enum {
> > > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > > >  };
> > > > >
> > > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > > --
> > > > > 2.35.1.265.g69c8d7142f-goog
> > > > >
> > > >
> > > > How does userspace know that binder supports this new flag?
> > >
> > > Sorry, I don't completely follow even after Todd's comment. Doesn't
> > > the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> > > this?
> >
> > There is no "header" when running a new kernel on an old userspace,
> > right?  How about the other way around, old kernel, new userspace?
> 
> 1. new kernel + old userspace = kernel supports the feature but
> userspace does not use it. The old userspace won't even mount the new
> cgroup controller, accounting is not performed, charge is not
> transferred.
> 2. old kernel + new userspace = the new cgroup controller is not
> supported by the kernel, accounting is not performed, charge is not
> transferred.
> 3. old kernel + old userspace = same as #2
> 4. new kernel + new userspace = cgroup is mounted, feature is
> supported and used.
> Does that work or do we need a separate indication of whether binder
> driver supports the charge transfer feature?

Ok, if that's all working, this is fine, it just seemed odd to add a new
type like this.  Perhaps this can go into the changelog text...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-15  7:30             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-15  7:30 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Zefan Li, open list:DOCUMENTATION, David Airlie,
	DRI mailing list, Benjamin Gaignard, Kalesh Singh,
	Joel Fernandes, Sumit Semwal, Jonathan Corbet, Laura Abbott,
	linux-media, Todd Kjos, Thomas Zimmermann,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Tejun Heo,
	cgroups mailinglist, Martijn Coenen, T.J. Mercier,
	Christian Brauner, Kenny.Ho, LKML, Liam Mark,
	Christian König, Arve Hjønnevåg, Johannes Weiner,
	Hridya Valsaraju

On Mon, Feb 14, 2022 at 11:19:35PM -0800, Suren Baghdasaryan wrote:
> On Mon, Feb 14, 2022 at 11:01 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> > > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > > > --- a/include/uapi/linux/android/binder.h
> > > > > +++ b/include/uapi/linux/android/binder.h
> > > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > > >
> > > > >  enum {
> > > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > > >  };
> > > > >
> > > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > > --
> > > > > 2.35.1.265.g69c8d7142f-goog
> > > > >
> > > >
> > > > How does userspace know that binder supports this new flag?
> > >
> > > Sorry, I don't completely follow even after Todd's comment. Doesn't
> > > the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> > > this?
> >
> > There is no "header" when running a new kernel on an old userspace,
> > right?  How about the other way around, old kernel, new userspace?
> 
> 1. new kernel + old userspace = kernel supports the feature but
> userspace does not use it. The old userspace won't even mount the new
> cgroup controller, accounting is not performed, charge is not
> transferred.
> 2. old kernel + new userspace = the new cgroup controller is not
> supported by the kernel, accounting is not performed, charge is not
> transferred.
> 3. old kernel + old userspace = same as #2
> 4. new kernel + new userspace = cgroup is mounted, feature is
> supported and used.
> Does that work or do we need a separate indication of whether binder
> driver supports the charge transfer feature?

Ok, if that's all working, this is fine, it just seemed odd to add a new
type like this.  Perhaps this can go into the changelog text...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-02-15  7:30             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 63+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-15  7:30 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: T.J. Mercier, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Sumit Semwal, Christian König, Benjamin Gaignard, Liam Mark,
	Laura Abbott

On Mon, Feb 14, 2022 at 11:19:35PM -0800, Suren Baghdasaryan wrote:
> On Mon, Feb 14, 2022 at 11:01 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, Feb 14, 2022 at 02:25:47PM -0800, T.J. Mercier wrote:
> > > On Fri, Feb 11, 2022 at 11:19 PM Greg Kroah-Hartman
> > > > > --- a/include/uapi/linux/android/binder.h
> > > > > +++ b/include/uapi/linux/android/binder.h
> > > > > @@ -137,6 +137,7 @@ struct binder_buffer_object {
> > > > >
> > > > >  enum {
> > > > >       BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
> > > > > +     BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
> > > > >  };
> > > > >
> > > > >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> > > > > --
> > > > > 2.35.1.265.g69c8d7142f-goog
> > > > >
> > > >
> > > > How does userspace know that binder supports this new flag?
> > >
> > > Sorry, I don't completely follow even after Todd's comment. Doesn't
> > > the presence of BINDER_BUFFER_FLAG_SENDER_NO_NEED in the header do
> > > this?
> >
> > There is no "header" when running a new kernel on an old userspace,
> > right?  How about the other way around, old kernel, new userspace?
> 
> 1. new kernel + old userspace = kernel supports the feature but
> userspace does not use it. The old userspace won't even mount the new
> cgroup controller, accounting is not performed, charge is not
> transferred.
> 2. old kernel + new userspace = the new cgroup controller is not
> supported by the kernel, accounting is not performed, charge is not
> transferred.
> 3. old kernel + old userspace = same as #2
> 4. new kernel + new userspace = cgroup is mounted, feature is
> supported and used.
> Does that work or do we need a separate indication of whether binder
> driver supports the charge transfer feature?

Ok, if that's all working, this is fine, it just seemed odd to add a new
type like this.  Perhaps this can go into the changelog text...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 0/6] Proposal for a GPU cgroup controller
  2022-02-11 16:18 ` T.J. Mercier
  (?)
@ 2022-02-18 19:12   ` T.J. Mercier
  -1 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-18 19:12 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: Kalesh Singh, Kenny.Ho, dri-devel, linux-doc, linux-kernel,
	linux-media, linaro-mm-sig, cgroups

On Fri, Feb 11, 2022 at 8:18 AM T.J. Mercier <tjmercier@google.com> wrote:
>
> This patch series revisits the proposal for a GPU cgroup controller to
> track and limit memory allocations by various device/allocator
> subsystems. The patch series also contains a simple prototype to
> illustrate how Android intends to implement DMA-BUF allocator
> attribution using the GPU cgroup controller. The prototype does not
> include resource limit enforcements.
>
> Changelog:
>
> v2:
> See the previous revision of this change submitted by Hridya Valsaraju
> at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
>
> Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König. Pointers to struct gpucg and struct gpucg_device
> tracking the current associations were added to the dma_buf struct to
> achieve this.
>
> Fix incorrect Kconfig help section indentation per Randy Dunlap.
>
> History of the GPU cgroup controller
> ====================================
> The GPU/DRM cgroup controller came into being when a consensus[1]
> was reached that the resources it tracked were unsuitable to be integrated
> into memcg. Originally, the proposed controller was specific to the DRM
> subsystem and was intended to track GEM buffers and GPU-specific
> resources[2]. In order to help establish a unified memory accounting model
> for all GPU and all related subsystems, Daniel Vetter put forth a
> suggestion to move it out of the DRM subsystem so that it can be used by
> other DMA-BUF exporters as well[3]. This RFC proposes an interface that
> does the same.
>
> [1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
> [2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
> [3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
>
> T.J. Mercier (6):
>   gpu: rfc: Proposal for a GPU cgroup controller
>   cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
>     memory
>   dmabuf: Use the GPU cgroup charge/uncharge APIs
>   dmabuf: heaps: export system_heap buffers with GPU cgroup charging
>   dmabuf: Add gpu cgroup charge transfer function
>   android: binder: Add a buffer flag to relinquish ownership of fds
>
>  Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++
>  Documentation/gpu/rfc/index.rst      |   4 +
>  drivers/android/binder.c             |  26 +++
>  drivers/dma-buf/dma-buf.c            | 100 +++++++++
>  drivers/dma-buf/dma-heap.c           |  27 +++
>  drivers/dma-buf/heaps/system_heap.c  |   3 +
>  include/linux/cgroup_gpu.h           | 127 +++++++++++
>  include/linux/cgroup_subsys.h        |   4 +
>  include/linux/dma-buf.h              |  22 +-
>  include/linux/dma-heap.h             |  11 +
>  include/uapi/linux/android/binder.h  |   1 +
>  init/Kconfig                         |   7 +
>  kernel/cgroup/Makefile               |   1 +
>  kernel/cgroup/gpu.c                  | 304 +++++++++++++++++++++++++++
>  14 files changed, 830 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
>  create mode 100644 include/linux/cgroup_gpu.h
>  create mode 100644 kernel/cgroup/gpu.c
>
> --
> 2.35.1.265.g69c8d7142f-goog
>

Gentle nudge to GPU maintainers to please provide their feedback on
this RFC. Thanks!

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 0/6] Proposal for a GPU cgroup controller
@ 2022-02-18 19:12   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-18 19:12 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark, Laura Abbott, Brian Starkey,
	John Stultz, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: linux-doc, Kenny.Ho, linux-kernel, dri-devel, linaro-mm-sig,
	Kalesh Singh, cgroups, linux-media

On Fri, Feb 11, 2022 at 8:18 AM T.J. Mercier <tjmercier@google.com> wrote:
>
> This patch series revisits the proposal for a GPU cgroup controller to
> track and limit memory allocations by various device/allocator
> subsystems. The patch series also contains a simple prototype to
> illustrate how Android intends to implement DMA-BUF allocator
> attribution using the GPU cgroup controller. The prototype does not
> include resource limit enforcements.
>
> Changelog:
>
> v2:
> See the previous revision of this change submitted by Hridya Valsaraju
> at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
>
> Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König. Pointers to struct gpucg and struct gpucg_device
> tracking the current associations were added to the dma_buf struct to
> achieve this.
>
> Fix incorrect Kconfig help section indentation per Randy Dunlap.
>
> History of the GPU cgroup controller
> ====================================
> The GPU/DRM cgroup controller came into being when a consensus[1]
> was reached that the resources it tracked were unsuitable to be integrated
> into memcg. Originally, the proposed controller was specific to the DRM
> subsystem and was intended to track GEM buffers and GPU-specific
> resources[2]. In order to help establish a unified memory accounting model
> for all GPU and all related subsystems, Daniel Vetter put forth a
> suggestion to move it out of the DRM subsystem so that it can be used by
> other DMA-BUF exporters as well[3]. This RFC proposes an interface that
> does the same.
>
> [1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
> [2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
> [3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
>
> T.J. Mercier (6):
>   gpu: rfc: Proposal for a GPU cgroup controller
>   cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
>     memory
>   dmabuf: Use the GPU cgroup charge/uncharge APIs
>   dmabuf: heaps: export system_heap buffers with GPU cgroup charging
>   dmabuf: Add gpu cgroup charge transfer function
>   android: binder: Add a buffer flag to relinquish ownership of fds
>
>  Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++
>  Documentation/gpu/rfc/index.rst      |   4 +
>  drivers/android/binder.c             |  26 +++
>  drivers/dma-buf/dma-buf.c            | 100 +++++++++
>  drivers/dma-buf/dma-heap.c           |  27 +++
>  drivers/dma-buf/heaps/system_heap.c  |   3 +
>  include/linux/cgroup_gpu.h           | 127 +++++++++++
>  include/linux/cgroup_subsys.h        |   4 +
>  include/linux/dma-buf.h              |  22 +-
>  include/linux/dma-heap.h             |  11 +
>  include/uapi/linux/android/binder.h  |   1 +
>  init/Kconfig                         |   7 +
>  kernel/cgroup/Makefile               |   1 +
>  kernel/cgroup/gpu.c                  | 304 +++++++++++++++++++++++++++
>  14 files changed, 830 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
>  create mode 100644 include/linux/cgroup_gpu.h
>  create mode 100644 kernel/cgroup/gpu.c
>
> --
> 2.35.1.265.g69c8d7142f-goog
>

Gentle nudge to GPU maintainers to please provide their feedback on
this RFC. Thanks!

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC v2 0/6] Proposal for a GPU cgroup controller
@ 2022-02-18 19:12   ` T.J. Mercier
  0 siblings, 0 replies; 63+ messages in thread
From: T.J. Mercier @ 2022-02-18 19:12 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Benjamin Gaignard, Liam Mark
  Cc: Kalesh Singh, Kenny.Ho-5C7GfCeVMHo,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri, Feb 11, 2022 at 8:18 AM T.J. Mercier <tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> This patch series revisits the proposal for a GPU cgroup controller to
> track and limit memory allocations by various device/allocator
> subsystems. The patch series also contains a simple prototype to
> illustrate how Android intends to implement DMA-BUF allocator
> attribution using the GPU cgroup controller. The prototype does not
> include resource limit enforcements.
>
> Changelog:
>
> v2:
> See the previous revision of this change submitted by Hridya Valsaraju
> at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya-hpIqsD4AKldhl2p70BpVqQ@public.gmane.orgm/
>
> Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
> heap to a single dma-buf function for all heaps per Daniel Vetter and
> Christian König. Pointers to struct gpucg and struct gpucg_device
> tracking the current associations were added to the dma_buf struct to
> achieve this.
>
> Fix incorrect Kconfig help section indentation per Randy Dunlap.
>
> History of the GPU cgroup controller
> ====================================
> The GPU/DRM cgroup controller came into being when a consensus[1]
> was reached that the resources it tracked were unsuitable to be integrated
> into memcg. Originally, the proposed controller was specific to the DRM
> subsystem and was intended to track GEM buffers and GPU-specific
> resources[2]. In order to help establish a unified memory accounting model
> for all GPU and all related subsystems, Daniel Vetter put forth a
> suggestion to move it out of the DRM subsystem so that it can be used by
> other DMA-BUF exporters as well[3]. This RFC proposes an interface that
> does the same.
>
> [1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/#22624705
> [2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
> [3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei-dv86pmgwkMBes7Z6vYuT8fd9D2ou9A/h@public.gmane.orgl/
>
> T.J. Mercier (6):
>   gpu: rfc: Proposal for a GPU cgroup controller
>   cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
>     memory
>   dmabuf: Use the GPU cgroup charge/uncharge APIs
>   dmabuf: heaps: export system_heap buffers with GPU cgroup charging
>   dmabuf: Add gpu cgroup charge transfer function
>   android: binder: Add a buffer flag to relinquish ownership of fds
>
>  Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++
>  Documentation/gpu/rfc/index.rst      |   4 +
>  drivers/android/binder.c             |  26 +++
>  drivers/dma-buf/dma-buf.c            | 100 +++++++++
>  drivers/dma-buf/dma-heap.c           |  27 +++
>  drivers/dma-buf/heaps/system_heap.c  |   3 +
>  include/linux/cgroup_gpu.h           | 127 +++++++++++
>  include/linux/cgroup_subsys.h        |   4 +
>  include/linux/dma-buf.h              |  22 +-
>  include/linux/dma-heap.h             |  11 +
>  include/uapi/linux/android/binder.h  |   1 +
>  init/Kconfig                         |   7 +
>  kernel/cgroup/Makefile               |   1 +
>  kernel/cgroup/gpu.c                  | 304 +++++++++++++++++++++++++++
>  14 files changed, 830 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
>  create mode 100644 include/linux/cgroup_gpu.h
>  create mode 100644 kernel/cgroup/gpu.c
>
> --
> 2.35.1.265.g69c8d7142f-goog
>

Gentle nudge to GPU maintainers to please provide their feedback on
this RFC. Thanks!

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2022-02-18 19:13 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-11 16:18 [RFC v2 0/6] Proposal for a GPU cgroup controller T.J. Mercier
2022-02-11 16:18 ` T.J. Mercier
2022-02-11 16:18 ` T.J. Mercier
2022-02-11 16:18 ` [RFC v2 1/6] gpu: rfc: " T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18 ` [RFC v2 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18 ` [RFC v2 3/6] dmabuf: Use the GPU cgroup charge/uncharge APIs T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18 ` [RFC v2 4/6] dmabuf: heaps: export system_heap buffers with GPU cgroup charging T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18 ` [RFC v2 5/6] dmabuf: Add gpu cgroup charge transfer function T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18 ` [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-11 16:18   ` T.J. Mercier
2022-02-12  7:19   ` Greg Kroah-Hartman
2022-02-12  7:19     ` Greg Kroah-Hartman
2022-02-12  7:19     ` Greg Kroah-Hartman
2022-02-14 18:33     ` Todd Kjos
2022-02-14 18:33       ` Todd Kjos
2022-02-14 18:33       ` Todd Kjos
2022-02-14 19:29       ` Suren Baghdasaryan
2022-02-14 19:29         ` Suren Baghdasaryan
2022-02-14 19:29         ` Suren Baghdasaryan
2022-02-14 20:19         ` Todd Kjos
2022-02-14 20:19           ` Todd Kjos
2022-02-14 20:19           ` Todd Kjos
2022-02-14 20:37           ` John Stultz
2022-02-14 20:37             ` John Stultz
2022-02-14 20:37             ` John Stultz
2022-02-14 21:14             ` Hridya Valsaraju
2022-02-14 21:14               ` Hridya Valsaraju
2022-02-14 21:14               ` Hridya Valsaraju
2022-02-14 22:25     ` T.J. Mercier
2022-02-14 22:25       ` T.J. Mercier
2022-02-14 22:25       ` T.J. Mercier
2022-02-15  7:01       ` Greg Kroah-Hartman
2022-02-15  7:01         ` Greg Kroah-Hartman
2022-02-15  7:01         ` Greg Kroah-Hartman
2022-02-15  7:19         ` Suren Baghdasaryan
2022-02-15  7:19           ` Suren Baghdasaryan
2022-02-15  7:19           ` Suren Baghdasaryan
2022-02-15  7:30           ` Greg Kroah-Hartman
2022-02-15  7:30             ` Greg Kroah-Hartman
2022-02-15  7:30             ` Greg Kroah-Hartman
2022-02-14 21:25   ` Todd Kjos
2022-02-14 21:25     ` Todd Kjos
2022-02-14 21:25     ` Todd Kjos
2022-02-15  0:03     ` T.J. Mercier
2022-02-15  0:03       ` T.J. Mercier
2022-02-15  0:03       ` T.J. Mercier
2022-02-14 19:23 ` [RFC v2 0/6] Proposal for a GPU cgroup controller Tejun Heo
2022-02-14 19:23   ` Tejun Heo
2022-02-14 19:23   ` Tejun Heo
2022-02-18 19:12 ` T.J. Mercier
2022-02-18 19:12   ` T.J. Mercier
2022-02-18 19:12   ` T.J. Mercier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.