All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/6] Proposal for a GPU cgroup controller
@ 2022-01-15  1:05 ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:05 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie, Matthew Brost,
	Kenneth Graunke, Matthew Auld, Li Li, Marco Ballesio,
	Finn Behrens, Hang Lu, Wedson Almeida Filho, Masahiro Yamada,
	Andrew Morton, Nathan Chancellor, Kees Cook, Nick Desaulniers,
	Miguel Ojeda, Vipin Sharma, Chris Down, Daniel Borkmann,
	Vlastimil Babka, Arnd Bergmann, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.

History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific resources[2].
In order to help establish a unified memory accounting model for all GPU and
all related subsystems, Daniel Vetter put forth a suggestion to move it out of
the DRM subsystem so that it can be used by other DMA-BUF exporters as well[3].
This RFC proposes an interface that does the same.

[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/

Hridya Valsaraju (6):
  gpu: rfc: Proposal for a GPU cgroup controller
  cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
    memory
  dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs
  dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  dmabuf: system_heap: implement dma-buf op for GPU cgroup charge
    transfer
  android: binder: Add a buffer flag to relinquish ownership of fds

 Documentation/gpu/rfc/gpu-cgroup.rst | 192 +++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 drivers/android/binder.c             |  32 +++
 drivers/dma-buf/dma-heap.c           |  27 +++
 drivers/dma-buf/heaps/system_heap.c  |  68 ++++++
 include/linux/cgroup_gpu.h           | 120 +++++++++++
 include/linux/cgroup_subsys.h        |   4 +
 include/linux/dma-buf.h              |  18 ++
 include/linux/dma-heap.h             |  11 +
 include/uapi/linux/android/binder.h  |   1 +
 init/Kconfig                         |   7 +
 kernel/cgroup/Makefile               |   1 +
 kernel/cgroup/gpu.c                  | 305 +++++++++++++++++++++++++++
 13 files changed, 790 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC 0/6] Proposal for a GPU cgroup controller
@ 2022-01-15  1:05 ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:05 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie, Matthew Brost,
	Kenneth Graunke, Matthew Auld, Li Li, Marco Ballesio,
	Finn Behrens, Hang Lu, Wedson Almeida Filho, Masahiro Yamada,
	Andrew Morton, Nathan Chancellor, Kees Cook, Nick Desaulniers,
	Miguel Ojeda, Vipin Sharma, Chris Down, Daniel Borkmann,
	Vlastimil Babka, Arnd Bergmann, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.

History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific resources[2].
In order to help establish a unified memory accounting model for all GPU and
all related subsystems, Daniel Vetter put forth a suggestion to move it out of
the DRM subsystem so that it can be used by other DMA-BUF exporters as well[3].
This RFC proposes an interface that does the same.

[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/

Hridya Valsaraju (6):
  gpu: rfc: Proposal for a GPU cgroup controller
  cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
    memory
  dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs
  dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  dmabuf: system_heap: implement dma-buf op for GPU cgroup charge
    transfer
  android: binder: Add a buffer flag to relinquish ownership of fds

 Documentation/gpu/rfc/gpu-cgroup.rst | 192 +++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 drivers/android/binder.c             |  32 +++
 drivers/dma-buf/dma-heap.c           |  27 +++
 drivers/dma-buf/heaps/system_heap.c  |  68 ++++++
 include/linux/cgroup_gpu.h           | 120 +++++++++++
 include/linux/cgroup_subsys.h        |   4 +
 include/linux/dma-buf.h              |  18 ++
 include/linux/dma-heap.h             |  11 +
 include/uapi/linux/android/binder.h  |   1 +
 init/Kconfig                         |   7 +
 kernel/cgroup/Makefile               |   1 +
 kernel/cgroup/gpu.c                  | 305 +++++++++++++++++++++++++++
 13 files changed, 790 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC 0/6] Proposal for a GPU cgroup controller
@ 2022-01-15  1:05 ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:05 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.

History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific resources[2].
In order to help establish a unified memory accounting model for all GPU and
all related subsystems, Daniel Vetter put forth a suggestion to move it out of
the DRM subsystem so that it can be used by other DMA-BUF exporters as well[3].
This RFC proposes an interface that does the same.

[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/#22624705
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org/

Hridya Valsaraju (6):
  gpu: rfc: Proposal for a GPU cgroup controller
  cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
    memory
  dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs
  dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  dmabuf: system_heap: implement dma-buf op for GPU cgroup charge
    transfer
  android: binder: Add a buffer flag to relinquish ownership of fds

 Documentation/gpu/rfc/gpu-cgroup.rst | 192 +++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 drivers/android/binder.c             |  32 +++
 drivers/dma-buf/dma-heap.c           |  27 +++
 drivers/dma-buf/heaps/system_heap.c  |  68 ++++++
 include/linux/cgroup_gpu.h           | 120 +++++++++++
 include/linux/cgroup_subsys.h        |   4 +
 include/linux/dma-buf.h              |  18 ++
 include/linux/dma-heap.h             |  11 +
 include/uapi/linux/android/binder.h  |   1 +
 init/Kconfig                         |   7 +
 kernel/cgroup/Makefile               |   1 +
 kernel/cgroup/gpu.c                  | 305 +++++++++++++++++++++++++++
 13 files changed, 790 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC 1/6] gpu: rfc: Proposal for a GPU cgroup controller
  2022-01-15  1:05 ` Hridya Valsaraju
@ 2022-01-15  1:05   ` Hridya Valsaraju
  -1 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:05 UTC (permalink / raw)
  To: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie,
	Kenneth Graunke, Simon Ser, Jason Ekstrand, Matthew Auld,
	Matthew Brost, Li Li, Marco Ballesio, Finn Behrens, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Miguel Ojeda,
	Vipin Sharma, Chris Down, Daniel Borkmann, Vlastimil Babka,
	Arnd Bergmann, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.

The new cgroup controller would:
* Allow setting per-cgroup limits on the total size of buffers charged
  to it.
* Allow setting per-device limits on the total size of buffers
  allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
  cgroup.

The prototype in the following patches are only for memory accounting
using the GPU cgroup controller and does not implement limit setting.

[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---

Hi all,

Here is the RFC documentation for the GPU cgroup controller that we
talked about at LPC 2021 along with a prototype. I reached out to Tejun
with the idea recently and he mentioned that cgroup-aware BPF(by Kenny
Ho) or the new misc cgroup controller can also be considered as
alternatives to track GPU resources. I am sending the RFC to the list to
give everyone else a chance to chime in with their thoughts as well so
that we can reach an agreement on how to proceed. Thanks in advance!

Regards,
Hridya


 Documentation/gpu/rfc/gpu-cgroup.rst | 192 +++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 2 files changed, 196 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst

diff --git a/Documentation/gpu/rfc/gpu-cgroup.rst b/Documentation/gpu/rfc/gpu-cgroup.rst
new file mode 100644
index 000000000000..9bff23007b22
--- /dev/null
+++ b/Documentation/gpu/rfc/gpu-cgroup.rst
@@ -0,0 +1,192 @@
+===================================
+GPU cgroup controller
+===================================
+
+Goals
+=====
+This document intends to outline a plan to create a cgroup v2 controller subsystem
+for the per-cgroup accounting of device and system memory allocated by the GPU
+and related subsystems.
+
+The new cgroup controller would:
+
+* Allow setting per-cgroup limits on the total size of buffers charged to it.
+
+* Allow setting per-device limits on the total size of buffers allocated by a
+  device/allocator within a cgroup.
+
+* Expose a per-device/allocator breakdown of the buffers charged to a cgroup.
+
+Alternatives Considered
+=======================
+
+The following alternatives were considered:
+
+The memory cgroup controller
+____________________________
+
+1. As was noted in [1], memory accounting provided by the GPU cgroup
+controller is not a good fit for integration into memcg due to the
+differences in how accounting is performed. It implements a mechanism
+for the allocator attribution of GPU and GPU-related memory by
+charging each buffer to the cgroup of the process on behalf of which
+the memory was allocated. The buffer stays charged to the cgroup until
+it is freed regardless of whether the process retains any references
+to it. On the other hand, the memory cgroup controller offers a more
+fine-grained charging and uncharging behavior depending on the kind of
+page being accounted.
+
+2. Memcg performs accounting in units of pages. In the DMA-BUF buffer sharing model,
+a process takes a reference to the entire buffer(hence keeping it alive) even if
+it is only accessing parts of it. Therefore, per-page memory tracking for DMA-BUF
+memory accounting would only introduce additional overhead without any benefits.
+
+[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
+
+Userspace service to keep track of buffer allocations and releases
+__________________________________________________________________
+
+1. There is no way for a userspace service to intercept all allocations and releases.
+2. In case the process gets killed or restarted, we lose all accounting so far.
+
+UAPI
+====
+When enabled, the new cgroup controller would create the following files in every cgroup.
+
+::
+
+        gpu.memory.current (R)
+        gpu.memory.max (R/W)
+
+gpu.memory.current is a read-only file and would contain per-device memory allocations
+in a key-value format where key is a string representing the device name
+and the value is the size of memory charged to the device in the cgroup in bytes.
+
+For example:
+
+::
+
+        cat /sys/kernel/fs/cgroup1/gpu.memory.current
+        dev1 4194304
+        dev2 4194304
+
+The string key for each device is set by the device driver when the device registers
+with the GPU cgroup controller to participate in resource accounting(see section
+'Design and Implementation' for more details).
+
+gpu.memory.max is a read/write file. It would show the current total
+size limits on memory usage for the cgroup and the limits on total memory usage
+for each allocator/device.
+
+Setting a total limit for a cgroup can be done as follows:
+
+::
+
+        echo “total 41943040” > /sys/kernel/fs/cgroup1/gpu.memory.max
+
+Setting a total limit for a particular device/allocator can be done as follows:
+
+::
+
+        echo “dev1 4194304” >  /sys/kernel/fs/cgroup1/gpu.memory.max
+
+In this example, 'dev1' is the string key set by the device driver during
+registration.
+
+Design and Implementation
+=========================
+
+The cgroup controller would closely follow the design of the RDMA cgroup controller
+subsystem where each cgroup maintains a list of resource pools.
+Each resource pool contains a struct device and the counter to track current total,
+and the maximum limit set for the device.
+
+The below code block is a preliminary estimation on how the core kernel data structures
+and APIs would look like.
+
+.. code-block:: c
+
+        /**
+         * The GPU cgroup controller data structure.
+         */
+        struct gpucg {
+                struct cgroup_subsys_state css;
+                /* list of all resource pools that belong to this cgroup */
+                struct list_head rpools;
+        };
+
+        struct gpucg_device {
+                /*
+                 * list  of various resource pools in various cgroups that the device is
+                 * part of.
+                 */
+                struct list_head rpools;
+                /* list of all devices registered for GPU cgroup accounting */
+                struct list_head dev_node;
+                /* name to be used as identifier for accounting and limit setting */
+                const char *name;
+        };
+
+        struct gpucg_resource_pool {
+                /* The device whose resource usage is tracked by this resource pool */
+                struct gpucg_device *device;
+
+                /* list of all resource pools for the cgroup */
+                struct list_head cg_node;
+
+                /*
+                 * list maintained by the gpucg_device to keep track of its
+                 * resource pools
+                 */
+                struct list_head dev_node;
+
+                /* tracks memory usage of the resource pool */
+                struct page_counter total;
+        };
+
+        /**
+         * gpucg_register_device - Registers a device for memory accounting using the
+         * GPU cgroup controller.
+         *
+         * @device: The device to register for memory accounting. Must remain valid
+         * after registration.
+         * @name: Pointer to a string literal to denote the name of the device.
+         */
+        void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+
+        /**
+         * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to charge the memory to.
+         * @device: The device to charge the memory to.
+         * @usage: size of memory to charge in bytes.
+         *
+         * Return: returns 0 if the charging is successful and otherwise returns an
+         * error code.
+         */
+        int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+        /**
+         * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to uncharge the memory from.
+         * @device: The device to charge the memory from.
+         * @usage: size of memory to uncharge in bytes.
+         */
+        void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+Future Work
+===========
+Additional GPU resources can be supported by adding new controller files.
+
+Upstreaming Plan
+================
+* Decide on a UAPI that accommodates all use-cases for the upstream GPU ecosystem
+  as well as for Android.
+
+* Prototype the GPU cgroup controller and integrate its usage into the DMA-BUF
+  system heap.
+
+* Demonstrate its usage from userspace in the Android Open Space Project.
+
+* Send out RFCs to LKML for the GPU cgroup controller and iterate.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..0a9bcd94e95d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    gpu-cgroup.rst
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 1/6] gpu: rfc: Proposal for a GPU cgroup controller
@ 2022-01-15  1:05   ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:05 UTC (permalink / raw)
  To: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie,
	Kenneth Graunke, Simon Ser, Jason Ekstrand, Matthew Auld,
	Matthew Brost, Li Li, Marco Ballesio, Finn Behrens, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Miguel Ojeda,
	Vipin Sharma, Chris Down, Daniel Borkmann, Vlastimil Babka,
	Arnd Bergmann, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.

The new cgroup controller would:
* Allow setting per-cgroup limits on the total size of buffers charged
  to it.
* Allow setting per-device limits on the total size of buffers
  allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
  cgroup.

The prototype in the following patches are only for memory accounting
using the GPU cgroup controller and does not implement limit setting.

[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---

Hi all,

Here is the RFC documentation for the GPU cgroup controller that we
talked about at LPC 2021 along with a prototype. I reached out to Tejun
with the idea recently and he mentioned that cgroup-aware BPF(by Kenny
Ho) or the new misc cgroup controller can also be considered as
alternatives to track GPU resources. I am sending the RFC to the list to
give everyone else a chance to chime in with their thoughts as well so
that we can reach an agreement on how to proceed. Thanks in advance!

Regards,
Hridya


 Documentation/gpu/rfc/gpu-cgroup.rst | 192 +++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 2 files changed, 196 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst

diff --git a/Documentation/gpu/rfc/gpu-cgroup.rst b/Documentation/gpu/rfc/gpu-cgroup.rst
new file mode 100644
index 000000000000..9bff23007b22
--- /dev/null
+++ b/Documentation/gpu/rfc/gpu-cgroup.rst
@@ -0,0 +1,192 @@
+===================================
+GPU cgroup controller
+===================================
+
+Goals
+=====
+This document intends to outline a plan to create a cgroup v2 controller subsystem
+for the per-cgroup accounting of device and system memory allocated by the GPU
+and related subsystems.
+
+The new cgroup controller would:
+
+* Allow setting per-cgroup limits on the total size of buffers charged to it.
+
+* Allow setting per-device limits on the total size of buffers allocated by a
+  device/allocator within a cgroup.
+
+* Expose a per-device/allocator breakdown of the buffers charged to a cgroup.
+
+Alternatives Considered
+=======================
+
+The following alternatives were considered:
+
+The memory cgroup controller
+____________________________
+
+1. As was noted in [1], memory accounting provided by the GPU cgroup
+controller is not a good fit for integration into memcg due to the
+differences in how accounting is performed. It implements a mechanism
+for the allocator attribution of GPU and GPU-related memory by
+charging each buffer to the cgroup of the process on behalf of which
+the memory was allocated. The buffer stays charged to the cgroup until
+it is freed regardless of whether the process retains any references
+to it. On the other hand, the memory cgroup controller offers a more
+fine-grained charging and uncharging behavior depending on the kind of
+page being accounted.
+
+2. Memcg performs accounting in units of pages. In the DMA-BUF buffer sharing model,
+a process takes a reference to the entire buffer(hence keeping it alive) even if
+it is only accessing parts of it. Therefore, per-page memory tracking for DMA-BUF
+memory accounting would only introduce additional overhead without any benefits.
+
+[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705
+
+Userspace service to keep track of buffer allocations and releases
+__________________________________________________________________
+
+1. There is no way for a userspace service to intercept all allocations and releases.
+2. In case the process gets killed or restarted, we lose all accounting so far.
+
+UAPI
+====
+When enabled, the new cgroup controller would create the following files in every cgroup.
+
+::
+
+        gpu.memory.current (R)
+        gpu.memory.max (R/W)
+
+gpu.memory.current is a read-only file and would contain per-device memory allocations
+in a key-value format where key is a string representing the device name
+and the value is the size of memory charged to the device in the cgroup in bytes.
+
+For example:
+
+::
+
+        cat /sys/kernel/fs/cgroup1/gpu.memory.current
+        dev1 4194304
+        dev2 4194304
+
+The string key for each device is set by the device driver when the device registers
+with the GPU cgroup controller to participate in resource accounting(see section
+'Design and Implementation' for more details).
+
+gpu.memory.max is a read/write file. It would show the current total
+size limits on memory usage for the cgroup and the limits on total memory usage
+for each allocator/device.
+
+Setting a total limit for a cgroup can be done as follows:
+
+::
+
+        echo “total 41943040” > /sys/kernel/fs/cgroup1/gpu.memory.max
+
+Setting a total limit for a particular device/allocator can be done as follows:
+
+::
+
+        echo “dev1 4194304” >  /sys/kernel/fs/cgroup1/gpu.memory.max
+
+In this example, 'dev1' is the string key set by the device driver during
+registration.
+
+Design and Implementation
+=========================
+
+The cgroup controller would closely follow the design of the RDMA cgroup controller
+subsystem where each cgroup maintains a list of resource pools.
+Each resource pool contains a struct device and the counter to track current total,
+and the maximum limit set for the device.
+
+The below code block is a preliminary estimation on how the core kernel data structures
+and APIs would look like.
+
+.. code-block:: c
+
+        /**
+         * The GPU cgroup controller data structure.
+         */
+        struct gpucg {
+                struct cgroup_subsys_state css;
+                /* list of all resource pools that belong to this cgroup */
+                struct list_head rpools;
+        };
+
+        struct gpucg_device {
+                /*
+                 * list  of various resource pools in various cgroups that the device is
+                 * part of.
+                 */
+                struct list_head rpools;
+                /* list of all devices registered for GPU cgroup accounting */
+                struct list_head dev_node;
+                /* name to be used as identifier for accounting and limit setting */
+                const char *name;
+        };
+
+        struct gpucg_resource_pool {
+                /* The device whose resource usage is tracked by this resource pool */
+                struct gpucg_device *device;
+
+                /* list of all resource pools for the cgroup */
+                struct list_head cg_node;
+
+                /*
+                 * list maintained by the gpucg_device to keep track of its
+                 * resource pools
+                 */
+                struct list_head dev_node;
+
+                /* tracks memory usage of the resource pool */
+                struct page_counter total;
+        };
+
+        /**
+         * gpucg_register_device - Registers a device for memory accounting using the
+         * GPU cgroup controller.
+         *
+         * @device: The device to register for memory accounting. Must remain valid
+         * after registration.
+         * @name: Pointer to a string literal to denote the name of the device.
+         */
+        void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+
+        /**
+         * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to charge the memory to.
+         * @device: The device to charge the memory to.
+         * @usage: size of memory to charge in bytes.
+         *
+         * Return: returns 0 if the charging is successful and otherwise returns an
+         * error code.
+         */
+        int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+        /**
+         * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to uncharge the memory from.
+         * @device: The device to charge the memory from.
+         * @usage: size of memory to uncharge in bytes.
+         */
+        void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+Future Work
+===========
+Additional GPU resources can be supported by adding new controller files.
+
+Upstreaming Plan
+================
+* Decide on a UAPI that accommodates all use-cases for the upstream GPU ecosystem
+  as well as for Android.
+
+* Prototype the GPU cgroup controller and integrate its usage into the DMA-BUF
+  system heap.
+
+* Demonstrate its usage from userspace in the Android Open Space Project.
+
+* Send out RFCs to LKML for the GPU cgroup controller and iterate.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..0a9bcd94e95d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    gpu-cgroup.rst
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 1/6] gpu: rfc: Proposal for a GPU cgroup controller
       [not found] ` <20220115010622.3185921-1-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2022-01-15  1:05   ` Hridya Valsaraju
  2022-01-15  1:06   ` [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory Hridya Valsaraju
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:05 UTC (permalink / raw)
  To: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

This patch adds a proposal for a new GPU cgroup controller for
accounting/limiting GPU and GPU-related memory allocations.
The proposed controller is based on the DRM cgroup controller[1] and
follows the design of the RDMA cgroup controller.

The new cgroup controller would:
* Allow setting per-cgroup limits on the total size of buffers charged
  to it.
* Allow setting per-device limits on the total size of buffers
  allocated by device within a cgroup.
* Expose a per-device/allocator breakdown of the buffers charged to a
  cgroup.

The prototype in the following patches are only for memory accounting
using the GPU cgroup controller and does not implement limit setting.

[1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/

Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---

Hi all,

Here is the RFC documentation for the GPU cgroup controller that we
talked about at LPC 2021 along with a prototype. I reached out to Tejun
with the idea recently and he mentioned that cgroup-aware BPF(by Kenny
Ho) or the new misc cgroup controller can also be considered as
alternatives to track GPU resources. I am sending the RFC to the list to
give everyone else a chance to chime in with their thoughts as well so
that we can reach an agreement on how to proceed. Thanks in advance!

Regards,
Hridya


 Documentation/gpu/rfc/gpu-cgroup.rst | 192 +++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst      |   4 +
 2 files changed, 196 insertions(+)
 create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst

diff --git a/Documentation/gpu/rfc/gpu-cgroup.rst b/Documentation/gpu/rfc/gpu-cgroup.rst
new file mode 100644
index 000000000000..9bff23007b22
--- /dev/null
+++ b/Documentation/gpu/rfc/gpu-cgroup.rst
@@ -0,0 +1,192 @@
+===================================
+GPU cgroup controller
+===================================
+
+Goals
+=====
+This document intends to outline a plan to create a cgroup v2 controller subsystem
+for the per-cgroup accounting of device and system memory allocated by the GPU
+and related subsystems.
+
+The new cgroup controller would:
+
+* Allow setting per-cgroup limits on the total size of buffers charged to it.
+
+* Allow setting per-device limits on the total size of buffers allocated by a
+  device/allocator within a cgroup.
+
+* Expose a per-device/allocator breakdown of the buffers charged to a cgroup.
+
+Alternatives Considered
+=======================
+
+The following alternatives were considered:
+
+The memory cgroup controller
+____________________________
+
+1. As was noted in [1], memory accounting provided by the GPU cgroup
+controller is not a good fit for integration into memcg due to the
+differences in how accounting is performed. It implements a mechanism
+for the allocator attribution of GPU and GPU-related memory by
+charging each buffer to the cgroup of the process on behalf of which
+the memory was allocated. The buffer stays charged to the cgroup until
+it is freed regardless of whether the process retains any references
+to it. On the other hand, the memory cgroup controller offers a more
+fine-grained charging and uncharging behavior depending on the kind of
+page being accounted.
+
+2. Memcg performs accounting in units of pages. In the DMA-BUF buffer sharing model,
+a process takes a reference to the entire buffer(hence keeping it alive) even if
+it is only accessing parts of it. Therefore, per-page memory tracking for DMA-BUF
+memory accounting would only introduce additional overhead without any benefits.
+
+[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org/#22624705
+
+Userspace service to keep track of buffer allocations and releases
+__________________________________________________________________
+
+1. There is no way for a userspace service to intercept all allocations and releases.
+2. In case the process gets killed or restarted, we lose all accounting so far.
+
+UAPI
+====
+When enabled, the new cgroup controller would create the following files in every cgroup.
+
+::
+
+        gpu.memory.current (R)
+        gpu.memory.max (R/W)
+
+gpu.memory.current is a read-only file and would contain per-device memory allocations
+in a key-value format where key is a string representing the device name
+and the value is the size of memory charged to the device in the cgroup in bytes.
+
+For example:
+
+::
+
+        cat /sys/kernel/fs/cgroup1/gpu.memory.current
+        dev1 4194304
+        dev2 4194304
+
+The string key for each device is set by the device driver when the device registers
+with the GPU cgroup controller to participate in resource accounting(see section
+'Design and Implementation' for more details).
+
+gpu.memory.max is a read/write file. It would show the current total
+size limits on memory usage for the cgroup and the limits on total memory usage
+for each allocator/device.
+
+Setting a total limit for a cgroup can be done as follows:
+
+::
+
+        echo “total 41943040” > /sys/kernel/fs/cgroup1/gpu.memory.max
+
+Setting a total limit for a particular device/allocator can be done as follows:
+
+::
+
+        echo “dev1 4194304” >  /sys/kernel/fs/cgroup1/gpu.memory.max
+
+In this example, 'dev1' is the string key set by the device driver during
+registration.
+
+Design and Implementation
+=========================
+
+The cgroup controller would closely follow the design of the RDMA cgroup controller
+subsystem where each cgroup maintains a list of resource pools.
+Each resource pool contains a struct device and the counter to track current total,
+and the maximum limit set for the device.
+
+The below code block is a preliminary estimation on how the core kernel data structures
+and APIs would look like.
+
+.. code-block:: c
+
+        /**
+         * The GPU cgroup controller data structure.
+         */
+        struct gpucg {
+                struct cgroup_subsys_state css;
+                /* list of all resource pools that belong to this cgroup */
+                struct list_head rpools;
+        };
+
+        struct gpucg_device {
+                /*
+                 * list  of various resource pools in various cgroups that the device is
+                 * part of.
+                 */
+                struct list_head rpools;
+                /* list of all devices registered for GPU cgroup accounting */
+                struct list_head dev_node;
+                /* name to be used as identifier for accounting and limit setting */
+                const char *name;
+        };
+
+        struct gpucg_resource_pool {
+                /* The device whose resource usage is tracked by this resource pool */
+                struct gpucg_device *device;
+
+                /* list of all resource pools for the cgroup */
+                struct list_head cg_node;
+
+                /*
+                 * list maintained by the gpucg_device to keep track of its
+                 * resource pools
+                 */
+                struct list_head dev_node;
+
+                /* tracks memory usage of the resource pool */
+                struct page_counter total;
+        };
+
+        /**
+         * gpucg_register_device - Registers a device for memory accounting using the
+         * GPU cgroup controller.
+         *
+         * @device: The device to register for memory accounting. Must remain valid
+         * after registration.
+         * @name: Pointer to a string literal to denote the name of the device.
+         */
+        void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+
+        /**
+         * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to charge the memory to.
+         * @device: The device to charge the memory to.
+         * @usage: size of memory to charge in bytes.
+         *
+         * Return: returns 0 if the charging is successful and otherwise returns an
+         * error code.
+         */
+        int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+        /**
+         * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+         *
+         * @gpucg: The gpu cgroup to uncharge the memory from.
+         * @device: The device to charge the memory from.
+         * @usage: size of memory to uncharge in bytes.
+         */
+        void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+
+Future Work
+===========
+Additional GPU resources can be supported by adding new controller files.
+
+Upstreaming Plan
+================
+* Decide on a UAPI that accommodates all use-cases for the upstream GPU ecosystem
+  as well as for Android.
+
+* Prototype the GPU cgroup controller and integrate its usage into the DMA-BUF
+  system heap.
+
+* Demonstrate its usage from userspace in the Android Open Space Project.
+
+* Send out RFCs to LKML for the GPU cgroup controller and iterate.
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..0a9bcd94e95d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -23,3 +23,7 @@ host such documentation:
 .. toctree::
 
     i915_scheduler.rst
+
+.. toctree::
+
+    gpu-cgroup.rst
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-01-15  1:05 ` Hridya Valsaraju
@ 2022-01-15  1:06   ` Hridya Valsaraju
  -1 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie, Rodrigo Vivi,
	Matthew Auld, Matthew Brost, Li Li, Marco Ballesio, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Miguel Ojeda,
	Vipin Sharma, Chris Down, Daniel Borkmann, Vlastimil Babka,
	Arnd Bergmann, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

The cgroup controller provides accounting for GPU and GPU-related
memory allocations. The memory being accounted can be device memory or
memory allocated from pools dedicated to serve GPU-related tasks.

This patch adds APIs to:
-allow a device to register for memory accounting using the GPU cgroup
controller.
-charge and uncharge allocated memory to a cgroup.

When the cgroup controller is enabled, it would expose information about
the memory allocated by each device(registered for GPU cgroup memory
accounting) for each cgroup.

The API/UAPI can be extended to set per-device/total allocation limits
in the future.

The cgroup controller has been named following the discussion in [1].

[1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv@phenom.ffwll.local/

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 include/linux/cgroup_gpu.h    | 120 +++++++++++++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig                  |   7 +
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/gpu.c           | 305 ++++++++++++++++++++++++++++++++++
 5 files changed, 437 insertions(+)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
new file mode 100644
index 000000000000..0ac303ce6179
--- /dev/null
+++ b/include/linux/cgroup_gpu.h
@@ -0,0 +1,120 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * Copyright (C) 2022 Google LLC.
+ */
+#ifndef _CGROUP_GPU_H
+#define _CGROUP_GPU_H
+
+#include <linux/cgroup.h>
+#include <linux/page_counter.h>
+
+#ifdef CONFIG_CGROUP_GPU
+ /* The GPU cgroup controller data structure */
+struct gpucg {
+	struct cgroup_subsys_state css;
+	/* list of all resource pools that belong to this cgroup */
+	struct list_head rpools;
+};
+
+struct gpucg_device {
+	/*
+	 * list  of various resource pool in various cgroups that the device is
+	 * part of.
+	 */
+	struct list_head rpools;
+	/* list of all devices registered for GPU cgroup accounting */
+	struct list_head dev_node;
+	/*
+	 * pointer to string literal to be used as identifier for accounting and
+	 * limit setting
+	 */
+	const char *name;
+};
+
+/**
+ * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Returns: gpu cgroup that contains the @css
+ */
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct gpucg, css) : NULL;
+}
+
+/**
+ * gpucg_get - get the gpucg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increases the reference count of the css that the @task belongs to.
+ *
+ * Returns: reference to the gpu cgroup the task belongs to.
+ */
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	if (!cgroup_subsys_enabled(gpu_cgrp_subsys))
+		return NULL;
+	return css_to_gpucg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * gpucg_put - put a gpucg reference
+ * @gpucg: the target gpucg
+ *
+ * Put a reference obtained via gpucg_get
+ */
+static inline void gpucg_put(struct gpucg *gpucg)
+{
+	if (gpucg)
+		css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_parent - find the parent of a gpu cgroup
+ * @cg: the target gpucg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Returns: parent gpu cgroup of @cg
+ */
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return css_to_gpucg(cg->css.parent);
+}
+
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+#else /* CONFIG_CGROUP_GPU */
+
+struct gpucg;
+struct gpucg_device;
+
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void gpucg_put(struct gpucg *gpucg) {}
+
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return NULL;
+}
+static inline int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device,
+				   u64 usage)
+{
+	return 0;
+}
+
+static inline void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device,
+				  u64 usage) {}
+static inline void gpucg_register_device(struct gpucg_device *gpucg_dev,
+					 const char *name) {}
+#endif	/* CONFIG_CGROUP_GPU */
+#endif	/* _CGROUP_GPU_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..46a2a7b93c41 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_GPU)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index cd23faa163d1..408910b21387 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -990,6 +990,13 @@ config BLK_CGROUP
 
 	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
+config CGROUP_GPU
+       bool "gpu cgroup controller (EXPERIMENTAL)"
+       select PAGE_COUNTER
+       help
+	Provides accounting and limit setting for memory allocations by the GPU
+	and GPU-related subsystems.
+
 config CGROUP_WRITEBACK
 	bool
 	depends on MEMCG && BLK_CGROUP
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..be95a5a532fc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
+obj-$(CONFIG_CGROUP_GPU) += gpu.o
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
new file mode 100644
index 000000000000..b171fae06b0d
--- /dev/null
+++ b/kernel/cgroup/gpu.c
@@ -0,0 +1,305 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+// Copyright (C) 2022 Google LLC.
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_gpu.h>
+#include <linux/page_counter.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+static struct gpucg *root_gpucg __read_mostly;
+
+/*
+ * Protects list of resource pools maintained on per cgroup basis
+ * and list of devices registered for memory accounting using the GPU cgroup
+ * controller.
+ */
+static DEFINE_MUTEX(gpucg_mutex);
+static LIST_HEAD(gpucg_devices);
+
+struct gpucg_resource_pool {
+	 /* The device whose resource usage is tracked by this resource pool */
+	struct gpucg_device *device;
+
+	/* list of all resource pools for the cgroup */
+	struct list_head cg_node;
+
+	/*
+	 * list maintained by the gpucg_device to keep track of its
+	 * resource pools
+	 */
+	struct list_head dev_node;
+
+	/* tracks memory usage of the resource pool */
+	struct page_counter total;
+};
+
+static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool)
+{
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_del(&rpool->cg_node);
+	list_del(&rpool->dev_node);
+	kfree(rpool);
+}
+
+static void gpucg_css_free(struct cgroup_subsys_state *css)
+{
+	struct gpucg_resource_pool *rpool, *tmp;
+	struct gpucg *gpucg = css_to_gpucg(css);
+
+	// delete all resource pools
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&gpucg_mutex);
+
+	kfree(gpucg);
+}
+
+static struct cgroup_subsys_state *
+gpucg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct gpucg *gpucg, *parent;
+
+	gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL);
+	if (!gpucg)
+		return ERR_PTR(-ENOMEM);
+
+	parent = css_to_gpucg(parent_css);
+	if (!parent)
+		root_gpucg = gpucg;
+
+	INIT_LIST_HEAD(&gpucg->rpools);
+
+	return &gpucg->css;
+}
+
+static struct gpucg_resource_pool *find_cg_rpool_locked(struct gpucg *cg,
+							struct gpucg_device *device)
+
+{
+	struct gpucg_resource_pool *pool;
+
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_for_each_entry(pool, &cg->rpools, cg_node)
+		if (pool->device == device)
+			return pool;
+
+	return NULL;
+}
+
+static struct gpucg_resource_pool *init_cg_rpool(struct gpucg *cg,
+						 struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool),
+						    GFP_KERNEL);
+	if (!rpool)
+		return ERR_PTR(-ENOMEM);
+
+	rpool->device = device;
+
+	page_counter_init(&rpool->total, NULL);
+	INIT_LIST_HEAD(&rpool->cg_node);
+	INIT_LIST_HEAD(&rpool->dev_node);
+	list_add_tail(&rpool->cg_node, &cg->rpools);
+	list_add_tail(&rpool->dev_node, &device->rpools);
+
+	return rpool;
+}
+
+/**
+ * get_cg_rpool_locked - find the resource pool for the specified device and
+ * specified cgroup. If the resource pool does not exist for the cg, it is created
+ * in a hierarchical manner in the cgroup and its ancestor cgroups who do not
+ * already have a resource pool entry for the device.
+ *
+ * @cg: The cgroup to find the resource pool for.
+ * @device: The device associated with the returned resource pool.
+ *
+ * Return: return resource pool entry corresponding to the specified device in
+ * the specified cgroup (hierarchically creating them if not existing already).
+ *
+ */
+static struct gpucg_resource_pool *
+get_cg_rpool_locked(struct gpucg *cg, struct gpucg_device *device)
+{
+	struct gpucg *parent_cg, *p, *stop_cg;
+	struct gpucg_resource_pool *rpool, *tmp_rpool;
+	struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL;
+
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool)
+		return rpool;
+
+	stop_cg = cg;
+	do {
+		rpool = init_cg_rpool(stop_cg, device);
+		if (IS_ERR(rpool))
+			goto err;
+
+		if (!leaf_rpool)
+			leaf_rpool = rpool;
+
+		stop_cg = gpucg_parent(stop_cg);
+		if (!stop_cg)
+			break;
+
+		rpool = find_cg_rpool_locked(stop_cg, device);
+	} while (!rpool);
+
+	/*
+	 * Re-initialize page counters of all rpools created in this invocation to
+	 * enable hierarchical charging.
+	 * stop_cg is the first ancestor cg who already had a resource pool for
+	 * the device. It can also be NULL if no ancestors had a pre-existing
+	 * resource pool for the device before this invocation.
+	 */
+	rpool = leaf_rpool;
+	for (p = cg; p != stop_cg; p = parent_cg) {
+		parent_cg = gpucg_parent(p);
+		if (!parent_cg)
+			break;
+		parent_rpool = find_cg_rpool_locked(parent_cg, device);
+		page_counter_init(&rpool->total, &parent_rpool->total);
+
+		rpool = parent_rpool;
+	}
+
+	return leaf_rpool;
+err:
+	for (p = cg; p != stop_cg; p = gpucg_parent(p)) {
+		tmp_rpool = find_cg_rpool_locked(p, device);
+		free_cg_rpool_locked(tmp_rpool);
+	}
+	return rpool;
+}
+
+/**
+ * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+ * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size
+ * of the memory is rounded up to be a multiple of the page size.
+ *
+ * @gpucg: The gpu cgroup to charge the memory to.
+ * @device: The device to charge the memory to.
+ * @usage: size of memory to charge in bytes.
+ *
+ * Return: returns 0 if the charging is successful and otherwise returns an
+ * error code.
+ */
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+	int ret = 0;
+
+	mutex_lock(&gpucg_mutex);
+	rp = get_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and the caller is holding a reference to the gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (IS_ERR(rp))
+		return PTR_ERR(rp);
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	if (page_counter_try_charge(&rp->total, nr_pages,
+				    &counter))
+		css_get_many(&gpucg->css, nr_pages);
+	else
+		ret = -ENOMEM;
+
+	return ret;
+}
+
+/**
+ * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+ * The caller must hold a reference to @gpucg obtained through gpucg_get().
+ *
+ * @gpucg: The gpu cgroup to uncharge the memory from.
+ * @device: The device to uncharge the memory from.
+ * @usage: size of memory to uncharge in bytes.
+ */
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device,
+		   u64 usage)
+{
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+
+	mutex_lock(&gpucg_mutex);
+	rp = find_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and there are active refs on gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (unlikely(!rp)) {
+		pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n");
+		return;
+	}
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	page_counter_uncharge(&rp->total, nr_pages);
+	css_put_many(&gpucg->css, nr_pages);
+}
+
+/**
+ * gpucg_register_device - Registers a device for memory accounting using the
+ * GPU cgroup controller.
+ *
+ * @device: The device to register for memory accounting.
+ * @name: Pointer to a string literal to denote the name of the device.
+ *
+ * Both @device andd @name must remain valid.
+ */
+void gpucg_register_device(struct gpucg_device *device, const char *name)
+{
+	if (!device)
+		return;
+
+	INIT_LIST_HEAD(&device->dev_node);
+	INIT_LIST_HEAD(&device->rpools);
+
+	mutex_lock(&gpucg_mutex);
+	list_add_tail(&device->dev_node, &gpucg_devices);
+	mutex_unlock(&gpucg_mutex);
+
+	device->name = name;
+}
+
+static int gpucg_resource_show(struct seq_file *sf, void *v)
+{
+	struct gpucg_resource_pool *rpool;
+	struct gpucg *cg = css_to_gpucg(seq_css(sf));
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(rpool, &cg->rpools, cg_node) {
+		seq_printf(sf, "%s %lu\n", rpool->device->name,
+			   page_counter_read(&rpool->total) * PAGE_SIZE);
+	}
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+struct cftype files[] = {
+	{
+		.name = "memory.current",
+		.seq_show = gpucg_resource_show,
+	},
+	{ }	/* terminate */
+};
+
+struct cgroup_subsys gpu_cgrp_subsys = {
+	.css_alloc	= gpucg_css_alloc,
+	.css_free	= gpucg_css_free,
+	.early_init	= false,
+	.legacy_cftypes	= files,
+	.dfl_cftypes	= files,
+};
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
@ 2022-01-15  1:06   ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie, Rodrigo Vivi,
	Matthew Auld, Matthew Brost, Li Li, Marco Ballesio, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Miguel Ojeda,
	Vipin Sharma, Chris Down, Daniel Borkmann, Vlastimil Babka,
	Arnd Bergmann, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

The cgroup controller provides accounting for GPU and GPU-related
memory allocations. The memory being accounted can be device memory or
memory allocated from pools dedicated to serve GPU-related tasks.

This patch adds APIs to:
-allow a device to register for memory accounting using the GPU cgroup
controller.
-charge and uncharge allocated memory to a cgroup.

When the cgroup controller is enabled, it would expose information about
the memory allocated by each device(registered for GPU cgroup memory
accounting) for each cgroup.

The API/UAPI can be extended to set per-device/total allocation limits
in the future.

The cgroup controller has been named following the discussion in [1].

[1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv@phenom.ffwll.local/

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 include/linux/cgroup_gpu.h    | 120 +++++++++++++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig                  |   7 +
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/gpu.c           | 305 ++++++++++++++++++++++++++++++++++
 5 files changed, 437 insertions(+)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
new file mode 100644
index 000000000000..0ac303ce6179
--- /dev/null
+++ b/include/linux/cgroup_gpu.h
@@ -0,0 +1,120 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * Copyright (C) 2022 Google LLC.
+ */
+#ifndef _CGROUP_GPU_H
+#define _CGROUP_GPU_H
+
+#include <linux/cgroup.h>
+#include <linux/page_counter.h>
+
+#ifdef CONFIG_CGROUP_GPU
+ /* The GPU cgroup controller data structure */
+struct gpucg {
+	struct cgroup_subsys_state css;
+	/* list of all resource pools that belong to this cgroup */
+	struct list_head rpools;
+};
+
+struct gpucg_device {
+	/*
+	 * list  of various resource pool in various cgroups that the device is
+	 * part of.
+	 */
+	struct list_head rpools;
+	/* list of all devices registered for GPU cgroup accounting */
+	struct list_head dev_node;
+	/*
+	 * pointer to string literal to be used as identifier for accounting and
+	 * limit setting
+	 */
+	const char *name;
+};
+
+/**
+ * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Returns: gpu cgroup that contains the @css
+ */
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct gpucg, css) : NULL;
+}
+
+/**
+ * gpucg_get - get the gpucg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increases the reference count of the css that the @task belongs to.
+ *
+ * Returns: reference to the gpu cgroup the task belongs to.
+ */
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	if (!cgroup_subsys_enabled(gpu_cgrp_subsys))
+		return NULL;
+	return css_to_gpucg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * gpucg_put - put a gpucg reference
+ * @gpucg: the target gpucg
+ *
+ * Put a reference obtained via gpucg_get
+ */
+static inline void gpucg_put(struct gpucg *gpucg)
+{
+	if (gpucg)
+		css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_parent - find the parent of a gpu cgroup
+ * @cg: the target gpucg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Returns: parent gpu cgroup of @cg
+ */
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return css_to_gpucg(cg->css.parent);
+}
+
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+#else /* CONFIG_CGROUP_GPU */
+
+struct gpucg;
+struct gpucg_device;
+
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void gpucg_put(struct gpucg *gpucg) {}
+
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return NULL;
+}
+static inline int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device,
+				   u64 usage)
+{
+	return 0;
+}
+
+static inline void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device,
+				  u64 usage) {}
+static inline void gpucg_register_device(struct gpucg_device *gpucg_dev,
+					 const char *name) {}
+#endif	/* CONFIG_CGROUP_GPU */
+#endif	/* _CGROUP_GPU_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..46a2a7b93c41 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_GPU)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index cd23faa163d1..408910b21387 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -990,6 +990,13 @@ config BLK_CGROUP
 
 	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
+config CGROUP_GPU
+       bool "gpu cgroup controller (EXPERIMENTAL)"
+       select PAGE_COUNTER
+       help
+	Provides accounting and limit setting for memory allocations by the GPU
+	and GPU-related subsystems.
+
 config CGROUP_WRITEBACK
 	bool
 	depends on MEMCG && BLK_CGROUP
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..be95a5a532fc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
+obj-$(CONFIG_CGROUP_GPU) += gpu.o
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
new file mode 100644
index 000000000000..b171fae06b0d
--- /dev/null
+++ b/kernel/cgroup/gpu.c
@@ -0,0 +1,305 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+// Copyright (C) 2022 Google LLC.
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_gpu.h>
+#include <linux/page_counter.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+static struct gpucg *root_gpucg __read_mostly;
+
+/*
+ * Protects list of resource pools maintained on per cgroup basis
+ * and list of devices registered for memory accounting using the GPU cgroup
+ * controller.
+ */
+static DEFINE_MUTEX(gpucg_mutex);
+static LIST_HEAD(gpucg_devices);
+
+struct gpucg_resource_pool {
+	 /* The device whose resource usage is tracked by this resource pool */
+	struct gpucg_device *device;
+
+	/* list of all resource pools for the cgroup */
+	struct list_head cg_node;
+
+	/*
+	 * list maintained by the gpucg_device to keep track of its
+	 * resource pools
+	 */
+	struct list_head dev_node;
+
+	/* tracks memory usage of the resource pool */
+	struct page_counter total;
+};
+
+static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool)
+{
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_del(&rpool->cg_node);
+	list_del(&rpool->dev_node);
+	kfree(rpool);
+}
+
+static void gpucg_css_free(struct cgroup_subsys_state *css)
+{
+	struct gpucg_resource_pool *rpool, *tmp;
+	struct gpucg *gpucg = css_to_gpucg(css);
+
+	// delete all resource pools
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&gpucg_mutex);
+
+	kfree(gpucg);
+}
+
+static struct cgroup_subsys_state *
+gpucg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct gpucg *gpucg, *parent;
+
+	gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL);
+	if (!gpucg)
+		return ERR_PTR(-ENOMEM);
+
+	parent = css_to_gpucg(parent_css);
+	if (!parent)
+		root_gpucg = gpucg;
+
+	INIT_LIST_HEAD(&gpucg->rpools);
+
+	return &gpucg->css;
+}
+
+static struct gpucg_resource_pool *find_cg_rpool_locked(struct gpucg *cg,
+							struct gpucg_device *device)
+
+{
+	struct gpucg_resource_pool *pool;
+
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_for_each_entry(pool, &cg->rpools, cg_node)
+		if (pool->device == device)
+			return pool;
+
+	return NULL;
+}
+
+static struct gpucg_resource_pool *init_cg_rpool(struct gpucg *cg,
+						 struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool),
+						    GFP_KERNEL);
+	if (!rpool)
+		return ERR_PTR(-ENOMEM);
+
+	rpool->device = device;
+
+	page_counter_init(&rpool->total, NULL);
+	INIT_LIST_HEAD(&rpool->cg_node);
+	INIT_LIST_HEAD(&rpool->dev_node);
+	list_add_tail(&rpool->cg_node, &cg->rpools);
+	list_add_tail(&rpool->dev_node, &device->rpools);
+
+	return rpool;
+}
+
+/**
+ * get_cg_rpool_locked - find the resource pool for the specified device and
+ * specified cgroup. If the resource pool does not exist for the cg, it is created
+ * in a hierarchical manner in the cgroup and its ancestor cgroups who do not
+ * already have a resource pool entry for the device.
+ *
+ * @cg: The cgroup to find the resource pool for.
+ * @device: The device associated with the returned resource pool.
+ *
+ * Return: return resource pool entry corresponding to the specified device in
+ * the specified cgroup (hierarchically creating them if not existing already).
+ *
+ */
+static struct gpucg_resource_pool *
+get_cg_rpool_locked(struct gpucg *cg, struct gpucg_device *device)
+{
+	struct gpucg *parent_cg, *p, *stop_cg;
+	struct gpucg_resource_pool *rpool, *tmp_rpool;
+	struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL;
+
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool)
+		return rpool;
+
+	stop_cg = cg;
+	do {
+		rpool = init_cg_rpool(stop_cg, device);
+		if (IS_ERR(rpool))
+			goto err;
+
+		if (!leaf_rpool)
+			leaf_rpool = rpool;
+
+		stop_cg = gpucg_parent(stop_cg);
+		if (!stop_cg)
+			break;
+
+		rpool = find_cg_rpool_locked(stop_cg, device);
+	} while (!rpool);
+
+	/*
+	 * Re-initialize page counters of all rpools created in this invocation to
+	 * enable hierarchical charging.
+	 * stop_cg is the first ancestor cg who already had a resource pool for
+	 * the device. It can also be NULL if no ancestors had a pre-existing
+	 * resource pool for the device before this invocation.
+	 */
+	rpool = leaf_rpool;
+	for (p = cg; p != stop_cg; p = parent_cg) {
+		parent_cg = gpucg_parent(p);
+		if (!parent_cg)
+			break;
+		parent_rpool = find_cg_rpool_locked(parent_cg, device);
+		page_counter_init(&rpool->total, &parent_rpool->total);
+
+		rpool = parent_rpool;
+	}
+
+	return leaf_rpool;
+err:
+	for (p = cg; p != stop_cg; p = gpucg_parent(p)) {
+		tmp_rpool = find_cg_rpool_locked(p, device);
+		free_cg_rpool_locked(tmp_rpool);
+	}
+	return rpool;
+}
+
+/**
+ * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+ * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size
+ * of the memory is rounded up to be a multiple of the page size.
+ *
+ * @gpucg: The gpu cgroup to charge the memory to.
+ * @device: The device to charge the memory to.
+ * @usage: size of memory to charge in bytes.
+ *
+ * Return: returns 0 if the charging is successful and otherwise returns an
+ * error code.
+ */
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+	int ret = 0;
+
+	mutex_lock(&gpucg_mutex);
+	rp = get_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and the caller is holding a reference to the gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (IS_ERR(rp))
+		return PTR_ERR(rp);
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	if (page_counter_try_charge(&rp->total, nr_pages,
+				    &counter))
+		css_get_many(&gpucg->css, nr_pages);
+	else
+		ret = -ENOMEM;
+
+	return ret;
+}
+
+/**
+ * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+ * The caller must hold a reference to @gpucg obtained through gpucg_get().
+ *
+ * @gpucg: The gpu cgroup to uncharge the memory from.
+ * @device: The device to uncharge the memory from.
+ * @usage: size of memory to uncharge in bytes.
+ */
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device,
+		   u64 usage)
+{
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+
+	mutex_lock(&gpucg_mutex);
+	rp = find_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and there are active refs on gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (unlikely(!rp)) {
+		pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n");
+		return;
+	}
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	page_counter_uncharge(&rp->total, nr_pages);
+	css_put_many(&gpucg->css, nr_pages);
+}
+
+/**
+ * gpucg_register_device - Registers a device for memory accounting using the
+ * GPU cgroup controller.
+ *
+ * @device: The device to register for memory accounting.
+ * @name: Pointer to a string literal to denote the name of the device.
+ *
+ * Both @device andd @name must remain valid.
+ */
+void gpucg_register_device(struct gpucg_device *device, const char *name)
+{
+	if (!device)
+		return;
+
+	INIT_LIST_HEAD(&device->dev_node);
+	INIT_LIST_HEAD(&device->rpools);
+
+	mutex_lock(&gpucg_mutex);
+	list_add_tail(&device->dev_node, &gpucg_devices);
+	mutex_unlock(&gpucg_mutex);
+
+	device->name = name;
+}
+
+static int gpucg_resource_show(struct seq_file *sf, void *v)
+{
+	struct gpucg_resource_pool *rpool;
+	struct gpucg *cg = css_to_gpucg(seq_css(sf));
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(rpool, &cg->rpools, cg_node) {
+		seq_printf(sf, "%s %lu\n", rpool->device->name,
+			   page_counter_read(&rpool->total) * PAGE_SIZE);
+	}
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+struct cftype files[] = {
+	{
+		.name = "memory.current",
+		.seq_show = gpucg_resource_show,
+	},
+	{ }	/* terminate */
+};
+
+struct cgroup_subsys gpu_cgrp_subsys = {
+	.css_alloc	= gpucg_css_alloc,
+	.css_free	= gpucg_css_free,
+	.early_init	= false,
+	.legacy_cftypes	= files,
+	.dfl_cftypes	= files,
+};
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
       [not found] ` <20220115010622.3185921-1-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2022-01-15  1:05   ` [RFC 1/6] gpu: rfc: Proposal for a GPU cgroup controller Hridya Valsaraju
@ 2022-01-15  1:06   ` Hridya Valsaraju
  2022-01-15  1:06   ` [RFC 3/6] dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs Hridya Valsaraju
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

The cgroup controller provides accounting for GPU and GPU-related
memory allocations. The memory being accounted can be device memory or
memory allocated from pools dedicated to serve GPU-related tasks.

This patch adds APIs to:
-allow a device to register for memory accounting using the GPU cgroup
controller.
-charge and uncharge allocated memory to a cgroup.

When the cgroup controller is enabled, it would expose information about
the memory allocated by each device(registered for GPU cgroup memory
accounting) for each cgroup.

The API/UAPI can be extended to set per-device/total allocation limits
in the future.

The cgroup controller has been named following the discussion in [1].

[1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org/

Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 include/linux/cgroup_gpu.h    | 120 +++++++++++++
 include/linux/cgroup_subsys.h |   4 +
 init/Kconfig                  |   7 +
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/gpu.c           | 305 ++++++++++++++++++++++++++++++++++
 5 files changed, 437 insertions(+)
 create mode 100644 include/linux/cgroup_gpu.h
 create mode 100644 kernel/cgroup/gpu.c

diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h
new file mode 100644
index 000000000000..0ac303ce6179
--- /dev/null
+++ b/include/linux/cgroup_gpu.h
@@ -0,0 +1,120 @@
+/* SPDX-License-Identifier: MIT
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * Copyright (C) 2022 Google LLC.
+ */
+#ifndef _CGROUP_GPU_H
+#define _CGROUP_GPU_H
+
+#include <linux/cgroup.h>
+#include <linux/page_counter.h>
+
+#ifdef CONFIG_CGROUP_GPU
+ /* The GPU cgroup controller data structure */
+struct gpucg {
+	struct cgroup_subsys_state css;
+	/* list of all resource pools that belong to this cgroup */
+	struct list_head rpools;
+};
+
+struct gpucg_device {
+	/*
+	 * list  of various resource pool in various cgroups that the device is
+	 * part of.
+	 */
+	struct list_head rpools;
+	/* list of all devices registered for GPU cgroup accounting */
+	struct list_head dev_node;
+	/*
+	 * pointer to string literal to be used as identifier for accounting and
+	 * limit setting
+	 */
+	const char *name;
+};
+
+/**
+ * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state
+ * @css: the target cgroup_subsys_state
+ *
+ * Returns: gpu cgroup that contains the @css
+ */
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct gpucg, css) : NULL;
+}
+
+/**
+ * gpucg_get - get the gpucg reference that a task belongs to
+ * @task: the target task
+ *
+ * This increases the reference count of the css that the @task belongs to.
+ *
+ * Returns: reference to the gpu cgroup the task belongs to.
+ */
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	if (!cgroup_subsys_enabled(gpu_cgrp_subsys))
+		return NULL;
+	return css_to_gpucg(task_get_css(task, gpu_cgrp_id));
+}
+
+/**
+ * gpucg_put - put a gpucg reference
+ * @gpucg: the target gpucg
+ *
+ * Put a reference obtained via gpucg_get
+ */
+static inline void gpucg_put(struct gpucg *gpucg)
+{
+	if (gpucg)
+		css_put(&gpucg->css);
+}
+
+/**
+ * gpucg_parent - find the parent of a gpu cgroup
+ * @cg: the target gpucg
+ *
+ * This does not increase the reference count of the parent cgroup
+ *
+ * Returns: parent gpu cgroup of @cg
+ */
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return css_to_gpucg(cg->css.parent);
+}
+
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage);
+void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name);
+#else /* CONFIG_CGROUP_GPU */
+
+struct gpucg;
+struct gpucg_device;
+
+static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css)
+{
+	return NULL;
+}
+
+static inline struct gpucg *gpucg_get(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void gpucg_put(struct gpucg *gpucg) {}
+
+static inline struct gpucg *gpucg_parent(struct gpucg *cg)
+{
+	return NULL;
+}
+static inline int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device,
+				   u64 usage)
+{
+	return 0;
+}
+
+static inline void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device,
+				  u64 usage) {}
+static inline void gpucg_register_device(struct gpucg_device *gpucg_dev,
+					 const char *name) {}
+#endif	/* CONFIG_CGROUP_GPU */
+#endif	/* _CGROUP_GPU_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..46a2a7b93c41 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_GPU)
+SUBSYS(gpu)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index cd23faa163d1..408910b21387 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -990,6 +990,13 @@ config BLK_CGROUP
 
 	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
+config CGROUP_GPU
+       bool "gpu cgroup controller (EXPERIMENTAL)"
+       select PAGE_COUNTER
+       help
+	Provides accounting and limit setting for memory allocations by the GPU
+	and GPU-related subsystems.
+
 config CGROUP_WRITEBACK
 	bool
 	depends on MEMCG && BLK_CGROUP
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..be95a5a532fc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
+obj-$(CONFIG_CGROUP_GPU) += gpu.o
diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c
new file mode 100644
index 000000000000..b171fae06b0d
--- /dev/null
+++ b/kernel/cgroup/gpu.c
@@ -0,0 +1,305 @@
+// SPDX-License-Identifier: MIT
+// Copyright 2019 Advanced Micro Devices, Inc.
+// Copyright (C) 2022 Google LLC.
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_gpu.h>
+#include <linux/page_counter.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+static struct gpucg *root_gpucg __read_mostly;
+
+/*
+ * Protects list of resource pools maintained on per cgroup basis
+ * and list of devices registered for memory accounting using the GPU cgroup
+ * controller.
+ */
+static DEFINE_MUTEX(gpucg_mutex);
+static LIST_HEAD(gpucg_devices);
+
+struct gpucg_resource_pool {
+	 /* The device whose resource usage is tracked by this resource pool */
+	struct gpucg_device *device;
+
+	/* list of all resource pools for the cgroup */
+	struct list_head cg_node;
+
+	/*
+	 * list maintained by the gpucg_device to keep track of its
+	 * resource pools
+	 */
+	struct list_head dev_node;
+
+	/* tracks memory usage of the resource pool */
+	struct page_counter total;
+};
+
+static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool)
+{
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_del(&rpool->cg_node);
+	list_del(&rpool->dev_node);
+	kfree(rpool);
+}
+
+static void gpucg_css_free(struct cgroup_subsys_state *css)
+{
+	struct gpucg_resource_pool *rpool, *tmp;
+	struct gpucg *gpucg = css_to_gpucg(css);
+
+	// delete all resource pools
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node)
+		free_cg_rpool_locked(rpool);
+	mutex_unlock(&gpucg_mutex);
+
+	kfree(gpucg);
+}
+
+static struct cgroup_subsys_state *
+gpucg_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct gpucg *gpucg, *parent;
+
+	gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL);
+	if (!gpucg)
+		return ERR_PTR(-ENOMEM);
+
+	parent = css_to_gpucg(parent_css);
+	if (!parent)
+		root_gpucg = gpucg;
+
+	INIT_LIST_HEAD(&gpucg->rpools);
+
+	return &gpucg->css;
+}
+
+static struct gpucg_resource_pool *find_cg_rpool_locked(struct gpucg *cg,
+							struct gpucg_device *device)
+
+{
+	struct gpucg_resource_pool *pool;
+
+	lockdep_assert_held(&gpucg_mutex);
+
+	list_for_each_entry(pool, &cg->rpools, cg_node)
+		if (pool->device == device)
+			return pool;
+
+	return NULL;
+}
+
+static struct gpucg_resource_pool *init_cg_rpool(struct gpucg *cg,
+						 struct gpucg_device *device)
+{
+	struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool),
+						    GFP_KERNEL);
+	if (!rpool)
+		return ERR_PTR(-ENOMEM);
+
+	rpool->device = device;
+
+	page_counter_init(&rpool->total, NULL);
+	INIT_LIST_HEAD(&rpool->cg_node);
+	INIT_LIST_HEAD(&rpool->dev_node);
+	list_add_tail(&rpool->cg_node, &cg->rpools);
+	list_add_tail(&rpool->dev_node, &device->rpools);
+
+	return rpool;
+}
+
+/**
+ * get_cg_rpool_locked - find the resource pool for the specified device and
+ * specified cgroup. If the resource pool does not exist for the cg, it is created
+ * in a hierarchical manner in the cgroup and its ancestor cgroups who do not
+ * already have a resource pool entry for the device.
+ *
+ * @cg: The cgroup to find the resource pool for.
+ * @device: The device associated with the returned resource pool.
+ *
+ * Return: return resource pool entry corresponding to the specified device in
+ * the specified cgroup (hierarchically creating them if not existing already).
+ *
+ */
+static struct gpucg_resource_pool *
+get_cg_rpool_locked(struct gpucg *cg, struct gpucg_device *device)
+{
+	struct gpucg *parent_cg, *p, *stop_cg;
+	struct gpucg_resource_pool *rpool, *tmp_rpool;
+	struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL;
+
+	rpool = find_cg_rpool_locked(cg, device);
+	if (rpool)
+		return rpool;
+
+	stop_cg = cg;
+	do {
+		rpool = init_cg_rpool(stop_cg, device);
+		if (IS_ERR(rpool))
+			goto err;
+
+		if (!leaf_rpool)
+			leaf_rpool = rpool;
+
+		stop_cg = gpucg_parent(stop_cg);
+		if (!stop_cg)
+			break;
+
+		rpool = find_cg_rpool_locked(stop_cg, device);
+	} while (!rpool);
+
+	/*
+	 * Re-initialize page counters of all rpools created in this invocation to
+	 * enable hierarchical charging.
+	 * stop_cg is the first ancestor cg who already had a resource pool for
+	 * the device. It can also be NULL if no ancestors had a pre-existing
+	 * resource pool for the device before this invocation.
+	 */
+	rpool = leaf_rpool;
+	for (p = cg; p != stop_cg; p = parent_cg) {
+		parent_cg = gpucg_parent(p);
+		if (!parent_cg)
+			break;
+		parent_rpool = find_cg_rpool_locked(parent_cg, device);
+		page_counter_init(&rpool->total, &parent_rpool->total);
+
+		rpool = parent_rpool;
+	}
+
+	return leaf_rpool;
+err:
+	for (p = cg; p != stop_cg; p = gpucg_parent(p)) {
+		tmp_rpool = find_cg_rpool_locked(p, device);
+		free_cg_rpool_locked(tmp_rpool);
+	}
+	return rpool;
+}
+
+/**
+ * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device.
+ * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size
+ * of the memory is rounded up to be a multiple of the page size.
+ *
+ * @gpucg: The gpu cgroup to charge the memory to.
+ * @device: The device to charge the memory to.
+ * @usage: size of memory to charge in bytes.
+ *
+ * Return: returns 0 if the charging is successful and otherwise returns an
+ * error code.
+ */
+int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage)
+{
+	struct page_counter *counter;
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+	int ret = 0;
+
+	mutex_lock(&gpucg_mutex);
+	rp = get_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and the caller is holding a reference to the gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (IS_ERR(rp))
+		return PTR_ERR(rp);
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	if (page_counter_try_charge(&rp->total, nr_pages,
+				    &counter))
+		css_get_many(&gpucg->css, nr_pages);
+	else
+		ret = -ENOMEM;
+
+	return ret;
+}
+
+/**
+ * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device.
+ * The caller must hold a reference to @gpucg obtained through gpucg_get().
+ *
+ * @gpucg: The gpu cgroup to uncharge the memory from.
+ * @device: The device to uncharge the memory from.
+ * @usage: size of memory to uncharge in bytes.
+ */
+void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device,
+		   u64 usage)
+{
+	u64 nr_pages;
+	struct gpucg_resource_pool *rp;
+
+	mutex_lock(&gpucg_mutex);
+	rp = find_cg_rpool_locked(gpucg, device);
+	/*
+	 * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is
+	 * freed and there are active refs on gpucg.
+	 */
+	mutex_unlock(&gpucg_mutex);
+
+	if (unlikely(!rp)) {
+		pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n");
+		return;
+	}
+
+	nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT;
+	page_counter_uncharge(&rp->total, nr_pages);
+	css_put_many(&gpucg->css, nr_pages);
+}
+
+/**
+ * gpucg_register_device - Registers a device for memory accounting using the
+ * GPU cgroup controller.
+ *
+ * @device: The device to register for memory accounting.
+ * @name: Pointer to a string literal to denote the name of the device.
+ *
+ * Both @device andd @name must remain valid.
+ */
+void gpucg_register_device(struct gpucg_device *device, const char *name)
+{
+	if (!device)
+		return;
+
+	INIT_LIST_HEAD(&device->dev_node);
+	INIT_LIST_HEAD(&device->rpools);
+
+	mutex_lock(&gpucg_mutex);
+	list_add_tail(&device->dev_node, &gpucg_devices);
+	mutex_unlock(&gpucg_mutex);
+
+	device->name = name;
+}
+
+static int gpucg_resource_show(struct seq_file *sf, void *v)
+{
+	struct gpucg_resource_pool *rpool;
+	struct gpucg *cg = css_to_gpucg(seq_css(sf));
+
+	mutex_lock(&gpucg_mutex);
+	list_for_each_entry(rpool, &cg->rpools, cg_node) {
+		seq_printf(sf, "%s %lu\n", rpool->device->name,
+			   page_counter_read(&rpool->total) * PAGE_SIZE);
+	}
+	mutex_unlock(&gpucg_mutex);
+
+	return 0;
+}
+
+struct cftype files[] = {
+	{
+		.name = "memory.current",
+		.seq_show = gpucg_resource_show,
+	},
+	{ }	/* terminate */
+};
+
+struct cgroup_subsys gpu_cgrp_subsys = {
+	.css_alloc	= gpucg_css_alloc,
+	.css_free	= gpucg_css_free,
+	.early_init	= false,
+	.legacy_cftypes	= files,
+	.dfl_cftypes	= files,
+};
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 3/6] dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs
  2022-01-15  1:05 ` Hridya Valsaraju
@ 2022-01-15  1:06   ` Hridya Valsaraju
  -1 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie, Matthew Auld,
	Jason Ekstrand, Jon Bloomfield, Matthew Brost, Li Li,
	Marco Ballesio, Wedson Almeida Filho, Hang Lu, Masahiro Yamada,
	Andrew Morton, Nathan Chancellor, Kees Cook, Nick Desaulniers,
	Miguel Ojeda, Vipin Sharma, Chris Down, Daniel Borkmann,
	Vlastimil Babka, Arnd Bergmann, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

This patch uses the GPU cgroup charge/uncharge APIs to charge buffers
allocated by the DMA-BUF system heap to the processes who allocated them.

By doing so, it becomes possible to track who allocated/exported a
DMA-BUF even after the allocating process drops all references to a
buffer.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
 drivers/dma-buf/heaps/system_heap.c | 25 +++++++++++++++++++++++++
 include/linux/dma-heap.h            | 11 +++++++++++
 3 files changed, 63 insertions(+)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 56bf5ad01ad5..6e74690f4b83 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -6,6 +6,7 @@
  * Copyright (C) 2019 Linaro Ltd.
  */
 
+#include <linux/cgroup_gpu.h>
 #include <linux/cdev.h>
 #include <linux/debugfs.h>
 #include <linux/device.h>
@@ -30,6 +31,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @gpucg_dev           gpu cg device for memory accounting
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -40,6 +42,9 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device gpucg_dev;
+#endif
 };
 
 static LIST_HEAD(heap_list);
@@ -214,6 +219,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
 	return heap->name;
 }
 
+#ifdef CONFIG_CGROUP_GPU
+/**
+ * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
+ * @heap: DMA-Heap to get the gpucg_device struct for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
+ * not enabled.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return &heap->gpucg_dev;
+}
+#else
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return NULL;
+}
+#endif
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
@@ -286,6 +311,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	list_add(&heap->list, &heap_list);
 	mutex_unlock(&heap_list_lock);
 
+	gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
+
 	return heap;
 
 err2:
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index ab7fd896d2c4..adfdc8c576f2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -31,6 +31,7 @@ struct system_heap_buffer {
 	struct sg_table sg_table;
 	int vmap_cnt;
 	void *vaddr;
+	struct gpucg *gpucg;
 };
 
 struct dma_heap_attachment {
@@ -296,6 +297,13 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
 		__free_pages(page, compound_order(page));
 	}
 	sg_free_table(table);
+
+	gpucg_uncharge(buffer->gpucg,
+		       dma_heap_get_gpucg_dev(buffer->heap),
+		       buffer->len);
+
+	gpucg_put(buffer->gpucg);
+
 	kfree(buffer);
 }
 
@@ -356,6 +364,16 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	mutex_init(&buffer->lock);
 	buffer->heap = heap;
 	buffer->len = len;
+	buffer->gpucg = gpucg_get(current);
+
+	ret = gpucg_try_charge(buffer->gpucg,
+			       dma_heap_get_gpucg_dev(buffer->heap),
+			       len);
+	if (ret) {
+		gpucg_put(buffer->gpucg);
+		kfree(buffer);
+		return ERR_PTR(ret);
+	}
 
 	INIT_LIST_HEAD(&pages);
 	i = 0;
@@ -413,6 +431,13 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 free_buffer:
 	list_for_each_entry_safe(page, tmp_page, &pages, lru)
 		__free_pages(page, compound_order(page));
+
+	gpucg_uncharge(buffer->gpucg,
+		       dma_heap_get_gpucg_dev(buffer->heap),
+		       buffer->len);
+
+	gpucg_put(buffer->gpucg);
+
 	kfree(buffer);
 
 	return ERR_PTR(ret);
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 0c05561cad6e..e447a61d054e 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -10,6 +10,7 @@
 #define _DMA_HEAPS_H
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/types.h>
 
 struct dma_heap;
@@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
  */
 const char *dma_heap_get_name(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
+ * heap.
+ * @heap: DMA-Heap to retrieve gpucg_device for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 3/6] dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs
@ 2022-01-15  1:06   ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie, Matthew Auld,
	Jason Ekstrand, Jon Bloomfield, Matthew Brost, Li Li,
	Marco Ballesio, Wedson Almeida Filho, Hang Lu, Masahiro Yamada,
	Andrew Morton, Nathan Chancellor, Kees Cook, Nick Desaulniers,
	Miguel Ojeda, Vipin Sharma, Chris Down, Daniel Borkmann,
	Vlastimil Babka, Arnd Bergmann, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

This patch uses the GPU cgroup charge/uncharge APIs to charge buffers
allocated by the DMA-BUF system heap to the processes who allocated them.

By doing so, it becomes possible to track who allocated/exported a
DMA-BUF even after the allocating process drops all references to a
buffer.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
 drivers/dma-buf/heaps/system_heap.c | 25 +++++++++++++++++++++++++
 include/linux/dma-heap.h            | 11 +++++++++++
 3 files changed, 63 insertions(+)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 56bf5ad01ad5..6e74690f4b83 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -6,6 +6,7 @@
  * Copyright (C) 2019 Linaro Ltd.
  */
 
+#include <linux/cgroup_gpu.h>
 #include <linux/cdev.h>
 #include <linux/debugfs.h>
 #include <linux/device.h>
@@ -30,6 +31,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @gpucg_dev           gpu cg device for memory accounting
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -40,6 +42,9 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device gpucg_dev;
+#endif
 };
 
 static LIST_HEAD(heap_list);
@@ -214,6 +219,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
 	return heap->name;
 }
 
+#ifdef CONFIG_CGROUP_GPU
+/**
+ * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
+ * @heap: DMA-Heap to get the gpucg_device struct for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
+ * not enabled.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return &heap->gpucg_dev;
+}
+#else
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return NULL;
+}
+#endif
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
@@ -286,6 +311,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	list_add(&heap->list, &heap_list);
 	mutex_unlock(&heap_list_lock);
 
+	gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
+
 	return heap;
 
 err2:
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index ab7fd896d2c4..adfdc8c576f2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -31,6 +31,7 @@ struct system_heap_buffer {
 	struct sg_table sg_table;
 	int vmap_cnt;
 	void *vaddr;
+	struct gpucg *gpucg;
 };
 
 struct dma_heap_attachment {
@@ -296,6 +297,13 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
 		__free_pages(page, compound_order(page));
 	}
 	sg_free_table(table);
+
+	gpucg_uncharge(buffer->gpucg,
+		       dma_heap_get_gpucg_dev(buffer->heap),
+		       buffer->len);
+
+	gpucg_put(buffer->gpucg);
+
 	kfree(buffer);
 }
 
@@ -356,6 +364,16 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	mutex_init(&buffer->lock);
 	buffer->heap = heap;
 	buffer->len = len;
+	buffer->gpucg = gpucg_get(current);
+
+	ret = gpucg_try_charge(buffer->gpucg,
+			       dma_heap_get_gpucg_dev(buffer->heap),
+			       len);
+	if (ret) {
+		gpucg_put(buffer->gpucg);
+		kfree(buffer);
+		return ERR_PTR(ret);
+	}
 
 	INIT_LIST_HEAD(&pages);
 	i = 0;
@@ -413,6 +431,13 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 free_buffer:
 	list_for_each_entry_safe(page, tmp_page, &pages, lru)
 		__free_pages(page, compound_order(page));
+
+	gpucg_uncharge(buffer->gpucg,
+		       dma_heap_get_gpucg_dev(buffer->heap),
+		       buffer->len);
+
+	gpucg_put(buffer->gpucg);
+
 	kfree(buffer);
 
 	return ERR_PTR(ret);
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 0c05561cad6e..e447a61d054e 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -10,6 +10,7 @@
 #define _DMA_HEAPS_H
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/types.h>
 
 struct dma_heap;
@@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
  */
 const char *dma_heap_get_name(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
+ * heap.
+ * @heap: DMA-Heap to retrieve gpucg_device for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 3/6] dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs
       [not found] ` <20220115010622.3185921-1-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2022-01-15  1:05   ` [RFC 1/6] gpu: rfc: Proposal for a GPU cgroup controller Hridya Valsaraju
  2022-01-15  1:06   ` [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory Hridya Valsaraju
@ 2022-01-15  1:06   ` Hridya Valsaraju
  2022-01-15  1:06   ` [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup Hridya Valsaraju
  2022-01-15  1:06   ` [RFC 6/6] android: binder: Add a buffer flag to relinquish ownership of fds Hridya Valsaraju
  4 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

This patch uses the GPU cgroup charge/uncharge APIs to charge buffers
allocated by the DMA-BUF system heap to the processes who allocated them.

By doing so, it becomes possible to track who allocated/exported a
DMA-BUF even after the allocating process drops all references to a
buffer.

Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 drivers/dma-buf/dma-heap.c          | 27 +++++++++++++++++++++++++++
 drivers/dma-buf/heaps/system_heap.c | 25 +++++++++++++++++++++++++
 include/linux/dma-heap.h            | 11 +++++++++++
 3 files changed, 63 insertions(+)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 56bf5ad01ad5..6e74690f4b83 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -6,6 +6,7 @@
  * Copyright (C) 2019 Linaro Ltd.
  */
 
+#include <linux/cgroup_gpu.h>
 #include <linux/cdev.h>
 #include <linux/debugfs.h>
 #include <linux/device.h>
@@ -30,6 +31,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @gpucg_dev           gpu cg device for memory accounting
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -40,6 +42,9 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+#ifdef CONFIG_CGROUP_GPU
+	struct gpucg_device gpucg_dev;
+#endif
 };
 
 static LIST_HEAD(heap_list);
@@ -214,6 +219,26 @@ const char *dma_heap_get_name(struct dma_heap *heap)
 	return heap->name;
 }
 
+#ifdef CONFIG_CGROUP_GPU
+/**
+ * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap.
+ * @heap: DMA-Heap to get the gpucg_device struct for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is
+ * not enabled.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return &heap->gpucg_dev;
+}
+#else
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap)
+{
+	return NULL;
+}
+#endif
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
@@ -286,6 +311,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	list_add(&heap->list, &heap_list);
 	mutex_unlock(&heap_list_lock);
 
+	gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name);
+
 	return heap;
 
 err2:
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index ab7fd896d2c4..adfdc8c576f2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -31,6 +31,7 @@ struct system_heap_buffer {
 	struct sg_table sg_table;
 	int vmap_cnt;
 	void *vaddr;
+	struct gpucg *gpucg;
 };
 
 struct dma_heap_attachment {
@@ -296,6 +297,13 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
 		__free_pages(page, compound_order(page));
 	}
 	sg_free_table(table);
+
+	gpucg_uncharge(buffer->gpucg,
+		       dma_heap_get_gpucg_dev(buffer->heap),
+		       buffer->len);
+
+	gpucg_put(buffer->gpucg);
+
 	kfree(buffer);
 }
 
@@ -356,6 +364,16 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	mutex_init(&buffer->lock);
 	buffer->heap = heap;
 	buffer->len = len;
+	buffer->gpucg = gpucg_get(current);
+
+	ret = gpucg_try_charge(buffer->gpucg,
+			       dma_heap_get_gpucg_dev(buffer->heap),
+			       len);
+	if (ret) {
+		gpucg_put(buffer->gpucg);
+		kfree(buffer);
+		return ERR_PTR(ret);
+	}
 
 	INIT_LIST_HEAD(&pages);
 	i = 0;
@@ -413,6 +431,13 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 free_buffer:
 	list_for_each_entry_safe(page, tmp_page, &pages, lru)
 		__free_pages(page, compound_order(page));
+
+	gpucg_uncharge(buffer->gpucg,
+		       dma_heap_get_gpucg_dev(buffer->heap),
+		       buffer->len);
+
+	gpucg_put(buffer->gpucg);
+
 	kfree(buffer);
 
 	return ERR_PTR(ret);
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 0c05561cad6e..e447a61d054e 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -10,6 +10,7 @@
 #define _DMA_HEAPS_H
 
 #include <linux/cdev.h>
+#include <linux/cgroup_gpu.h>
 #include <linux/types.h>
 
 struct dma_heap;
@@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap);
  */
 const char *dma_heap_get_name(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the
+ * heap.
+ * @heap: DMA-Heap to retrieve gpucg_device for.
+ *
+ * Returns:
+ * The gpucg_device struct for the heap.
+ */
+struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  2022-01-15  1:05 ` Hridya Valsaraju
@ 2022-01-15  1:06   ` Hridya Valsaraju
  -1 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie,
	Jason Ekstrand, Matthew Auld, Matthew Brost, Li Li,
	Marco Ballesio, Miguel Ojeda, Hang Lu, Wedson Almeida Filho,
	Masahiro Yamada, Andrew Morton, Nathan Chancellor, Kees Cook,
	Nick Desaulniers, Chris Down, Vipin Sharma, Daniel Borkmann,
	Vlastimil Babka, Arnd Bergmann, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

The optional exporter op provides a way for processes to transfer
charge of a buffer to a different process. This is essential for the
cases where a central allocator process does allocations for various
subsystems, hands over the fd to the client who
requested the memory and drops all references to the allocated memory.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 include/linux/dma-buf.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7ab50076e7a6..d5e52f81cc6f 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -13,6 +13,7 @@
 #ifndef __DMA_BUF_H__
 #define __DMA_BUF_H__
 
+#include <linux/cgroup_gpu.h>
 #include <linux/dma-buf-map.h>
 #include <linux/file.h>
 #include <linux/err.h>
@@ -285,6 +286,23 @@ struct dma_buf_ops {
 
 	int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
 	void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+	/**
+	 * @charge_to_cgroup:
+	 *
+	 * This is called by an exporter to charge a buffer to the specified
+	 * cgroup. The caller must hold a reference to @gpucg obtained via
+	 * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
+	 * currently charged to before being charged to @gpucg. The caller must
+	 * belong to the cgroup the buffer is currently charged to.
+	 *
+	 * This callback is optional.
+	 *
+	 * Returns:
+	 *
+	 * 0 on success or negative error code on failure.
+	 */
+	int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
 };
 
 /**
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
@ 2022-01-15  1:06   ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie,
	Jason Ekstrand, Matthew Auld, Matthew Brost, Li Li,
	Marco Ballesio, Miguel Ojeda, Hang Lu, Wedson Almeida Filho,
	Masahiro Yamada, Andrew Morton, Nathan Chancellor, Kees Cook,
	Nick Desaulniers, Chris Down, Vipin Sharma, Daniel Borkmann,
	Vlastimil Babka, Arnd Bergmann, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

The optional exporter op provides a way for processes to transfer
charge of a buffer to a different process. This is essential for the
cases where a central allocator process does allocations for various
subsystems, hands over the fd to the client who
requested the memory and drops all references to the allocated memory.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 include/linux/dma-buf.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7ab50076e7a6..d5e52f81cc6f 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -13,6 +13,7 @@
 #ifndef __DMA_BUF_H__
 #define __DMA_BUF_H__
 
+#include <linux/cgroup_gpu.h>
 #include <linux/dma-buf-map.h>
 #include <linux/file.h>
 #include <linux/err.h>
@@ -285,6 +286,23 @@ struct dma_buf_ops {
 
 	int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
 	void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+	/**
+	 * @charge_to_cgroup:
+	 *
+	 * This is called by an exporter to charge a buffer to the specified
+	 * cgroup. The caller must hold a reference to @gpucg obtained via
+	 * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
+	 * currently charged to before being charged to @gpucg. The caller must
+	 * belong to the cgroup the buffer is currently charged to.
+	 *
+	 * This callback is optional.
+	 *
+	 * Returns:
+	 *
+	 * 0 on success or negative error code on failure.
+	 */
+	int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
 };
 
 /**
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
       [not found] ` <20220115010622.3185921-1-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2022-01-15  1:06   ` [RFC 3/6] dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs Hridya Valsaraju
@ 2022-01-15  1:06   ` Hridya Valsaraju
  2022-01-15  1:06   ` [RFC 6/6] android: binder: Add a buffer flag to relinquish ownership of fds Hridya Valsaraju
  4 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

The optional exporter op provides a way for processes to transfer
charge of a buffer to a different process. This is essential for the
cases where a central allocator process does allocations for various
subsystems, hands over the fd to the client who
requested the memory and drops all references to the allocated memory.

Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 include/linux/dma-buf.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7ab50076e7a6..d5e52f81cc6f 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -13,6 +13,7 @@
 #ifndef __DMA_BUF_H__
 #define __DMA_BUF_H__
 
+#include <linux/cgroup_gpu.h>
 #include <linux/dma-buf-map.h>
 #include <linux/file.h>
 #include <linux/err.h>
@@ -285,6 +286,23 @@ struct dma_buf_ops {
 
 	int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
 	void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+	/**
+	 * @charge_to_cgroup:
+	 *
+	 * This is called by an exporter to charge a buffer to the specified
+	 * cgroup. The caller must hold a reference to @gpucg obtained via
+	 * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
+	 * currently charged to before being charged to @gpucg. The caller must
+	 * belong to the cgroup the buffer is currently charged to.
+	 *
+	 * This callback is optional.
+	 *
+	 * Returns:
+	 *
+	 * 0 on success or negative error code on failure.
+	 */
+	int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
 };
 
 /**
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 5/6] dmabuf: system_heap: implement dma-buf op for GPU cgroup charge transfer
  2022-01-15  1:05 ` Hridya Valsaraju
  (?)
@ 2022-01-15  1:06   ` Hridya Valsaraju
  -1 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie,
	Kenneth Graunke, Rodrigo Vivi, Matthew Brost, Matthew Auld,
	Li Li, Marco Ballesio, Miguel Ojeda, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Chris Down,
	Vipin Sharma, Daniel Borkmann, Vlastimil Babka, Arnd Bergmann,
	dri-devel, linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

The DMA-BUF op can be invoked when a process that allocated a buffer
relinquishes its ownership and passes it over to another process.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 drivers/dma-buf/heaps/system_heap.c | 43 +++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index adfdc8c576f2..70f5b98f1157 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -307,6 +307,48 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
 	kfree(buffer);
 }
 
+#ifdef CONFIG_CGROUP_GPU
+static int system_heap_dma_buf_charge(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+	struct gpucg *current_gpucg;
+	struct gpucg_device *gpucg_dev;
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	size_t len = buffer->len;
+	int ret = 0;
+
+	/*
+	 * Check that the process requesting the transfer is the same as the one
+	 * to whom the buffer is currently charged to.
+	 */
+	current_gpucg = gpucg_get(current);
+	if (current_gpucg != buffer->gpucg)
+		ret = -EPERM;
+
+	gpucg_put(current_gpucg);
+	if (ret)
+		return ret;
+
+	gpucg_dev = dma_heap_get_gpucg_dev(buffer->heap);
+
+	ret = gpucg_try_charge(gpucg, gpucg_dev, len);
+	if (ret)
+		return ret;
+
+	/* uncharge the buffer from the cgroup its currently charged to. */
+	gpucg_uncharge(buffer->gpucg, gpucg_dev, buffer->len);
+	gpucg_put(buffer->gpucg);
+
+	buffer->gpucg = gpucg;
+
+	return 0;
+}
+#else
+static int system_heap_dma_buf_charge(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+	return 0;
+}
+#endif
+
 static const struct dma_buf_ops system_heap_buf_ops = {
 	.attach = system_heap_attach,
 	.detach = system_heap_detach,
@@ -318,6 +360,7 @@ static const struct dma_buf_ops system_heap_buf_ops = {
 	.vmap = system_heap_vmap,
 	.vunmap = system_heap_vunmap,
 	.release = system_heap_dma_buf_release,
+	.charge_to_cgroup = system_heap_dma_buf_charge,
 };
 
 static struct page *alloc_largest_available(unsigned long size,
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 5/6] dmabuf: system_heap: implement dma-buf op for GPU cgroup charge transfer
@ 2022-01-15  1:06   ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie,
	Kenneth Graunke, Rodrigo Vivi, Matthew Brost, Matthew Auld,
	Li Li, Marco Ballesio, Miguel Ojeda, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Chris Down,
	Vipin Sharma, Daniel Borkmann, Vlastimil Babka, Arnd Bergmann,
	dri-devel, linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

The DMA-BUF op can be invoked when a process that allocated a buffer
relinquishes its ownership and passes it over to another process.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 drivers/dma-buf/heaps/system_heap.c | 43 +++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index adfdc8c576f2..70f5b98f1157 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -307,6 +307,48 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
 	kfree(buffer);
 }
 
+#ifdef CONFIG_CGROUP_GPU
+static int system_heap_dma_buf_charge(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+	struct gpucg *current_gpucg;
+	struct gpucg_device *gpucg_dev;
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	size_t len = buffer->len;
+	int ret = 0;
+
+	/*
+	 * Check that the process requesting the transfer is the same as the one
+	 * to whom the buffer is currently charged to.
+	 */
+	current_gpucg = gpucg_get(current);
+	if (current_gpucg != buffer->gpucg)
+		ret = -EPERM;
+
+	gpucg_put(current_gpucg);
+	if (ret)
+		return ret;
+
+	gpucg_dev = dma_heap_get_gpucg_dev(buffer->heap);
+
+	ret = gpucg_try_charge(gpucg, gpucg_dev, len);
+	if (ret)
+		return ret;
+
+	/* uncharge the buffer from the cgroup its currently charged to. */
+	gpucg_uncharge(buffer->gpucg, gpucg_dev, buffer->len);
+	gpucg_put(buffer->gpucg);
+
+	buffer->gpucg = gpucg;
+
+	return 0;
+}
+#else
+static int system_heap_dma_buf_charge(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+	return 0;
+}
+#endif
+
 static const struct dma_buf_ops system_heap_buf_ops = {
 	.attach = system_heap_attach,
 	.detach = system_heap_detach,
@@ -318,6 +360,7 @@ static const struct dma_buf_ops system_heap_buf_ops = {
 	.vmap = system_heap_vmap,
 	.vunmap = system_heap_vunmap,
 	.release = system_heap_dma_buf_release,
+	.charge_to_cgroup = system_heap_dma_buf_charge,
 };
 
 static struct page *alloc_largest_available(unsigned long size,
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 5/6] dmabuf: system_heap: implement dma-buf op for GPU cgroup charge transfer
@ 2022-01-15  1:06   ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

The DMA-BUF op can be invoked when a process that allocated a buffer
relinquishes its ownership and passes it over to another process.

Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 drivers/dma-buf/heaps/system_heap.c | 43 +++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index adfdc8c576f2..70f5b98f1157 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -307,6 +307,48 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
 	kfree(buffer);
 }
 
+#ifdef CONFIG_CGROUP_GPU
+static int system_heap_dma_buf_charge(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+	struct gpucg *current_gpucg;
+	struct gpucg_device *gpucg_dev;
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	size_t len = buffer->len;
+	int ret = 0;
+
+	/*
+	 * Check that the process requesting the transfer is the same as the one
+	 * to whom the buffer is currently charged to.
+	 */
+	current_gpucg = gpucg_get(current);
+	if (current_gpucg != buffer->gpucg)
+		ret = -EPERM;
+
+	gpucg_put(current_gpucg);
+	if (ret)
+		return ret;
+
+	gpucg_dev = dma_heap_get_gpucg_dev(buffer->heap);
+
+	ret = gpucg_try_charge(gpucg, gpucg_dev, len);
+	if (ret)
+		return ret;
+
+	/* uncharge the buffer from the cgroup its currently charged to. */
+	gpucg_uncharge(buffer->gpucg, gpucg_dev, buffer->len);
+	gpucg_put(buffer->gpucg);
+
+	buffer->gpucg = gpucg;
+
+	return 0;
+}
+#else
+static int system_heap_dma_buf_charge(struct dma_buf *dmabuf, struct gpucg *gpucg)
+{
+	return 0;
+}
+#endif
+
 static const struct dma_buf_ops system_heap_buf_ops = {
 	.attach = system_heap_attach,
 	.detach = system_heap_detach,
@@ -318,6 +360,7 @@ static const struct dma_buf_ops system_heap_buf_ops = {
 	.vmap = system_heap_vmap,
 	.vunmap = system_heap_vunmap,
 	.release = system_heap_dma_buf_release,
+	.charge_to_cgroup = system_heap_dma_buf_charge,
 };
 
 static struct page *alloc_largest_available(unsigned long size,
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
  2022-01-15  1:05 ` Hridya Valsaraju
@ 2022-01-15  1:06   ` Hridya Valsaraju
  -1 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie,
	Kenneth Graunke, Jason Ekstrand, Matthew Auld, Matthew Brost,
	Li Li, Marco Ballesio, Hang Lu, Wedson Almeida Filho,
	Masahiro Yamada, Nathan Chancellor, Andrew Morton, Kees Cook,
	Nick Desaulniers, Miguel Ojeda, Chris Down, Vipin Sharma,
	Daniel Borkmann, Vlastimil Babka, Arnd Bergmann, dri-devel,
	linux-doc, linux-kernel, linux-media, linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
that a process sending an fd array to another process over binder IPC
can set to relinquish ownership of the fds being sent for memory
accounting purposes. If the flag is found to be set during the fd array
translation and the fd is for a DMA-BUF, the buffer is uncharged from
the sender's cgroup and charged to the receiving process's cgroup
instead.

It is upto the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL process
allocates memory and sends the fds to the clients over binder IPC.
The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
correctly charge the buffers to the client processes instead of the
graphics allocator HAL.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 drivers/android/binder.c            | 32 +++++++++++++++++++++++++++++
 include/uapi/linux/android/binder.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 5497797ab258..83082fd1ab6a 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/dma-buf.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
@@ -2482,8 +2483,11 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 {
 	binder_size_t fdi, fd_buf_size;
 	binder_size_t fda_offset;
+	bool transfer_gpu_charge = false;
 	const void __user *sender_ufda_base;
 	struct binder_proc *proc = thread->proc;
+	struct binder_proc *target_proc = t->to_proc;
+
 	int ret;
 
 	fd_buf_size = sizeof(u32) * fda->num_fds;
@@ -2520,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 	if (ret)
 		return ret;
 
+	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
+	    parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
+		transfer_gpu_charge = true;
+
 	for (fdi = 0; fdi < fda->num_fds; fdi++) {
 		u32 fd;
+		struct dma_buf *dmabuf;
+		struct gpucg *gpucg;
+
 		binder_size_t offset = fda_offset + fdi * sizeof(fd);
 		binder_size_t sender_uoffset = fdi * sizeof(fd);
 
@@ -2531,6 +2542,27 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 						  in_reply_to);
 		if (ret)
 			return ret > 0 ? -EINVAL : ret;
+
+		if (!transfer_gpu_charge)
+			continue;
+
+		dmabuf = dma_buf_get(fd);
+		if (IS_ERR(dmabuf))
+			continue;
+
+		if (dmabuf->ops->charge_to_cgroup) {
+			gpucg = gpucg_get(target_proc->tsk);
+			ret = dmabuf->ops->charge_to_cgroup(dmabuf, gpucg);
+			if (ret) {
+				pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
+					proc->pid, thread->pid, target_proc->pid);
+				gpucg_put(gpucg);
+			}
+		} else {
+			pr_warn("%d:%d DMA-BUF exporter %s is not configured correctly for GPU cgroup memory accounting",
+				proc->pid, thread->pid, dmabuf->exp_name);
+		}
+		dma_buf_put(dmabuf);
 	}
 	return 0;
 }
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index ad619623571e..c85f0014c341 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -137,6 +137,7 @@ struct binder_buffer_object {
 
 enum {
 	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
+	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
 };
 
 /* struct binder_fd_array_object - object describing an array of fds in a buffer
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
@ 2022-01-15  1:06   ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie,
	Kenneth Graunke, Jason Ekstrand, Matthew Auld, Matthew Brost,
	Li Li, Marco Ballesio, Hang Lu, Wedson Almeida Filho,
	Masahiro Yamada, Nathan Chancellor, Andrew Morton, Kees Cook,
	Nick Desaulniers, Miguel Ojeda, Chris Down, Vipin Sharma,
	Daniel Borkmann, Vlastimil Babka, Arnd Bergmann, dri-devel,
	linux-doc, linux-kernel, linux-media, linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
that a process sending an fd array to another process over binder IPC
can set to relinquish ownership of the fds being sent for memory
accounting purposes. If the flag is found to be set during the fd array
translation and the fd is for a DMA-BUF, the buffer is uncharged from
the sender's cgroup and charged to the receiving process's cgroup
instead.

It is upto the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL process
allocates memory and sends the fds to the clients over binder IPC.
The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
correctly charge the buffers to the client processes instead of the
graphics allocator HAL.

Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 drivers/android/binder.c            | 32 +++++++++++++++++++++++++++++
 include/uapi/linux/android/binder.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 5497797ab258..83082fd1ab6a 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/dma-buf.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
@@ -2482,8 +2483,11 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 {
 	binder_size_t fdi, fd_buf_size;
 	binder_size_t fda_offset;
+	bool transfer_gpu_charge = false;
 	const void __user *sender_ufda_base;
 	struct binder_proc *proc = thread->proc;
+	struct binder_proc *target_proc = t->to_proc;
+
 	int ret;
 
 	fd_buf_size = sizeof(u32) * fda->num_fds;
@@ -2520,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 	if (ret)
 		return ret;
 
+	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
+	    parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
+		transfer_gpu_charge = true;
+
 	for (fdi = 0; fdi < fda->num_fds; fdi++) {
 		u32 fd;
+		struct dma_buf *dmabuf;
+		struct gpucg *gpucg;
+
 		binder_size_t offset = fda_offset + fdi * sizeof(fd);
 		binder_size_t sender_uoffset = fdi * sizeof(fd);
 
@@ -2531,6 +2542,27 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 						  in_reply_to);
 		if (ret)
 			return ret > 0 ? -EINVAL : ret;
+
+		if (!transfer_gpu_charge)
+			continue;
+
+		dmabuf = dma_buf_get(fd);
+		if (IS_ERR(dmabuf))
+			continue;
+
+		if (dmabuf->ops->charge_to_cgroup) {
+			gpucg = gpucg_get(target_proc->tsk);
+			ret = dmabuf->ops->charge_to_cgroup(dmabuf, gpucg);
+			if (ret) {
+				pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
+					proc->pid, thread->pid, target_proc->pid);
+				gpucg_put(gpucg);
+			}
+		} else {
+			pr_warn("%d:%d DMA-BUF exporter %s is not configured correctly for GPU cgroup memory accounting",
+				proc->pid, thread->pid, dmabuf->exp_name);
+		}
+		dma_buf_put(dmabuf);
 	}
 	return 0;
 }
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index ad619623571e..c85f0014c341 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -137,6 +137,7 @@ struct binder_buffer_object {
 
 enum {
 	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
+	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
 };
 
 /* struct binder_fd_array_object - object describing an array of fds in a buffer
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC 6/6] android: binder: Add a buffer flag to relinquish ownership of fds
       [not found] ` <20220115010622.3185921-1-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2022-01-15  1:06   ` [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup Hridya Valsaraju
@ 2022-01-15  1:06   ` Hridya Valsaraju
  4 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-15  1:06 UTC (permalink / raw)
  To: David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED
that a process sending an fd array to another process over binder IPC
can set to relinquish ownership of the fds being sent for memory
accounting purposes. If the flag is found to be set during the fd array
translation and the fd is for a DMA-BUF, the buffer is uncharged from
the sender's cgroup and charged to the receiving process's cgroup
instead.

It is upto the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL process
allocates memory and sends the fds to the clients over binder IPC.
The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd
arrays holding DMA-BUF fds, the gpu cgroup controller will be able to
correctly charge the buffers to the client processes instead of the
graphics allocator HAL.

Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 drivers/android/binder.c            | 32 +++++++++++++++++++++++++++++
 include/uapi/linux/android/binder.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 5497797ab258..83082fd1ab6a 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/dma-buf.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
@@ -2482,8 +2483,11 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 {
 	binder_size_t fdi, fd_buf_size;
 	binder_size_t fda_offset;
+	bool transfer_gpu_charge = false;
 	const void __user *sender_ufda_base;
 	struct binder_proc *proc = thread->proc;
+	struct binder_proc *target_proc = t->to_proc;
+
 	int ret;
 
 	fd_buf_size = sizeof(u32) * fda->num_fds;
@@ -2520,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 	if (ret)
 		return ret;
 
+	if (IS_ENABLED(CONFIG_CGROUP_GPU) &&
+	    parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED)
+		transfer_gpu_charge = true;
+
 	for (fdi = 0; fdi < fda->num_fds; fdi++) {
 		u32 fd;
+		struct dma_buf *dmabuf;
+		struct gpucg *gpucg;
+
 		binder_size_t offset = fda_offset + fdi * sizeof(fd);
 		binder_size_t sender_uoffset = fdi * sizeof(fd);
 
@@ -2531,6 +2542,27 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 						  in_reply_to);
 		if (ret)
 			return ret > 0 ? -EINVAL : ret;
+
+		if (!transfer_gpu_charge)
+			continue;
+
+		dmabuf = dma_buf_get(fd);
+		if (IS_ERR(dmabuf))
+			continue;
+
+		if (dmabuf->ops->charge_to_cgroup) {
+			gpucg = gpucg_get(target_proc->tsk);
+			ret = dmabuf->ops->charge_to_cgroup(dmabuf, gpucg);
+			if (ret) {
+				pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d",
+					proc->pid, thread->pid, target_proc->pid);
+				gpucg_put(gpucg);
+			}
+		} else {
+			pr_warn("%d:%d DMA-BUF exporter %s is not configured correctly for GPU cgroup memory accounting",
+				proc->pid, thread->pid, dmabuf->exp_name);
+		}
+		dma_buf_put(dmabuf);
 	}
 	return 0;
 }
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index ad619623571e..c85f0014c341 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -137,6 +137,7 @@ struct binder_buffer_object {
 
 enum {
 	BINDER_BUFFER_FLAG_HAS_PARENT = 0x01,
+	BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02,
 };
 
 /* struct binder_fd_array_object - object describing an array of fds in a buffer
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  2022-01-15  1:06   ` Hridya Valsaraju
@ 2022-01-17  7:46     ` Christian König
  -1 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2022-01-17  7:46 UTC (permalink / raw)
  To: Hridya Valsaraju, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Tejun Heo, Zefan Li,
	Johannes Weiner, Dave Airlie, Jason Ekstrand, Matthew Auld,
	Matthew Brost, Li Li, Marco Ballesio, Miguel Ojeda, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Chris Down,
	Vipin Sharma, Daniel Borkmann, Vlastimil Babka, Arnd Bergmann,
	dri-devel, linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> The optional exporter op provides a way for processes to transfer
> charge of a buffer to a different process. This is essential for the
> cases where a central allocator process does allocations for various
> subsystems, hands over the fd to the client who
> requested the memory and drops all references to the allocated memory.
>
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> ---
>   include/linux/dma-buf.h | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
>
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 7ab50076e7a6..d5e52f81cc6f 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -13,6 +13,7 @@
>   #ifndef __DMA_BUF_H__
>   #define __DMA_BUF_H__
>   
> +#include <linux/cgroup_gpu.h>
>   #include <linux/dma-buf-map.h>
>   #include <linux/file.h>
>   #include <linux/err.h>
> @@ -285,6 +286,23 @@ struct dma_buf_ops {
>   
>   	int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>   	void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> +
> +	/**
> +	 * @charge_to_cgroup:
> +	 *
> +	 * This is called by an exporter to charge a buffer to the specified
> +	 * cgroup.

Well that sentence makes absolutely no sense at all.

The dma_buf_ops are supposed to be called by the DMA-buf subsystem on 
behalves of the importer and never by the exporter itself.

I hope that this is just a documentation mixup.

Regards,
Christian.

>   The caller must hold a reference to @gpucg obtained via
> +	 * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> +	 * currently charged to before being charged to @gpucg. The caller must
> +	 * belong to the cgroup the buffer is currently charged to.
> +	 *
> +	 * This callback is optional.
> +	 *
> +	 * Returns:
> +	 *
> +	 * 0 on success or negative error code on failure.
> +	 */
> +	int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
>   };
>   
>   /**


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
@ 2022-01-17  7:46     ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2022-01-17  7:46 UTC (permalink / raw)
  To: Hridya Valsaraju, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Tejun Heo, Zefan Li,
	Johannes Weiner, Dave Airlie, Jason Ekstrand, Matthew Auld,
	Matthew Brost, Li Li, Marco Ballesio, Miguel Ojeda, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Chris Down,
	Vipin Sharma, Daniel Borkmann, Vlastimil Babka, Arnd Bergmann,
	dri-devel, linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> The optional exporter op provides a way for processes to transfer
> charge of a buffer to a different process. This is essential for the
> cases where a central allocator process does allocations for various
> subsystems, hands over the fd to the client who
> requested the memory and drops all references to the allocated memory.
>
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> ---
>   include/linux/dma-buf.h | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
>
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 7ab50076e7a6..d5e52f81cc6f 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -13,6 +13,7 @@
>   #ifndef __DMA_BUF_H__
>   #define __DMA_BUF_H__
>   
> +#include <linux/cgroup_gpu.h>
>   #include <linux/dma-buf-map.h>
>   #include <linux/file.h>
>   #include <linux/err.h>
> @@ -285,6 +286,23 @@ struct dma_buf_ops {
>   
>   	int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>   	void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> +
> +	/**
> +	 * @charge_to_cgroup:
> +	 *
> +	 * This is called by an exporter to charge a buffer to the specified
> +	 * cgroup.

Well that sentence makes absolutely no sense at all.

The dma_buf_ops are supposed to be called by the DMA-buf subsystem on 
behalves of the importer and never by the exporter itself.

I hope that this is just a documentation mixup.

Regards,
Christian.

>   The caller must hold a reference to @gpucg obtained via
> +	 * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> +	 * currently charged to before being charged to @gpucg. The caller must
> +	 * belong to the cgroup the buffer is currently charged to.
> +	 *
> +	 * This callback is optional.
> +	 *
> +	 * Returns:
> +	 *
> +	 * 0 on success or negative error code on failure.
> +	 */
> +	int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
>   };
>   
>   /**


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
       [not found]   ` <20220115010622.3185921-5-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2022-01-17  7:46     ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2022-01-17  7:46 UTC (permalink / raw)
  To: Hridya Valsaraju, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> The optional exporter op provides a way for processes to transfer
> charge of a buffer to a different process. This is essential for the
> cases where a central allocator process does allocations for various
> subsystems, hands over the fd to the client who
> requested the memory and drops all references to the allocated memory.
>
> Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
>   include/linux/dma-buf.h | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
>
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 7ab50076e7a6..d5e52f81cc6f 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -13,6 +13,7 @@
>   #ifndef __DMA_BUF_H__
>   #define __DMA_BUF_H__
>   
> +#include <linux/cgroup_gpu.h>
>   #include <linux/dma-buf-map.h>
>   #include <linux/file.h>
>   #include <linux/err.h>
> @@ -285,6 +286,23 @@ struct dma_buf_ops {
>   
>   	int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>   	void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> +
> +	/**
> +	 * @charge_to_cgroup:
> +	 *
> +	 * This is called by an exporter to charge a buffer to the specified
> +	 * cgroup.

Well that sentence makes absolutely no sense at all.

The dma_buf_ops are supposed to be called by the DMA-buf subsystem on 
behalves of the importer and never by the exporter itself.

I hope that this is just a documentation mixup.

Regards,
Christian.

>   The caller must hold a reference to @gpucg obtained via
> +	 * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> +	 * currently charged to before being charged to @gpucg. The caller must
> +	 * belong to the cgroup the buffer is currently charged to.
> +	 *
> +	 * This callback is optional.
> +	 *
> +	 * Returns:
> +	 *
> +	 * 0 on success or negative error code on failure.
> +	 */
> +	int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
>   };
>   
>   /**


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  2022-01-17  7:46     ` Christian König
  (?)
@ 2022-01-18 18:54       ` Hridya Valsaraju
  -1 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-18 18:54 UTC (permalink / raw)
  To: Christian König
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Suren Baghdasaryan,
	Sumit Semwal, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Dave Airlie, Jason Ekstrand, Matthew Auld, Matthew Brost, Li Li,
	Marco Ballesio, Miguel Ojeda, Hang Lu, Wedson Almeida Filho,
	Masahiro Yamada, Andrew Morton, Nathan Chancellor, Kees Cook,
	Nick Desaulniers, Chris Down, Vipin Sharma, Daniel Borkmann,
	Vlastimil Babka, Arnd Bergmann, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups, Kenny.Ho,
	daniels, kaleshsingh, tjmercier

On Sun, Jan 16, 2022 at 11:46 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> > The optional exporter op provides a way for processes to transfer
> > charge of a buffer to a different process. This is essential for the
> > cases where a central allocator process does allocations for various
> > subsystems, hands over the fd to the client who
> > requested the memory and drops all references to the allocated memory.
> >
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > ---
> >   include/linux/dma-buf.h | 18 ++++++++++++++++++
> >   1 file changed, 18 insertions(+)
> >
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 7ab50076e7a6..d5e52f81cc6f 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -13,6 +13,7 @@
> >   #ifndef __DMA_BUF_H__
> >   #define __DMA_BUF_H__
> >
> > +#include <linux/cgroup_gpu.h>
> >   #include <linux/dma-buf-map.h>
> >   #include <linux/file.h>
> >   #include <linux/err.h>
> > @@ -285,6 +286,23 @@ struct dma_buf_ops {
> >
> >       int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >       void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > +
> > +     /**
> > +      * @charge_to_cgroup:
> > +      *
> > +      * This is called by an exporter to charge a buffer to the specified
> > +      * cgroup.
>
> Well that sentence makes absolutely no sense at all.
>
> The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> behalves of the importer and never by the exporter itself.
>
> I hope that this is just a documentation mixup.

Thank you for taking a look Christian!

Yes, that was poor wording, sorry about that. It should instead say
that the op would be called by the process the buffer is currently
charged to in order to transfer the buffer's charge to a different
cgroup. This is helpful in the case where a process acts as an
allocator for multiple client processes and we would like the
allocated buffers to be charged to the clients who requested their
allocation(instead of the allocating process as is the default
behavior). In Android, the graphics allocator HAL process[1] does
most of the graphics allocations on behalf of various clients. After
allocation, the HAL process passes the fd to the client over binder
IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
uncharge the buffer from the HAL process and charge it to the client
process instead.

[1]: https://source.android.com/devices/graphics/arch-bq-gralloc

Regards,
Hridya


>
> Regards,
> Christian.
>
> >   The caller must hold a reference to @gpucg obtained via
> > +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> > +      * currently charged to before being charged to @gpucg. The caller must
> > +      * belong to the cgroup the buffer is currently charged to.
> > +      *
> > +      * This callback is optional.
> > +      *
> > +      * Returns:
> > +      *
> > +      * 0 on success or negative error code on failure.
> > +      */
> > +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> >   };
> >
> >   /**
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
@ 2022-01-18 18:54       ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-18 18:54 UTC (permalink / raw)
  To: Christian König
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	kaleshsingh, Joel Fernandes, Kees Cook, Matthew Brost, Kenny.Ho,
	Daniel Borkmann, Jonathan Corbet, Martijn Coenen,
	Masahiro Yamada, Wedson Almeida Filho, Matthew Auld,
	Miguel Ojeda, Dave Airlie, Laura Abbott, Marco Ballesio,
	Jason Ekstrand, linux-media, Li Li, Todd Kjos, Arnd Bergmann,
	Vlastimil Babka, Vipin Sharma, Nathan Chancellor, cgroups,
	Suren Baghdasaryan, tjmercier, Christian Brauner, linaro-mm-sig,
	Hang Lu, daniels, Chris Down, Greg Kroah-Hartman,
	Nick Desaulniers, linux-kernel, Liam Mark,
	Arve Hjønnevåg, Thomas Zimmermann, Johannes Weiner,
	Tejun Heo, Andrew Morton

On Sun, Jan 16, 2022 at 11:46 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> > The optional exporter op provides a way for processes to transfer
> > charge of a buffer to a different process. This is essential for the
> > cases where a central allocator process does allocations for various
> > subsystems, hands over the fd to the client who
> > requested the memory and drops all references to the allocated memory.
> >
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > ---
> >   include/linux/dma-buf.h | 18 ++++++++++++++++++
> >   1 file changed, 18 insertions(+)
> >
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 7ab50076e7a6..d5e52f81cc6f 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -13,6 +13,7 @@
> >   #ifndef __DMA_BUF_H__
> >   #define __DMA_BUF_H__
> >
> > +#include <linux/cgroup_gpu.h>
> >   #include <linux/dma-buf-map.h>
> >   #include <linux/file.h>
> >   #include <linux/err.h>
> > @@ -285,6 +286,23 @@ struct dma_buf_ops {
> >
> >       int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >       void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > +
> > +     /**
> > +      * @charge_to_cgroup:
> > +      *
> > +      * This is called by an exporter to charge a buffer to the specified
> > +      * cgroup.
>
> Well that sentence makes absolutely no sense at all.
>
> The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> behalves of the importer and never by the exporter itself.
>
> I hope that this is just a documentation mixup.

Thank you for taking a look Christian!

Yes, that was poor wording, sorry about that. It should instead say
that the op would be called by the process the buffer is currently
charged to in order to transfer the buffer's charge to a different
cgroup. This is helpful in the case where a process acts as an
allocator for multiple client processes and we would like the
allocated buffers to be charged to the clients who requested their
allocation(instead of the allocating process as is the default
behavior). In Android, the graphics allocator HAL process[1] does
most of the graphics allocations on behalf of various clients. After
allocation, the HAL process passes the fd to the client over binder
IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
uncharge the buffer from the HAL process and charge it to the client
process instead.

[1]: https://source.android.com/devices/graphics/arch-bq-gralloc

Regards,
Hridya


>
> Regards,
> Christian.
>
> >   The caller must hold a reference to @gpucg obtained via
> > +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> > +      * currently charged to before being charged to @gpucg. The caller must
> > +      * belong to the cgroup the buffer is currently charged to.
> > +      *
> > +      * This callback is optional.
> > +      *
> > +      * Returns:
> > +      *
> > +      * 0 on success or negative error code on failure.
> > +      */
> > +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> >   };
> >
> >   /**
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
@ 2022-01-18 18:54       ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-18 18:54 UTC (permalink / raw)
  To: Christian König
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Suren Baghdasaryan,
	Sumit Semwal, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz

On Sun, Jan 16, 2022 at 11:46 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> > The optional exporter op provides a way for processes to transfer
> > charge of a buffer to a different process. This is essential for the
> > cases where a central allocator process does allocations for various
> > subsystems, hands over the fd to the client who
> > requested the memory and drops all references to the allocated memory.
> >
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > ---
> >   include/linux/dma-buf.h | 18 ++++++++++++++++++
> >   1 file changed, 18 insertions(+)
> >
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 7ab50076e7a6..d5e52f81cc6f 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -13,6 +13,7 @@
> >   #ifndef __DMA_BUF_H__
> >   #define __DMA_BUF_H__
> >
> > +#include <linux/cgroup_gpu.h>
> >   #include <linux/dma-buf-map.h>
> >   #include <linux/file.h>
> >   #include <linux/err.h>
> > @@ -285,6 +286,23 @@ struct dma_buf_ops {
> >
> >       int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >       void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > +
> > +     /**
> > +      * @charge_to_cgroup:
> > +      *
> > +      * This is called by an exporter to charge a buffer to the specified
> > +      * cgroup.
>
> Well that sentence makes absolutely no sense at all.
>
> The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> behalves of the importer and never by the exporter itself.
>
> I hope that this is just a documentation mixup.

Thank you for taking a look Christian!

Yes, that was poor wording, sorry about that. It should instead say
that the op would be called by the process the buffer is currently
charged to in order to transfer the buffer's charge to a different
cgroup. This is helpful in the case where a process acts as an
allocator for multiple client processes and we would like the
allocated buffers to be charged to the clients who requested their
allocation(instead of the allocating process as is the default
behavior). In Android, the graphics allocator HAL process[1] does
most of the graphics allocations on behalf of various clients. After
allocation, the HAL process passes the fd to the client over binder
IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
uncharge the buffer from the HAL process and charge it to the client
process instead.

[1]: https://source.android.com/devices/graphics/arch-bq-gralloc

Regards,
Hridya


>
> Regards,
> Christian.
>
> >   The caller must hold a reference to @gpucg obtained via
> > +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> > +      * currently charged to before being charged to @gpucg. The caller must
> > +      * belong to the cgroup the buffer is currently charged to.
> > +      *
> > +      * This callback is optional.
> > +      *
> > +      * Returns:
> > +      *
> > +      * 0 on success or negative error code on failure.
> > +      */
> > +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> >   };
> >
> >   /**
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-01-15  1:06   ` Hridya Valsaraju
@ 2022-01-19 15:40     ` Randy Dunlap
  -1 siblings, 0 replies; 41+ messages in thread
From: Randy Dunlap @ 2022-01-19 15:40 UTC (permalink / raw)
  To: Hridya Valsaraju, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie, Rodrigo Vivi,
	Matthew Auld, Matthew Brost, Li Li, Marco Ballesio, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Miguel Ojeda,
	Vipin Sharma, Chris Down, Daniel Borkmann, Vlastimil Babka,
	Arnd Bergmann, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, kaleshsingh, tjmercier

Hi--

On 1/14/22 17:06, Hridya Valsaraju wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index cd23faa163d1..408910b21387 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -990,6 +990,13 @@ config BLK_CGROUP
>  
>  	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
>  
> +config CGROUP_GPU
> +       bool "gpu cgroup controller (EXPERIMENTAL)"
> +       select PAGE_COUNTER
> +       help
> +	Provides accounting and limit setting for memory allocations by the GPU
> +	and GPU-related subsystems.

Please follow coding-style for Kconfig files:

(from Documentation/process/coding-style.rst, section 10):

For all of the Kconfig* configuration files throughout the source tree,
the indentation is somewhat different.  Lines under a ``config`` definition
are indented with one tab, while help text is indented an additional two
spaces.


thanks.

-- 
~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
@ 2022-01-19 15:40     ` Randy Dunlap
  0 siblings, 0 replies; 41+ messages in thread
From: Randy Dunlap @ 2022-01-19 15:40 UTC (permalink / raw)
  To: Hridya Valsaraju, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Christian König,
	Tejun Heo, Zefan Li, Johannes Weiner, Dave Airlie, Rodrigo Vivi,
	Matthew Auld, Matthew Brost, Li Li, Marco Ballesio, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Miguel Ojeda,
	Vipin Sharma, Chris Down, Daniel Borkmann, Vlastimil Babka,
	Arnd Bergmann, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups
  Cc: Kenny.Ho, daniels, tjmercier, kaleshsingh

Hi--

On 1/14/22 17:06, Hridya Valsaraju wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index cd23faa163d1..408910b21387 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -990,6 +990,13 @@ config BLK_CGROUP
>  
>  	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
>  
> +config CGROUP_GPU
> +       bool "gpu cgroup controller (EXPERIMENTAL)"
> +       select PAGE_COUNTER
> +       help
> +	Provides accounting and limit setting for memory allocations by the GPU
> +	and GPU-related subsystems.

Please follow coding-style for Kconfig files:

(from Documentation/process/coding-style.rst, section 10):

For all of the Kconfig* configuration files throughout the source tree,
the indentation is somewhat different.  Lines under a ``config`` definition
are indented with one tab, while help text is indented an additional two
spaces.


thanks.

-- 
~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
       [not found]   ` <20220115010622.3185921-3-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2022-01-19 15:40     ` Randy Dunlap
  0 siblings, 0 replies; 41+ messages in thread
From: Randy Dunlap @ 2022-01-19 15:40 UTC (permalink / raw)
  To: Hridya Valsaraju, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott
  Cc: Kenny.Ho-5C7GfCeVMHo, daniels-ZGY8ohtN/8qB+jHODAdFcQ,
	kaleshsingh-hpIqsD4AKlfQT0dZR+AlfA,
	tjmercier-hpIqsD4AKlfQT0dZR+AlfA

Hi--

On 1/14/22 17:06, Hridya Valsaraju wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index cd23faa163d1..408910b21387 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -990,6 +990,13 @@ config BLK_CGROUP
>  
>  	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
>  
> +config CGROUP_GPU
> +       bool "gpu cgroup controller (EXPERIMENTAL)"
> +       select PAGE_COUNTER
> +       help
> +	Provides accounting and limit setting for memory allocations by the GPU
> +	and GPU-related subsystems.

Please follow coding-style for Kconfig files:

(from Documentation/process/coding-style.rst, section 10):

For all of the Kconfig* configuration files throughout the source tree,
the indentation is somewhat different.  Lines under a ``config`` definition
are indented with one tab, while help text is indented an additional two
spaces.


thanks.

-- 
~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  2022-01-18 18:54       ` Hridya Valsaraju
  (?)
@ 2022-01-19 15:54         ` Daniel Vetter
  -1 siblings, 0 replies; 41+ messages in thread
From: Daniel Vetter @ 2022-01-19 15:54 UTC (permalink / raw)
  To: Hridya Valsaraju
  Cc: Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Tejun Heo, Zefan Li,
	Johannes Weiner, Dave Airlie, Jason Ekstrand, Matthew Auld,
	Matthew Brost, Li Li, Marco Ballesio, Miguel Ojeda, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Chris Down,
	Vipin Sharma, Daniel Borkmann, Vlastimil Babka, Arnd Bergmann,
	dri-devel, linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	cgroups, Kenny.Ho, daniels, kaleshsingh, tjmercier

On Tue, Jan 18, 2022 at 10:54:16AM -0800, Hridya Valsaraju wrote:
> On Sun, Jan 16, 2022 at 11:46 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> > > The optional exporter op provides a way for processes to transfer
> > > charge of a buffer to a different process. This is essential for the
> > > cases where a central allocator process does allocations for various
> > > subsystems, hands over the fd to the client who
> > > requested the memory and drops all references to the allocated memory.
> > >
> > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > ---
> > >   include/linux/dma-buf.h | 18 ++++++++++++++++++
> > >   1 file changed, 18 insertions(+)
> > >
> > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > > index 7ab50076e7a6..d5e52f81cc6f 100644
> > > --- a/include/linux/dma-buf.h
> > > +++ b/include/linux/dma-buf.h
> > > @@ -13,6 +13,7 @@
> > >   #ifndef __DMA_BUF_H__
> > >   #define __DMA_BUF_H__
> > >
> > > +#include <linux/cgroup_gpu.h>
> > >   #include <linux/dma-buf-map.h>
> > >   #include <linux/file.h>
> > >   #include <linux/err.h>
> > > @@ -285,6 +286,23 @@ struct dma_buf_ops {
> > >
> > >       int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > >       void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > > +
> > > +     /**
> > > +      * @charge_to_cgroup:
> > > +      *
> > > +      * This is called by an exporter to charge a buffer to the specified
> > > +      * cgroup.
> >
> > Well that sentence makes absolutely no sense at all.
> >
> > The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> > behalves of the importer and never by the exporter itself.
> >
> > I hope that this is just a documentation mixup.
> 
> Thank you for taking a look Christian!
> 
> Yes, that was poor wording, sorry about that. It should instead say
> that the op would be called by the process the buffer is currently
> charged to in order to transfer the buffer's charge to a different
> cgroup. This is helpful in the case where a process acts as an
> allocator for multiple client processes and we would like the
> allocated buffers to be charged to the clients who requested their
> allocation(instead of the allocating process as is the default
> behavior). In Android, the graphics allocator HAL process[1] does
> most of the graphics allocations on behalf of various clients. After
> allocation, the HAL process passes the fd to the client over binder
> IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
> uncharge the buffer from the HAL process and charge it to the client
> process instead.
> 
> [1]: https://source.android.com/devices/graphics/arch-bq-gralloc

For that use-case, do we really need to have the vfunc abstraction and
force all exporters to do something reasonable with it?

I think just storing the cgrpus gpu memory bucket this is charged against
and doing this in a generic way would be a lot better.

That way we can also easily add other neat features in the future, like
e.g. ttm could take care of charge-assignement automatically maybe, or we
could print the current cgroups charge relationship in the sysfs info
file. Or anything else really.

I do feel that in general for gpu memory cgroups to be useful, we should
really have memory pools as a fairly strong concept. Otherwise every
driver/allocator/thing is going to come up with their own, and very likely
incompatible interpretation. And we end up with a supposed generic cgroups
interface which cannot actually be used in a driver/vendor agnostic way at
all.
-Daniel

> 
> Regards,
> Hridya
> 
> 
> >
> > Regards,
> > Christian.
> >
> > >   The caller must hold a reference to @gpucg obtained via
> > > +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> > > +      * currently charged to before being charged to @gpucg. The caller must
> > > +      * belong to the cgroup the buffer is currently charged to.
> > > +      *
> > > +      * This callback is optional.
> > > +      *
> > > +      * Returns:
> > > +      *
> > > +      * 0 on success or negative error code on failure.
> > > +      */
> > > +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> > >   };
> > >
> > >   /**
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
@ 2022-01-19 15:54         ` Daniel Vetter
  0 siblings, 0 replies; 41+ messages in thread
From: Daniel Vetter @ 2022-01-19 15:54 UTC (permalink / raw)
  To: Hridya Valsaraju
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	kaleshsingh, Joel Fernandes, Kees Cook, Matthew Brost, Kenny.Ho,
	Daniel Borkmann, Jonathan Corbet, Martijn Coenen,
	Masahiro Yamada, Wedson Almeida Filho, Matthew Auld,
	Miguel Ojeda, Dave Airlie, Laura Abbott, Marco Ballesio,
	Jason Ekstrand, linux-media, Li Li, Todd Kjos, Thomas Zimmermann,
	Arnd Bergmann, Vlastimil Babka, Vipin Sharma, Nathan Chancellor,
	cgroups, Suren Baghdasaryan, tjmercier, Christian Brauner,
	linaro-mm-sig, Hang Lu, daniels, Chris Down, Greg Kroah-Hartman,
	Nick Desaulniers, linux-kernel, Liam Mark, Christian König,
	Arve Hjønnevåg, Johannes Weiner, Tejun Heo,
	Andrew Morton

On Tue, Jan 18, 2022 at 10:54:16AM -0800, Hridya Valsaraju wrote:
> On Sun, Jan 16, 2022 at 11:46 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> > > The optional exporter op provides a way for processes to transfer
> > > charge of a buffer to a different process. This is essential for the
> > > cases where a central allocator process does allocations for various
> > > subsystems, hands over the fd to the client who
> > > requested the memory and drops all references to the allocated memory.
> > >
> > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > ---
> > >   include/linux/dma-buf.h | 18 ++++++++++++++++++
> > >   1 file changed, 18 insertions(+)
> > >
> > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > > index 7ab50076e7a6..d5e52f81cc6f 100644
> > > --- a/include/linux/dma-buf.h
> > > +++ b/include/linux/dma-buf.h
> > > @@ -13,6 +13,7 @@
> > >   #ifndef __DMA_BUF_H__
> > >   #define __DMA_BUF_H__
> > >
> > > +#include <linux/cgroup_gpu.h>
> > >   #include <linux/dma-buf-map.h>
> > >   #include <linux/file.h>
> > >   #include <linux/err.h>
> > > @@ -285,6 +286,23 @@ struct dma_buf_ops {
> > >
> > >       int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > >       void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > > +
> > > +     /**
> > > +      * @charge_to_cgroup:
> > > +      *
> > > +      * This is called by an exporter to charge a buffer to the specified
> > > +      * cgroup.
> >
> > Well that sentence makes absolutely no sense at all.
> >
> > The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> > behalves of the importer and never by the exporter itself.
> >
> > I hope that this is just a documentation mixup.
> 
> Thank you for taking a look Christian!
> 
> Yes, that was poor wording, sorry about that. It should instead say
> that the op would be called by the process the buffer is currently
> charged to in order to transfer the buffer's charge to a different
> cgroup. This is helpful in the case where a process acts as an
> allocator for multiple client processes and we would like the
> allocated buffers to be charged to the clients who requested their
> allocation(instead of the allocating process as is the default
> behavior). In Android, the graphics allocator HAL process[1] does
> most of the graphics allocations on behalf of various clients. After
> allocation, the HAL process passes the fd to the client over binder
> IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
> uncharge the buffer from the HAL process and charge it to the client
> process instead.
> 
> [1]: https://source.android.com/devices/graphics/arch-bq-gralloc

For that use-case, do we really need to have the vfunc abstraction and
force all exporters to do something reasonable with it?

I think just storing the cgrpus gpu memory bucket this is charged against
and doing this in a generic way would be a lot better.

That way we can also easily add other neat features in the future, like
e.g. ttm could take care of charge-assignement automatically maybe, or we
could print the current cgroups charge relationship in the sysfs info
file. Or anything else really.

I do feel that in general for gpu memory cgroups to be useful, we should
really have memory pools as a fairly strong concept. Otherwise every
driver/allocator/thing is going to come up with their own, and very likely
incompatible interpretation. And we end up with a supposed generic cgroups
interface which cannot actually be used in a driver/vendor agnostic way at
all.
-Daniel

> 
> Regards,
> Hridya
> 
> 
> >
> > Regards,
> > Christian.
> >
> > >   The caller must hold a reference to @gpucg obtained via
> > > +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> > > +      * currently charged to before being charged to @gpucg. The caller must
> > > +      * belong to the cgroup the buffer is currently charged to.
> > > +      *
> > > +      * This callback is optional.
> > > +      *
> > > +      * Returns:
> > > +      *
> > > +      * 0 on success or negative error code on failure.
> > > +      */
> > > +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> > >   };
> > >
> > >   /**
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
@ 2022-01-19 15:54         ` Daniel Vetter
  0 siblings, 0 replies; 41+ messages in thread
From: Daniel Vetter @ 2022-01-19 15:54 UTC (permalink / raw)
  To: Hridya Valsaraju
  Cc: Christian König, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott

On Tue, Jan 18, 2022 at 10:54:16AM -0800, Hridya Valsaraju wrote:
> On Sun, Jan 16, 2022 at 11:46 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> > > The optional exporter op provides a way for processes to transfer
> > > charge of a buffer to a different process. This is essential for the
> > > cases where a central allocator process does allocations for various
> > > subsystems, hands over the fd to the client who
> > > requested the memory and drops all references to the allocated memory.
> > >
> > > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > > ---
> > >   include/linux/dma-buf.h | 18 ++++++++++++++++++
> > >   1 file changed, 18 insertions(+)
> > >
> > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > > index 7ab50076e7a6..d5e52f81cc6f 100644
> > > --- a/include/linux/dma-buf.h
> > > +++ b/include/linux/dma-buf.h
> > > @@ -13,6 +13,7 @@
> > >   #ifndef __DMA_BUF_H__
> > >   #define __DMA_BUF_H__
> > >
> > > +#include <linux/cgroup_gpu.h>
> > >   #include <linux/dma-buf-map.h>
> > >   #include <linux/file.h>
> > >   #include <linux/err.h>
> > > @@ -285,6 +286,23 @@ struct dma_buf_ops {
> > >
> > >       int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > >       void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > > +
> > > +     /**
> > > +      * @charge_to_cgroup:
> > > +      *
> > > +      * This is called by an exporter to charge a buffer to the specified
> > > +      * cgroup.
> >
> > Well that sentence makes absolutely no sense at all.
> >
> > The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> > behalves of the importer and never by the exporter itself.
> >
> > I hope that this is just a documentation mixup.
> 
> Thank you for taking a look Christian!
> 
> Yes, that was poor wording, sorry about that. It should instead say
> that the op would be called by the process the buffer is currently
> charged to in order to transfer the buffer's charge to a different
> cgroup. This is helpful in the case where a process acts as an
> allocator for multiple client processes and we would like the
> allocated buffers to be charged to the clients who requested their
> allocation(instead of the allocating process as is the default
> behavior). In Android, the graphics allocator HAL process[1] does
> most of the graphics allocations on behalf of various clients. After
> allocation, the HAL process passes the fd to the client over binder
> IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
> uncharge the buffer from the HAL process and charge it to the client
> process instead.
> 
> [1]: https://source.android.com/devices/graphics/arch-bq-gralloc

For that use-case, do we really need to have the vfunc abstraction and
force all exporters to do something reasonable with it?

I think just storing the cgrpus gpu memory bucket this is charged against
and doing this in a generic way would be a lot better.

That way we can also easily add other neat features in the future, like
e.g. ttm could take care of charge-assignement automatically maybe, or we
could print the current cgroups charge relationship in the sysfs info
file. Or anything else really.

I do feel that in general for gpu memory cgroups to be useful, we should
really have memory pools as a fairly strong concept. Otherwise every
driver/allocator/thing is going to come up with their own, and very likely
incompatible interpretation. And we end up with a supposed generic cgroups
interface which cannot actually be used in a driver/vendor agnostic way at
all.
-Daniel

> 
> Regards,
> Hridya
> 
> 
> >
> > Regards,
> > Christian.
> >
> > >   The caller must hold a reference to @gpucg obtained via
> > > +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> > > +      * currently charged to before being charged to @gpucg. The caller must
> > > +      * belong to the cgroup the buffer is currently charged to.
> > > +      *
> > > +      * This callback is optional.
> > > +      *
> > > +      * Returns:
> > > +      *
> > > +      * 0 on success or negative error code on failure.
> > > +      */
> > > +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> > >   };
> > >
> > >   /**
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  2022-01-19 15:54         ` Daniel Vetter
  (?)
  (?)
@ 2022-01-19 15:58         ` Christian König
  2022-01-19 18:21             ` Hridya Valsaraju
  -1 siblings, 1 reply; 41+ messages in thread
From: Christian König @ 2022-01-19 15:58 UTC (permalink / raw)
  To: Hridya Valsaraju, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey, John Stultz, Tejun Heo, Zefan Li,
	Johannes Weiner, Dave Airlie, Jason Ekstrand, Matthew Auld,
	Matthew Brost, Li Li, Marco Ballesio, Miguel Ojeda, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Chris Down,
	Vipin Sharma, Daniel Borkmann, Vlastimil Babka, Arnd Bergmann,
	dri-devel, linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	cgroups, Kenny.Ho, daniels, kaleshsingh, tjmercier

Am 19.01.22 um 16:54 schrieb Daniel Vetter:
> On Tue, Jan 18, 2022 at 10:54:16AM -0800, Hridya Valsaraju wrote:
>> On Sun, Jan 16, 2022 at 11:46 PM Christian König
>> <christian.koenig@amd.com> wrote:
>>> Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
>>>> The optional exporter op provides a way for processes to transfer
>>>> charge of a buffer to a different process. This is essential for the
>>>> cases where a central allocator process does allocations for various
>>>> subsystems, hands over the fd to the client who
>>>> requested the memory and drops all references to the allocated memory.
>>>>
>>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
>>>> ---
>>>>    include/linux/dma-buf.h | 18 ++++++++++++++++++
>>>>    1 file changed, 18 insertions(+)
>>>>
>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>>> index 7ab50076e7a6..d5e52f81cc6f 100644
>>>> --- a/include/linux/dma-buf.h
>>>> +++ b/include/linux/dma-buf.h
>>>> @@ -13,6 +13,7 @@
>>>>    #ifndef __DMA_BUF_H__
>>>>    #define __DMA_BUF_H__
>>>>
>>>> +#include <linux/cgroup_gpu.h>
>>>>    #include <linux/dma-buf-map.h>
>>>>    #include <linux/file.h>
>>>>    #include <linux/err.h>
>>>> @@ -285,6 +286,23 @@ struct dma_buf_ops {
>>>>
>>>>        int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>>>>        void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>>>> +
>>>> +     /**
>>>> +      * @charge_to_cgroup:
>>>> +      *
>>>> +      * This is called by an exporter to charge a buffer to the specified
>>>> +      * cgroup.
>>> Well that sentence makes absolutely no sense at all.
>>>
>>> The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
>>> behalves of the importer and never by the exporter itself.
>>>
>>> I hope that this is just a documentation mixup.
>> Thank you for taking a look Christian!
>>
>> Yes, that was poor wording, sorry about that. It should instead say
>> that the op would be called by the process the buffer is currently
>> charged to in order to transfer the buffer's charge to a different
>> cgroup. This is helpful in the case where a process acts as an
>> allocator for multiple client processes and we would like the
>> allocated buffers to be charged to the clients who requested their
>> allocation(instead of the allocating process as is the default
>> behavior). In Android, the graphics allocator HAL process[1] does
>> most of the graphics allocations on behalf of various clients. After
>> allocation, the HAL process passes the fd to the client over binder
>> IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
>> uncharge the buffer from the HAL process and charge it to the client
>> process instead.
>>
>> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsource.android.com%2Fdevices%2Fgraphics%2Farch-bq-gralloc&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C838d25da974d4ea4257508d9db63eb70%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637782044488604857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Qn7JeyF5Rq9tnrGw1KgNuQkpu5RbcrvPhDOa1OBJ6TU%3D&amp;reserved=0
> For that use-case, do we really need to have the vfunc abstraction and
> force all exporters to do something reasonable with it?

I was about to write up a similar answer, but more from the technical side.

Why in the world should that be done on the DMA-buf object as a 
communication function between importer and exporter?

That design makes absolutely no sense at all to me.

Regards,
Christian.

>
> I think just storing the cgrpus gpu memory bucket this is charged against
> and doing this in a generic way would be a lot better.
>
> That way we can also easily add other neat features in the future, like
> e.g. ttm could take care of charge-assignement automatically maybe, or we
> could print the current cgroups charge relationship in the sysfs info
> file. Or anything else really.
>
> I do feel that in general for gpu memory cgroups to be useful, we should
> really have memory pools as a fairly strong concept. Otherwise every
> driver/allocator/thing is going to come up with their own, and very likely
> incompatible interpretation. And we end up with a supposed generic cgroups
> interface which cannot actually be used in a driver/vendor agnostic way at
> all.
> -Daniel
>
>> Regards,
>> Hridya
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>>    The caller must hold a reference to @gpucg obtained via
>>>> +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
>>>> +      * currently charged to before being charged to @gpucg. The caller must
>>>> +      * belong to the cgroup the buffer is currently charged to.
>>>> +      *
>>>> +      * This callback is optional.
>>>> +      *
>>>> +      * Returns:
>>>> +      *
>>>> +      * 0 on success or negative error code on failure.
>>>> +      */
>>>> +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
>>>>    };
>>>>
>>>>    /**


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
       [not found]         ` <Yeg0GGi0tdnnCLHg-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2022-01-19 15:58           ` Christian König
  0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2022-01-19 15:58 UTC (permalink / raw)
  To: Hridya Valsaraju, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Benjamin Gaignard, Liam Mark,
	Laura Abbott, Brian Starkey

Am 19.01.22 um 16:54 schrieb Daniel Vetter:
> On Tue, Jan 18, 2022 at 10:54:16AM -0800, Hridya Valsaraju wrote:
>> On Sun, Jan 16, 2022 at 11:46 PM Christian König
>> <christian.koenig-5C7GfCeVMHo@public.gmane.org> wrote:
>>> Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
>>>> The optional exporter op provides a way for processes to transfer
>>>> charge of a buffer to a different process. This is essential for the
>>>> cases where a central allocator process does allocations for various
>>>> subsystems, hands over the fd to the client who
>>>> requested the memory and drops all references to the allocated memory.
>>>>
>>>> Signed-off-by: Hridya Valsaraju <hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>    include/linux/dma-buf.h | 18 ++++++++++++++++++
>>>>    1 file changed, 18 insertions(+)
>>>>
>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>>> index 7ab50076e7a6..d5e52f81cc6f 100644
>>>> --- a/include/linux/dma-buf.h
>>>> +++ b/include/linux/dma-buf.h
>>>> @@ -13,6 +13,7 @@
>>>>    #ifndef __DMA_BUF_H__
>>>>    #define __DMA_BUF_H__
>>>>
>>>> +#include <linux/cgroup_gpu.h>
>>>>    #include <linux/dma-buf-map.h>
>>>>    #include <linux/file.h>
>>>>    #include <linux/err.h>
>>>> @@ -285,6 +286,23 @@ struct dma_buf_ops {
>>>>
>>>>        int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>>>>        void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>>>> +
>>>> +     /**
>>>> +      * @charge_to_cgroup:
>>>> +      *
>>>> +      * This is called by an exporter to charge a buffer to the specified
>>>> +      * cgroup.
>>> Well that sentence makes absolutely no sense at all.
>>>
>>> The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
>>> behalves of the importer and never by the exporter itself.
>>>
>>> I hope that this is just a documentation mixup.
>> Thank you for taking a look Christian!
>>
>> Yes, that was poor wording, sorry about that. It should instead say
>> that the op would be called by the process the buffer is currently
>> charged to in order to transfer the buffer's charge to a different
>> cgroup. This is helpful in the case where a process acts as an
>> allocator for multiple client processes and we would like the
>> allocated buffers to be charged to the clients who requested their
>> allocation(instead of the allocating process as is the default
>> behavior). In Android, the graphics allocator HAL process[1] does
>> most of the graphics allocations on behalf of various clients. After
>> allocation, the HAL process passes the fd to the client over binder
>> IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
>> uncharge the buffer from the HAL process and charge it to the client
>> process instead.
>>
>> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsource.android.com%2Fdevices%2Fgraphics%2Farch-bq-gralloc&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C838d25da974d4ea4257508d9db63eb70%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637782044488604857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Qn7JeyF5Rq9tnrGw1KgNuQkpu5RbcrvPhDOa1OBJ6TU%3D&amp;reserved=0
> For that use-case, do we really need to have the vfunc abstraction and
> force all exporters to do something reasonable with it?

I was about to write up a similar answer, but more from the technical side.

Why in the world should that be done on the DMA-buf object as a 
communication function between importer and exporter?

That design makes absolutely no sense at all to me.

Regards,
Christian.

>
> I think just storing the cgrpus gpu memory bucket this is charged against
> and doing this in a generic way would be a lot better.
>
> That way we can also easily add other neat features in the future, like
> e.g. ttm could take care of charge-assignement automatically maybe, or we
> could print the current cgroups charge relationship in the sysfs info
> file. Or anything else really.
>
> I do feel that in general for gpu memory cgroups to be useful, we should
> really have memory pools as a fairly strong concept. Otherwise every
> driver/allocator/thing is going to come up with their own, and very likely
> incompatible interpretation. And we end up with a supposed generic cgroups
> interface which cannot actually be used in a driver/vendor agnostic way at
> all.
> -Daniel
>
>> Regards,
>> Hridya
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>>    The caller must hold a reference to @gpucg obtained via
>>>> +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
>>>> +      * currently charged to before being charged to @gpucg. The caller must
>>>> +      * belong to the cgroup the buffer is currently charged to.
>>>> +      *
>>>> +      * This callback is optional.
>>>> +      *
>>>> +      * Returns:
>>>> +      *
>>>> +      * 0 on success or negative error code on failure.
>>>> +      */
>>>> +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
>>>>    };
>>>>
>>>>    /**


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
  2022-01-19 15:58         ` Christian König
  2022-01-19 18:21             ` Hridya Valsaraju
@ 2022-01-19 18:21             ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-19 18:21 UTC (permalink / raw)
  To: Christian König
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Suren Baghdasaryan,
	Sumit Semwal, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo, Zefan Li, Johannes Weiner,
	Dave Airlie, Jason Ekstrand, Matthew Auld, Matthew Brost, Li Li,
	Marco Ballesio, Miguel Ojeda, Hang Lu, Wedson Almeida Filho,
	Masahiro Yamada, Andrew Morton, Nathan Chancellor, Kees Cook,
	Nick Desaulniers, Chris Down, Vipin Sharma, Daniel Borkmann,
	Vlastimil Babka, Arnd Bergmann, dri-devel, linux-doc,
	linux-kernel, linux-media, linaro-mm-sig, cgroups, Kenny.Ho,
	daniels, kaleshsingh, tjmercier

On Wed, Jan 19, 2022 at 7:58 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 19.01.22 um 16:54 schrieb Daniel Vetter:
> > On Tue, Jan 18, 2022 at 10:54:16AM -0800, Hridya Valsaraju wrote:
> >> On Sun, Jan 16, 2022 at 11:46 PM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>> Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> >>>> The optional exporter op provides a way for processes to transfer
> >>>> charge of a buffer to a different process. This is essential for the
> >>>> cases where a central allocator process does allocations for various
> >>>> subsystems, hands over the fd to the client who
> >>>> requested the memory and drops all references to the allocated memory.
> >>>>
> >>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> >>>> ---
> >>>>    include/linux/dma-buf.h | 18 ++++++++++++++++++
> >>>>    1 file changed, 18 insertions(+)
> >>>>
> >>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> >>>> index 7ab50076e7a6..d5e52f81cc6f 100644
> >>>> --- a/include/linux/dma-buf.h
> >>>> +++ b/include/linux/dma-buf.h
> >>>> @@ -13,6 +13,7 @@
> >>>>    #ifndef __DMA_BUF_H__
> >>>>    #define __DMA_BUF_H__
> >>>>
> >>>> +#include <linux/cgroup_gpu.h>
> >>>>    #include <linux/dma-buf-map.h>
> >>>>    #include <linux/file.h>
> >>>>    #include <linux/err.h>
> >>>> @@ -285,6 +286,23 @@ struct dma_buf_ops {
> >>>>
> >>>>        int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >>>>        void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >>>> +
> >>>> +     /**
> >>>> +      * @charge_to_cgroup:
> >>>> +      *
> >>>> +      * This is called by an exporter to charge a buffer to the specified
> >>>> +      * cgroup.
> >>> Well that sentence makes absolutely no sense at all.
> >>>
> >>> The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> >>> behalves of the importer and never by the exporter itself.
> >>>
> >>> I hope that this is just a documentation mixup.
> >> Thank you for taking a look Christian!
> >>
> >> Yes, that was poor wording, sorry about that. It should instead say
> >> that the op would be called by the process the buffer is currently
> >> charged to in order to transfer the buffer's charge to a different
> >> cgroup. This is helpful in the case where a process acts as an
> >> allocator for multiple client processes and we would like the
> >> allocated buffers to be charged to the clients who requested their
> >> allocation(instead of the allocating process as is the default
> >> behavior). In Android, the graphics allocator HAL process[1] does
> >> most of the graphics allocations on behalf of various clients. After
> >> allocation, the HAL process passes the fd to the client over binder
> >> IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
> >> uncharge the buffer from the HAL process and charge it to the client
> >> process instead.
> >>
> >> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsource.android.com%2Fdevices%2Fgraphics%2Farch-bq-gralloc&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C838d25da974d4ea4257508d9db63eb70%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637782044488604857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Qn7JeyF5Rq9tnrGw1KgNuQkpu5RbcrvPhDOa1OBJ6TU%3D&amp;reserved=0
> > For that use-case, do we really need to have the vfunc abstraction and
> > force all exporters to do something reasonable with it?
>
> I was about to write up a similar answer, but more from the technical side.
>
> Why in the world should that be done on the DMA-buf object as a
> communication function between importer and exporter?
>
> That design makes absolutely no sense at all to me.
>
> Regards,
> Christian.
>
> >
> > I think just storing the cgrpus gpu memory bucket this is charged against
> > and doing this in a generic way would be a lot better.
> >
> > That way we can also easily add other neat features in the future, like
> > e.g. ttm could take care of charge-assignement automatically maybe, or we
> > could print the current cgroups charge relationship in the sysfs info
> > file. Or anything else really.

Thank you for the comments Daniel and Christian! I made the
charge/uncharge/transfer part of the exporter implementation since it
provided exporters a choice on whether they wanted to enable cgroup
memory accounting for the buffers they were exporting. I also see the
benefits of making the charge/uncharge/transfer generic by moving it
to the DMA-BUF framework like you are suggesting though. We will move
to a more generic design in the next version of the RFC.

Regards,
Hridya

> >
> > I do feel that in general for gpu memory cgroups to be useful, we should
> > really have memory pools as a fairly strong concept. Otherwise every
> > driver/allocator/thing is going to come up with their own, and very likely
> > incompatible interpretation. And we end up with a supposed generic cgroups
> > interface which cannot actually be used in a driver/vendor agnostic way at
> > all.
> > -Daniel
> >
> >> Regards,
> >> Hridya
> >>
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>    The caller must hold a reference to @gpucg obtained via
> >>>> +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> >>>> +      * currently charged to before being charged to @gpucg. The caller must
> >>>> +      * belong to the cgroup the buffer is currently charged to.
> >>>> +      *
> >>>> +      * This callback is optional.
> >>>> +      *
> >>>> +      * Returns:
> >>>> +      *
> >>>> +      * 0 on success or negative error code on failure.
> >>>> +      */
> >>>> +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> >>>>    };
> >>>>
> >>>>    /**
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
@ 2022-01-19 18:21             ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-19 18:21 UTC (permalink / raw)
  To: Christian König
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	kaleshsingh, Joel Fernandes, daniels, Matthew Brost, Kenny.Ho,
	Daniel Borkmann, Jonathan Corbet, Martijn Coenen,
	Masahiro Yamada, Wedson Almeida Filho, Matthew Auld,
	Miguel Ojeda, Dave Airlie, Laura Abbott, Marco Ballesio,
	Jason Ekstrand, linux-media, Li Li, Todd Kjos, Kees Cook,
	Arnd Bergmann, Vlastimil Babka, Vipin Sharma, Nathan Chancellor,
	cgroups, Suren Baghdasaryan, tjmercier, Christian Brauner,
	linaro-mm-sig, Hang Lu, Chris Down, Greg Kroah-Hartman,
	Nick Desaulniers, linux-kernel, Liam Mark,
	Arve Hjønnevåg, Thomas Zimmermann, Johannes Weiner,
	Tejun Heo, Andrew Morton

On Wed, Jan 19, 2022 at 7:58 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 19.01.22 um 16:54 schrieb Daniel Vetter:
> > On Tue, Jan 18, 2022 at 10:54:16AM -0800, Hridya Valsaraju wrote:
> >> On Sun, Jan 16, 2022 at 11:46 PM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>> Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> >>>> The optional exporter op provides a way for processes to transfer
> >>>> charge of a buffer to a different process. This is essential for the
> >>>> cases where a central allocator process does allocations for various
> >>>> subsystems, hands over the fd to the client who
> >>>> requested the memory and drops all references to the allocated memory.
> >>>>
> >>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> >>>> ---
> >>>>    include/linux/dma-buf.h | 18 ++++++++++++++++++
> >>>>    1 file changed, 18 insertions(+)
> >>>>
> >>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> >>>> index 7ab50076e7a6..d5e52f81cc6f 100644
> >>>> --- a/include/linux/dma-buf.h
> >>>> +++ b/include/linux/dma-buf.h
> >>>> @@ -13,6 +13,7 @@
> >>>>    #ifndef __DMA_BUF_H__
> >>>>    #define __DMA_BUF_H__
> >>>>
> >>>> +#include <linux/cgroup_gpu.h>
> >>>>    #include <linux/dma-buf-map.h>
> >>>>    #include <linux/file.h>
> >>>>    #include <linux/err.h>
> >>>> @@ -285,6 +286,23 @@ struct dma_buf_ops {
> >>>>
> >>>>        int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >>>>        void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >>>> +
> >>>> +     /**
> >>>> +      * @charge_to_cgroup:
> >>>> +      *
> >>>> +      * This is called by an exporter to charge a buffer to the specified
> >>>> +      * cgroup.
> >>> Well that sentence makes absolutely no sense at all.
> >>>
> >>> The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> >>> behalves of the importer and never by the exporter itself.
> >>>
> >>> I hope that this is just a documentation mixup.
> >> Thank you for taking a look Christian!
> >>
> >> Yes, that was poor wording, sorry about that. It should instead say
> >> that the op would be called by the process the buffer is currently
> >> charged to in order to transfer the buffer's charge to a different
> >> cgroup. This is helpful in the case where a process acts as an
> >> allocator for multiple client processes and we would like the
> >> allocated buffers to be charged to the clients who requested their
> >> allocation(instead of the allocating process as is the default
> >> behavior). In Android, the graphics allocator HAL process[1] does
> >> most of the graphics allocations on behalf of various clients. After
> >> allocation, the HAL process passes the fd to the client over binder
> >> IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
> >> uncharge the buffer from the HAL process and charge it to the client
> >> process instead.
> >>
> >> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsource.android.com%2Fdevices%2Fgraphics%2Farch-bq-gralloc&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C838d25da974d4ea4257508d9db63eb70%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637782044488604857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Qn7JeyF5Rq9tnrGw1KgNuQkpu5RbcrvPhDOa1OBJ6TU%3D&amp;reserved=0
> > For that use-case, do we really need to have the vfunc abstraction and
> > force all exporters to do something reasonable with it?
>
> I was about to write up a similar answer, but more from the technical side.
>
> Why in the world should that be done on the DMA-buf object as a
> communication function between importer and exporter?
>
> That design makes absolutely no sense at all to me.
>
> Regards,
> Christian.
>
> >
> > I think just storing the cgrpus gpu memory bucket this is charged against
> > and doing this in a generic way would be a lot better.
> >
> > That way we can also easily add other neat features in the future, like
> > e.g. ttm could take care of charge-assignement automatically maybe, or we
> > could print the current cgroups charge relationship in the sysfs info
> > file. Or anything else really.

Thank you for the comments Daniel and Christian! I made the
charge/uncharge/transfer part of the exporter implementation since it
provided exporters a choice on whether they wanted to enable cgroup
memory accounting for the buffers they were exporting. I also see the
benefits of making the charge/uncharge/transfer generic by moving it
to the DMA-BUF framework like you are suggesting though. We will move
to a more generic design in the next version of the RFC.

Regards,
Hridya

> >
> > I do feel that in general for gpu memory cgroups to be useful, we should
> > really have memory pools as a fairly strong concept. Otherwise every
> > driver/allocator/thing is going to come up with their own, and very likely
> > incompatible interpretation. And we end up with a supposed generic cgroups
> > interface which cannot actually be used in a driver/vendor agnostic way at
> > all.
> > -Daniel
> >
> >> Regards,
> >> Hridya
> >>
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>    The caller must hold a reference to @gpucg obtained via
> >>>> +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> >>>> +      * currently charged to before being charged to @gpucg. The caller must
> >>>> +      * belong to the cgroup the buffer is currently charged to.
> >>>> +      *
> >>>> +      * This callback is optional.
> >>>> +      *
> >>>> +      * Returns:
> >>>> +      *
> >>>> +      * 0 on success or negative error code on failure.
> >>>> +      */
> >>>> +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> >>>>    };
> >>>>
> >>>>    /**
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup.
@ 2022-01-19 18:21             ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-19 18:21 UTC (permalink / raw)
  To: Christian König
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Suren Baghdasaryan,
	Sumit Semwal, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Tejun Heo

On Wed, Jan 19, 2022 at 7:58 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 19.01.22 um 16:54 schrieb Daniel Vetter:
> > On Tue, Jan 18, 2022 at 10:54:16AM -0800, Hridya Valsaraju wrote:
> >> On Sun, Jan 16, 2022 at 11:46 PM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>> Am 15.01.22 um 02:06 schrieb Hridya Valsaraju:
> >>>> The optional exporter op provides a way for processes to transfer
> >>>> charge of a buffer to a different process. This is essential for the
> >>>> cases where a central allocator process does allocations for various
> >>>> subsystems, hands over the fd to the client who
> >>>> requested the memory and drops all references to the allocated memory.
> >>>>
> >>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> >>>> ---
> >>>>    include/linux/dma-buf.h | 18 ++++++++++++++++++
> >>>>    1 file changed, 18 insertions(+)
> >>>>
> >>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> >>>> index 7ab50076e7a6..d5e52f81cc6f 100644
> >>>> --- a/include/linux/dma-buf.h
> >>>> +++ b/include/linux/dma-buf.h
> >>>> @@ -13,6 +13,7 @@
> >>>>    #ifndef __DMA_BUF_H__
> >>>>    #define __DMA_BUF_H__
> >>>>
> >>>> +#include <linux/cgroup_gpu.h>
> >>>>    #include <linux/dma-buf-map.h>
> >>>>    #include <linux/file.h>
> >>>>    #include <linux/err.h>
> >>>> @@ -285,6 +286,23 @@ struct dma_buf_ops {
> >>>>
> >>>>        int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >>>>        void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >>>> +
> >>>> +     /**
> >>>> +      * @charge_to_cgroup:
> >>>> +      *
> >>>> +      * This is called by an exporter to charge a buffer to the specified
> >>>> +      * cgroup.
> >>> Well that sentence makes absolutely no sense at all.
> >>>
> >>> The dma_buf_ops are supposed to be called by the DMA-buf subsystem on
> >>> behalves of the importer and never by the exporter itself.
> >>>
> >>> I hope that this is just a documentation mixup.
> >> Thank you for taking a look Christian!
> >>
> >> Yes, that was poor wording, sorry about that. It should instead say
> >> that the op would be called by the process the buffer is currently
> >> charged to in order to transfer the buffer's charge to a different
> >> cgroup. This is helpful in the case where a process acts as an
> >> allocator for multiple client processes and we would like the
> >> allocated buffers to be charged to the clients who requested their
> >> allocation(instead of the allocating process as is the default
> >> behavior). In Android, the graphics allocator HAL process[1] does
> >> most of the graphics allocations on behalf of various clients. After
> >> allocation, the HAL process passes the fd to the client over binder
> >> IPC and the binder driver invokes the charge_to_cgroup() DMA-BUF op to
> >> uncharge the buffer from the HAL process and charge it to the client
> >> process instead.
> >>
> >> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsource.android.com%2Fdevices%2Fgraphics%2Farch-bq-gralloc&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C838d25da974d4ea4257508d9db63eb70%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637782044488604857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Qn7JeyF5Rq9tnrGw1KgNuQkpu5RbcrvPhDOa1OBJ6TU%3D&amp;reserved=0
> > For that use-case, do we really need to have the vfunc abstraction and
> > force all exporters to do something reasonable with it?
>
> I was about to write up a similar answer, but more from the technical side.
>
> Why in the world should that be done on the DMA-buf object as a
> communication function between importer and exporter?
>
> That design makes absolutely no sense at all to me.
>
> Regards,
> Christian.
>
> >
> > I think just storing the cgrpus gpu memory bucket this is charged against
> > and doing this in a generic way would be a lot better.
> >
> > That way we can also easily add other neat features in the future, like
> > e.g. ttm could take care of charge-assignement automatically maybe, or we
> > could print the current cgroups charge relationship in the sysfs info
> > file. Or anything else really.

Thank you for the comments Daniel and Christian! I made the
charge/uncharge/transfer part of the exporter implementation since it
provided exporters a choice on whether they wanted to enable cgroup
memory accounting for the buffers they were exporting. I also see the
benefits of making the charge/uncharge/transfer generic by moving it
to the DMA-BUF framework like you are suggesting though. We will move
to a more generic design in the next version of the RFC.

Regards,
Hridya

> >
> > I do feel that in general for gpu memory cgroups to be useful, we should
> > really have memory pools as a fairly strong concept. Otherwise every
> > driver/allocator/thing is going to come up with their own, and very likely
> > incompatible interpretation. And we end up with a supposed generic cgroups
> > interface which cannot actually be used in a driver/vendor agnostic way at
> > all.
> > -Daniel
> >
> >> Regards,
> >> Hridya
> >>
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>    The caller must hold a reference to @gpucg obtained via
> >>>> +      * gpucg_get(). The DMA-BUF will be uncharged from the cgroup it is
> >>>> +      * currently charged to before being charged to @gpucg. The caller must
> >>>> +      * belong to the cgroup the buffer is currently charged to.
> >>>> +      *
> >>>> +      * This callback is optional.
> >>>> +      *
> >>>> +      * Returns:
> >>>> +      *
> >>>> +      * 0 on success or negative error code on failure.
> >>>> +      */
> >>>> +     int (*charge_to_cgroup)(struct dma_buf *dmabuf, struct gpucg *gpucg);
> >>>>    };
> >>>>
> >>>>    /**
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
  2022-01-19 15:40     ` Randy Dunlap
  (?)
@ 2022-01-19 18:24       ` Hridya Valsaraju
  -1 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-19 18:24 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Suren Baghdasaryan,
	Sumit Semwal, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz, Christian König, Tejun Heo,
	Zefan Li, Johannes Weiner, Dave Airlie, Rodrigo Vivi,
	Matthew Auld, Matthew Brost, Li Li, Marco Ballesio, Hang Lu,
	Wedson Almeida Filho, Masahiro Yamada, Andrew Morton,
	Nathan Chancellor, Kees Cook, Nick Desaulniers, Miguel Ojeda,
	Vipin Sharma, Chris Down, Daniel Borkmann, Vlastimil Babka,
	Arnd Bergmann, dri-devel, linux-doc, linux-kernel, linux-media,
	linaro-mm-sig, cgroups, Kenny.Ho, daniels, kaleshsingh,
	tjmercier

On Wed, Jan 19, 2022 at 7:40 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> Hi--
>
> On 1/14/22 17:06, Hridya Valsaraju wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index cd23faa163d1..408910b21387 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -990,6 +990,13 @@ config BLK_CGROUP
> >
> >       See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
> >
> > +config CGROUP_GPU
> > +       bool "gpu cgroup controller (EXPERIMENTAL)"
> > +       select PAGE_COUNTER
> > +       help
> > +     Provides accounting and limit setting for memory allocations by the GPU
> > +     and GPU-related subsystems.
>
> Please follow coding-style for Kconfig files:
>
> (from Documentation/process/coding-style.rst, section 10):
>
> For all of the Kconfig* configuration files throughout the source tree,
> the indentation is somewhat different.  Lines under a ``config`` definition
> are indented with one tab, while help text is indented an additional two
> spaces.

Thanks Randy, sounds good! Will fix it in the next version!

>
>
> thanks.

>
> --
> ~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
@ 2022-01-19 18:24       ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-19 18:24 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Zefan Li, linux-doc, David Airlie, dri-devel, Benjamin Gaignard,
	kaleshsingh, Joel Fernandes, Kees Cook, Matthew Brost, Kenny.Ho,
	Daniel Borkmann, Jonathan Corbet, Martijn Coenen,
	Masahiro Yamada, Wedson Almeida Filho, Matthew Auld,
	Miguel Ojeda, Dave Airlie, Laura Abbott, Marco Ballesio,
	linux-media, Li Li, Todd Kjos, Arnd Bergmann, Vlastimil Babka,
	Vipin Sharma, Nathan Chancellor, Rodrigo Vivi, cgroups,
	Suren Baghdasaryan, tjmercier, Christian Brauner, linaro-mm-sig,
	Hang Lu, daniels, Chris Down, Greg Kroah-Hartman,
	Nick Desaulniers, linux-kernel, Liam Mark, Christian König,
	Arve Hjønnevåg, Thomas Zimmermann, Johannes Weiner,
	Tejun Heo, Andrew Morton

On Wed, Jan 19, 2022 at 7:40 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> Hi--
>
> On 1/14/22 17:06, Hridya Valsaraju wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index cd23faa163d1..408910b21387 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -990,6 +990,13 @@ config BLK_CGROUP
> >
> >       See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
> >
> > +config CGROUP_GPU
> > +       bool "gpu cgroup controller (EXPERIMENTAL)"
> > +       select PAGE_COUNTER
> > +       help
> > +     Provides accounting and limit setting for memory allocations by the GPU
> > +     and GPU-related subsystems.
>
> Please follow coding-style for Kconfig files:
>
> (from Documentation/process/coding-style.rst, section 10):
>
> For all of the Kconfig* configuration files throughout the source tree,
> the indentation is somewhat different.  Lines under a ``config`` definition
> are indented with one tab, while help text is indented an additional two
> spaces.

Thanks Randy, sounds good! Will fix it in the next version!

>
>
> thanks.

>
> --
> ~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory
@ 2022-01-19 18:24       ` Hridya Valsaraju
  0 siblings, 0 replies; 41+ messages in thread
From: Hridya Valsaraju @ 2022-01-19 18:24 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Jonathan Corbet, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Suren Baghdasaryan,
	Sumit Semwal, Benjamin Gaignard, Liam Mark, Laura Abbott,
	Brian Starkey, John Stultz

On Wed, Jan 19, 2022 at 7:40 AM Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
>
> Hi--
>
> On 1/14/22 17:06, Hridya Valsaraju wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index cd23faa163d1..408910b21387 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -990,6 +990,13 @@ config BLK_CGROUP
> >
> >       See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
> >
> > +config CGROUP_GPU
> > +       bool "gpu cgroup controller (EXPERIMENTAL)"
> > +       select PAGE_COUNTER
> > +       help
> > +     Provides accounting and limit setting for memory allocations by the GPU
> > +     and GPU-related subsystems.
>
> Please follow coding-style for Kconfig files:
>
> (from Documentation/process/coding-style.rst, section 10):
>
> For all of the Kconfig* configuration files throughout the source tree,
> the indentation is somewhat different.  Lines under a ``config`` definition
> are indented with one tab, while help text is indented an additional two
> spaces.

Thanks Randy, sounds good! Will fix it in the next version!

>
>
> thanks.

>
> --
> ~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-01-19 18:25 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-15  1:05 [RFC 0/6] Proposal for a GPU cgroup controller Hridya Valsaraju
2022-01-15  1:05 ` Hridya Valsaraju
2022-01-15  1:05 ` Hridya Valsaraju
2022-01-15  1:05 ` [RFC 1/6] gpu: rfc: " Hridya Valsaraju
2022-01-15  1:05   ` Hridya Valsaraju
2022-01-15  1:06 ` [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory Hridya Valsaraju
2022-01-15  1:06   ` Hridya Valsaraju
     [not found]   ` <20220115010622.3185921-3-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-01-19 15:40     ` Randy Dunlap
2022-01-19 15:40   ` Randy Dunlap
2022-01-19 15:40     ` Randy Dunlap
2022-01-19 18:24     ` Hridya Valsaraju
2022-01-19 18:24       ` Hridya Valsaraju
2022-01-19 18:24       ` Hridya Valsaraju
2022-01-15  1:06 ` [RFC 3/6] dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs Hridya Valsaraju
2022-01-15  1:06   ` Hridya Valsaraju
     [not found] ` <20220115010622.3185921-1-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-01-15  1:05   ` [RFC 1/6] gpu: rfc: Proposal for a GPU cgroup controller Hridya Valsaraju
2022-01-15  1:06   ` [RFC 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory Hridya Valsaraju
2022-01-15  1:06   ` [RFC 3/6] dmabuf: heaps: Use the GPU cgroup charge/uncharge APIs Hridya Valsaraju
2022-01-15  1:06   ` [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup Hridya Valsaraju
2022-01-15  1:06   ` [RFC 6/6] android: binder: Add a buffer flag to relinquish ownership of fds Hridya Valsaraju
2022-01-15  1:06 ` [RFC 4/6] dma-buf: Add DMA-BUF exporter op to charge a DMA-BUF to a cgroup Hridya Valsaraju
2022-01-15  1:06   ` Hridya Valsaraju
2022-01-17  7:46   ` Christian König
2022-01-17  7:46     ` Christian König
2022-01-18 18:54     ` Hridya Valsaraju
2022-01-18 18:54       ` Hridya Valsaraju
2022-01-18 18:54       ` Hridya Valsaraju
2022-01-19 15:54       ` Daniel Vetter
2022-01-19 15:54         ` Daniel Vetter
2022-01-19 15:54         ` Daniel Vetter
2022-01-19 15:58         ` Christian König
2022-01-19 18:21           ` Hridya Valsaraju
2022-01-19 18:21             ` Hridya Valsaraju
2022-01-19 18:21             ` Hridya Valsaraju
     [not found]         ` <Yeg0GGi0tdnnCLHg-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2022-01-19 15:58           ` Christian König
     [not found]   ` <20220115010622.3185921-5-hridya-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-01-17  7:46     ` Christian König
2022-01-15  1:06 ` [RFC 5/6] dmabuf: system_heap: implement dma-buf op for GPU cgroup charge transfer Hridya Valsaraju
2022-01-15  1:06   ` Hridya Valsaraju
2022-01-15  1:06   ` Hridya Valsaraju
2022-01-15  1:06 ` [RFC 6/6] android: binder: Add a buffer flag to relinquish ownership of fds Hridya Valsaraju
2022-01-15  1:06   ` Hridya Valsaraju

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.