linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Track exported dma-buffers with memcg
@ 2023-01-09 21:38 T.J. Mercier
  2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: T.J. Mercier @ 2023-01-09 21:38 UTC (permalink / raw)
  To: tjmercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Eric Paris
  Cc: daniel.vetter, android-mm, jstultz, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux

Based on discussions at LPC, this series adds a memory.stat counter for
exported dmabufs. This counter allows us to continue tracking
system-wide total exported buffer sizes which there is no longer any
way to get without DMABUF_SYSFS_STATS, and adds a new capability to
track per-cgroup exported buffer sizes. The total (root counter) is
helpful for accounting in-kernel dmabuf use (by comparing with the sum
of child nodes or with the sum of sizes of mapped buffers or FD
references in procfs) in addition to helping identify driver memory
leaks when in-kernel use continually increases over time. With
per-application cgroups, the per-cgroup counter allows us to quickly
see how much dma-buf memory an application has caused to be allocated.
This avoids the need to read through all of procfs which can be a
lengthy process, and causes the charge to "stick" to the allocating
process/cgroup as long as the buffer is alive, regardless of how the
buffer is shared (unless the charge is transferred).

The first patch adds the counter to memcg. The next two patches allow
the charge for a buffer to be transferred across cgroups which is
necessary because of the way most dmabufs are allocated from a central
process on Android. The fourth patch adds a SELinux hook to binder in
order to control who is allowed to transfer buffer charges.

[1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/

Hridya Valsaraju (1):
  binder: Add flags to relinquish ownership of fds

T.J. Mercier (3):
  memcg: Track exported dma-buffers
  dmabuf: Add cgroup charge transfer function
  security: binder: Add transfer_charge SElinux hook

 Documentation/admin-guide/cgroup-v2.rst |  5 +++
 drivers/android/binder.c                | 36 +++++++++++++++--
 drivers/dma-buf/dma-buf.c               | 54 +++++++++++++++++++++++--
 include/linux/dma-buf.h                 |  5 +++
 include/linux/lsm_hook_defs.h           |  2 +
 include/linux/lsm_hooks.h               |  6 +++
 include/linux/memcontrol.h              |  7 ++++
 include/linux/security.h                |  2 +
 include/uapi/linux/android/binder.h     | 23 +++++++++--
 mm/memcontrol.c                         |  4 ++
 security/security.c                     |  6 +++
 security/selinux/hooks.c                |  9 +++++
 security/selinux/include/classmap.h     |  2 +-
 13 files changed, 149 insertions(+), 12 deletions(-)


base-commit: b7bfaa761d760e72a969d116517eaa12e404c262
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] memcg: Track exported dma-buffers
  2023-01-09 21:38 [PATCH 0/4] Track exported dma-buffers with memcg T.J. Mercier
@ 2023-01-09 21:38 ` T.J. Mercier
  2023-01-10  8:58   ` Michal Hocko
  2023-01-09 21:38 ` [PATCH 2/4] dmabuf: Add cgroup charge transfer function T.J. Mercier
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: T.J. Mercier @ 2023-01-09 21:38 UTC (permalink / raw)
  To: tjmercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Sumit Semwal, Christian König, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton
  Cc: daniel.vetter, android-mm, jstultz, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm

When a buffer is exported to userspace, use memcg to attribute the
buffer to the allocating cgroup until all buffer references are
released.

Unlike the dmabuf sysfs stats implementation, this memcg accounting
avoids contention over the kernfs_rwsem incurred when creating or
removing nodes.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 4 ++++
 drivers/dma-buf/dma-buf.c               | 5 +++++
 include/linux/dma-buf.h                 | 3 +++
 include/linux/memcontrol.h              | 1 +
 mm/memcontrol.c                         | 4 ++++
 5 files changed, 17 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index c8ae7c897f14..538ae22bc514 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1455,6 +1455,10 @@ PAGE_SIZE multiple when read back.
 		Amount of memory used for storing in-kernel data
 		structures.
 
+	  dmabuf (npn)
+		Amount of memory used for exported DMA buffers allocated by the cgroup.
+		Stays with the allocating cgroup regardless of how the buffer is shared.
+
 	  workingset_refault_anon
 		Number of refaults of previously evicted anonymous pages.
 
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e6528767efc7..ac45dd101c4d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -75,6 +75,8 @@ static void dma_buf_release(struct dentry *dentry)
 	 */
 	BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
 
+	mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, -dmabuf->size);
+	mem_cgroup_put(dmabuf->memcg);
 	dma_buf_stats_teardown(dmabuf);
 	dmabuf->ops->release(dmabuf);
 
@@ -673,6 +675,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	if (ret)
 		goto err_dmabuf;
 
+	dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
+	mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, dmabuf->size);
+
 	file->private_data = dmabuf;
 	file->f_path.dentry->d_fsdata = dmabuf;
 	dmabuf->file = file;
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 6fa8d4e29719..1f0ffb8e4bf5 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -22,6 +22,7 @@
 #include <linux/fs.h>
 #include <linux/dma-fence.h>
 #include <linux/wait.h>
+#include <linux/memcontrol.h>
 
 struct device;
 struct dma_buf;
@@ -446,6 +447,8 @@ struct dma_buf {
 		struct dma_buf *dmabuf;
 	} *sysfs_entry;
 #endif
+	/* The cgroup to which this buffer is currently attributed */
+	struct mem_cgroup *memcg;
 };
 
 /**
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d3c8203cab6c..1c1da2da20a6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -37,6 +37,7 @@ enum memcg_stat_item {
 	MEMCG_KMEM,
 	MEMCG_ZSWAP_B,
 	MEMCG_ZSWAPPED,
+	MEMCG_DMABUF,
 	MEMCG_NR_STAT,
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab457f0394ab..680189bec7e0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1502,6 +1502,7 @@ static const struct memory_stat memory_stats[] = {
 	{ "unevictable",		NR_UNEVICTABLE			},
 	{ "slab_reclaimable",		NR_SLAB_RECLAIMABLE_B		},
 	{ "slab_unreclaimable",		NR_SLAB_UNRECLAIMABLE_B		},
+	{ "dmabuf",			MEMCG_DMABUF			},
 
 	/* The memory events */
 	{ "workingset_refault_anon",	WORKINGSET_REFAULT_ANON		},
@@ -1519,6 +1520,7 @@ static int memcg_page_state_unit(int item)
 	switch (item) {
 	case MEMCG_PERCPU_B:
 	case MEMCG_ZSWAP_B:
+	case MEMCG_DMABUF:
 	case NR_SLAB_RECLAIMABLE_B:
 	case NR_SLAB_UNRECLAIMABLE_B:
 	case WORKINGSET_REFAULT_ANON:
@@ -4042,6 +4044,7 @@ static const unsigned int memcg1_stats[] = {
 	WORKINGSET_REFAULT_ANON,
 	WORKINGSET_REFAULT_FILE,
 	MEMCG_SWAP,
+	MEMCG_DMABUF,
 };
 
 static const char *const memcg1_stat_names[] = {
@@ -4057,6 +4060,7 @@ static const char *const memcg1_stat_names[] = {
 	"workingset_refault_anon",
 	"workingset_refault_file",
 	"swap",
+	"dmabuf",
 };
 
 /* Universal VM events cgroup1 shows, original sort order */
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] dmabuf: Add cgroup charge transfer function
  2023-01-09 21:38 [PATCH 0/4] Track exported dma-buffers with memcg T.J. Mercier
  2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier
@ 2023-01-09 21:38 ` T.J. Mercier
  2023-01-09 21:38 ` [PATCH 3/4] binder: Add flags to relinquish ownership of fds T.J. Mercier
  2023-01-10  0:18 ` [PATCH 0/4] Track exported dma-buffers with memcg Shakeel Butt
  3 siblings, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2023-01-09 21:38 UTC (permalink / raw)
  To: tjmercier, Sumit Semwal, Christian König
  Cc: hannes, daniel.vetter, android-mm, jstultz, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel

The dma_buf_transfer_charge function provides a way for processes to
transfer charge of a buffer to a different cgroup. This is essential
for the cases where a central allocator process does allocations for
various subsystems, hands over the fd to the client who requested the
memory, and drops all references to the allocated memory.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 drivers/dma-buf/dma-buf.c  | 45 ++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h    |  1 +
 include/linux/memcontrol.h |  6 +++++
 3 files changed, 52 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index ac45dd101c4d..fd6c5002032b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -11,6 +11,7 @@
  * refining of this idea.
  */
 
+#include <linux/atomic.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/dma-buf.h>
@@ -1618,6 +1619,50 @@ void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF);
 
+/**
+ * dma_buf_transfer_charge - Change the cgroup to which the provided dma_buf is charged.
+ * @dmabuf:	[in]	buffer whose charge will be migrated to a different cgroup
+ * @target:	[in]	the task_struct of the destination process for the cgroup charge
+ *
+ * Only tasks that belong to the same cgroup the buffer is currently charged to
+ * may call this function, otherwise it will return -EPERM.
+ *
+ * Returns 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct task_struct *target)
+{
+	struct mem_cgroup *current_cg, *target_cg;
+	int ret = 0;
+
+	if (!IS_ENABLED(CONFIG_MEMCG))
+		return 0;
+
+	if (WARN_ON(!dmabuf) || WARN_ON(!target))
+		return -EINVAL;
+
+	current_cg = mem_cgroup_from_task(current);
+	target_cg = get_mem_cgroup_from_mm(target->mm);
+
+	if (current_cg == target_cg)
+		goto skip_transfer;
+
+	if (cmpxchg(&dmabuf->memcg, current_cg, target_cg) != current_cg) {
+		/* Only the current owner can transfer the charge */
+		ret = -EPERM;
+		goto skip_transfer;
+	}
+
+	mod_memcg_state(current_cg, MEMCG_DMABUF, -dmabuf->size);
+	mod_memcg_state(target_cg, MEMCG_DMABUF, dmabuf->size);
+
+	mem_cgroup_put(current_cg); /* unref from buffer - buffer keeps new ref to target_cg */
+	return 0;
+
+skip_transfer:
+	mem_cgroup_put(target_cg);
+	return ret;
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 1f0ffb8e4bf5..6aa128d76aa7 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -634,4 +634,5 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map);
 int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
+int dma_buf_transfer_charge(struct dma_buf *dmabuf, struct task_struct *target);
 #endif /* __DMA_BUF_H__ */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1c1da2da20a6..e5aec27044c7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1298,6 +1298,12 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css)
 	return NULL;
 }
 
+static inline
+struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
+{
+	return NULL;
+}
+
 static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 {
 }
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] binder: Add flags to relinquish ownership of fds
  2023-01-09 21:38 [PATCH 0/4] Track exported dma-buffers with memcg T.J. Mercier
  2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier
  2023-01-09 21:38 ` [PATCH 2/4] dmabuf: Add cgroup charge transfer function T.J. Mercier
@ 2023-01-09 21:38 ` T.J. Mercier
       [not found]   ` <20230110014720.281-1-hdanton@sina.com>
  2023-01-20 21:25   ` Carlos Llamas
  2023-01-10  0:18 ` [PATCH 0/4] Track exported dma-buffers with memcg Shakeel Butt
  3 siblings, 2 replies; 16+ messages in thread
From: T.J. Mercier @ 2023-01-09 21:38 UTC (permalink / raw)
  To: tjmercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan, Sumit Semwal, Christian König
  Cc: daniel.vetter, android-mm, jstultz, Hridya Valsaraju, cgroups,
	linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig

From: Hridya Valsaraju <hridya@google.com>

This patch introduces flags BINDER_FD_FLAG_XFER_CHARGE, and
BINDER_FD_FLAG_XFER_CHARGE that a process sending an individual fd or
fd array to another process over binder IPC can set to relinquish
ownership of the fd(s) being sent for memory accounting purposes. If the
flag is found to be set during the fd or fd array translation and the
fd is for a DMA-BUF, the buffer is uncharged from the sender's cgroup
and charged to the receiving process's cgroup instead.

It is up to the sending process to ensure that it closes the fds
regardless of whether the transfer failed or succeeded.

Most graphics shared memory allocations in Android are done by the
graphics allocator HAL process. On requests from clients, the HAL
process allocates memory and sends the fds to the clients over binder
IPC. The graphics allocator HAL will not retain any references to the
buffers. When the HAL sets *_FLAG_XFER_CHARGE for fd arrays holding
DMA-BUF fds, or individual fd objects, binder will transfer the charge
for the buffer from the allocator process cgroup to the client process
cgroup.

The pad [1] and pad_flags [2] fields of binder_fd_object and
binder_fda_array_object come from alignment with flat_binder_object and
have never been exposed for use from userspace. This new flags use
follows the pattern set by binder_buffer_object.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=feba3900cabb8e7c87368faa28e7a6936809ba22
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=5cdcf4c6a638591ec0e98c57404a19e7f9997567

Signed-off-by: Hridya Valsaraju <hridya@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  3 ++-
 drivers/android/binder.c                | 31 +++++++++++++++++++++----
 drivers/dma-buf/dma-buf.c               |  4 +---
 include/linux/dma-buf.h                 |  1 +
 include/uapi/linux/android/binder.h     | 23 ++++++++++++++----
 5 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 538ae22bc514..d225295932c0 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1457,7 +1457,8 @@ PAGE_SIZE multiple when read back.
 
 	  dmabuf (npn)
 		Amount of memory used for exported DMA buffers allocated by the cgroup.
-		Stays with the allocating cgroup regardless of how the buffer is shared.
+		Stays with the allocating cgroup regardless of how the buffer is shared
+		unless explicitly transferred.
 
 	  workingset_refault_anon
 		Number of refaults of previously evicted anonymous pages.
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 880224ec6abb..9830848c8d25 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -42,6 +42,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/dma-buf.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
@@ -2237,7 +2238,7 @@ static int binder_translate_handle(struct flat_binder_object *fp,
 	return ret;
 }
 
-static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
+static int binder_translate_fd(u32 fd, binder_size_t fd_offset, __u32 flags,
 			       struct binder_transaction *t,
 			       struct binder_thread *thread,
 			       struct binder_transaction *in_reply_to)
@@ -2275,6 +2276,26 @@ static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
 		goto err_security;
 	}
 
+	if (IS_ENABLED(CONFIG_MEMCG) && (flags & BINDER_FD_FLAG_XFER_CHARGE)) {
+		struct dma_buf *dmabuf;
+
+		if (unlikely(!is_dma_buf_file(file))) {
+			binder_user_error(
+				"%d:%d got transaction with XFER_CHARGE for non-dmabuf fd, %d\n",
+				proc->pid, thread->pid, fd);
+			ret = -EINVAL;
+			goto err_dmabuf;
+		}
+
+		dmabuf = file->private_data;
+		ret = dma_buf_transfer_charge(dmabuf, target_proc->tsk);
+		if (ret) {
+			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d\n",
+				proc->pid, thread->pid, target_proc->pid);
+			goto err_xfer;
+		}
+	}
+
 	/*
 	 * Add fixup record for this transaction. The allocation
 	 * of the fd in the target needs to be done from a
@@ -2294,6 +2315,8 @@ static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
 	return ret;
 
 err_alloc:
+err_xfer:
+err_dmabuf:
 err_security:
 	fput(file);
 err_fget:
@@ -2604,7 +2627,7 @@ static int binder_translate_fd_array(struct list_head *pf_head,
 
 		ret = copy_from_user(&fd, sender_ufda_base + sender_uoffset, sizeof(fd));
 		if (!ret)
-			ret = binder_translate_fd(fd, offset, t, thread,
+			ret = binder_translate_fd(fd, offset, fda->flags, t, thread,
 						  in_reply_to);
 		if (ret)
 			return ret > 0 ? -EINVAL : ret;
@@ -3383,8 +3406,8 @@ static void binder_transaction(struct binder_proc *proc,
 			struct binder_fd_object *fp = to_binder_fd_object(hdr);
 			binder_size_t fd_offset = object_offset +
 				(uintptr_t)&fp->fd - (uintptr_t)fp;
-			int ret = binder_translate_fd(fp->fd, fd_offset, t,
-						      thread, in_reply_to);
+			int ret = binder_translate_fd(fp->fd, fd_offset, fp->flags,
+						      t, thread, in_reply_to);
 
 			fp->pad_binder = 0;
 			if (ret < 0 ||
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index fd6c5002032b..a65b42433099 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,8 +34,6 @@
 
 #include "dma-buf-sysfs-stats.h"
 
-static inline int is_dma_buf_file(struct file *);
-
 struct dma_buf_list {
 	struct list_head head;
 	struct mutex lock;
@@ -527,7 +525,7 @@ static const struct file_operations dma_buf_fops = {
 /*
  * is_dma_buf_file - Check if struct file* is associated with dma_buf
  */
-static inline int is_dma_buf_file(struct file *file)
+int is_dma_buf_file(struct file *file)
 {
 	return file->f_op == &dma_buf_fops;
 }
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 6aa128d76aa7..092d572ce528 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -595,6 +595,7 @@ dma_buf_attachment_is_dynamic(struct dma_buf_attachment *attach)
 	return !!attach->importer_ops;
 }
 
+int is_dma_buf_file(struct file *file);
 struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
 					  struct device *dev);
 struct dma_buf_attachment *
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index e72e4de8f452..696c2bdb8a7e 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -91,14 +91,14 @@ struct flat_binder_object {
 /**
  * struct binder_fd_object - describes a filedescriptor to be fixed up.
  * @hdr:	common header structure
- * @pad_flags:	padding to remain compatible with old userspace code
+ * @flags:	One or more BINDER_FD_FLAG_* flags
  * @pad_binder:	padding to remain compatible with old userspace code
  * @fd:		file descriptor
  * @cookie:	opaque data, used by user-space
  */
 struct binder_fd_object {
 	struct binder_object_header	hdr;
-	__u32				pad_flags;
+	__u32				flags;
 	union {
 		binder_uintptr_t	pad_binder;
 		__u32			fd;
@@ -107,6 +107,17 @@ struct binder_fd_object {
 	binder_uintptr_t		cookie;
 };
 
+enum {
+	/**
+	 * @BINDER_FD_FLAG_XFER_CHARGE
+	 *
+	 * When set, the sender of a binder_fd_object wishes to relinquish ownership of the fd for
+	 * memory accounting purposes. If the fd is for a DMA-BUF, the buffer is uncharged from the
+	 * sender's cgroup and charged to the receiving process's cgroup instead.
+	 */
+	BINDER_FD_FLAG_XFER_CHARGE = 0x01,
+};
+
 /* struct binder_buffer_object - object describing a userspace buffer
  * @hdr:		common header structure
  * @flags:		one or more BINDER_BUFFER_* flags
@@ -141,7 +152,7 @@ enum {
 
 /* struct binder_fd_array_object - object describing an array of fds in a buffer
  * @hdr:		common header structure
- * @pad:		padding to ensure correct alignment
+ * @flags:		One or more BINDER_FDA_FLAG_* flags
  * @num_fds:		number of file descriptors in the buffer
  * @parent:		index in offset array to buffer holding the fd array
  * @parent_offset:	start offset of fd array in the buffer
@@ -162,12 +173,16 @@ enum {
  */
 struct binder_fd_array_object {
 	struct binder_object_header	hdr;
-	__u32				pad;
+	__u32				flags;
 	binder_size_t			num_fds;
 	binder_size_t			parent;
 	binder_size_t			parent_offset;
 };
 
+enum {
+	BINDER_FDA_FLAG_XFER_CHARGE = BINDER_FD_FLAG_XFER_CHARGE,
+};
+
 /*
  * On 64-bit platforms where user code may run in 32-bits the driver must
  * translate the buffer (and local binder) addresses appropriately.
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] Track exported dma-buffers with memcg
  2023-01-09 21:38 [PATCH 0/4] Track exported dma-buffers with memcg T.J. Mercier
                   ` (2 preceding siblings ...)
  2023-01-09 21:38 ` [PATCH 3/4] binder: Add flags to relinquish ownership of fds T.J. Mercier
@ 2023-01-10  0:18 ` Shakeel Butt
  2023-01-11 22:56   ` Daniel Vetter
  3 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2023-01-10  0:18 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Eric Paris, daniel.vetter, android-mm, jstultz, cgroups,
	linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig,
	linux-mm, linux-security-module, selinux

Hi T.J.,

On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier <tjmercier@google.com> wrote:
>
> Based on discussions at LPC, this series adds a memory.stat counter for
> exported dmabufs. This counter allows us to continue tracking
> system-wide total exported buffer sizes which there is no longer any
> way to get without DMABUF_SYSFS_STATS, and adds a new capability to
> track per-cgroup exported buffer sizes. The total (root counter) is
> helpful for accounting in-kernel dmabuf use (by comparing with the sum
> of child nodes or with the sum of sizes of mapped buffers or FD
> references in procfs) in addition to helping identify driver memory
> leaks when in-kernel use continually increases over time. With
> per-application cgroups, the per-cgroup counter allows us to quickly
> see how much dma-buf memory an application has caused to be allocated.
> This avoids the need to read through all of procfs which can be a
> lengthy process, and causes the charge to "stick" to the allocating
> process/cgroup as long as the buffer is alive, regardless of how the
> buffer is shared (unless the charge is transferred).
>
> The first patch adds the counter to memcg. The next two patches allow
> the charge for a buffer to be transferred across cgroups which is
> necessary because of the way most dmabufs are allocated from a central
> process on Android. The fourth patch adds a SELinux hook to binder in
> order to control who is allowed to transfer buffer charges.
>
> [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/
>

I am a bit confused by the term "charge" used in this patch series.
From the patches, it seems like only a memcg stat is added and nothing
is charged to the memcg.

This leads me to the question: Why add this stat in memcg if the
underlying memory is not charged to the memcg and if we don't really
want to limit the usage?

I see two ways forward:

1. Instead of memcg, use bpf-rstat [1] infra to implement the
per-cgroup stat for dmabuf. (You may need an additional hook for the
stat transfer).

2. Charge the actual memory to the memcg. Since the size of dmabuf is
immutable across its lifetime, you will not need to do accounting at
page level and instead use something similar to the network memory
accounting interface/mechanism (or even more simple). However you
would need to handle the reclaim, OOM and charge context and failure
cases. However if you are not looking to limit the usage of dmabuf
then this option is an overkill.

Please let me know if I misunderstood something.

[1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@google.com/

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] memcg: Track exported dma-buffers
  2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier
@ 2023-01-10  8:58   ` Michal Hocko
  2023-01-10 19:08     ` T.J. Mercier
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2023-01-10  8:58 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Sumit Semwal, Christian König, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, daniel.vetter, android-mm, jstultz,
	cgroups, linux-doc, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, linux-mm

On Mon 09-01-23 21:38:04, T.J. Mercier wrote:
> When a buffer is exported to userspace, use memcg to attribute the
> buffer to the allocating cgroup until all buffer references are
> released.
> 
> Unlike the dmabuf sysfs stats implementation, this memcg accounting
> avoids contention over the kernfs_rwsem incurred when creating or
> removing nodes.

I am not familiar with dmabuf infrastructure so please bear with me.
AFAIU this patch adds a dmabuf specific counter to find out the amount
of dmabuf memory used. But I do not see any actual charging implemented
for that memory.

I have looked at two random users of dma_buf_export cma_heap_allocate
and it allocates pages to back the dmabuf (AFAIU) by cma_alloc
which doesn't account to memcg, system_heap_allocate uses
alloc_largest_available which relies on order_flags which doesn't seem
to ever use __GFP_ACCOUNT.

This would mean that the counter doesn't represent any actual memory
reflected in the overall memory consumption of a memcg. I believe this
is rather unexpected and confusing behavior. While some counters
overlap and their sum would exceed the charged memory we do not have any
that doesn't correspond to any memory (at least not for non-root memcgs).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] memcg: Track exported dma-buffers
  2023-01-10  8:58   ` Michal Hocko
@ 2023-01-10 19:08     ` T.J. Mercier
  0 siblings, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2023-01-10 19:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Sumit Semwal, Christian König, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, daniel.vetter, android-mm, jstultz,
	cgroups, linux-doc, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, linux-mm

On Tue, Jan 10, 2023 at 12:58 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 09-01-23 21:38:04, T.J. Mercier wrote:
> > When a buffer is exported to userspace, use memcg to attribute the
> > buffer to the allocating cgroup until all buffer references are
> > released.
> >
> > Unlike the dmabuf sysfs stats implementation, this memcg accounting
> > avoids contention over the kernfs_rwsem incurred when creating or
> > removing nodes.
>
> I am not familiar with dmabuf infrastructure so please bear with me.
> AFAIU this patch adds a dmabuf specific counter to find out the amount
> of dmabuf memory used. But I do not see any actual charging implemented
> for that memory.
>
> I have looked at two random users of dma_buf_export cma_heap_allocate
> and it allocates pages to back the dmabuf (AFAIU) by cma_alloc
> which doesn't account to memcg, system_heap_allocate uses
> alloc_largest_available which relies on order_flags which doesn't seem
> to ever use __GFP_ACCOUNT.
>
> This would mean that the counter doesn't represent any actual memory
> reflected in the overall memory consumption of a memcg. I believe this
> is rather unexpected and confusing behavior. While some counters
> overlap and their sum would exceed the charged memory we do not have any
> that doesn't correspond to any memory (at least not for non-root memcgs).
>
> --
> Michal Hocko
> SUSE Labs

Thank you, that behavior is not intentional. I'm not looking at the
overall memcg charge yet otherwise I would have noticed this. I think
I understand what's needed for the charging part, but Shakeel
mentioned some additional work for "reclaim, OOM and charge context
and failure cases" on the cover letter which I need to look into.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] binder: Add flags to relinquish ownership of fds
       [not found]   ` <20230110014720.281-1-hdanton@sina.com>
@ 2023-01-10 21:20     ` T.J. Mercier
  0 siblings, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2023-01-10 21:20 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Greg Kroah-Hartman, linux-kernel, dri-devel, daniel.vetter,
	Tejun Heo, linux-media

On Mon, Jan 9, 2023 at 6:07 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On 9 Jan 2023 21:38:06 +0000 T.J. Mercier <tjmercier@google.com>
> >
> > @@ -2275,6 +2276,26 @@ static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
> >               goto err_security;
> >       }
> >
> > +     if (IS_ENABLED(CONFIG_MEMCG) && (flags & BINDER_FD_FLAG_XFER_CHARGE)) {
> > +             struct dma_buf *dmabuf;
> > +
> > +             if (unlikely(!is_dma_buf_file(file))) {
> > +                     binder_user_error(
> > +                             "%d:%d got transaction with XFER_CHARGE for non-dmabuf fd, %d\n",
> > +                             proc->pid, thread->pid, fd);
> > +                     ret = -EINVAL;
> > +                     goto err_dmabuf;
> > +             }
>
> It barely makes sense to expose is_dma_buf_file() only for this.
> > +
> > +             dmabuf = file->private_data;
> > +             ret = dma_buf_transfer_charge(dmabuf, target_proc->tsk);
> > +             if (ret) {
> > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d\n",
> > +                             proc->pid, thread->pid, target_proc->pid);
> > +                     goto err_xfer;
> > +             }
> > +     }
> > +
>
> This whole hunk should go to dma-buf instead by adding change to
> dma_buf_transfer_charge() for instance.

Fair enough, will change this for v2. I think we'd still want to
distinguish between the two failure modes for logging purposes, so
I'll use the return value of dma_buf_transfer_charge to do that.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] Track exported dma-buffers with memcg
  2023-01-10  0:18 ` [PATCH 0/4] Track exported dma-buffers with memcg Shakeel Butt
@ 2023-01-11 22:56   ` Daniel Vetter
  2023-01-12  0:49     ` T.J. Mercier
  2023-01-12  7:56     ` Shakeel Butt
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Vetter @ 2023-01-11 22:56 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: T.J. Mercier, Tejun Heo, Zefan Li, Johannes Weiner,
	Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg,
	Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner,
	Carlos Llamas, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Eric Paris, daniel.vetter, android-mm, jstultz,
	cgroups, linux-doc, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, linux-mm, linux-security-module, selinux

On Mon, Jan 09, 2023 at 04:18:12PM -0800, Shakeel Butt wrote:
> Hi T.J.,
> 
> On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier <tjmercier@google.com> wrote:
> >
> > Based on discussions at LPC, this series adds a memory.stat counter for
> > exported dmabufs. This counter allows us to continue tracking
> > system-wide total exported buffer sizes which there is no longer any
> > way to get without DMABUF_SYSFS_STATS, and adds a new capability to
> > track per-cgroup exported buffer sizes. The total (root counter) is
> > helpful for accounting in-kernel dmabuf use (by comparing with the sum
> > of child nodes or with the sum of sizes of mapped buffers or FD
> > references in procfs) in addition to helping identify driver memory
> > leaks when in-kernel use continually increases over time. With
> > per-application cgroups, the per-cgroup counter allows us to quickly
> > see how much dma-buf memory an application has caused to be allocated.
> > This avoids the need to read through all of procfs which can be a
> > lengthy process, and causes the charge to "stick" to the allocating
> > process/cgroup as long as the buffer is alive, regardless of how the
> > buffer is shared (unless the charge is transferred).
> >
> > The first patch adds the counter to memcg. The next two patches allow
> > the charge for a buffer to be transferred across cgroups which is
> > necessary because of the way most dmabufs are allocated from a central
> > process on Android. The fourth patch adds a SELinux hook to binder in
> > order to control who is allowed to transfer buffer charges.
> >
> > [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/
> >
> 
> I am a bit confused by the term "charge" used in this patch series.
> From the patches, it seems like only a memcg stat is added and nothing
> is charged to the memcg.
> 
> This leads me to the question: Why add this stat in memcg if the
> underlying memory is not charged to the memcg and if we don't really
> want to limit the usage?
> 
> I see two ways forward:
> 
> 1. Instead of memcg, use bpf-rstat [1] infra to implement the
> per-cgroup stat for dmabuf. (You may need an additional hook for the
> stat transfer).
> 
> 2. Charge the actual memory to the memcg. Since the size of dmabuf is
> immutable across its lifetime, you will not need to do accounting at
> page level and instead use something similar to the network memory
> accounting interface/mechanism (or even more simple). However you
> would need to handle the reclaim, OOM and charge context and failure
> cases. However if you are not looking to limit the usage of dmabuf
> then this option is an overkill.

I think eventually, at least for other "account gpu stuff in cgroups" use
case we do want to actually charge the memory.

The problem is a bit that with gpu allocations reclaim is essentially "we
pass the error to userspace and they get to sort the mess out". There are
some exceptions (some gpu drivers to have shrinkers) would we need to make
sure these shrinkers are tied into the cgroup stuff before we could enable
charging for them?

Also note that at least from the gpu driver side this is all a huge
endeavour, so if we can split up the steps as much as possible (and get
something interim useable that doesn't break stuff ofc), that is
practically need to make headway here. TJ has been trying out various
approaches for quite some time now already :-/
-Daniel

> Please let me know if I misunderstood something.
> 
> [1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@google.com/
> 
> thanks,
> Shakeel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] Track exported dma-buffers with memcg
  2023-01-11 22:56   ` Daniel Vetter
@ 2023-01-12  0:49     ` T.J. Mercier
  2023-01-12  8:13       ` Shakeel Butt
  2023-01-12  7:56     ` Shakeel Butt
  1 sibling, 1 reply; 16+ messages in thread
From: T.J. Mercier @ 2023-01-12  0:49 UTC (permalink / raw)
  To: Shakeel Butt, T.J. Mercier, Tejun Heo, Zefan Li, Johannes Weiner,
	Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg,
	Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner,
	Carlos Llamas, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Eric Paris, android-mm, jstultz, cgroups,
	linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig,
	linux-mm, linux-security-module, selinux
  Cc: daniel.vetter

On Wed, Jan 11, 2023 at 2:56 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Jan 09, 2023 at 04:18:12PM -0800, Shakeel Butt wrote:
> > Hi T.J.,
> >
> > On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier <tjmercier@google.com> wrote:
> > >
> > > Based on discussions at LPC, this series adds a memory.stat counter for
> > > exported dmabufs. This counter allows us to continue tracking
> > > system-wide total exported buffer sizes which there is no longer any
> > > way to get without DMABUF_SYSFS_STATS, and adds a new capability to
> > > track per-cgroup exported buffer sizes. The total (root counter) is
> > > helpful for accounting in-kernel dmabuf use (by comparing with the sum
> > > of child nodes or with the sum of sizes of mapped buffers or FD
> > > references in procfs) in addition to helping identify driver memory
> > > leaks when in-kernel use continually increases over time. With
> > > per-application cgroups, the per-cgroup counter allows us to quickly
> > > see how much dma-buf memory an application has caused to be allocated.
> > > This avoids the need to read through all of procfs which can be a
> > > lengthy process, and causes the charge to "stick" to the allocating
> > > process/cgroup as long as the buffer is alive, regardless of how the
> > > buffer is shared (unless the charge is transferred).
> > >
> > > The first patch adds the counter to memcg. The next two patches allow
> > > the charge for a buffer to be transferred across cgroups which is
> > > necessary because of the way most dmabufs are allocated from a central
> > > process on Android. The fourth patch adds a SELinux hook to binder in
> > > order to control who is allowed to transfer buffer charges.
> > >
> > > [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/
> > >
> >
> > I am a bit confused by the term "charge" used in this patch series.
> > From the patches, it seems like only a memcg stat is added and nothing
> > is charged to the memcg.
> >
> > This leads me to the question: Why add this stat in memcg if the
> > underlying memory is not charged to the memcg and if we don't really
> > want to limit the usage?
> >
> > I see two ways forward:
> >
> > 1. Instead of memcg, use bpf-rstat [1] infra to implement the
> > per-cgroup stat for dmabuf. (You may need an additional hook for the
> > stat transfer).
> >
> > 2. Charge the actual memory to the memcg. Since the size of dmabuf is
> > immutable across its lifetime, you will not need to do accounting at
> > page level and instead use something similar to the network memory
> > accounting interface/mechanism (or even more simple). However you
> > would need to handle the reclaim, OOM and charge context and failure
> > cases. However if you are not looking to limit the usage of dmabuf
> > then this option is an overkill.
>
> I think eventually, at least for other "account gpu stuff in cgroups" use
> case we do want to actually charge the memory.
>
Yes, I've been looking at this today.

> The problem is a bit that with gpu allocations reclaim is essentially "we
> pass the error to userspace and they get to sort the mess out". There are
> some exceptions (some gpu drivers to have shrinkers) would we need to make
> sure these shrinkers are tied into the cgroup stuff before we could enable
> charging for them?
>
I'm also not sure that we can depend on the dmabuf being backed at
export time 100% of the time? (They are for dmabuf heaps.) If not,
that'd make calling the existing memcg folio based functions a bit
difficult.

> Also note that at least from the gpu driver side this is all a huge
> endeavour, so if we can split up the steps as much as possible (and get
> something interim useable that doesn't break stuff ofc), that is
> practically need to make headway here. TJ has been trying out various
> approaches for quite some time now already :-/
> -Daniel
>
> > Please let me know if I misunderstood something.
> >
> > [1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@google.com/
> >
> > thanks,
> > Shakeel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] Track exported dma-buffers with memcg
  2023-01-11 22:56   ` Daniel Vetter
  2023-01-12  0:49     ` T.J. Mercier
@ 2023-01-12  7:56     ` Shakeel Butt
  2023-01-12 10:25       ` Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2023-01-12  7:56 UTC (permalink / raw)
  To: T.J. Mercier, Tejun Heo, Zefan Li, Johannes Weiner,
	Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg,
	Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner,
	Carlos Llamas, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Eric Paris, android-mm, jstultz, cgroups,
	linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig,
	linux-mm, linux-security-module, selinux

On Wed, Jan 11, 2023 at 11:56:45PM +0100, Daniel Vetter wrote:
> 
[...]
> I think eventually, at least for other "account gpu stuff in cgroups" use
> case we do want to actually charge the memory.
> 
> The problem is a bit that with gpu allocations reclaim is essentially "we
> pass the error to userspace and they get to sort the mess out". There are
> some exceptions (some gpu drivers to have shrinkers) would we need to make
> sure these shrinkers are tied into the cgroup stuff before we could enable
> charging for them?
> 

No, there is no requirement to have shrinkers or making such memory
reclaimable before charging it. Though existing shrinkers and the
possible future shrinkers would need to be converted into memcg aware
shrinkers.

Though there will be a need to update user expectations that if they 
use memcgs with hard limits, they may start seeing memcg OOMs after the
charging of dmabuf.

> Also note that at least from the gpu driver side this is all a huge
> endeavour, so if we can split up the steps as much as possible (and get
> something interim useable that doesn't break stuff ofc), that is
> practically need to make headway here. 

This sounds reasonable to me.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] Track exported dma-buffers with memcg
  2023-01-12  0:49     ` T.J. Mercier
@ 2023-01-12  8:13       ` Shakeel Butt
  2023-01-12  8:17         ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Shakeel Butt @ 2023-01-12  8:13 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Eric Paris, android-mm, jstultz, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, daniel.vetter

On Wed, Jan 11, 2023 at 04:49:36PM -0800, T.J. Mercier wrote:
> 
[...]
> > The problem is a bit that with gpu allocations reclaim is essentially "we
> > pass the error to userspace and they get to sort the mess out". There are
> > some exceptions (some gpu drivers to have shrinkers) would we need to make
> > sure these shrinkers are tied into the cgroup stuff before we could enable
> > charging for them?
> >
> I'm also not sure that we can depend on the dmabuf being backed at
> export time 100% of the time? (They are for dmabuf heaps.) If not,
> that'd make calling the existing memcg folio based functions a bit
> difficult.
> 

Where does the actual memory get allocated? I see the first patch is
updating the stat in dma_buf_export() and dma_buf_release(). Does the
memory get allocated and freed in those code paths?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] Track exported dma-buffers with memcg
  2023-01-12  8:13       ` Shakeel Butt
@ 2023-01-12  8:17         ` Christian König
  0 siblings, 0 replies; 16+ messages in thread
From: Christian König @ 2023-01-12  8:17 UTC (permalink / raw)
  To: Shakeel Butt, T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas,
	Suren Baghdasaryan, Sumit Semwal, Michal Hocko, Roman Gushchin,
	Muchun Song, Andrew Morton, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, android-mm,
	jstultz, cgroups, linux-doc, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig, linux-mm, linux-security-module,
	selinux, daniel.vetter

Am 12.01.23 um 09:13 schrieb Shakeel Butt:
> On Wed, Jan 11, 2023 at 04:49:36PM -0800, T.J. Mercier wrote:
> [...]
>>> The problem is a bit that with gpu allocations reclaim is essentially "we
>>> pass the error to userspace and they get to sort the mess out". There are
>>> some exceptions (some gpu drivers to have shrinkers) would we need to make
>>> sure these shrinkers are tied into the cgroup stuff before we could enable
>>> charging for them?
>>>
>> I'm also not sure that we can depend on the dmabuf being backed at
>> export time 100% of the time? (They are for dmabuf heaps.) If not,
>> that'd make calling the existing memcg folio based functions a bit
>> difficult.
>>
> Where does the actual memory get allocated? I see the first patch is
> updating the stat in dma_buf_export() and dma_buf_release(). Does the
> memory get allocated and freed in those code paths?

Nope, dma_buf_export() just makes the memory available to others.

The driver which calls dma_buf_export() is the one allocating the memory.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] Track exported dma-buffers with memcg
  2023-01-12  7:56     ` Shakeel Butt
@ 2023-01-12 10:25       ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2023-01-12 10:25 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: T.J. Mercier, Tejun Heo, Zefan Li, Johannes Weiner,
	Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg,
	Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner,
	Carlos Llamas, Suren Baghdasaryan, Sumit Semwal,
	Christian König, Roman Gushchin, Muchun Song, Andrew Morton,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Eric Paris, android-mm, jstultz, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux

On Thu 12-01-23 07:56:31, Shakeel Butt wrote:
> On Wed, Jan 11, 2023 at 11:56:45PM +0100, Daniel Vetter wrote:
> > 
> [...]
> > I think eventually, at least for other "account gpu stuff in cgroups" use
> > case we do want to actually charge the memory.
> > 
> > The problem is a bit that with gpu allocations reclaim is essentially "we
> > pass the error to userspace and they get to sort the mess out". There are
> > some exceptions (some gpu drivers to have shrinkers) would we need to make
> > sure these shrinkers are tied into the cgroup stuff before we could enable
> > charging for them?
> > 
> 
> No, there is no requirement to have shrinkers or making such memory
> reclaimable before charging it. Though existing shrinkers and the
> possible future shrinkers would need to be converted into memcg aware
> shrinkers.
> 
> Though there will be a need to update user expectations that if they 
> use memcgs with hard limits, they may start seeing memcg OOMs after the
> charging of dmabuf.

Agreed. This wouldn't be the first in kernel memory charged memory that
is not directly reclaimable. With a dedicated counter an excessive
dmabuf usage would be visible in the oom report because we do print
memcg stats.

It is definitely preferable to have a shrinker mechanism but if that is
to be done in a follow up step then this is acceptable. But leaving out
charging from early on sounds like a bad choice to me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] binder: Add flags to relinquish ownership of fds
  2023-01-09 21:38 ` [PATCH 3/4] binder: Add flags to relinquish ownership of fds T.J. Mercier
       [not found]   ` <20230110014720.281-1-hdanton@sina.com>
@ 2023-01-20 21:25   ` Carlos Llamas
  2023-01-20 21:52     ` T.J. Mercier
  1 sibling, 1 reply; 16+ messages in thread
From: Carlos Llamas @ 2023-01-20 21:25 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	daniel.vetter, android-mm, jstultz, Hridya Valsaraju, cgroups,
	linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig

On Mon, Jan 09, 2023 at 09:38:06PM +0000, T.J. Mercier wrote:
> From: Hridya Valsaraju <hridya@google.com>
> 
> This patch introduces flags BINDER_FD_FLAG_XFER_CHARGE, and
> BINDER_FD_FLAG_XFER_CHARGE that a process sending an individual fd or

I believe the second one was meant to be BINDER_FDA_FLAG_XFER_CHARGE.
However, I don't think a separation of flags is needed. We process each
fd in the array individually anyway. So, it's OK to reuse the FD flags
for FDAs too.

> fd array to another process over binder IPC can set to relinquish
> ownership of the fd(s) being sent for memory accounting purposes. If the
> flag is found to be set during the fd or fd array translation and the
> fd is for a DMA-BUF, the buffer is uncharged from the sender's cgroup
> and charged to the receiving process's cgroup instead.
> 
> It is up to the sending process to ensure that it closes the fds
> regardless of whether the transfer failed or succeeded.
> 
> Most graphics shared memory allocations in Android are done by the
> graphics allocator HAL process. On requests from clients, the HAL
> process allocates memory and sends the fds to the clients over binder
> IPC. The graphics allocator HAL will not retain any references to the
> buffers. When the HAL sets *_FLAG_XFER_CHARGE for fd arrays holding
> DMA-BUF fds, or individual fd objects, binder will transfer the charge
> for the buffer from the allocator process cgroup to the client process
> cgroup.
> 
> The pad [1] and pad_flags [2] fields of binder_fd_object and
> binder_fda_array_object come from alignment with flat_binder_object and
> have never been exposed for use from userspace. This new flags use
> follows the pattern set by binder_buffer_object.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=feba3900cabb8e7c87368faa28e7a6936809ba22
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=5cdcf4c6a638591ec0e98c57404a19e7f9997567
> 
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  3 ++-
>  drivers/android/binder.c                | 31 +++++++++++++++++++++----
>  drivers/dma-buf/dma-buf.c               |  4 +---
>  include/linux/dma-buf.h                 |  1 +
>  include/uapi/linux/android/binder.h     | 23 ++++++++++++++----
>  5 files changed, 50 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 538ae22bc514..d225295932c0 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1457,7 +1457,8 @@ PAGE_SIZE multiple when read back.
>  
>  	  dmabuf (npn)
>  		Amount of memory used for exported DMA buffers allocated by the cgroup.
> -		Stays with the allocating cgroup regardless of how the buffer is shared.
> +		Stays with the allocating cgroup regardless of how the buffer is shared
> +		unless explicitly transferred.
>  
>  	  workingset_refault_anon
>  		Number of refaults of previously evicted anonymous pages.
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 880224ec6abb..9830848c8d25 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -42,6 +42,7 @@
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include <linux/dma-buf.h>
>  #include <linux/fdtable.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
> @@ -2237,7 +2238,7 @@ static int binder_translate_handle(struct flat_binder_object *fp,
>  	return ret;
>  }
>  
> -static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
> +static int binder_translate_fd(u32 fd, binder_size_t fd_offset, __u32 flags,
>  			       struct binder_transaction *t,
>  			       struct binder_thread *thread,
>  			       struct binder_transaction *in_reply_to)
> @@ -2275,6 +2276,26 @@ static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
>  		goto err_security;
>  	}
>  
> +	if (IS_ENABLED(CONFIG_MEMCG) && (flags & BINDER_FD_FLAG_XFER_CHARGE)) {
> +		struct dma_buf *dmabuf;
> +
> +		if (unlikely(!is_dma_buf_file(file))) {
> +			binder_user_error(
> +				"%d:%d got transaction with XFER_CHARGE for non-dmabuf fd, %d\n",
> +				proc->pid, thread->pid, fd);
> +			ret = -EINVAL;
> +			goto err_dmabuf;
> +		}
> +
> +		dmabuf = file->private_data;
> +		ret = dma_buf_transfer_charge(dmabuf, target_proc->tsk);
> +		if (ret) {
> +			pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d\n",
> +				proc->pid, thread->pid, target_proc->pid);
> +			goto err_xfer;
> +		}
> +	}
> +
>  	/*
>  	 * Add fixup record for this transaction. The allocation
>  	 * of the fd in the target needs to be done from a
> @@ -2294,6 +2315,8 @@ static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
>  	return ret;
>  
>  err_alloc:
> +err_xfer:
> +err_dmabuf:
>  err_security:
>  	fput(file);
>  err_fget:
> @@ -2604,7 +2627,7 @@ static int binder_translate_fd_array(struct list_head *pf_head,
>  
>  		ret = copy_from_user(&fd, sender_ufda_base + sender_uoffset, sizeof(fd));
>  		if (!ret)
> -			ret = binder_translate_fd(fd, offset, t, thread,
> +			ret = binder_translate_fd(fd, offset, fda->flags, t, thread,
>  						  in_reply_to);
>  		if (ret)
>  			return ret > 0 ? -EINVAL : ret;
> @@ -3383,8 +3406,8 @@ static void binder_transaction(struct binder_proc *proc,
>  			struct binder_fd_object *fp = to_binder_fd_object(hdr);
>  			binder_size_t fd_offset = object_offset +
>  				(uintptr_t)&fp->fd - (uintptr_t)fp;
> -			int ret = binder_translate_fd(fp->fd, fd_offset, t,
> -						      thread, in_reply_to);
> +			int ret = binder_translate_fd(fp->fd, fd_offset, fp->flags,
> +						      t, thread, in_reply_to);
>  
>  			fp->pad_binder = 0;
>  			if (ret < 0 ||

IMO the changes to the dma-buf api should some in a separate patch. So
those can be approved and managed separately.

> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index fd6c5002032b..a65b42433099 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -34,8 +34,6 @@
>  
>  #include "dma-buf-sysfs-stats.h"
>  
> -static inline int is_dma_buf_file(struct file *);
> -
>  struct dma_buf_list {
>  	struct list_head head;
>  	struct mutex lock;
> @@ -527,7 +525,7 @@ static const struct file_operations dma_buf_fops = {
>  /*
>   * is_dma_buf_file - Check if struct file* is associated with dma_buf
>   */
> -static inline int is_dma_buf_file(struct file *file)
> +int is_dma_buf_file(struct file *file)
>  {
>  	return file->f_op == &dma_buf_fops;
>  }
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 6aa128d76aa7..092d572ce528 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -595,6 +595,7 @@ dma_buf_attachment_is_dynamic(struct dma_buf_attachment *attach)
>  	return !!attach->importer_ops;
>  }
>  
> +int is_dma_buf_file(struct file *file);
>  struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>  					  struct device *dev);
>  struct dma_buf_attachment *
> diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> index e72e4de8f452..696c2bdb8a7e 100644
> --- a/include/uapi/linux/android/binder.h
> +++ b/include/uapi/linux/android/binder.h
> @@ -91,14 +91,14 @@ struct flat_binder_object {
>  /**
>   * struct binder_fd_object - describes a filedescriptor to be fixed up.
>   * @hdr:	common header structure
> - * @pad_flags:	padding to remain compatible with old userspace code
> + * @flags:	One or more BINDER_FD_FLAG_* flags
>   * @pad_binder:	padding to remain compatible with old userspace code
>   * @fd:		file descriptor
>   * @cookie:	opaque data, used by user-space
>   */
>  struct binder_fd_object {
>  	struct binder_object_header	hdr;
> -	__u32				pad_flags;
> +	__u32				flags;
>  	union {
>  		binder_uintptr_t	pad_binder;
>  		__u32			fd;
> @@ -107,6 +107,17 @@ struct binder_fd_object {
>  	binder_uintptr_t		cookie;
>  };
>  
> +enum {
> +	/**
> +	 * @BINDER_FD_FLAG_XFER_CHARGE
> +	 *
> +	 * When set, the sender of a binder_fd_object wishes to relinquish ownership of the fd for
> +	 * memory accounting purposes. If the fd is for a DMA-BUF, the buffer is uncharged from the
> +	 * sender's cgroup and charged to the receiving process's cgroup instead.
> +	 */
> +	BINDER_FD_FLAG_XFER_CHARGE = 0x01,
> +};
> +
>  /* struct binder_buffer_object - object describing a userspace buffer
>   * @hdr:		common header structure
>   * @flags:		one or more BINDER_BUFFER_* flags
> @@ -141,7 +152,7 @@ enum {
>  
>  /* struct binder_fd_array_object - object describing an array of fds in a buffer
>   * @hdr:		common header structure
> - * @pad:		padding to ensure correct alignment
> + * @flags:		One or more BINDER_FDA_FLAG_* flags
>   * @num_fds:		number of file descriptors in the buffer
>   * @parent:		index in offset array to buffer holding the fd array
>   * @parent_offset:	start offset of fd array in the buffer
> @@ -162,12 +173,16 @@ enum {
>   */
>  struct binder_fd_array_object {
>  	struct binder_object_header	hdr;
> -	__u32				pad;
> +	__u32				flags;
>  	binder_size_t			num_fds;
>  	binder_size_t			parent;
>  	binder_size_t			parent_offset;
>  };
>  
> +enum {
> +	BINDER_FDA_FLAG_XFER_CHARGE = BINDER_FD_FLAG_XFER_CHARGE,
> +};
> +

I would prefer to drop this. It should avoid silly mistakes in
userspace similar to the typo in the commit message above.

>  /*
>   * On 64-bit platforms where user code may run in 32-bits the driver must
>   * translate the buffer (and local binder) addresses appropriately.
> -- 
> 2.39.0.314.g84b9a713c41-goog
> 

Thanks,
--
Carlos Llamas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] binder: Add flags to relinquish ownership of fds
  2023-01-20 21:25   ` Carlos Llamas
@ 2023-01-20 21:52     ` T.J. Mercier
  0 siblings, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2023-01-20 21:52 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos,
	Martijn Coenen, Joel Fernandes, Christian Brauner,
	Suren Baghdasaryan, Sumit Semwal, Christian König,
	daniel.vetter, android-mm, jstultz, Hridya Valsaraju, cgroups,
	linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig

On Fri, Jan 20, 2023 at 1:25 PM Carlos Llamas <cmllamas@google.com> wrote:
>
> On Mon, Jan 09, 2023 at 09:38:06PM +0000, T.J. Mercier wrote:
> > From: Hridya Valsaraju <hridya@google.com>
> >
> > This patch introduces flags BINDER_FD_FLAG_XFER_CHARGE, and
> > BINDER_FD_FLAG_XFER_CHARGE that a process sending an individual fd or
>
> I believe the second one was meant to be BINDER_FDA_FLAG_XFER_CHARGE.
> However, I don't think a separation of flags is needed. We process each
> fd in the array individually anyway. So, it's OK to reuse the FD flags
> for FDAs too.

Yes, thanks.
>
>
> > fd array to another process over binder IPC can set to relinquish
> > ownership of the fd(s) being sent for memory accounting purposes. If the
> > flag is found to be set during the fd or fd array translation and the
> > fd is for a DMA-BUF, the buffer is uncharged from the sender's cgroup
> > and charged to the receiving process's cgroup instead.
> >
> > It is up to the sending process to ensure that it closes the fds
> > regardless of whether the transfer failed or succeeded.
> >
> > Most graphics shared memory allocations in Android are done by the
> > graphics allocator HAL process. On requests from clients, the HAL
> > process allocates memory and sends the fds to the clients over binder
> > IPC. The graphics allocator HAL will not retain any references to the
> > buffers. When the HAL sets *_FLAG_XFER_CHARGE for fd arrays holding
> > DMA-BUF fds, or individual fd objects, binder will transfer the charge
> > for the buffer from the allocator process cgroup to the client process
> > cgroup.
> >
> > The pad [1] and pad_flags [2] fields of binder_fd_object and
> > binder_fda_array_object come from alignment with flat_binder_object and
> > have never been exposed for use from userspace. This new flags use
> > follows the pattern set by binder_buffer_object.
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=feba3900cabb8e7c87368faa28e7a6936809ba22
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux/android/binder.h?id=5cdcf4c6a638591ec0e98c57404a19e7f9997567
> >
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> >  Documentation/admin-guide/cgroup-v2.rst |  3 ++-
> >  drivers/android/binder.c                | 31 +++++++++++++++++++++----
> >  drivers/dma-buf/dma-buf.c               |  4 +---
> >  include/linux/dma-buf.h                 |  1 +
> >  include/uapi/linux/android/binder.h     | 23 ++++++++++++++----
> >  5 files changed, 50 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 538ae22bc514..d225295932c0 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1457,7 +1457,8 @@ PAGE_SIZE multiple when read back.
> >
> >         dmabuf (npn)
> >               Amount of memory used for exported DMA buffers allocated by the cgroup.
> > -             Stays with the allocating cgroup regardless of how the buffer is shared.
> > +             Stays with the allocating cgroup regardless of how the buffer is shared
> > +             unless explicitly transferred.
> >
> >         workingset_refault_anon
> >               Number of refaults of previously evicted anonymous pages.
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index 880224ec6abb..9830848c8d25 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -42,6 +42,7 @@
> >
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> > +#include <linux/dma-buf.h>
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/freezer.h>
> > @@ -2237,7 +2238,7 @@ static int binder_translate_handle(struct flat_binder_object *fp,
> >       return ret;
> >  }
> >
> > -static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
> > +static int binder_translate_fd(u32 fd, binder_size_t fd_offset, __u32 flags,
> >                              struct binder_transaction *t,
> >                              struct binder_thread *thread,
> >                              struct binder_transaction *in_reply_to)
> > @@ -2275,6 +2276,26 @@ static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
> >               goto err_security;
> >       }
> >
> > +     if (IS_ENABLED(CONFIG_MEMCG) && (flags & BINDER_FD_FLAG_XFER_CHARGE)) {
> > +             struct dma_buf *dmabuf;
> > +
> > +             if (unlikely(!is_dma_buf_file(file))) {
> > +                     binder_user_error(
> > +                             "%d:%d got transaction with XFER_CHARGE for non-dmabuf fd, %d\n",
> > +                             proc->pid, thread->pid, fd);
> > +                     ret = -EINVAL;
> > +                     goto err_dmabuf;
> > +             }
> > +
> > +             dmabuf = file->private_data;
> > +             ret = dma_buf_transfer_charge(dmabuf, target_proc->tsk);
> > +             if (ret) {
> > +                     pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d\n",
> > +                             proc->pid, thread->pid, target_proc->pid);
> > +                     goto err_xfer;
> > +             }
> > +     }
> > +
> >       /*
> >        * Add fixup record for this transaction. The allocation
> >        * of the fd in the target needs to be done from a
> > @@ -2294,6 +2315,8 @@ static int binder_translate_fd(u32 fd, binder_size_t fd_offset,
> >       return ret;
> >
> >  err_alloc:
> > +err_xfer:
> > +err_dmabuf:
> >  err_security:
> >       fput(file);
> >  err_fget:
> > @@ -2604,7 +2627,7 @@ static int binder_translate_fd_array(struct list_head *pf_head,
> >
> >               ret = copy_from_user(&fd, sender_ufda_base + sender_uoffset, sizeof(fd));
> >               if (!ret)
> > -                     ret = binder_translate_fd(fd, offset, t, thread,
> > +                     ret = binder_translate_fd(fd, offset, fda->flags, t, thread,
> >                                                 in_reply_to);
> >               if (ret)
> >                       return ret > 0 ? -EINVAL : ret;
> > @@ -3383,8 +3406,8 @@ static void binder_transaction(struct binder_proc *proc,
> >                       struct binder_fd_object *fp = to_binder_fd_object(hdr);
> >                       binder_size_t fd_offset = object_offset +
> >                               (uintptr_t)&fp->fd - (uintptr_t)fp;
> > -                     int ret = binder_translate_fd(fp->fd, fd_offset, t,
> > -                                                   thread, in_reply_to);
> > +                     int ret = binder_translate_fd(fp->fd, fd_offset, fp->flags,
> > +                                                   t, thread, in_reply_to);
> >
> >                       fp->pad_binder = 0;
> >                       if (ret < 0 ||
>
> IMO the changes to the dma-buf api should some in a separate patch. So
> those can be approved and managed separately.
>
I've actually already dropped these based on feedback from Hillf, so
there are no longer any dma-buf.c changes on this patch for the v2 I
have queued up.

> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index fd6c5002032b..a65b42433099 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -34,8 +34,6 @@
> >
> >  #include "dma-buf-sysfs-stats.h"
> >
> > -static inline int is_dma_buf_file(struct file *);
> > -
> >  struct dma_buf_list {
> >       struct list_head head;
> >       struct mutex lock;
> > @@ -527,7 +525,7 @@ static const struct file_operations dma_buf_fops = {
> >  /*
> >   * is_dma_buf_file - Check if struct file* is associated with dma_buf
> >   */
> > -static inline int is_dma_buf_file(struct file *file)
> > +int is_dma_buf_file(struct file *file)
> >  {
> >       return file->f_op == &dma_buf_fops;
> >  }
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 6aa128d76aa7..092d572ce528 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -595,6 +595,7 @@ dma_buf_attachment_is_dynamic(struct dma_buf_attachment *attach)
> >       return !!attach->importer_ops;
> >  }
> >
> > +int is_dma_buf_file(struct file *file);
> >  struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> >                                         struct device *dev);
> >  struct dma_buf_attachment *
> > diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
> > index e72e4de8f452..696c2bdb8a7e 100644
> > --- a/include/uapi/linux/android/binder.h
> > +++ b/include/uapi/linux/android/binder.h
> > @@ -91,14 +91,14 @@ struct flat_binder_object {
> >  /**
> >   * struct binder_fd_object - describes a filedescriptor to be fixed up.
> >   * @hdr:     common header structure
> > - * @pad_flags:       padding to remain compatible with old userspace code
> > + * @flags:   One or more BINDER_FD_FLAG_* flags
> >   * @pad_binder:      padding to remain compatible with old userspace code
> >   * @fd:              file descriptor
> >   * @cookie:  opaque data, used by user-space
> >   */
> >  struct binder_fd_object {
> >       struct binder_object_header     hdr;
> > -     __u32                           pad_flags;
> > +     __u32                           flags;
> >       union {
> >               binder_uintptr_t        pad_binder;
> >               __u32                   fd;
> > @@ -107,6 +107,17 @@ struct binder_fd_object {
> >       binder_uintptr_t                cookie;
> >  };
> >
> > +enum {
> > +     /**
> > +      * @BINDER_FD_FLAG_XFER_CHARGE
> > +      *
> > +      * When set, the sender of a binder_fd_object wishes to relinquish ownership of the fd for
> > +      * memory accounting purposes. If the fd is for a DMA-BUF, the buffer is uncharged from the
> > +      * sender's cgroup and charged to the receiving process's cgroup instead.
> > +      */
> > +     BINDER_FD_FLAG_XFER_CHARGE = 0x01,
> > +};
> > +
> >  /* struct binder_buffer_object - object describing a userspace buffer
> >   * @hdr:             common header structure
> >   * @flags:           one or more BINDER_BUFFER_* flags
> > @@ -141,7 +152,7 @@ enum {
> >
> >  /* struct binder_fd_array_object - object describing an array of fds in a buffer
> >   * @hdr:             common header structure
> > - * @pad:             padding to ensure correct alignment
> > + * @flags:           One or more BINDER_FDA_FLAG_* flags
> >   * @num_fds:         number of file descriptors in the buffer
> >   * @parent:          index in offset array to buffer holding the fd array
> >   * @parent_offset:   start offset of fd array in the buffer
> > @@ -162,12 +173,16 @@ enum {
> >   */
> >  struct binder_fd_array_object {
> >       struct binder_object_header     hdr;
> > -     __u32                           pad;
> > +     __u32                           flags;
> >       binder_size_t                   num_fds;
> >       binder_size_t                   parent;
> >       binder_size_t                   parent_offset;
> >  };
> >
> > +enum {
> > +     BINDER_FDA_FLAG_XFER_CHARGE = BINDER_FD_FLAG_XFER_CHARGE,
> > +};
> > +
>
> I would prefer to drop this. It should avoid silly mistakes in
> userspace similar to the typo in the commit message above.
>
Ok I'll make that work.
>
> >  /*
> >   * On 64-bit platforms where user code may run in 32-bits the driver must
> >   * translate the buffer (and local binder) addresses appropriately.
> > --
> > 2.39.0.314.g84b9a713c41-goog
> >
>
> Thanks,
> --
> Carlos Llamas

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-01-20 21:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-09 21:38 [PATCH 0/4] Track exported dma-buffers with memcg T.J. Mercier
2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier
2023-01-10  8:58   ` Michal Hocko
2023-01-10 19:08     ` T.J. Mercier
2023-01-09 21:38 ` [PATCH 2/4] dmabuf: Add cgroup charge transfer function T.J. Mercier
2023-01-09 21:38 ` [PATCH 3/4] binder: Add flags to relinquish ownership of fds T.J. Mercier
     [not found]   ` <20230110014720.281-1-hdanton@sina.com>
2023-01-10 21:20     ` T.J. Mercier
2023-01-20 21:25   ` Carlos Llamas
2023-01-20 21:52     ` T.J. Mercier
2023-01-10  0:18 ` [PATCH 0/4] Track exported dma-buffers with memcg Shakeel Butt
2023-01-11 22:56   ` Daniel Vetter
2023-01-12  0:49     ` T.J. Mercier
2023-01-12  8:13       ` Shakeel Butt
2023-01-12  8:17         ` Christian König
2023-01-12  7:56     ` Shakeel Butt
2023-01-12 10:25       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).