linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
@ 2022-01-04 23:51 Hridya Valsaraju
  2022-01-05 15:13 ` Greg Kroah-Hartman
  2022-01-06  8:59 ` Christian König
  0 siblings, 2 replies; 15+ messages in thread
From: Hridya Valsaraju @ 2022-01-04 23:51 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Hridya Valsaraju,
	Daniel Vetter, Greg Kroah-Hartman, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel
  Cc: john.stultz, surenb, kaleshsingh, tjmercier, keescook

Recently, we noticed an issue where a process went into direct reclaim
while holding the kernfs rw semaphore for sysfs in write(exclusive)
mode. This caused processes who were doing DMA-BUF exports and releases
to go into uninterruptible sleep since they needed to acquire the same
semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
blocking DMA-BUF export/release for an indeterminate amount of time
while another process is holding the sysfs rw semaphore in exclusive
mode, this patch moves the per-buffer sysfs file creation/deleteion to
a kthread.

Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
Signed-off-by: Hridya Valsaraju <hridya@google.com>
---
 drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
 include/linux/dma-buf.h               |  46 ++++
 2 files changed, 366 insertions(+), 23 deletions(-)

diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
index 053baadcada9..3251fdf2f05f 100644
--- a/drivers/dma-buf/dma-buf-sysfs-stats.c
+++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
@@ -7,13 +7,39 @@
 
 #include <linux/dma-buf.h>
 #include <linux/dma-resv.h>
+#include <linux/freezer.h>
 #include <linux/kobject.h>
+#include <linux/kthread.h>
+#include <linux/list.h>
 #include <linux/printk.h>
+#include <linux/sched/signal.h>
 #include <linux/slab.h>
 #include <linux/sysfs.h>
 
 #include "dma-buf-sysfs-stats.h"
 
+struct dmabuf_kobj_work {
+	struct list_head list;
+	struct dma_buf_sysfs_entry *sysfs_entry;
+	struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
+	unsigned long uid;
+};
+
+/* Both kobject setup and teardown work gets queued on the list. */
+static LIST_HEAD(dmabuf_kobj_work_list);
+
+/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
+static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
+
+/*
+ * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
+ * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
+ */
+static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
+
+static struct task_struct *dmabuf_kobject_task;
+static wait_queue_head_t dmabuf_kobject_waitqueue;
+
 #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
 
 /**
@@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
 	struct dma_buf_stats_attribute *attribute;
 	struct dma_buf_sysfs_entry *sysfs_entry;
 	struct dma_buf *dmabuf;
+	int ret;
 
 	attribute = to_dma_buf_stats_attr(attr);
 	sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
+
+	/*
+	 * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
+	 * being freed while sysfs_entry->dmabuf is being accessed.
+	 */
+	spin_lock(&dmabuf_sysfs_show_lock);
 	dmabuf = sysfs_entry->dmabuf;
 
-	if (!dmabuf || !attribute->show)
+	if (!dmabuf || !attribute->show) {
+		spin_unlock(&dmabuf_sysfs_show_lock);
 		return -EIO;
+	}
 
-	return attribute->show(dmabuf, attribute, buf);
+	ret = attribute->show(dmabuf, attribute, buf);
+	spin_unlock(&dmabuf_sysfs_show_lock);
+	return ret;
 }
 
 static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
@@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
 	.default_groups = dma_buf_stats_default_groups,
 };
 
-void dma_buf_stats_teardown(struct dma_buf *dmabuf)
+/* Statistics files do not need to send uevents. */
+static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
 {
-	struct dma_buf_sysfs_entry *sysfs_entry;
+	return 0;
+}
 
-	sysfs_entry = dmabuf->sysfs_entry;
-	if (!sysfs_entry)
-		return;
+static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
+	.filter = dmabuf_sysfs_uevent_filter,
+};
+
+/* setup of sysfs entries done asynchronously in the worker thread. */
+static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
+{
+	struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
+	struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
+			kobject_work->sysfs_metadata;
+	bool free_metadata = false;
+
+	int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
+				       "%lu", kobject_work->uid);
+	if (ret) {
+		kobject_put(&sysfs_entry->kobj);
+
+		spin_lock(&sysfs_metadata->sysfs_entry_lock);
+		if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
+			/*
+			 * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
+			 * been freed. At this point, its safe to free the memory for
+			 * the sysfs_metadata;
+			 */
+			free_metadata = true;
+		} else {
+			/*
+			 * The DMA-BUF has not yet been freed, set the status to
+			 * sysfs_entry_error so that when the DMA-BUF gets
+			 * freed, we know there is no need to teardown the sysfs
+			 * entry.
+			 */
+			sysfs_metadata->status = SYSFS_ENTRY_ERROR;
+		}
+		goto unlock;
+	}
+
+	/*
+	 * If the DMA-BUF has not yet been released, status would still be
+	 * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
+	 */
+	spin_lock(&sysfs_metadata->sysfs_entry_lock);
+	if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
+		sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
+		goto unlock;
+	}
 
+	/*
+	 * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
+	 * that the DMA-BUF has already been freed. Hence, we cleanup the
+	 * sysfs_entry and its metadata since neither of them are needed
+	 * anymore.
+	 */
+	free_metadata = true;
 	kobject_del(&sysfs_entry->kobj);
 	kobject_put(&sysfs_entry->kobj);
+
+unlock:
+	spin_unlock(&sysfs_metadata->sysfs_entry_lock);
+	if (free_metadata) {
+		kfree(kobject_work->sysfs_metadata);
+		kobject_work->sysfs_metadata = NULL;
+	}
 }
 
+/* teardown of sysfs entries done asynchronously in the worker thread. */
+static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
+{
+	struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
 
-/* Statistics files do not need to send uevents. */
-static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
+	kobject_del(&sysfs_entry->kobj);
+	kobject_put(&sysfs_entry->kobj);
+
+	kfree(kobject_work->sysfs_metadata);
+	kobject_work->sysfs_metadata = NULL;
+}
+
+/* do setup or teardown of sysfs entries as required */
+static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
 {
+	struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
+	bool setup_needed = false;
+	bool teardown_needed = false;
+
+	sysfs_metadata = kobject_work->sysfs_metadata;
+	spin_lock(&sysfs_metadata->sysfs_entry_lock);
+	if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
+		setup_needed = true;
+		sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
+	} else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
+		teardown_needed = true;
+	}
+
+	/*
+	 * It is ok to release the sysfs_entry_lock here.
+	 *
+	 * If setup_needed is true, we check the status again after the kobject
+	 * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
+	 * and if so teardown the kobject.
+	 *
+	 * If teardown_needed is true, there are no more changes expected to the
+	 * status.
+	 *
+	 * If neither setup_needed nor teardown needed are true, it
+	 * means the DMA-BUF was freed before we got around to setting up the
+	 * sysfs entry and hence we just need to release the metadata and
+	 * return.
+	 */
+	spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
+
+	if (setup_needed)
+		dma_buf_sysfs_stats_setup_work(kobject_work);
+	else if (teardown_needed)
+		dma_buf_sysfs_stats_teardown_work(kobject_work);
+	else
+		kfree(kobject_work->sysfs_metadata);
+
+	kfree(kobject_work);
+}
+
+static struct dmabuf_kobj_work *get_next_kobj_work(void)
+{
+	struct dmabuf_kobj_work *kobject_work;
+
+	spin_lock(&dmabuf_kobj_list_lock);
+	kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
+						struct dmabuf_kobj_work, list);
+	if (kobject_work)
+		list_del(&kobject_work->list);
+	spin_unlock(&dmabuf_kobj_list_lock);
+	return kobject_work;
+}
+
+static int kobject_work_thread(void *data)
+{
+	struct dmabuf_kobj_work *kobject_work;
+
+	while (1) {
+		wait_event_freezable(dmabuf_kobject_waitqueue,
+				     (kobject_work = get_next_kobj_work()));
+		do_kobject_work(kobject_work);
+	}
+
 	return 0;
 }
 
-static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
-	.filter = dmabuf_sysfs_uevent_filter,
-};
+static int kobject_worklist_init(void)
+{
+	init_waitqueue_head(&dmabuf_kobject_waitqueue);
+	dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
+					  "%s", "dmabuf-kobject-worker");
+	if (IS_ERR(dmabuf_kobject_task)) {
+		pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
+		return PTR_ERR(dmabuf_kobject_task);
+	}
+	sched_set_normal(dmabuf_kobject_task, MAX_NICE);
+
+	return 0;
+}
+
+static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
+{
+	INIT_LIST_HEAD(&kobject_work->list);
+
+	spin_lock(&dmabuf_kobj_list_lock);
+
+	list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
+
+	spin_unlock(&dmabuf_kobj_list_lock);
+
+	wake_up(&dmabuf_kobject_waitqueue);
+}
+
+
+void dma_buf_stats_teardown(struct dma_buf *dmabuf)
+{
+	struct dma_buf_sysfs_entry *sysfs_entry;
+	struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
+	struct dmabuf_kobj_work *kobj_work;
+
+	sysfs_entry = dmabuf->sysfs_entry;
+	if (!sysfs_entry)
+		return;
+
+	sysfs_metadata = dmabuf->sysfs_entry_metadata;
+	if (!sysfs_metadata)
+		return;
+
+	spin_lock(&sysfs_metadata->sysfs_entry_lock);
+
+	if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
+	    sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
+		/*
+		 * The sysfs entry for this buffer has not yet been initialized,
+		 * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
+		 * initialization.
+		 */
+		sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
+		spin_unlock(&sysfs_metadata->sysfs_entry_lock);
+
+		/*
+		 * In case kobject initialization completes right as we release
+		 * the sysfs_entry_lock, disable show() for the sysfs entry by
+		 * setting sysfs_entry->dmabuf to NULL to prevent a race.
+		 */
+		spin_lock(&dmabuf_sysfs_show_lock);
+		sysfs_entry->dmabuf = NULL;
+		spin_unlock(&dmabuf_sysfs_show_lock);
+
+		return;
+	}
+
+	if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
+		/*
+		 * queue teardown work only if sysfs_entry is fully inititalized.
+		 * It is ok to release the sysfs_entry_lock here since the
+		 * status can no longer change.
+		 */
+		spin_unlock(&sysfs_metadata->sysfs_entry_lock);
+
+		/*
+		 * Meanwhile disable show() for the sysfs entry to avoid a race
+		 * between teardown and show().
+		 */
+		spin_lock(&dmabuf_sysfs_show_lock);
+		sysfs_entry->dmabuf = NULL;
+		spin_unlock(&dmabuf_sysfs_show_lock);
+
+		kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
+		if (!kobj_work) {
+			/* do the teardown immediately. */
+			kobject_del(&sysfs_entry->kobj);
+			kobject_put(&sysfs_entry->kobj);
+			kfree(sysfs_metadata);
+		} else {
+			/* queue teardown work. */
+			kobj_work->sysfs_entry = dmabuf->sysfs_entry;
+			kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
+			deferred_kobject_create(kobj_work);
+		}
+
+		return;
+	}
+
+	/*
+	 * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
+	 * metadata.
+	 */
+	spin_unlock(&sysfs_metadata->sysfs_entry_lock);
+	kfree(dmabuf->sysfs_entry_metadata);
+	dmabuf->sysfs_entry_metadata = NULL;
+}
 
 static struct kset *dma_buf_stats_kset;
 static struct kset *dma_buf_per_buffer_stats_kset;
 int dma_buf_init_sysfs_statistics(void)
 {
+	int ret;
+
+	ret = kobject_worklist_init();
+	if (ret)
+		return ret;
+
 	dma_buf_stats_kset = kset_create_and_add("dmabuf",
 						 &dmabuf_sysfs_no_uevent_ops,
 						 kernel_kobj);
@@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
 int dma_buf_stats_setup(struct dma_buf *dmabuf)
 {
 	struct dma_buf_sysfs_entry *sysfs_entry;
-	int ret;
+	struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
+	struct dmabuf_kobj_work *kobj_work;
 
 	if (!dmabuf || !dmabuf->file)
 		return -EINVAL;
@@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
 	sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
 	sysfs_entry->dmabuf = dmabuf;
 
+	sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
+				 GFP_KERNEL);
+	if (!sysfs_metadata) {
+		kfree(sysfs_entry);
+		return -ENOMEM;
+	}
+
 	dmabuf->sysfs_entry = sysfs_entry;
 
-	/* create the directory for buffer stats */
-	ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
-				   "%lu", file_inode(dmabuf->file)->i_ino);
-	if (ret)
-		goto err_sysfs_dmabuf;
+	sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
+	spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
 
-	return 0;
+	dmabuf->sysfs_entry_metadata = sysfs_metadata;
 
-err_sysfs_dmabuf:
-	kobject_put(&sysfs_entry->kobj);
-	dmabuf->sysfs_entry = NULL;
-	return ret;
+	kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
+	if (!kobj_work) {
+		kfree(sysfs_entry);
+		kfree(sysfs_metadata);
+		return -ENOMEM;
+	}
+
+	kobj_work->sysfs_entry = dmabuf->sysfs_entry;
+	kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
+	/*
+	 * stash the inode number in struct dmabuf_kobj_work since setup
+	 * might race with DMA-BUF teardown.
+	 */
+	kobj_work->uid = file_inode(dmabuf->file)->i_ino;
+
+	deferred_kobject_create(kobj_work);
+	return 0;
 }
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7ab50076e7a6..0597690023a0 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -287,6 +287,50 @@ struct dma_buf_ops {
 	void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
 };
 
+#ifdef CONFIG_DMABUF_SYSFS_STATS
+enum sysfs_entry_status {
+	SYSFS_ENTRY_UNINITIALIZED,
+	SYSFS_ENTRY_INIT_IN_PROGRESS,
+	SYSFS_ENTRY_ERROR,
+	SYSFS_ENTRY_INIT_ABORTED,
+	SYSFS_ENTRY_INITIALIZED,
+};
+
+/*
+ * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
+ * DMA-BUF sysfs entry.
+ *
+ * @status: holds the current status for the DMA-BUF sysfs entry. The status of
+ * the sysfs entry has the following path.
+ *
+ *			SYSFS_ENTRY_UNINITIALIZED
+ *		 __________________|____________________
+ *		|					|
+ *	  SYSFS_ENTRY_INIT_IN_PROGRESS	    SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
+ *		|						      gets freed
+ *		|						      before
+ *		|						      init)
+ *	________|_____________________________________
+ *	|			  |		      |
+ * SYSFS_ENTRY_INITIALIZED	  |	  SYSFS_ENTRY_INIT_ABORTED
+ *				  |		  (DMA-BUF gets freed during kobject
+ *				  |		  init)
+ *				  |
+ *				  |
+ *		      SYSFS_ENTRY_ERROR
+ *		      (error during kobject init)
+ *
+ * @sysfs_entry_lock: protects access to @status.
+ */
+struct dma_buf_sysfs_entry_metadata {
+	enum sysfs_entry_status status;
+	/*
+	 * Protects sysfs_entry_metadata->status
+	 */
+	spinlock_t sysfs_entry_lock;
+};
+#endif
+
 /**
  * struct dma_buf - shared buffer object
  *
@@ -452,6 +496,8 @@ struct dma_buf {
 		struct kobject kobj;
 		struct dma_buf *dmabuf;
 	} *sysfs_entry;
+
+	struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
 #endif
 };
 
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-04 23:51 [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path Hridya Valsaraju
@ 2022-01-05 15:13 ` Greg Kroah-Hartman
  2022-01-05 18:38   ` Hridya Valsaraju
  2022-01-06  8:59 ` Christian König
  1 sibling, 1 reply; 15+ messages in thread
From: Greg Kroah-Hartman @ 2022-01-05 15:13 UTC (permalink / raw)
  To: Hridya Valsaraju
  Cc: Sumit Semwal, Christian König, Daniel Vetter, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

On Tue, Jan 04, 2022 at 03:51:48PM -0800, Hridya Valsaraju wrote:
> Recently, we noticed an issue where a process went into direct reclaim
> while holding the kernfs rw semaphore for sysfs in write(exclusive)
> mode. This caused processes who were doing DMA-BUF exports and releases
> to go into uninterruptible sleep since they needed to acquire the same
> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> blocking DMA-BUF export/release for an indeterminate amount of time
> while another process is holding the sysfs rw semaphore in exclusive
> mode, this patch moves the per-buffer sysfs file creation/deleteion to
> a kthread.
> 
> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> ---
>  drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
>  include/linux/dma-buf.h               |  46 ++++
>  2 files changed, 366 insertions(+), 23 deletions(-)

Crazy, but if this works in your testing, it looks ok to me.  Nice work.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-05 15:13 ` Greg Kroah-Hartman
@ 2022-01-05 18:38   ` Hridya Valsaraju
  0 siblings, 0 replies; 15+ messages in thread
From: Hridya Valsaraju @ 2022-01-05 18:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sumit Semwal, Christian König, Daniel Vetter, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

On Wed, Jan 5, 2022 at 7:13 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jan 04, 2022 at 03:51:48PM -0800, Hridya Valsaraju wrote:
> > Recently, we noticed an issue where a process went into direct reclaim
> > while holding the kernfs rw semaphore for sysfs in write(exclusive)
> > mode. This caused processes who were doing DMA-BUF exports and releases
> > to go into uninterruptible sleep since they needed to acquire the same
> > semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> > blocking DMA-BUF export/release for an indeterminate amount of time
> > while another process is holding the sysfs rw semaphore in exclusive
> > mode, this patch moves the per-buffer sysfs file creation/deleteion to
> > a kthread.
> >
> > Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > ---
> >  drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
> >  include/linux/dma-buf.h               |  46 ++++
> >  2 files changed, 366 insertions(+), 23 deletions(-)
>
> Crazy, but if this works in your testing, it looks ok to me.  Nice work.
>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Thank you for the review Greg :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-04 23:51 [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path Hridya Valsaraju
  2022-01-05 15:13 ` Greg Kroah-Hartman
@ 2022-01-06  8:59 ` Christian König
  2022-01-06 19:04   ` Hridya Valsaraju
  1 sibling, 1 reply; 15+ messages in thread
From: Christian König @ 2022-01-06  8:59 UTC (permalink / raw)
  To: Hridya Valsaraju, Sumit Semwal, Daniel Vetter,
	Greg Kroah-Hartman, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: john.stultz, surenb, kaleshsingh, tjmercier, keescook

Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
> Recently, we noticed an issue where a process went into direct reclaim
> while holding the kernfs rw semaphore for sysfs in write(exclusive)
> mode. This caused processes who were doing DMA-BUF exports and releases
> to go into uninterruptible sleep since they needed to acquire the same
> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> blocking DMA-BUF export/release for an indeterminate amount of time
> while another process is holding the sysfs rw semaphore in exclusive
> mode, this patch moves the per-buffer sysfs file creation/deleteion to
> a kthread.

Well I absolutely don't think that this is justified.

You adding tons of complexity here just to avoid the overhead of 
creating the sysfs files while exporting DMA-bufs which is an operation 
which should be done exactly once in the lifecycle for the most common 
use case.

Please explain further why that should be necessary.

Regards,
Christian.

>
> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> ---
>   drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
>   include/linux/dma-buf.h               |  46 ++++
>   2 files changed, 366 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
> index 053baadcada9..3251fdf2f05f 100644
> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> @@ -7,13 +7,39 @@
>   
>   #include <linux/dma-buf.h>
>   #include <linux/dma-resv.h>
> +#include <linux/freezer.h>
>   #include <linux/kobject.h>
> +#include <linux/kthread.h>
> +#include <linux/list.h>
>   #include <linux/printk.h>
> +#include <linux/sched/signal.h>
>   #include <linux/slab.h>
>   #include <linux/sysfs.h>
>   
>   #include "dma-buf-sysfs-stats.h"
>   
> +struct dmabuf_kobj_work {
> +	struct list_head list;
> +	struct dma_buf_sysfs_entry *sysfs_entry;
> +	struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> +	unsigned long uid;
> +};
> +
> +/* Both kobject setup and teardown work gets queued on the list. */
> +static LIST_HEAD(dmabuf_kobj_work_list);
> +
> +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
> +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
> +
> +/*
> + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
> + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
> + */
> +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
> +
> +static struct task_struct *dmabuf_kobject_task;
> +static wait_queue_head_t dmabuf_kobject_waitqueue;
> +
>   #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
>   
>   /**
> @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
>   	struct dma_buf_stats_attribute *attribute;
>   	struct dma_buf_sysfs_entry *sysfs_entry;
>   	struct dma_buf *dmabuf;
> +	int ret;
>   
>   	attribute = to_dma_buf_stats_attr(attr);
>   	sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
> +
> +	/*
> +	 * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
> +	 * being freed while sysfs_entry->dmabuf is being accessed.
> +	 */
> +	spin_lock(&dmabuf_sysfs_show_lock);
>   	dmabuf = sysfs_entry->dmabuf;
>   
> -	if (!dmabuf || !attribute->show)
> +	if (!dmabuf || !attribute->show) {
> +		spin_unlock(&dmabuf_sysfs_show_lock);
>   		return -EIO;
> +	}
>   
> -	return attribute->show(dmabuf, attribute, buf);
> +	ret = attribute->show(dmabuf, attribute, buf);
> +	spin_unlock(&dmabuf_sysfs_show_lock);
> +	return ret;
>   }
>   
>   static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
> @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
>   	.default_groups = dma_buf_stats_default_groups,
>   };
>   
> -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> +/* Statistics files do not need to send uevents. */
> +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
>   {
> -	struct dma_buf_sysfs_entry *sysfs_entry;
> +	return 0;
> +}
>   
> -	sysfs_entry = dmabuf->sysfs_entry;
> -	if (!sysfs_entry)
> -		return;
> +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> +	.filter = dmabuf_sysfs_uevent_filter,
> +};
> +
> +/* setup of sysfs entries done asynchronously in the worker thread. */
> +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
> +{
> +	struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> +	struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
> +			kobject_work->sysfs_metadata;
> +	bool free_metadata = false;
> +
> +	int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> +				       "%lu", kobject_work->uid);
> +	if (ret) {
> +		kobject_put(&sysfs_entry->kobj);
> +
> +		spin_lock(&sysfs_metadata->sysfs_entry_lock);
> +		if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
> +			/*
> +			 * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
> +			 * been freed. At this point, its safe to free the memory for
> +			 * the sysfs_metadata;
> +			 */
> +			free_metadata = true;
> +		} else {
> +			/*
> +			 * The DMA-BUF has not yet been freed, set the status to
> +			 * sysfs_entry_error so that when the DMA-BUF gets
> +			 * freed, we know there is no need to teardown the sysfs
> +			 * entry.
> +			 */
> +			sysfs_metadata->status = SYSFS_ENTRY_ERROR;
> +		}
> +		goto unlock;
> +	}
> +
> +	/*
> +	 * If the DMA-BUF has not yet been released, status would still be
> +	 * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
> +	 */
> +	spin_lock(&sysfs_metadata->sysfs_entry_lock);
> +	if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> +		sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
> +		goto unlock;
> +	}
>   
> +	/*
> +	 * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
> +	 * that the DMA-BUF has already been freed. Hence, we cleanup the
> +	 * sysfs_entry and its metadata since neither of them are needed
> +	 * anymore.
> +	 */
> +	free_metadata = true;
>   	kobject_del(&sysfs_entry->kobj);
>   	kobject_put(&sysfs_entry->kobj);
> +
> +unlock:
> +	spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> +	if (free_metadata) {
> +		kfree(kobject_work->sysfs_metadata);
> +		kobject_work->sysfs_metadata = NULL;
> +	}
>   }
>   
> +/* teardown of sysfs entries done asynchronously in the worker thread. */
> +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
> +{
> +	struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
>   
> -/* Statistics files do not need to send uevents. */
> -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> +	kobject_del(&sysfs_entry->kobj);
> +	kobject_put(&sysfs_entry->kobj);
> +
> +	kfree(kobject_work->sysfs_metadata);
> +	kobject_work->sysfs_metadata = NULL;
> +}
> +
> +/* do setup or teardown of sysfs entries as required */
> +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
>   {
> +	struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> +	bool setup_needed = false;
> +	bool teardown_needed = false;
> +
> +	sysfs_metadata = kobject_work->sysfs_metadata;
> +	spin_lock(&sysfs_metadata->sysfs_entry_lock);
> +	if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
> +		setup_needed = true;
> +		sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
> +	} else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> +		teardown_needed = true;
> +	}
> +
> +	/*
> +	 * It is ok to release the sysfs_entry_lock here.
> +	 *
> +	 * If setup_needed is true, we check the status again after the kobject
> +	 * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
> +	 * and if so teardown the kobject.
> +	 *
> +	 * If teardown_needed is true, there are no more changes expected to the
> +	 * status.
> +	 *
> +	 * If neither setup_needed nor teardown needed are true, it
> +	 * means the DMA-BUF was freed before we got around to setting up the
> +	 * sysfs entry and hence we just need to release the metadata and
> +	 * return.
> +	 */
> +	spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
> +
> +	if (setup_needed)
> +		dma_buf_sysfs_stats_setup_work(kobject_work);
> +	else if (teardown_needed)
> +		dma_buf_sysfs_stats_teardown_work(kobject_work);
> +	else
> +		kfree(kobject_work->sysfs_metadata);
> +
> +	kfree(kobject_work);
> +}
> +
> +static struct dmabuf_kobj_work *get_next_kobj_work(void)
> +{
> +	struct dmabuf_kobj_work *kobject_work;
> +
> +	spin_lock(&dmabuf_kobj_list_lock);
> +	kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
> +						struct dmabuf_kobj_work, list);
> +	if (kobject_work)
> +		list_del(&kobject_work->list);
> +	spin_unlock(&dmabuf_kobj_list_lock);
> +	return kobject_work;
> +}
> +
> +static int kobject_work_thread(void *data)
> +{
> +	struct dmabuf_kobj_work *kobject_work;
> +
> +	while (1) {
> +		wait_event_freezable(dmabuf_kobject_waitqueue,
> +				     (kobject_work = get_next_kobj_work()));
> +		do_kobject_work(kobject_work);
> +	}
> +
>   	return 0;
>   }
>   
> -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> -	.filter = dmabuf_sysfs_uevent_filter,
> -};
> +static int kobject_worklist_init(void)
> +{
> +	init_waitqueue_head(&dmabuf_kobject_waitqueue);
> +	dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
> +					  "%s", "dmabuf-kobject-worker");
> +	if (IS_ERR(dmabuf_kobject_task)) {
> +		pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
> +		return PTR_ERR(dmabuf_kobject_task);
> +	}
> +	sched_set_normal(dmabuf_kobject_task, MAX_NICE);
> +
> +	return 0;
> +}
> +
> +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
> +{
> +	INIT_LIST_HEAD(&kobject_work->list);
> +
> +	spin_lock(&dmabuf_kobj_list_lock);
> +
> +	list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
> +
> +	spin_unlock(&dmabuf_kobj_list_lock);
> +
> +	wake_up(&dmabuf_kobject_waitqueue);
> +}
> +
> +
> +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> +{
> +	struct dma_buf_sysfs_entry *sysfs_entry;
> +	struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> +	struct dmabuf_kobj_work *kobj_work;
> +
> +	sysfs_entry = dmabuf->sysfs_entry;
> +	if (!sysfs_entry)
> +		return;
> +
> +	sysfs_metadata = dmabuf->sysfs_entry_metadata;
> +	if (!sysfs_metadata)
> +		return;
> +
> +	spin_lock(&sysfs_metadata->sysfs_entry_lock);
> +
> +	if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
> +	    sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> +		/*
> +		 * The sysfs entry for this buffer has not yet been initialized,
> +		 * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
> +		 * initialization.
> +		 */
> +		sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
> +		spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> +
> +		/*
> +		 * In case kobject initialization completes right as we release
> +		 * the sysfs_entry_lock, disable show() for the sysfs entry by
> +		 * setting sysfs_entry->dmabuf to NULL to prevent a race.
> +		 */
> +		spin_lock(&dmabuf_sysfs_show_lock);
> +		sysfs_entry->dmabuf = NULL;
> +		spin_unlock(&dmabuf_sysfs_show_lock);
> +
> +		return;
> +	}
> +
> +	if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> +		/*
> +		 * queue teardown work only if sysfs_entry is fully inititalized.
> +		 * It is ok to release the sysfs_entry_lock here since the
> +		 * status can no longer change.
> +		 */
> +		spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> +
> +		/*
> +		 * Meanwhile disable show() for the sysfs entry to avoid a race
> +		 * between teardown and show().
> +		 */
> +		spin_lock(&dmabuf_sysfs_show_lock);
> +		sysfs_entry->dmabuf = NULL;
> +		spin_unlock(&dmabuf_sysfs_show_lock);
> +
> +		kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> +		if (!kobj_work) {
> +			/* do the teardown immediately. */
> +			kobject_del(&sysfs_entry->kobj);
> +			kobject_put(&sysfs_entry->kobj);
> +			kfree(sysfs_metadata);
> +		} else {
> +			/* queue teardown work. */
> +			kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> +			kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> +			deferred_kobject_create(kobj_work);
> +		}
> +
> +		return;
> +	}
> +
> +	/*
> +	 * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
> +	 * metadata.
> +	 */
> +	spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> +	kfree(dmabuf->sysfs_entry_metadata);
> +	dmabuf->sysfs_entry_metadata = NULL;
> +}
>   
>   static struct kset *dma_buf_stats_kset;
>   static struct kset *dma_buf_per_buffer_stats_kset;
>   int dma_buf_init_sysfs_statistics(void)
>   {
> +	int ret;
> +
> +	ret = kobject_worklist_init();
> +	if (ret)
> +		return ret;
> +
>   	dma_buf_stats_kset = kset_create_and_add("dmabuf",
>   						 &dmabuf_sysfs_no_uevent_ops,
>   						 kernel_kobj);
> @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
>   int dma_buf_stats_setup(struct dma_buf *dmabuf)
>   {
>   	struct dma_buf_sysfs_entry *sysfs_entry;
> -	int ret;
> +	struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> +	struct dmabuf_kobj_work *kobj_work;
>   
>   	if (!dmabuf || !dmabuf->file)
>   		return -EINVAL;
> @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
>   	sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
>   	sysfs_entry->dmabuf = dmabuf;
>   
> +	sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
> +				 GFP_KERNEL);
> +	if (!sysfs_metadata) {
> +		kfree(sysfs_entry);
> +		return -ENOMEM;
> +	}
> +
>   	dmabuf->sysfs_entry = sysfs_entry;
>   
> -	/* create the directory for buffer stats */
> -	ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> -				   "%lu", file_inode(dmabuf->file)->i_ino);
> -	if (ret)
> -		goto err_sysfs_dmabuf;
> +	sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
> +	spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
>   
> -	return 0;
> +	dmabuf->sysfs_entry_metadata = sysfs_metadata;
>   
> -err_sysfs_dmabuf:
> -	kobject_put(&sysfs_entry->kobj);
> -	dmabuf->sysfs_entry = NULL;
> -	return ret;
> +	kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> +	if (!kobj_work) {
> +		kfree(sysfs_entry);
> +		kfree(sysfs_metadata);
> +		return -ENOMEM;
> +	}
> +
> +	kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> +	kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> +	/*
> +	 * stash the inode number in struct dmabuf_kobj_work since setup
> +	 * might race with DMA-BUF teardown.
> +	 */
> +	kobj_work->uid = file_inode(dmabuf->file)->i_ino;
> +
> +	deferred_kobject_create(kobj_work);
> +	return 0;
>   }
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 7ab50076e7a6..0597690023a0 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -287,6 +287,50 @@ struct dma_buf_ops {
>   	void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>   };
>   
> +#ifdef CONFIG_DMABUF_SYSFS_STATS
> +enum sysfs_entry_status {
> +	SYSFS_ENTRY_UNINITIALIZED,
> +	SYSFS_ENTRY_INIT_IN_PROGRESS,
> +	SYSFS_ENTRY_ERROR,
> +	SYSFS_ENTRY_INIT_ABORTED,
> +	SYSFS_ENTRY_INITIALIZED,
> +};
> +
> +/*
> + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
> + * DMA-BUF sysfs entry.
> + *
> + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
> + * the sysfs entry has the following path.
> + *
> + *			SYSFS_ENTRY_UNINITIALIZED
> + *		 __________________|____________________
> + *		|					|
> + *	  SYSFS_ENTRY_INIT_IN_PROGRESS	    SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
> + *		|						      gets freed
> + *		|						      before
> + *		|						      init)
> + *	________|_____________________________________
> + *	|			  |		      |
> + * SYSFS_ENTRY_INITIALIZED	  |	  SYSFS_ENTRY_INIT_ABORTED
> + *				  |		  (DMA-BUF gets freed during kobject
> + *				  |		  init)
> + *				  |
> + *				  |
> + *		      SYSFS_ENTRY_ERROR
> + *		      (error during kobject init)
> + *
> + * @sysfs_entry_lock: protects access to @status.
> + */
> +struct dma_buf_sysfs_entry_metadata {
> +	enum sysfs_entry_status status;
> +	/*
> +	 * Protects sysfs_entry_metadata->status
> +	 */
> +	spinlock_t sysfs_entry_lock;
> +};
> +#endif
> +
>   /**
>    * struct dma_buf - shared buffer object
>    *
> @@ -452,6 +496,8 @@ struct dma_buf {
>   		struct kobject kobj;
>   		struct dma_buf *dmabuf;
>   	} *sysfs_entry;
> +
> +	struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
>   #endif
>   };
>   


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-06  8:59 ` Christian König
@ 2022-01-06 19:04   ` Hridya Valsaraju
  2022-01-07 10:22     ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Hridya Valsaraju @ 2022-01-06 19:04 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Daniel Vetter, Greg Kroah-Hartman, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

On Thu, Jan 6, 2022 at 12:59 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
> > Recently, we noticed an issue where a process went into direct reclaim
> > while holding the kernfs rw semaphore for sysfs in write(exclusive)
> > mode. This caused processes who were doing DMA-BUF exports and releases
> > to go into uninterruptible sleep since they needed to acquire the same
> > semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> > blocking DMA-BUF export/release for an indeterminate amount of time
> > while another process is holding the sysfs rw semaphore in exclusive
> > mode, this patch moves the per-buffer sysfs file creation/deleteion to
> > a kthread.
>
> Well I absolutely don't think that this is justified.
>
> You adding tons of complexity here just to avoid the overhead of
> creating the sysfs files while exporting DMA-bufs which is an operation
> which should be done exactly once in the lifecycle for the most common
> use case.
>
> Please explain further why that should be necessary.

Hi Christian,

We noticed that the issue sometimes causes the exporting process to go
to the uninterrupted sleep state for tens of milliseconds which
unfortunately leads to user-perceptible UI jank. This is the reason
why we are trying to move the sysfs entry creation and deletion out of
the DMA-BUF export/release path. I will update the commit message to
include the same in the next revision.

Thanks,
Hridya


>
> Regards,
> Christian.
>
> >
> > Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> > Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > ---
> >   drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
> >   include/linux/dma-buf.h               |  46 ++++
> >   2 files changed, 366 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > index 053baadcada9..3251fdf2f05f 100644
> > --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> > +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > @@ -7,13 +7,39 @@
> >
> >   #include <linux/dma-buf.h>
> >   #include <linux/dma-resv.h>
> > +#include <linux/freezer.h>
> >   #include <linux/kobject.h>
> > +#include <linux/kthread.h>
> > +#include <linux/list.h>
> >   #include <linux/printk.h>
> > +#include <linux/sched/signal.h>
> >   #include <linux/slab.h>
> >   #include <linux/sysfs.h>
> >
> >   #include "dma-buf-sysfs-stats.h"
> >
> > +struct dmabuf_kobj_work {
> > +     struct list_head list;
> > +     struct dma_buf_sysfs_entry *sysfs_entry;
> > +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > +     unsigned long uid;
> > +};
> > +
> > +/* Both kobject setup and teardown work gets queued on the list. */
> > +static LIST_HEAD(dmabuf_kobj_work_list);
> > +
> > +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
> > +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
> > +
> > +/*
> > + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
> > + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
> > + */
> > +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
> > +
> > +static struct task_struct *dmabuf_kobject_task;
> > +static wait_queue_head_t dmabuf_kobject_waitqueue;
> > +
> >   #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
> >
> >   /**
> > @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
> >       struct dma_buf_stats_attribute *attribute;
> >       struct dma_buf_sysfs_entry *sysfs_entry;
> >       struct dma_buf *dmabuf;
> > +     int ret;
> >
> >       attribute = to_dma_buf_stats_attr(attr);
> >       sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
> > +
> > +     /*
> > +      * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
> > +      * being freed while sysfs_entry->dmabuf is being accessed.
> > +      */
> > +     spin_lock(&dmabuf_sysfs_show_lock);
> >       dmabuf = sysfs_entry->dmabuf;
> >
> > -     if (!dmabuf || !attribute->show)
> > +     if (!dmabuf || !attribute->show) {
> > +             spin_unlock(&dmabuf_sysfs_show_lock);
> >               return -EIO;
> > +     }
> >
> > -     return attribute->show(dmabuf, attribute, buf);
> > +     ret = attribute->show(dmabuf, attribute, buf);
> > +     spin_unlock(&dmabuf_sysfs_show_lock);
> > +     return ret;
> >   }
> >
> >   static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
> > @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
> >       .default_groups = dma_buf_stats_default_groups,
> >   };
> >
> > -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> > +/* Statistics files do not need to send uevents. */
> > +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> >   {
> > -     struct dma_buf_sysfs_entry *sysfs_entry;
> > +     return 0;
> > +}
> >
> > -     sysfs_entry = dmabuf->sysfs_entry;
> > -     if (!sysfs_entry)
> > -             return;
> > +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> > +     .filter = dmabuf_sysfs_uevent_filter,
> > +};
> > +
> > +/* setup of sysfs entries done asynchronously in the worker thread. */
> > +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
> > +{
> > +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> > +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
> > +                     kobject_work->sysfs_metadata;
> > +     bool free_metadata = false;
> > +
> > +     int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> > +                                    "%lu", kobject_work->uid);
> > +     if (ret) {
> > +             kobject_put(&sysfs_entry->kobj);
> > +
> > +             spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > +             if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
> > +                     /*
> > +                      * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
> > +                      * been freed. At this point, its safe to free the memory for
> > +                      * the sysfs_metadata;
> > +                      */
> > +                     free_metadata = true;
> > +             } else {
> > +                     /*
> > +                      * The DMA-BUF has not yet been freed, set the status to
> > +                      * sysfs_entry_error so that when the DMA-BUF gets
> > +                      * freed, we know there is no need to teardown the sysfs
> > +                      * entry.
> > +                      */
> > +                     sysfs_metadata->status = SYSFS_ENTRY_ERROR;
> > +             }
> > +             goto unlock;
> > +     }
> > +
> > +     /*
> > +      * If the DMA-BUF has not yet been released, status would still be
> > +      * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
> > +      */
> > +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > +     if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> > +             sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
> > +             goto unlock;
> > +     }
> >
> > +     /*
> > +      * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
> > +      * that the DMA-BUF has already been freed. Hence, we cleanup the
> > +      * sysfs_entry and its metadata since neither of them are needed
> > +      * anymore.
> > +      */
> > +     free_metadata = true;
> >       kobject_del(&sysfs_entry->kobj);
> >       kobject_put(&sysfs_entry->kobj);
> > +
> > +unlock:
> > +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > +     if (free_metadata) {
> > +             kfree(kobject_work->sysfs_metadata);
> > +             kobject_work->sysfs_metadata = NULL;
> > +     }
> >   }
> >
> > +/* teardown of sysfs entries done asynchronously in the worker thread. */
> > +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
> > +{
> > +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> >
> > -/* Statistics files do not need to send uevents. */
> > -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> > +     kobject_del(&sysfs_entry->kobj);
> > +     kobject_put(&sysfs_entry->kobj);
> > +
> > +     kfree(kobject_work->sysfs_metadata);
> > +     kobject_work->sysfs_metadata = NULL;
> > +}
> > +
> > +/* do setup or teardown of sysfs entries as required */
> > +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
> >   {
> > +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > +     bool setup_needed = false;
> > +     bool teardown_needed = false;
> > +
> > +     sysfs_metadata = kobject_work->sysfs_metadata;
> > +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
> > +             setup_needed = true;
> > +             sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
> > +     } else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> > +             teardown_needed = true;
> > +     }
> > +
> > +     /*
> > +      * It is ok to release the sysfs_entry_lock here.
> > +      *
> > +      * If setup_needed is true, we check the status again after the kobject
> > +      * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
> > +      * and if so teardown the kobject.
> > +      *
> > +      * If teardown_needed is true, there are no more changes expected to the
> > +      * status.
> > +      *
> > +      * If neither setup_needed nor teardown needed are true, it
> > +      * means the DMA-BUF was freed before we got around to setting up the
> > +      * sysfs entry and hence we just need to release the metadata and
> > +      * return.
> > +      */
> > +     spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
> > +
> > +     if (setup_needed)
> > +             dma_buf_sysfs_stats_setup_work(kobject_work);
> > +     else if (teardown_needed)
> > +             dma_buf_sysfs_stats_teardown_work(kobject_work);
> > +     else
> > +             kfree(kobject_work->sysfs_metadata);
> > +
> > +     kfree(kobject_work);
> > +}
> > +
> > +static struct dmabuf_kobj_work *get_next_kobj_work(void)
> > +{
> > +     struct dmabuf_kobj_work *kobject_work;
> > +
> > +     spin_lock(&dmabuf_kobj_list_lock);
> > +     kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
> > +                                             struct dmabuf_kobj_work, list);
> > +     if (kobject_work)
> > +             list_del(&kobject_work->list);
> > +     spin_unlock(&dmabuf_kobj_list_lock);
> > +     return kobject_work;
> > +}
> > +
> > +static int kobject_work_thread(void *data)
> > +{
> > +     struct dmabuf_kobj_work *kobject_work;
> > +
> > +     while (1) {
> > +             wait_event_freezable(dmabuf_kobject_waitqueue,
> > +                                  (kobject_work = get_next_kobj_work()));
> > +             do_kobject_work(kobject_work);
> > +     }
> > +
> >       return 0;
> >   }
> >
> > -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> > -     .filter = dmabuf_sysfs_uevent_filter,
> > -};
> > +static int kobject_worklist_init(void)
> > +{
> > +     init_waitqueue_head(&dmabuf_kobject_waitqueue);
> > +     dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
> > +                                       "%s", "dmabuf-kobject-worker");
> > +     if (IS_ERR(dmabuf_kobject_task)) {
> > +             pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
> > +             return PTR_ERR(dmabuf_kobject_task);
> > +     }
> > +     sched_set_normal(dmabuf_kobject_task, MAX_NICE);
> > +
> > +     return 0;
> > +}
> > +
> > +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
> > +{
> > +     INIT_LIST_HEAD(&kobject_work->list);
> > +
> > +     spin_lock(&dmabuf_kobj_list_lock);
> > +
> > +     list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
> > +
> > +     spin_unlock(&dmabuf_kobj_list_lock);
> > +
> > +     wake_up(&dmabuf_kobject_waitqueue);
> > +}
> > +
> > +
> > +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> > +{
> > +     struct dma_buf_sysfs_entry *sysfs_entry;
> > +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > +     struct dmabuf_kobj_work *kobj_work;
> > +
> > +     sysfs_entry = dmabuf->sysfs_entry;
> > +     if (!sysfs_entry)
> > +             return;
> > +
> > +     sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > +     if (!sysfs_metadata)
> > +             return;
> > +
> > +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > +
> > +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
> > +         sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> > +             /*
> > +              * The sysfs entry for this buffer has not yet been initialized,
> > +              * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
> > +              * initialization.
> > +              */
> > +             sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
> > +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > +
> > +             /*
> > +              * In case kobject initialization completes right as we release
> > +              * the sysfs_entry_lock, disable show() for the sysfs entry by
> > +              * setting sysfs_entry->dmabuf to NULL to prevent a race.
> > +              */
> > +             spin_lock(&dmabuf_sysfs_show_lock);
> > +             sysfs_entry->dmabuf = NULL;
> > +             spin_unlock(&dmabuf_sysfs_show_lock);
> > +
> > +             return;
> > +     }
> > +
> > +     if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> > +             /*
> > +              * queue teardown work only if sysfs_entry is fully inititalized.
> > +              * It is ok to release the sysfs_entry_lock here since the
> > +              * status can no longer change.
> > +              */
> > +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > +
> > +             /*
> > +              * Meanwhile disable show() for the sysfs entry to avoid a race
> > +              * between teardown and show().
> > +              */
> > +             spin_lock(&dmabuf_sysfs_show_lock);
> > +             sysfs_entry->dmabuf = NULL;
> > +             spin_unlock(&dmabuf_sysfs_show_lock);
> > +
> > +             kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> > +             if (!kobj_work) {
> > +                     /* do the teardown immediately. */
> > +                     kobject_del(&sysfs_entry->kobj);
> > +                     kobject_put(&sysfs_entry->kobj);
> > +                     kfree(sysfs_metadata);
> > +             } else {
> > +                     /* queue teardown work. */
> > +                     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> > +                     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > +                     deferred_kobject_create(kobj_work);
> > +             }
> > +
> > +             return;
> > +     }
> > +
> > +     /*
> > +      * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
> > +      * metadata.
> > +      */
> > +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > +     kfree(dmabuf->sysfs_entry_metadata);
> > +     dmabuf->sysfs_entry_metadata = NULL;
> > +}
> >
> >   static struct kset *dma_buf_stats_kset;
> >   static struct kset *dma_buf_per_buffer_stats_kset;
> >   int dma_buf_init_sysfs_statistics(void)
> >   {
> > +     int ret;
> > +
> > +     ret = kobject_worklist_init();
> > +     if (ret)
> > +             return ret;
> > +
> >       dma_buf_stats_kset = kset_create_and_add("dmabuf",
> >                                                &dmabuf_sysfs_no_uevent_ops,
> >                                                kernel_kobj);
> > @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
> >   int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >   {
> >       struct dma_buf_sysfs_entry *sysfs_entry;
> > -     int ret;
> > +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > +     struct dmabuf_kobj_work *kobj_work;
> >
> >       if (!dmabuf || !dmabuf->file)
> >               return -EINVAL;
> > @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >       sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
> >       sysfs_entry->dmabuf = dmabuf;
> >
> > +     sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
> > +                              GFP_KERNEL);
> > +     if (!sysfs_metadata) {
> > +             kfree(sysfs_entry);
> > +             return -ENOMEM;
> > +     }
> > +
> >       dmabuf->sysfs_entry = sysfs_entry;
> >
> > -     /* create the directory for buffer stats */
> > -     ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> > -                                "%lu", file_inode(dmabuf->file)->i_ino);
> > -     if (ret)
> > -             goto err_sysfs_dmabuf;
> > +     sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
> > +     spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
> >
> > -     return 0;
> > +     dmabuf->sysfs_entry_metadata = sysfs_metadata;
> >
> > -err_sysfs_dmabuf:
> > -     kobject_put(&sysfs_entry->kobj);
> > -     dmabuf->sysfs_entry = NULL;
> > -     return ret;
> > +     kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> > +     if (!kobj_work) {
> > +             kfree(sysfs_entry);
> > +             kfree(sysfs_metadata);
> > +             return -ENOMEM;
> > +     }
> > +
> > +     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> > +     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > +     /*
> > +      * stash the inode number in struct dmabuf_kobj_work since setup
> > +      * might race with DMA-BUF teardown.
> > +      */
> > +     kobj_work->uid = file_inode(dmabuf->file)->i_ino;
> > +
> > +     deferred_kobject_create(kobj_work);
> > +     return 0;
> >   }
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 7ab50076e7a6..0597690023a0 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -287,6 +287,50 @@ struct dma_buf_ops {
> >       void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >   };
> >
> > +#ifdef CONFIG_DMABUF_SYSFS_STATS
> > +enum sysfs_entry_status {
> > +     SYSFS_ENTRY_UNINITIALIZED,
> > +     SYSFS_ENTRY_INIT_IN_PROGRESS,
> > +     SYSFS_ENTRY_ERROR,
> > +     SYSFS_ENTRY_INIT_ABORTED,
> > +     SYSFS_ENTRY_INITIALIZED,
> > +};
> > +
> > +/*
> > + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
> > + * DMA-BUF sysfs entry.
> > + *
> > + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
> > + * the sysfs entry has the following path.
> > + *
> > + *                   SYSFS_ENTRY_UNINITIALIZED
> > + *            __________________|____________________
> > + *           |                                       |
> > + *     SYSFS_ENTRY_INIT_IN_PROGRESS      SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
> > + *           |                                                     gets freed
> > + *           |                                                     before
> > + *           |                                                     init)
> > + *   ________|_____________________________________
> > + *   |                         |                   |
> > + * SYSFS_ENTRY_INITIALIZED     |       SYSFS_ENTRY_INIT_ABORTED
> > + *                             |               (DMA-BUF gets freed during kobject
> > + *                             |               init)
> > + *                             |
> > + *                             |
> > + *                 SYSFS_ENTRY_ERROR
> > + *                 (error during kobject init)
> > + *
> > + * @sysfs_entry_lock: protects access to @status.
> > + */
> > +struct dma_buf_sysfs_entry_metadata {
> > +     enum sysfs_entry_status status;
> > +     /*
> > +      * Protects sysfs_entry_metadata->status
> > +      */
> > +     spinlock_t sysfs_entry_lock;
> > +};
> > +#endif
> > +
> >   /**
> >    * struct dma_buf - shared buffer object
> >    *
> > @@ -452,6 +496,8 @@ struct dma_buf {
> >               struct kobject kobj;
> >               struct dma_buf *dmabuf;
> >       } *sysfs_entry;
> > +
> > +     struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
> >   #endif
> >   };
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-06 19:04   ` Hridya Valsaraju
@ 2022-01-07 10:22     ` Christian König
  2022-01-07 18:17       ` Hridya Valsaraju
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-01-07 10:22 UTC (permalink / raw)
  To: Hridya Valsaraju
  Cc: Sumit Semwal, Daniel Vetter, Greg Kroah-Hartman, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

Am 06.01.22 um 20:04 schrieb Hridya Valsaraju:
> On Thu, Jan 6, 2022 at 12:59 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
>>> Recently, we noticed an issue where a process went into direct reclaim
>>> while holding the kernfs rw semaphore for sysfs in write(exclusive)
>>> mode. This caused processes who were doing DMA-BUF exports and releases
>>> to go into uninterruptible sleep since they needed to acquire the same
>>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
>>> blocking DMA-BUF export/release for an indeterminate amount of time
>>> while another process is holding the sysfs rw semaphore in exclusive
>>> mode, this patch moves the per-buffer sysfs file creation/deleteion to
>>> a kthread.
>> Well I absolutely don't think that this is justified.
>>
>> You adding tons of complexity here just to avoid the overhead of
>> creating the sysfs files while exporting DMA-bufs which is an operation
>> which should be done exactly once in the lifecycle for the most common
>> use case.
>>
>> Please explain further why that should be necessary.
> Hi Christian,
>
> We noticed that the issue sometimes causes the exporting process to go
> to the uninterrupted sleep state for tens of milliseconds which
> unfortunately leads to user-perceptible UI jank. This is the reason
> why we are trying to move the sysfs entry creation and deletion out of
> the DMA-BUF export/release path. I will update the commit message to
> include the same in the next revision.

That is still not a justification for this change. The question is why 
do you need that in the first place?

Exporting a DMA-buf should be something would should be very rarely, 
e.g. only at the start of an application.

So this strongly looks like you are trying to optimize for an use case 
where we should probably rethink what userspace is doing here instead.

If we would want to go down this route you would need to change all the 
drivers implementing the DMA-buf export functionality to avoid 
uninterruptible sleep as well and that is certainly something I would NAK.

Regards,
Christian.

>
> Thanks,
> Hridya
>
>
>> Regards,
>> Christian.
>>
>>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
>>> ---
>>>    drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
>>>    include/linux/dma-buf.h               |  46 ++++
>>>    2 files changed, 366 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
>>> index 053baadcada9..3251fdf2f05f 100644
>>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
>>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
>>> @@ -7,13 +7,39 @@
>>>
>>>    #include <linux/dma-buf.h>
>>>    #include <linux/dma-resv.h>
>>> +#include <linux/freezer.h>
>>>    #include <linux/kobject.h>
>>> +#include <linux/kthread.h>
>>> +#include <linux/list.h>
>>>    #include <linux/printk.h>
>>> +#include <linux/sched/signal.h>
>>>    #include <linux/slab.h>
>>>    #include <linux/sysfs.h>
>>>
>>>    #include "dma-buf-sysfs-stats.h"
>>>
>>> +struct dmabuf_kobj_work {
>>> +     struct list_head list;
>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>> +     unsigned long uid;
>>> +};
>>> +
>>> +/* Both kobject setup and teardown work gets queued on the list. */
>>> +static LIST_HEAD(dmabuf_kobj_work_list);
>>> +
>>> +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
>>> +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
>>> +
>>> +/*
>>> + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
>>> + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
>>> + */
>>> +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
>>> +
>>> +static struct task_struct *dmabuf_kobject_task;
>>> +static wait_queue_head_t dmabuf_kobject_waitqueue;
>>> +
>>>    #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
>>>
>>>    /**
>>> @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
>>>        struct dma_buf_stats_attribute *attribute;
>>>        struct dma_buf_sysfs_entry *sysfs_entry;
>>>        struct dma_buf *dmabuf;
>>> +     int ret;
>>>
>>>        attribute = to_dma_buf_stats_attr(attr);
>>>        sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
>>> +
>>> +     /*
>>> +      * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
>>> +      * being freed while sysfs_entry->dmabuf is being accessed.
>>> +      */
>>> +     spin_lock(&dmabuf_sysfs_show_lock);
>>>        dmabuf = sysfs_entry->dmabuf;
>>>
>>> -     if (!dmabuf || !attribute->show)
>>> +     if (!dmabuf || !attribute->show) {
>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>>                return -EIO;
>>> +     }
>>>
>>> -     return attribute->show(dmabuf, attribute, buf);
>>> +     ret = attribute->show(dmabuf, attribute, buf);
>>> +     spin_unlock(&dmabuf_sysfs_show_lock);
>>> +     return ret;
>>>    }
>>>
>>>    static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
>>> @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
>>>        .default_groups = dma_buf_stats_default_groups,
>>>    };
>>>
>>> -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
>>> +/* Statistics files do not need to send uevents. */
>>> +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
>>>    {
>>> -     struct dma_buf_sysfs_entry *sysfs_entry;
>>> +     return 0;
>>> +}
>>>
>>> -     sysfs_entry = dmabuf->sysfs_entry;
>>> -     if (!sysfs_entry)
>>> -             return;
>>> +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
>>> +     .filter = dmabuf_sysfs_uevent_filter,
>>> +};
>>> +
>>> +/* setup of sysfs entries done asynchronously in the worker thread. */
>>> +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
>>> +{
>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
>>> +                     kobject_work->sysfs_metadata;
>>> +     bool free_metadata = false;
>>> +
>>> +     int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
>>> +                                    "%lu", kobject_work->uid);
>>> +     if (ret) {
>>> +             kobject_put(&sysfs_entry->kobj);
>>> +
>>> +             spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>> +             if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
>>> +                     /*
>>> +                      * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
>>> +                      * been freed. At this point, its safe to free the memory for
>>> +                      * the sysfs_metadata;
>>> +                      */
>>> +                     free_metadata = true;
>>> +             } else {
>>> +                     /*
>>> +                      * The DMA-BUF has not yet been freed, set the status to
>>> +                      * sysfs_entry_error so that when the DMA-BUF gets
>>> +                      * freed, we know there is no need to teardown the sysfs
>>> +                      * entry.
>>> +                      */
>>> +                     sysfs_metadata->status = SYSFS_ENTRY_ERROR;
>>> +             }
>>> +             goto unlock;
>>> +     }
>>> +
>>> +     /*
>>> +      * If the DMA-BUF has not yet been released, status would still be
>>> +      * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
>>> +      */
>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
>>> +             sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
>>> +             goto unlock;
>>> +     }
>>>
>>> +     /*
>>> +      * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
>>> +      * that the DMA-BUF has already been freed. Hence, we cleanup the
>>> +      * sysfs_entry and its metadata since neither of them are needed
>>> +      * anymore.
>>> +      */
>>> +     free_metadata = true;
>>>        kobject_del(&sysfs_entry->kobj);
>>>        kobject_put(&sysfs_entry->kobj);
>>> +
>>> +unlock:
>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>> +     if (free_metadata) {
>>> +             kfree(kobject_work->sysfs_metadata);
>>> +             kobject_work->sysfs_metadata = NULL;
>>> +     }
>>>    }
>>>
>>> +/* teardown of sysfs entries done asynchronously in the worker thread. */
>>> +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
>>> +{
>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
>>>
>>> -/* Statistics files do not need to send uevents. */
>>> -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
>>> +     kobject_del(&sysfs_entry->kobj);
>>> +     kobject_put(&sysfs_entry->kobj);
>>> +
>>> +     kfree(kobject_work->sysfs_metadata);
>>> +     kobject_work->sysfs_metadata = NULL;
>>> +}
>>> +
>>> +/* do setup or teardown of sysfs entries as required */
>>> +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
>>>    {
>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>> +     bool setup_needed = false;
>>> +     bool teardown_needed = false;
>>> +
>>> +     sysfs_metadata = kobject_work->sysfs_metadata;
>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
>>> +             setup_needed = true;
>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
>>> +     } else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
>>> +             teardown_needed = true;
>>> +     }
>>> +
>>> +     /*
>>> +      * It is ok to release the sysfs_entry_lock here.
>>> +      *
>>> +      * If setup_needed is true, we check the status again after the kobject
>>> +      * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
>>> +      * and if so teardown the kobject.
>>> +      *
>>> +      * If teardown_needed is true, there are no more changes expected to the
>>> +      * status.
>>> +      *
>>> +      * If neither setup_needed nor teardown needed are true, it
>>> +      * means the DMA-BUF was freed before we got around to setting up the
>>> +      * sysfs entry and hence we just need to release the metadata and
>>> +      * return.
>>> +      */
>>> +     spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
>>> +
>>> +     if (setup_needed)
>>> +             dma_buf_sysfs_stats_setup_work(kobject_work);
>>> +     else if (teardown_needed)
>>> +             dma_buf_sysfs_stats_teardown_work(kobject_work);
>>> +     else
>>> +             kfree(kobject_work->sysfs_metadata);
>>> +
>>> +     kfree(kobject_work);
>>> +}
>>> +
>>> +static struct dmabuf_kobj_work *get_next_kobj_work(void)
>>> +{
>>> +     struct dmabuf_kobj_work *kobject_work;
>>> +
>>> +     spin_lock(&dmabuf_kobj_list_lock);
>>> +     kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
>>> +                                             struct dmabuf_kobj_work, list);
>>> +     if (kobject_work)
>>> +             list_del(&kobject_work->list);
>>> +     spin_unlock(&dmabuf_kobj_list_lock);
>>> +     return kobject_work;
>>> +}
>>> +
>>> +static int kobject_work_thread(void *data)
>>> +{
>>> +     struct dmabuf_kobj_work *kobject_work;
>>> +
>>> +     while (1) {
>>> +             wait_event_freezable(dmabuf_kobject_waitqueue,
>>> +                                  (kobject_work = get_next_kobj_work()));
>>> +             do_kobject_work(kobject_work);
>>> +     }
>>> +
>>>        return 0;
>>>    }
>>>
>>> -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
>>> -     .filter = dmabuf_sysfs_uevent_filter,
>>> -};
>>> +static int kobject_worklist_init(void)
>>> +{
>>> +     init_waitqueue_head(&dmabuf_kobject_waitqueue);
>>> +     dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
>>> +                                       "%s", "dmabuf-kobject-worker");
>>> +     if (IS_ERR(dmabuf_kobject_task)) {
>>> +             pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
>>> +             return PTR_ERR(dmabuf_kobject_task);
>>> +     }
>>> +     sched_set_normal(dmabuf_kobject_task, MAX_NICE);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
>>> +{
>>> +     INIT_LIST_HEAD(&kobject_work->list);
>>> +
>>> +     spin_lock(&dmabuf_kobj_list_lock);
>>> +
>>> +     list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
>>> +
>>> +     spin_unlock(&dmabuf_kobj_list_lock);
>>> +
>>> +     wake_up(&dmabuf_kobject_waitqueue);
>>> +}
>>> +
>>> +
>>> +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
>>> +{
>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>> +     struct dmabuf_kobj_work *kobj_work;
>>> +
>>> +     sysfs_entry = dmabuf->sysfs_entry;
>>> +     if (!sysfs_entry)
>>> +             return;
>>> +
>>> +     sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>> +     if (!sysfs_metadata)
>>> +             return;
>>> +
>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>> +
>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
>>> +         sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
>>> +             /*
>>> +              * The sysfs entry for this buffer has not yet been initialized,
>>> +              * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
>>> +              * initialization.
>>> +              */
>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>> +
>>> +             /*
>>> +              * In case kobject initialization completes right as we release
>>> +              * the sysfs_entry_lock, disable show() for the sysfs entry by
>>> +              * setting sysfs_entry->dmabuf to NULL to prevent a race.
>>> +              */
>>> +             spin_lock(&dmabuf_sysfs_show_lock);
>>> +             sysfs_entry->dmabuf = NULL;
>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>> +
>>> +             return;
>>> +     }
>>> +
>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
>>> +             /*
>>> +              * queue teardown work only if sysfs_entry is fully inititalized.
>>> +              * It is ok to release the sysfs_entry_lock here since the
>>> +              * status can no longer change.
>>> +              */
>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>> +
>>> +             /*
>>> +              * Meanwhile disable show() for the sysfs entry to avoid a race
>>> +              * between teardown and show().
>>> +              */
>>> +             spin_lock(&dmabuf_sysfs_show_lock);
>>> +             sysfs_entry->dmabuf = NULL;
>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>> +
>>> +             kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
>>> +             if (!kobj_work) {
>>> +                     /* do the teardown immediately. */
>>> +                     kobject_del(&sysfs_entry->kobj);
>>> +                     kobject_put(&sysfs_entry->kobj);
>>> +                     kfree(sysfs_metadata);
>>> +             } else {
>>> +                     /* queue teardown work. */
>>> +                     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
>>> +                     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>> +                     deferred_kobject_create(kobj_work);
>>> +             }
>>> +
>>> +             return;
>>> +     }
>>> +
>>> +     /*
>>> +      * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
>>> +      * metadata.
>>> +      */
>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>> +     kfree(dmabuf->sysfs_entry_metadata);
>>> +     dmabuf->sysfs_entry_metadata = NULL;
>>> +}
>>>
>>>    static struct kset *dma_buf_stats_kset;
>>>    static struct kset *dma_buf_per_buffer_stats_kset;
>>>    int dma_buf_init_sysfs_statistics(void)
>>>    {
>>> +     int ret;
>>> +
>>> +     ret = kobject_worklist_init();
>>> +     if (ret)
>>> +             return ret;
>>> +
>>>        dma_buf_stats_kset = kset_create_and_add("dmabuf",
>>>                                                 &dmabuf_sysfs_no_uevent_ops,
>>>                                                 kernel_kobj);
>>> @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
>>>    int dma_buf_stats_setup(struct dma_buf *dmabuf)
>>>    {
>>>        struct dma_buf_sysfs_entry *sysfs_entry;
>>> -     int ret;
>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>> +     struct dmabuf_kobj_work *kobj_work;
>>>
>>>        if (!dmabuf || !dmabuf->file)
>>>                return -EINVAL;
>>> @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
>>>        sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
>>>        sysfs_entry->dmabuf = dmabuf;
>>>
>>> +     sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
>>> +                              GFP_KERNEL);
>>> +     if (!sysfs_metadata) {
>>> +             kfree(sysfs_entry);
>>> +             return -ENOMEM;
>>> +     }
>>> +
>>>        dmabuf->sysfs_entry = sysfs_entry;
>>>
>>> -     /* create the directory for buffer stats */
>>> -     ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
>>> -                                "%lu", file_inode(dmabuf->file)->i_ino);
>>> -     if (ret)
>>> -             goto err_sysfs_dmabuf;
>>> +     sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
>>> +     spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
>>>
>>> -     return 0;
>>> +     dmabuf->sysfs_entry_metadata = sysfs_metadata;
>>>
>>> -err_sysfs_dmabuf:
>>> -     kobject_put(&sysfs_entry->kobj);
>>> -     dmabuf->sysfs_entry = NULL;
>>> -     return ret;
>>> +     kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
>>> +     if (!kobj_work) {
>>> +             kfree(sysfs_entry);
>>> +             kfree(sysfs_metadata);
>>> +             return -ENOMEM;
>>> +     }
>>> +
>>> +     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
>>> +     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>> +     /*
>>> +      * stash the inode number in struct dmabuf_kobj_work since setup
>>> +      * might race with DMA-BUF teardown.
>>> +      */
>>> +     kobj_work->uid = file_inode(dmabuf->file)->i_ino;
>>> +
>>> +     deferred_kobject_create(kobj_work);
>>> +     return 0;
>>>    }
>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>> index 7ab50076e7a6..0597690023a0 100644
>>> --- a/include/linux/dma-buf.h
>>> +++ b/include/linux/dma-buf.h
>>> @@ -287,6 +287,50 @@ struct dma_buf_ops {
>>>        void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>>>    };
>>>
>>> +#ifdef CONFIG_DMABUF_SYSFS_STATS
>>> +enum sysfs_entry_status {
>>> +     SYSFS_ENTRY_UNINITIALIZED,
>>> +     SYSFS_ENTRY_INIT_IN_PROGRESS,
>>> +     SYSFS_ENTRY_ERROR,
>>> +     SYSFS_ENTRY_INIT_ABORTED,
>>> +     SYSFS_ENTRY_INITIALIZED,
>>> +};
>>> +
>>> +/*
>>> + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
>>> + * DMA-BUF sysfs entry.
>>> + *
>>> + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
>>> + * the sysfs entry has the following path.
>>> + *
>>> + *                   SYSFS_ENTRY_UNINITIALIZED
>>> + *            __________________|____________________
>>> + *           |                                       |
>>> + *     SYSFS_ENTRY_INIT_IN_PROGRESS      SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
>>> + *           |                                                     gets freed
>>> + *           |                                                     before
>>> + *           |                                                     init)
>>> + *   ________|_____________________________________
>>> + *   |                         |                   |
>>> + * SYSFS_ENTRY_INITIALIZED     |       SYSFS_ENTRY_INIT_ABORTED
>>> + *                             |               (DMA-BUF gets freed during kobject
>>> + *                             |               init)
>>> + *                             |
>>> + *                             |
>>> + *                 SYSFS_ENTRY_ERROR
>>> + *                 (error during kobject init)
>>> + *
>>> + * @sysfs_entry_lock: protects access to @status.
>>> + */
>>> +struct dma_buf_sysfs_entry_metadata {
>>> +     enum sysfs_entry_status status;
>>> +     /*
>>> +      * Protects sysfs_entry_metadata->status
>>> +      */
>>> +     spinlock_t sysfs_entry_lock;
>>> +};
>>> +#endif
>>> +
>>>    /**
>>>     * struct dma_buf - shared buffer object
>>>     *
>>> @@ -452,6 +496,8 @@ struct dma_buf {
>>>                struct kobject kobj;
>>>                struct dma_buf *dmabuf;
>>>        } *sysfs_entry;
>>> +
>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
>>>    #endif
>>>    };
>>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-07 10:22     ` Christian König
@ 2022-01-07 18:17       ` Hridya Valsaraju
  2022-01-07 21:25         ` Hridya Valsaraju
  0 siblings, 1 reply; 15+ messages in thread
From: Hridya Valsaraju @ 2022-01-07 18:17 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Daniel Vetter, Greg Kroah-Hartman, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

On Fri, Jan 7, 2022 at 2:22 AM Christian König <christian.koenig@amd.com> wrote:
>
> Am 06.01.22 um 20:04 schrieb Hridya Valsaraju:
> > On Thu, Jan 6, 2022 at 12:59 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
> >>> Recently, we noticed an issue where a process went into direct reclaim
> >>> while holding the kernfs rw semaphore for sysfs in write(exclusive)
> >>> mode. This caused processes who were doing DMA-BUF exports and releases
> >>> to go into uninterruptible sleep since they needed to acquire the same
> >>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> >>> blocking DMA-BUF export/release for an indeterminate amount of time
> >>> while another process is holding the sysfs rw semaphore in exclusive
> >>> mode, this patch moves the per-buffer sysfs file creation/deleteion to
> >>> a kthread.
> >> Well I absolutely don't think that this is justified.
> >>
> >> You adding tons of complexity here just to avoid the overhead of
> >> creating the sysfs files while exporting DMA-bufs which is an operation
> >> which should be done exactly once in the lifecycle for the most common
> >> use case.
> >>
> >> Please explain further why that should be necessary.
> > Hi Christian,
> >
> > We noticed that the issue sometimes causes the exporting process to go
> > to the uninterrupted sleep state for tens of milliseconds which
> > unfortunately leads to user-perceptible UI jank. This is the reason
> > why we are trying to move the sysfs entry creation and deletion out of
> > the DMA-BUF export/release path. I will update the commit message to
> > include the same in the next revision.
>
> That is still not a justification for this change. The question is why
> do you need that in the first place?
>
> Exporting a DMA-buf should be something would should be very rarely,
> e.g. only at the start of an application.

Hi Christian,

Yes, that is correct. Noticeable jank caused by the issue is not
present at all times and happens on UI transitions(for example during
app launches and exits) when there are several DMA-BUFs being exported
and released. This is especially true in the case of the camera app
since it exports and releases a relatively larger number of DMA-BUFs
during launch and exit respectively.

Regards,
Hridya

>
> So this strongly looks like you are trying to optimize for an use case
> where we should probably rethink what userspace is doing here instead.
>
> If we would want to go down this route you would need to change all the
> drivers implementing the DMA-buf export functionality to avoid
> uninterruptible sleep as well and that is certainly something I would NAK.
>
> Regards,
> Christian.
>
> >
> > Thanks,
> > Hridya
> >
> >
> >> Regards,
> >> Christian.
> >>
> >>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> >>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> >>> ---
> >>>    drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
> >>>    include/linux/dma-buf.h               |  46 ++++
> >>>    2 files changed, 366 insertions(+), 23 deletions(-)
> >>>
> >>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
> >>> index 053baadcada9..3251fdf2f05f 100644
> >>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> >>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> >>> @@ -7,13 +7,39 @@
> >>>
> >>>    #include <linux/dma-buf.h>
> >>>    #include <linux/dma-resv.h>
> >>> +#include <linux/freezer.h>
> >>>    #include <linux/kobject.h>
> >>> +#include <linux/kthread.h>
> >>> +#include <linux/list.h>
> >>>    #include <linux/printk.h>
> >>> +#include <linux/sched/signal.h>
> >>>    #include <linux/slab.h>
> >>>    #include <linux/sysfs.h>
> >>>
> >>>    #include "dma-buf-sysfs-stats.h"
> >>>
> >>> +struct dmabuf_kobj_work {
> >>> +     struct list_head list;
> >>> +     struct dma_buf_sysfs_entry *sysfs_entry;
> >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> >>> +     unsigned long uid;
> >>> +};
> >>> +
> >>> +/* Both kobject setup and teardown work gets queued on the list. */
> >>> +static LIST_HEAD(dmabuf_kobj_work_list);
> >>> +
> >>> +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
> >>> +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
> >>> +
> >>> +/*
> >>> + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
> >>> + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
> >>> + */
> >>> +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
> >>> +
> >>> +static struct task_struct *dmabuf_kobject_task;
> >>> +static wait_queue_head_t dmabuf_kobject_waitqueue;
> >>> +
> >>>    #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
> >>>
> >>>    /**
> >>> @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
> >>>        struct dma_buf_stats_attribute *attribute;
> >>>        struct dma_buf_sysfs_entry *sysfs_entry;
> >>>        struct dma_buf *dmabuf;
> >>> +     int ret;
> >>>
> >>>        attribute = to_dma_buf_stats_attr(attr);
> >>>        sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
> >>> +
> >>> +     /*
> >>> +      * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
> >>> +      * being freed while sysfs_entry->dmabuf is being accessed.
> >>> +      */
> >>> +     spin_lock(&dmabuf_sysfs_show_lock);
> >>>        dmabuf = sysfs_entry->dmabuf;
> >>>
> >>> -     if (!dmabuf || !attribute->show)
> >>> +     if (!dmabuf || !attribute->show) {
> >>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> >>>                return -EIO;
> >>> +     }
> >>>
> >>> -     return attribute->show(dmabuf, attribute, buf);
> >>> +     ret = attribute->show(dmabuf, attribute, buf);
> >>> +     spin_unlock(&dmabuf_sysfs_show_lock);
> >>> +     return ret;
> >>>    }
> >>>
> >>>    static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
> >>> @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
> >>>        .default_groups = dma_buf_stats_default_groups,
> >>>    };
> >>>
> >>> -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> >>> +/* Statistics files do not need to send uevents. */
> >>> +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> >>>    {
> >>> -     struct dma_buf_sysfs_entry *sysfs_entry;
> >>> +     return 0;
> >>> +}
> >>>
> >>> -     sysfs_entry = dmabuf->sysfs_entry;
> >>> -     if (!sysfs_entry)
> >>> -             return;
> >>> +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> >>> +     .filter = dmabuf_sysfs_uevent_filter,
> >>> +};
> >>> +
> >>> +/* setup of sysfs entries done asynchronously in the worker thread. */
> >>> +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
> >>> +{
> >>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
> >>> +                     kobject_work->sysfs_metadata;
> >>> +     bool free_metadata = false;
> >>> +
> >>> +     int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> >>> +                                    "%lu", kobject_work->uid);
> >>> +     if (ret) {
> >>> +             kobject_put(&sysfs_entry->kobj);
> >>> +
> >>> +             spin_lock(&sysfs_metadata->sysfs_entry_lock);
> >>> +             if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
> >>> +                     /*
> >>> +                      * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
> >>> +                      * been freed. At this point, its safe to free the memory for
> >>> +                      * the sysfs_metadata;
> >>> +                      */
> >>> +                     free_metadata = true;
> >>> +             } else {
> >>> +                     /*
> >>> +                      * The DMA-BUF has not yet been freed, set the status to
> >>> +                      * sysfs_entry_error so that when the DMA-BUF gets
> >>> +                      * freed, we know there is no need to teardown the sysfs
> >>> +                      * entry.
> >>> +                      */
> >>> +                     sysfs_metadata->status = SYSFS_ENTRY_ERROR;
> >>> +             }
> >>> +             goto unlock;
> >>> +     }
> >>> +
> >>> +     /*
> >>> +      * If the DMA-BUF has not yet been released, status would still be
> >>> +      * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
> >>> +      */
> >>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> >>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> >>> +             sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
> >>> +             goto unlock;
> >>> +     }
> >>>
> >>> +     /*
> >>> +      * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
> >>> +      * that the DMA-BUF has already been freed. Hence, we cleanup the
> >>> +      * sysfs_entry and its metadata since neither of them are needed
> >>> +      * anymore.
> >>> +      */
> >>> +     free_metadata = true;
> >>>        kobject_del(&sysfs_entry->kobj);
> >>>        kobject_put(&sysfs_entry->kobj);
> >>> +
> >>> +unlock:
> >>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> >>> +     if (free_metadata) {
> >>> +             kfree(kobject_work->sysfs_metadata);
> >>> +             kobject_work->sysfs_metadata = NULL;
> >>> +     }
> >>>    }
> >>>
> >>> +/* teardown of sysfs entries done asynchronously in the worker thread. */
> >>> +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
> >>> +{
> >>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> >>>
> >>> -/* Statistics files do not need to send uevents. */
> >>> -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> >>> +     kobject_del(&sysfs_entry->kobj);
> >>> +     kobject_put(&sysfs_entry->kobj);
> >>> +
> >>> +     kfree(kobject_work->sysfs_metadata);
> >>> +     kobject_work->sysfs_metadata = NULL;
> >>> +}
> >>> +
> >>> +/* do setup or teardown of sysfs entries as required */
> >>> +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
> >>>    {
> >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> >>> +     bool setup_needed = false;
> >>> +     bool teardown_needed = false;
> >>> +
> >>> +     sysfs_metadata = kobject_work->sysfs_metadata;
> >>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> >>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
> >>> +             setup_needed = true;
> >>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
> >>> +     } else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> >>> +             teardown_needed = true;
> >>> +     }
> >>> +
> >>> +     /*
> >>> +      * It is ok to release the sysfs_entry_lock here.
> >>> +      *
> >>> +      * If setup_needed is true, we check the status again after the kobject
> >>> +      * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
> >>> +      * and if so teardown the kobject.
> >>> +      *
> >>> +      * If teardown_needed is true, there are no more changes expected to the
> >>> +      * status.
> >>> +      *
> >>> +      * If neither setup_needed nor teardown needed are true, it
> >>> +      * means the DMA-BUF was freed before we got around to setting up the
> >>> +      * sysfs entry and hence we just need to release the metadata and
> >>> +      * return.
> >>> +      */
> >>> +     spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
> >>> +
> >>> +     if (setup_needed)
> >>> +             dma_buf_sysfs_stats_setup_work(kobject_work);
> >>> +     else if (teardown_needed)
> >>> +             dma_buf_sysfs_stats_teardown_work(kobject_work);
> >>> +     else
> >>> +             kfree(kobject_work->sysfs_metadata);
> >>> +
> >>> +     kfree(kobject_work);
> >>> +}
> >>> +
> >>> +static struct dmabuf_kobj_work *get_next_kobj_work(void)
> >>> +{
> >>> +     struct dmabuf_kobj_work *kobject_work;
> >>> +
> >>> +     spin_lock(&dmabuf_kobj_list_lock);
> >>> +     kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
> >>> +                                             struct dmabuf_kobj_work, list);
> >>> +     if (kobject_work)
> >>> +             list_del(&kobject_work->list);
> >>> +     spin_unlock(&dmabuf_kobj_list_lock);
> >>> +     return kobject_work;
> >>> +}
> >>> +
> >>> +static int kobject_work_thread(void *data)
> >>> +{
> >>> +     struct dmabuf_kobj_work *kobject_work;
> >>> +
> >>> +     while (1) {
> >>> +             wait_event_freezable(dmabuf_kobject_waitqueue,
> >>> +                                  (kobject_work = get_next_kobj_work()));
> >>> +             do_kobject_work(kobject_work);
> >>> +     }
> >>> +
> >>>        return 0;
> >>>    }
> >>>
> >>> -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> >>> -     .filter = dmabuf_sysfs_uevent_filter,
> >>> -};
> >>> +static int kobject_worklist_init(void)
> >>> +{
> >>> +     init_waitqueue_head(&dmabuf_kobject_waitqueue);
> >>> +     dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
> >>> +                                       "%s", "dmabuf-kobject-worker");
> >>> +     if (IS_ERR(dmabuf_kobject_task)) {
> >>> +             pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
> >>> +             return PTR_ERR(dmabuf_kobject_task);
> >>> +     }
> >>> +     sched_set_normal(dmabuf_kobject_task, MAX_NICE);
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>> +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
> >>> +{
> >>> +     INIT_LIST_HEAD(&kobject_work->list);
> >>> +
> >>> +     spin_lock(&dmabuf_kobj_list_lock);
> >>> +
> >>> +     list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
> >>> +
> >>> +     spin_unlock(&dmabuf_kobj_list_lock);
> >>> +
> >>> +     wake_up(&dmabuf_kobject_waitqueue);
> >>> +}
> >>> +
> >>> +
> >>> +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> >>> +{
> >>> +     struct dma_buf_sysfs_entry *sysfs_entry;
> >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> >>> +     struct dmabuf_kobj_work *kobj_work;
> >>> +
> >>> +     sysfs_entry = dmabuf->sysfs_entry;
> >>> +     if (!sysfs_entry)
> >>> +             return;
> >>> +
> >>> +     sysfs_metadata = dmabuf->sysfs_entry_metadata;
> >>> +     if (!sysfs_metadata)
> >>> +             return;
> >>> +
> >>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> >>> +
> >>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
> >>> +         sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> >>> +             /*
> >>> +              * The sysfs entry for this buffer has not yet been initialized,
> >>> +              * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
> >>> +              * initialization.
> >>> +              */
> >>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
> >>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> >>> +
> >>> +             /*
> >>> +              * In case kobject initialization completes right as we release
> >>> +              * the sysfs_entry_lock, disable show() for the sysfs entry by
> >>> +              * setting sysfs_entry->dmabuf to NULL to prevent a race.
> >>> +              */
> >>> +             spin_lock(&dmabuf_sysfs_show_lock);
> >>> +             sysfs_entry->dmabuf = NULL;
> >>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> >>> +
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> >>> +             /*
> >>> +              * queue teardown work only if sysfs_entry is fully inititalized.
> >>> +              * It is ok to release the sysfs_entry_lock here since the
> >>> +              * status can no longer change.
> >>> +              */
> >>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> >>> +
> >>> +             /*
> >>> +              * Meanwhile disable show() for the sysfs entry to avoid a race
> >>> +              * between teardown and show().
> >>> +              */
> >>> +             spin_lock(&dmabuf_sysfs_show_lock);
> >>> +             sysfs_entry->dmabuf = NULL;
> >>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> >>> +
> >>> +             kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> >>> +             if (!kobj_work) {
> >>> +                     /* do the teardown immediately. */
> >>> +                     kobject_del(&sysfs_entry->kobj);
> >>> +                     kobject_put(&sysfs_entry->kobj);
> >>> +                     kfree(sysfs_metadata);
> >>> +             } else {
> >>> +                     /* queue teardown work. */
> >>> +                     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> >>> +                     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> >>> +                     deferred_kobject_create(kobj_work);
> >>> +             }
> >>> +
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     /*
> >>> +      * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
> >>> +      * metadata.
> >>> +      */
> >>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> >>> +     kfree(dmabuf->sysfs_entry_metadata);
> >>> +     dmabuf->sysfs_entry_metadata = NULL;
> >>> +}
> >>>
> >>>    static struct kset *dma_buf_stats_kset;
> >>>    static struct kset *dma_buf_per_buffer_stats_kset;
> >>>    int dma_buf_init_sysfs_statistics(void)
> >>>    {
> >>> +     int ret;
> >>> +
> >>> +     ret = kobject_worklist_init();
> >>> +     if (ret)
> >>> +             return ret;
> >>> +
> >>>        dma_buf_stats_kset = kset_create_and_add("dmabuf",
> >>>                                                 &dmabuf_sysfs_no_uevent_ops,
> >>>                                                 kernel_kobj);
> >>> @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
> >>>    int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >>>    {
> >>>        struct dma_buf_sysfs_entry *sysfs_entry;
> >>> -     int ret;
> >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> >>> +     struct dmabuf_kobj_work *kobj_work;
> >>>
> >>>        if (!dmabuf || !dmabuf->file)
> >>>                return -EINVAL;
> >>> @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >>>        sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
> >>>        sysfs_entry->dmabuf = dmabuf;
> >>>
> >>> +     sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
> >>> +                              GFP_KERNEL);
> >>> +     if (!sysfs_metadata) {
> >>> +             kfree(sysfs_entry);
> >>> +             return -ENOMEM;
> >>> +     }
> >>> +
> >>>        dmabuf->sysfs_entry = sysfs_entry;
> >>>
> >>> -     /* create the directory for buffer stats */
> >>> -     ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> >>> -                                "%lu", file_inode(dmabuf->file)->i_ino);
> >>> -     if (ret)
> >>> -             goto err_sysfs_dmabuf;
> >>> +     sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
> >>> +     spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
> >>>
> >>> -     return 0;
> >>> +     dmabuf->sysfs_entry_metadata = sysfs_metadata;
> >>>
> >>> -err_sysfs_dmabuf:
> >>> -     kobject_put(&sysfs_entry->kobj);
> >>> -     dmabuf->sysfs_entry = NULL;
> >>> -     return ret;
> >>> +     kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> >>> +     if (!kobj_work) {
> >>> +             kfree(sysfs_entry);
> >>> +             kfree(sysfs_metadata);
> >>> +             return -ENOMEM;
> >>> +     }
> >>> +
> >>> +     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> >>> +     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> >>> +     /*
> >>> +      * stash the inode number in struct dmabuf_kobj_work since setup
> >>> +      * might race with DMA-BUF teardown.
> >>> +      */
> >>> +     kobj_work->uid = file_inode(dmabuf->file)->i_ino;
> >>> +
> >>> +     deferred_kobject_create(kobj_work);
> >>> +     return 0;
> >>>    }
> >>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> >>> index 7ab50076e7a6..0597690023a0 100644
> >>> --- a/include/linux/dma-buf.h
> >>> +++ b/include/linux/dma-buf.h
> >>> @@ -287,6 +287,50 @@ struct dma_buf_ops {
> >>>        void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >>>    };
> >>>
> >>> +#ifdef CONFIG_DMABUF_SYSFS_STATS
> >>> +enum sysfs_entry_status {
> >>> +     SYSFS_ENTRY_UNINITIALIZED,
> >>> +     SYSFS_ENTRY_INIT_IN_PROGRESS,
> >>> +     SYSFS_ENTRY_ERROR,
> >>> +     SYSFS_ENTRY_INIT_ABORTED,
> >>> +     SYSFS_ENTRY_INITIALIZED,
> >>> +};
> >>> +
> >>> +/*
> >>> + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
> >>> + * DMA-BUF sysfs entry.
> >>> + *
> >>> + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
> >>> + * the sysfs entry has the following path.
> >>> + *
> >>> + *                   SYSFS_ENTRY_UNINITIALIZED
> >>> + *            __________________|____________________
> >>> + *           |                                       |
> >>> + *     SYSFS_ENTRY_INIT_IN_PROGRESS      SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
> >>> + *           |                                                     gets freed
> >>> + *           |                                                     before
> >>> + *           |                                                     init)
> >>> + *   ________|_____________________________________
> >>> + *   |                         |                   |
> >>> + * SYSFS_ENTRY_INITIALIZED     |       SYSFS_ENTRY_INIT_ABORTED
> >>> + *                             |               (DMA-BUF gets freed during kobject
> >>> + *                             |               init)
> >>> + *                             |
> >>> + *                             |
> >>> + *                 SYSFS_ENTRY_ERROR
> >>> + *                 (error during kobject init)
> >>> + *
> >>> + * @sysfs_entry_lock: protects access to @status.
> >>> + */
> >>> +struct dma_buf_sysfs_entry_metadata {
> >>> +     enum sysfs_entry_status status;
> >>> +     /*
> >>> +      * Protects sysfs_entry_metadata->status
> >>> +      */
> >>> +     spinlock_t sysfs_entry_lock;
> >>> +};
> >>> +#endif
> >>> +
> >>>    /**
> >>>     * struct dma_buf - shared buffer object
> >>>     *
> >>> @@ -452,6 +496,8 @@ struct dma_buf {
> >>>                struct kobject kobj;
> >>>                struct dma_buf *dmabuf;
> >>>        } *sysfs_entry;
> >>> +
> >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
> >>>    #endif
> >>>    };
> >>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-07 18:17       ` Hridya Valsaraju
@ 2022-01-07 21:25         ` Hridya Valsaraju
  2022-01-10  7:28           ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Hridya Valsaraju @ 2022-01-07 21:25 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Daniel Vetter, Greg Kroah-Hartman, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

On Fri, Jan 7, 2022 at 10:17 AM Hridya Valsaraju <hridya@google.com> wrote:
>
> On Fri, Jan 7, 2022 at 2:22 AM Christian König <christian.koenig@amd.com> wrote:
> >
> > Am 06.01.22 um 20:04 schrieb Hridya Valsaraju:
> > > On Thu, Jan 6, 2022 at 12:59 AM Christian König
> > > <christian.koenig@amd.com> wrote:
> > >> Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
> > >>> Recently, we noticed an issue where a process went into direct reclaim
> > >>> while holding the kernfs rw semaphore for sysfs in write(exclusive)
> > >>> mode. This caused processes who were doing DMA-BUF exports and releases
> > >>> to go into uninterruptible sleep since they needed to acquire the same
> > >>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> > >>> blocking DMA-BUF export/release for an indeterminate amount of time
> > >>> while another process is holding the sysfs rw semaphore in exclusive
> > >>> mode, this patch moves the per-buffer sysfs file creation/deleteion to
> > >>> a kthread.
> > >> Well I absolutely don't think that this is justified.
> > >>
> > >> You adding tons of complexity here just to avoid the overhead of
> > >> creating the sysfs files while exporting DMA-bufs which is an operation
> > >> which should be done exactly once in the lifecycle for the most common
> > >> use case.
> > >>
> > >> Please explain further why that should be necessary.
> > > Hi Christian,
> > >
> > > We noticed that the issue sometimes causes the exporting process to go
> > > to the uninterrupted sleep state for tens of milliseconds which
> > > unfortunately leads to user-perceptible UI jank. This is the reason
> > > why we are trying to move the sysfs entry creation and deletion out of
> > > the DMA-BUF export/release path. I will update the commit message to
> > > include the same in the next revision.
> >
> > That is still not a justification for this change. The question is why
> > do you need that in the first place?
> >
> > Exporting a DMA-buf should be something would should be very rarely,
> > e.g. only at the start of an application.
>
> Hi Christian,
>
> Yes, that is correct. Noticeable jank caused by the issue is not
> present at all times and happens on UI transitions(for example during
> app launches and exits) when there are several DMA-BUFs being exported
> and released. This is especially true in the case of the camera app
> since it exports and releases a relatively larger number of DMA-BUFs
> during launch and exit respectively.
>
> Regards,
> Hridya
>
> >
> > So this strongly looks like you are trying to optimize for an use case
> > where we should probably rethink what userspace is doing here instead.

Hello Christian,

If you don't mind, could you please elaborate on the above statement?
Thanks in advance for the guidance!

Regards,
Hridya

> >
> > If we would want to go down this route you would need to change all the
> > drivers implementing the DMA-buf export functionality to avoid
> > uninterruptible sleep as well and that is certainly something I would NAK.
> >
> > Regards,
> > Christian.
> >
> > >
> > > Thanks,
> > > Hridya
> > >
> > >
> > >> Regards,
> > >> Christian.
> > >>
> > >>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> > >>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > >>> ---
> > >>>    drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
> > >>>    include/linux/dma-buf.h               |  46 ++++
> > >>>    2 files changed, 366 insertions(+), 23 deletions(-)
> > >>>
> > >>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > >>> index 053baadcada9..3251fdf2f05f 100644
> > >>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> > >>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > >>> @@ -7,13 +7,39 @@
> > >>>
> > >>>    #include <linux/dma-buf.h>
> > >>>    #include <linux/dma-resv.h>
> > >>> +#include <linux/freezer.h>
> > >>>    #include <linux/kobject.h>
> > >>> +#include <linux/kthread.h>
> > >>> +#include <linux/list.h>
> > >>>    #include <linux/printk.h>
> > >>> +#include <linux/sched/signal.h>
> > >>>    #include <linux/slab.h>
> > >>>    #include <linux/sysfs.h>
> > >>>
> > >>>    #include "dma-buf-sysfs-stats.h"
> > >>>
> > >>> +struct dmabuf_kobj_work {
> > >>> +     struct list_head list;
> > >>> +     struct dma_buf_sysfs_entry *sysfs_entry;
> > >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > >>> +     unsigned long uid;
> > >>> +};
> > >>> +
> > >>> +/* Both kobject setup and teardown work gets queued on the list. */
> > >>> +static LIST_HEAD(dmabuf_kobj_work_list);
> > >>> +
> > >>> +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
> > >>> +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
> > >>> +
> > >>> +/*
> > >>> + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
> > >>> + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
> > >>> + */
> > >>> +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
> > >>> +
> > >>> +static struct task_struct *dmabuf_kobject_task;
> > >>> +static wait_queue_head_t dmabuf_kobject_waitqueue;
> > >>> +
> > >>>    #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
> > >>>
> > >>>    /**
> > >>> @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
> > >>>        struct dma_buf_stats_attribute *attribute;
> > >>>        struct dma_buf_sysfs_entry *sysfs_entry;
> > >>>        struct dma_buf *dmabuf;
> > >>> +     int ret;
> > >>>
> > >>>        attribute = to_dma_buf_stats_attr(attr);
> > >>>        sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
> > >>> +
> > >>> +     /*
> > >>> +      * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
> > >>> +      * being freed while sysfs_entry->dmabuf is being accessed.
> > >>> +      */
> > >>> +     spin_lock(&dmabuf_sysfs_show_lock);
> > >>>        dmabuf = sysfs_entry->dmabuf;
> > >>>
> > >>> -     if (!dmabuf || !attribute->show)
> > >>> +     if (!dmabuf || !attribute->show) {
> > >>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> > >>>                return -EIO;
> > >>> +     }
> > >>>
> > >>> -     return attribute->show(dmabuf, attribute, buf);
> > >>> +     ret = attribute->show(dmabuf, attribute, buf);
> > >>> +     spin_unlock(&dmabuf_sysfs_show_lock);
> > >>> +     return ret;
> > >>>    }
> > >>>
> > >>>    static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
> > >>> @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
> > >>>        .default_groups = dma_buf_stats_default_groups,
> > >>>    };
> > >>>
> > >>> -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> > >>> +/* Statistics files do not need to send uevents. */
> > >>> +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> > >>>    {
> > >>> -     struct dma_buf_sysfs_entry *sysfs_entry;
> > >>> +     return 0;
> > >>> +}
> > >>>
> > >>> -     sysfs_entry = dmabuf->sysfs_entry;
> > >>> -     if (!sysfs_entry)
> > >>> -             return;
> > >>> +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> > >>> +     .filter = dmabuf_sysfs_uevent_filter,
> > >>> +};
> > >>> +
> > >>> +/* setup of sysfs entries done asynchronously in the worker thread. */
> > >>> +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
> > >>> +{
> > >>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> > >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
> > >>> +                     kobject_work->sysfs_metadata;
> > >>> +     bool free_metadata = false;
> > >>> +
> > >>> +     int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> > >>> +                                    "%lu", kobject_work->uid);
> > >>> +     if (ret) {
> > >>> +             kobject_put(&sysfs_entry->kobj);
> > >>> +
> > >>> +             spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > >>> +             if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
> > >>> +                     /*
> > >>> +                      * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
> > >>> +                      * been freed. At this point, its safe to free the memory for
> > >>> +                      * the sysfs_metadata;
> > >>> +                      */
> > >>> +                     free_metadata = true;
> > >>> +             } else {
> > >>> +                     /*
> > >>> +                      * The DMA-BUF has not yet been freed, set the status to
> > >>> +                      * sysfs_entry_error so that when the DMA-BUF gets
> > >>> +                      * freed, we know there is no need to teardown the sysfs
> > >>> +                      * entry.
> > >>> +                      */
> > >>> +                     sysfs_metadata->status = SYSFS_ENTRY_ERROR;
> > >>> +             }
> > >>> +             goto unlock;
> > >>> +     }
> > >>> +
> > >>> +     /*
> > >>> +      * If the DMA-BUF has not yet been released, status would still be
> > >>> +      * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
> > >>> +      */
> > >>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > >>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> > >>> +             sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
> > >>> +             goto unlock;
> > >>> +     }
> > >>>
> > >>> +     /*
> > >>> +      * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
> > >>> +      * that the DMA-BUF has already been freed. Hence, we cleanup the
> > >>> +      * sysfs_entry and its metadata since neither of them are needed
> > >>> +      * anymore.
> > >>> +      */
> > >>> +     free_metadata = true;
> > >>>        kobject_del(&sysfs_entry->kobj);
> > >>>        kobject_put(&sysfs_entry->kobj);
> > >>> +
> > >>> +unlock:
> > >>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > >>> +     if (free_metadata) {
> > >>> +             kfree(kobject_work->sysfs_metadata);
> > >>> +             kobject_work->sysfs_metadata = NULL;
> > >>> +     }
> > >>>    }
> > >>>
> > >>> +/* teardown of sysfs entries done asynchronously in the worker thread. */
> > >>> +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
> > >>> +{
> > >>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> > >>>
> > >>> -/* Statistics files do not need to send uevents. */
> > >>> -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> > >>> +     kobject_del(&sysfs_entry->kobj);
> > >>> +     kobject_put(&sysfs_entry->kobj);
> > >>> +
> > >>> +     kfree(kobject_work->sysfs_metadata);
> > >>> +     kobject_work->sysfs_metadata = NULL;
> > >>> +}
> > >>> +
> > >>> +/* do setup or teardown of sysfs entries as required */
> > >>> +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
> > >>>    {
> > >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > >>> +     bool setup_needed = false;
> > >>> +     bool teardown_needed = false;
> > >>> +
> > >>> +     sysfs_metadata = kobject_work->sysfs_metadata;
> > >>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > >>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
> > >>> +             setup_needed = true;
> > >>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
> > >>> +     } else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> > >>> +             teardown_needed = true;
> > >>> +     }
> > >>> +
> > >>> +     /*
> > >>> +      * It is ok to release the sysfs_entry_lock here.
> > >>> +      *
> > >>> +      * If setup_needed is true, we check the status again after the kobject
> > >>> +      * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
> > >>> +      * and if so teardown the kobject.
> > >>> +      *
> > >>> +      * If teardown_needed is true, there are no more changes expected to the
> > >>> +      * status.
> > >>> +      *
> > >>> +      * If neither setup_needed nor teardown needed are true, it
> > >>> +      * means the DMA-BUF was freed before we got around to setting up the
> > >>> +      * sysfs entry and hence we just need to release the metadata and
> > >>> +      * return.
> > >>> +      */
> > >>> +     spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
> > >>> +
> > >>> +     if (setup_needed)
> > >>> +             dma_buf_sysfs_stats_setup_work(kobject_work);
> > >>> +     else if (teardown_needed)
> > >>> +             dma_buf_sysfs_stats_teardown_work(kobject_work);
> > >>> +     else
> > >>> +             kfree(kobject_work->sysfs_metadata);
> > >>> +
> > >>> +     kfree(kobject_work);
> > >>> +}
> > >>> +
> > >>> +static struct dmabuf_kobj_work *get_next_kobj_work(void)
> > >>> +{
> > >>> +     struct dmabuf_kobj_work *kobject_work;
> > >>> +
> > >>> +     spin_lock(&dmabuf_kobj_list_lock);
> > >>> +     kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
> > >>> +                                             struct dmabuf_kobj_work, list);
> > >>> +     if (kobject_work)
> > >>> +             list_del(&kobject_work->list);
> > >>> +     spin_unlock(&dmabuf_kobj_list_lock);
> > >>> +     return kobject_work;
> > >>> +}
> > >>> +
> > >>> +static int kobject_work_thread(void *data)
> > >>> +{
> > >>> +     struct dmabuf_kobj_work *kobject_work;
> > >>> +
> > >>> +     while (1) {
> > >>> +             wait_event_freezable(dmabuf_kobject_waitqueue,
> > >>> +                                  (kobject_work = get_next_kobj_work()));
> > >>> +             do_kobject_work(kobject_work);
> > >>> +     }
> > >>> +
> > >>>        return 0;
> > >>>    }
> > >>>
> > >>> -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> > >>> -     .filter = dmabuf_sysfs_uevent_filter,
> > >>> -};
> > >>> +static int kobject_worklist_init(void)
> > >>> +{
> > >>> +     init_waitqueue_head(&dmabuf_kobject_waitqueue);
> > >>> +     dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
> > >>> +                                       "%s", "dmabuf-kobject-worker");
> > >>> +     if (IS_ERR(dmabuf_kobject_task)) {
> > >>> +             pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
> > >>> +             return PTR_ERR(dmabuf_kobject_task);
> > >>> +     }
> > >>> +     sched_set_normal(dmabuf_kobject_task, MAX_NICE);
> > >>> +
> > >>> +     return 0;
> > >>> +}
> > >>> +
> > >>> +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
> > >>> +{
> > >>> +     INIT_LIST_HEAD(&kobject_work->list);
> > >>> +
> > >>> +     spin_lock(&dmabuf_kobj_list_lock);
> > >>> +
> > >>> +     list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
> > >>> +
> > >>> +     spin_unlock(&dmabuf_kobj_list_lock);
> > >>> +
> > >>> +     wake_up(&dmabuf_kobject_waitqueue);
> > >>> +}
> > >>> +
> > >>> +
> > >>> +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> > >>> +{
> > >>> +     struct dma_buf_sysfs_entry *sysfs_entry;
> > >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > >>> +     struct dmabuf_kobj_work *kobj_work;
> > >>> +
> > >>> +     sysfs_entry = dmabuf->sysfs_entry;
> > >>> +     if (!sysfs_entry)
> > >>> +             return;
> > >>> +
> > >>> +     sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > >>> +     if (!sysfs_metadata)
> > >>> +             return;
> > >>> +
> > >>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > >>> +
> > >>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
> > >>> +         sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> > >>> +             /*
> > >>> +              * The sysfs entry for this buffer has not yet been initialized,
> > >>> +              * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
> > >>> +              * initialization.
> > >>> +              */
> > >>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
> > >>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > >>> +
> > >>> +             /*
> > >>> +              * In case kobject initialization completes right as we release
> > >>> +              * the sysfs_entry_lock, disable show() for the sysfs entry by
> > >>> +              * setting sysfs_entry->dmabuf to NULL to prevent a race.
> > >>> +              */
> > >>> +             spin_lock(&dmabuf_sysfs_show_lock);
> > >>> +             sysfs_entry->dmabuf = NULL;
> > >>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> > >>> +
> > >>> +             return;
> > >>> +     }
> > >>> +
> > >>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> > >>> +             /*
> > >>> +              * queue teardown work only if sysfs_entry is fully inititalized.
> > >>> +              * It is ok to release the sysfs_entry_lock here since the
> > >>> +              * status can no longer change.
> > >>> +              */
> > >>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > >>> +
> > >>> +             /*
> > >>> +              * Meanwhile disable show() for the sysfs entry to avoid a race
> > >>> +              * between teardown and show().
> > >>> +              */
> > >>> +             spin_lock(&dmabuf_sysfs_show_lock);
> > >>> +             sysfs_entry->dmabuf = NULL;
> > >>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> > >>> +
> > >>> +             kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> > >>> +             if (!kobj_work) {
> > >>> +                     /* do the teardown immediately. */
> > >>> +                     kobject_del(&sysfs_entry->kobj);
> > >>> +                     kobject_put(&sysfs_entry->kobj);
> > >>> +                     kfree(sysfs_metadata);
> > >>> +             } else {
> > >>> +                     /* queue teardown work. */
> > >>> +                     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> > >>> +                     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > >>> +                     deferred_kobject_create(kobj_work);
> > >>> +             }
> > >>> +
> > >>> +             return;
> > >>> +     }
> > >>> +
> > >>> +     /*
> > >>> +      * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
> > >>> +      * metadata.
> > >>> +      */
> > >>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > >>> +     kfree(dmabuf->sysfs_entry_metadata);
> > >>> +     dmabuf->sysfs_entry_metadata = NULL;
> > >>> +}
> > >>>
> > >>>    static struct kset *dma_buf_stats_kset;
> > >>>    static struct kset *dma_buf_per_buffer_stats_kset;
> > >>>    int dma_buf_init_sysfs_statistics(void)
> > >>>    {
> > >>> +     int ret;
> > >>> +
> > >>> +     ret = kobject_worklist_init();
> > >>> +     if (ret)
> > >>> +             return ret;
> > >>> +
> > >>>        dma_buf_stats_kset = kset_create_and_add("dmabuf",
> > >>>                                                 &dmabuf_sysfs_no_uevent_ops,
> > >>>                                                 kernel_kobj);
> > >>> @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
> > >>>    int dma_buf_stats_setup(struct dma_buf *dmabuf)
> > >>>    {
> > >>>        struct dma_buf_sysfs_entry *sysfs_entry;
> > >>> -     int ret;
> > >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > >>> +     struct dmabuf_kobj_work *kobj_work;
> > >>>
> > >>>        if (!dmabuf || !dmabuf->file)
> > >>>                return -EINVAL;
> > >>> @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
> > >>>        sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
> > >>>        sysfs_entry->dmabuf = dmabuf;
> > >>>
> > >>> +     sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
> > >>> +                              GFP_KERNEL);
> > >>> +     if (!sysfs_metadata) {
> > >>> +             kfree(sysfs_entry);
> > >>> +             return -ENOMEM;
> > >>> +     }
> > >>> +
> > >>>        dmabuf->sysfs_entry = sysfs_entry;
> > >>>
> > >>> -     /* create the directory for buffer stats */
> > >>> -     ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> > >>> -                                "%lu", file_inode(dmabuf->file)->i_ino);
> > >>> -     if (ret)
> > >>> -             goto err_sysfs_dmabuf;
> > >>> +     sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
> > >>> +     spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
> > >>>
> > >>> -     return 0;
> > >>> +     dmabuf->sysfs_entry_metadata = sysfs_metadata;
> > >>>
> > >>> -err_sysfs_dmabuf:
> > >>> -     kobject_put(&sysfs_entry->kobj);
> > >>> -     dmabuf->sysfs_entry = NULL;
> > >>> -     return ret;
> > >>> +     kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> > >>> +     if (!kobj_work) {
> > >>> +             kfree(sysfs_entry);
> > >>> +             kfree(sysfs_metadata);
> > >>> +             return -ENOMEM;
> > >>> +     }
> > >>> +
> > >>> +     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> > >>> +     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > >>> +     /*
> > >>> +      * stash the inode number in struct dmabuf_kobj_work since setup
> > >>> +      * might race with DMA-BUF teardown.
> > >>> +      */
> > >>> +     kobj_work->uid = file_inode(dmabuf->file)->i_ino;
> > >>> +
> > >>> +     deferred_kobject_create(kobj_work);
> > >>> +     return 0;
> > >>>    }
> > >>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > >>> index 7ab50076e7a6..0597690023a0 100644
> > >>> --- a/include/linux/dma-buf.h
> > >>> +++ b/include/linux/dma-buf.h
> > >>> @@ -287,6 +287,50 @@ struct dma_buf_ops {
> > >>>        void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > >>>    };
> > >>>
> > >>> +#ifdef CONFIG_DMABUF_SYSFS_STATS
> > >>> +enum sysfs_entry_status {
> > >>> +     SYSFS_ENTRY_UNINITIALIZED,
> > >>> +     SYSFS_ENTRY_INIT_IN_PROGRESS,
> > >>> +     SYSFS_ENTRY_ERROR,
> > >>> +     SYSFS_ENTRY_INIT_ABORTED,
> > >>> +     SYSFS_ENTRY_INITIALIZED,
> > >>> +};
> > >>> +
> > >>> +/*
> > >>> + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
> > >>> + * DMA-BUF sysfs entry.
> > >>> + *
> > >>> + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
> > >>> + * the sysfs entry has the following path.
> > >>> + *
> > >>> + *                   SYSFS_ENTRY_UNINITIALIZED
> > >>> + *            __________________|____________________
> > >>> + *           |                                       |
> > >>> + *     SYSFS_ENTRY_INIT_IN_PROGRESS      SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
> > >>> + *           |                                                     gets freed
> > >>> + *           |                                                     before
> > >>> + *           |                                                     init)
> > >>> + *   ________|_____________________________________
> > >>> + *   |                         |                   |
> > >>> + * SYSFS_ENTRY_INITIALIZED     |       SYSFS_ENTRY_INIT_ABORTED
> > >>> + *                             |               (DMA-BUF gets freed during kobject
> > >>> + *                             |               init)
> > >>> + *                             |
> > >>> + *                             |
> > >>> + *                 SYSFS_ENTRY_ERROR
> > >>> + *                 (error during kobject init)
> > >>> + *
> > >>> + * @sysfs_entry_lock: protects access to @status.
> > >>> + */
> > >>> +struct dma_buf_sysfs_entry_metadata {
> > >>> +     enum sysfs_entry_status status;
> > >>> +     /*
> > >>> +      * Protects sysfs_entry_metadata->status
> > >>> +      */
> > >>> +     spinlock_t sysfs_entry_lock;
> > >>> +};
> > >>> +#endif
> > >>> +
> > >>>    /**
> > >>>     * struct dma_buf - shared buffer object
> > >>>     *
> > >>> @@ -452,6 +496,8 @@ struct dma_buf {
> > >>>                struct kobject kobj;
> > >>>                struct dma_buf *dmabuf;
> > >>>        } *sysfs_entry;
> > >>> +
> > >>> +     struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
> > >>>    #endif
> > >>>    };
> > >>>
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-07 21:25         ` Hridya Valsaraju
@ 2022-01-10  7:28           ` Christian König
  2022-01-11  6:01             ` Hridya Valsaraju
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-01-10  7:28 UTC (permalink / raw)
  To: Hridya Valsaraju
  Cc: Sumit Semwal, Daniel Vetter, Greg Kroah-Hartman, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

Am 07.01.22 um 22:25 schrieb Hridya Valsaraju:
> On Fri, Jan 7, 2022 at 10:17 AM Hridya Valsaraju <hridya@google.com> wrote:
>> On Fri, Jan 7, 2022 at 2:22 AM Christian König <christian.koenig@amd.com> wrote:
>>> Am 06.01.22 um 20:04 schrieb Hridya Valsaraju:
>>>> On Thu, Jan 6, 2022 at 12:59 AM Christian König
>>>> <christian.koenig@amd.com> wrote:
>>>>> Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
>>>>>> Recently, we noticed an issue where a process went into direct reclaim
>>>>>> while holding the kernfs rw semaphore for sysfs in write(exclusive)
>>>>>> mode. This caused processes who were doing DMA-BUF exports and releases
>>>>>> to go into uninterruptible sleep since they needed to acquire the same
>>>>>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
>>>>>> blocking DMA-BUF export/release for an indeterminate amount of time
>>>>>> while another process is holding the sysfs rw semaphore in exclusive
>>>>>> mode, this patch moves the per-buffer sysfs file creation/deleteion to
>>>>>> a kthread.
>>>>> Well I absolutely don't think that this is justified.
>>>>>
>>>>> You adding tons of complexity here just to avoid the overhead of
>>>>> creating the sysfs files while exporting DMA-bufs which is an operation
>>>>> which should be done exactly once in the lifecycle for the most common
>>>>> use case.
>>>>>
>>>>> Please explain further why that should be necessary.
>>>> Hi Christian,
>>>>
>>>> We noticed that the issue sometimes causes the exporting process to go
>>>> to the uninterrupted sleep state for tens of milliseconds which
>>>> unfortunately leads to user-perceptible UI jank. This is the reason
>>>> why we are trying to move the sysfs entry creation and deletion out of
>>>> the DMA-BUF export/release path. I will update the commit message to
>>>> include the same in the next revision.
>>> That is still not a justification for this change. The question is why
>>> do you need that in the first place?
>>>
>>> Exporting a DMA-buf should be something would should be very rarely,
>>> e.g. only at the start of an application.
>> Hi Christian,
>>
>> Yes, that is correct. Noticeable jank caused by the issue is not
>> present at all times and happens on UI transitions(for example during
>> app launches and exits) when there are several DMA-BUFs being exported
>> and released. This is especially true in the case of the camera app
>> since it exports and releases a relatively larger number of DMA-BUFs
>> during launch and exit respectively.

Well, that sounds at least better than before.

>>
>> Regards,
>> Hridya
>>
>>> So this strongly looks like you are trying to optimize for an use case
>>> where we should probably rethink what userspace is doing here instead.
> Hello Christian,
>
> If you don't mind, could you please elaborate on the above statement?

The purpose of DMA-buf is to share a rather low number of buffers 
between different drivers and/or applications.

For example with triple buffering we have three buffers shared between 
the camera driver and the display driver, same thing for use cases 
between rendering and display.

So even when you have ~100 applications open your should not share more 
than ~300 DMA-buf handles and doing that should absolutely not cause any 
problems like you described above.

Long story short when this affects your user experience then your user 
space is exporting *much* more buffers than expected. Especially since 
the sysfs overhead is completely negligible compared to the overhead 
drivers have when they export buffers.

I think in that light adding sysfs was rather questionable to begin 
with, but that change here is a complete no-go if you ask me. You are 
adding complexity to the kernel for something which should probably be 
optimized in userspace.

Regards,
Christian.

> Thanks in advance for the guidance!
>
> Regards,
> Hridya
>
>>> If we would want to go down this route you would need to change all the
>>> drivers implementing the DMA-buf export functionality to avoid
>>> uninterruptible sleep as well and that is certainly something I would NAK.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Thanks,
>>>> Hridya
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
>>>>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
>>>>>> ---
>>>>>>     drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
>>>>>>     include/linux/dma-buf.h               |  46 ++++
>>>>>>     2 files changed, 366 insertions(+), 23 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
>>>>>> index 053baadcada9..3251fdf2f05f 100644
>>>>>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
>>>>>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
>>>>>> @@ -7,13 +7,39 @@
>>>>>>
>>>>>>     #include <linux/dma-buf.h>
>>>>>>     #include <linux/dma-resv.h>
>>>>>> +#include <linux/freezer.h>
>>>>>>     #include <linux/kobject.h>
>>>>>> +#include <linux/kthread.h>
>>>>>> +#include <linux/list.h>
>>>>>>     #include <linux/printk.h>
>>>>>> +#include <linux/sched/signal.h>
>>>>>>     #include <linux/slab.h>
>>>>>>     #include <linux/sysfs.h>
>>>>>>
>>>>>>     #include "dma-buf-sysfs-stats.h"
>>>>>>
>>>>>> +struct dmabuf_kobj_work {
>>>>>> +     struct list_head list;
>>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>>>>> +     unsigned long uid;
>>>>>> +};
>>>>>> +
>>>>>> +/* Both kobject setup and teardown work gets queued on the list. */
>>>>>> +static LIST_HEAD(dmabuf_kobj_work_list);
>>>>>> +
>>>>>> +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
>>>>>> +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
>>>>>> +
>>>>>> +/*
>>>>>> + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
>>>>>> + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
>>>>>> + */
>>>>>> +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
>>>>>> +
>>>>>> +static struct task_struct *dmabuf_kobject_task;
>>>>>> +static wait_queue_head_t dmabuf_kobject_waitqueue;
>>>>>> +
>>>>>>     #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
>>>>>>
>>>>>>     /**
>>>>>> @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
>>>>>>         struct dma_buf_stats_attribute *attribute;
>>>>>>         struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>>         struct dma_buf *dmabuf;
>>>>>> +     int ret;
>>>>>>
>>>>>>         attribute = to_dma_buf_stats_attr(attr);
>>>>>>         sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
>>>>>> +
>>>>>> +     /*
>>>>>> +      * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
>>>>>> +      * being freed while sysfs_entry->dmabuf is being accessed.
>>>>>> +      */
>>>>>> +     spin_lock(&dmabuf_sysfs_show_lock);
>>>>>>         dmabuf = sysfs_entry->dmabuf;
>>>>>>
>>>>>> -     if (!dmabuf || !attribute->show)
>>>>>> +     if (!dmabuf || !attribute->show) {
>>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>>>>>                 return -EIO;
>>>>>> +     }
>>>>>>
>>>>>> -     return attribute->show(dmabuf, attribute, buf);
>>>>>> +     ret = attribute->show(dmabuf, attribute, buf);
>>>>>> +     spin_unlock(&dmabuf_sysfs_show_lock);
>>>>>> +     return ret;
>>>>>>     }
>>>>>>
>>>>>>     static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
>>>>>> @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
>>>>>>         .default_groups = dma_buf_stats_default_groups,
>>>>>>     };
>>>>>>
>>>>>> -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
>>>>>> +/* Statistics files do not need to send uevents. */
>>>>>> +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
>>>>>>     {
>>>>>> -     struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>> +     return 0;
>>>>>> +}
>>>>>>
>>>>>> -     sysfs_entry = dmabuf->sysfs_entry;
>>>>>> -     if (!sysfs_entry)
>>>>>> -             return;
>>>>>> +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
>>>>>> +     .filter = dmabuf_sysfs_uevent_filter,
>>>>>> +};
>>>>>> +
>>>>>> +/* setup of sysfs entries done asynchronously in the worker thread. */
>>>>>> +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
>>>>>> +{
>>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
>>>>>> +                     kobject_work->sysfs_metadata;
>>>>>> +     bool free_metadata = false;
>>>>>> +
>>>>>> +     int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
>>>>>> +                                    "%lu", kobject_work->uid);
>>>>>> +     if (ret) {
>>>>>> +             kobject_put(&sysfs_entry->kobj);
>>>>>> +
>>>>>> +             spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>>>>> +             if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
>>>>>> +                     /*
>>>>>> +                      * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
>>>>>> +                      * been freed. At this point, its safe to free the memory for
>>>>>> +                      * the sysfs_metadata;
>>>>>> +                      */
>>>>>> +                     free_metadata = true;
>>>>>> +             } else {
>>>>>> +                     /*
>>>>>> +                      * The DMA-BUF has not yet been freed, set the status to
>>>>>> +                      * sysfs_entry_error so that when the DMA-BUF gets
>>>>>> +                      * freed, we know there is no need to teardown the sysfs
>>>>>> +                      * entry.
>>>>>> +                      */
>>>>>> +                     sysfs_metadata->status = SYSFS_ENTRY_ERROR;
>>>>>> +             }
>>>>>> +             goto unlock;
>>>>>> +     }
>>>>>> +
>>>>>> +     /*
>>>>>> +      * If the DMA-BUF has not yet been released, status would still be
>>>>>> +      * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
>>>>>> +      */
>>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
>>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
>>>>>> +             goto unlock;
>>>>>> +     }
>>>>>>
>>>>>> +     /*
>>>>>> +      * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
>>>>>> +      * that the DMA-BUF has already been freed. Hence, we cleanup the
>>>>>> +      * sysfs_entry and its metadata since neither of them are needed
>>>>>> +      * anymore.
>>>>>> +      */
>>>>>> +     free_metadata = true;
>>>>>>         kobject_del(&sysfs_entry->kobj);
>>>>>>         kobject_put(&sysfs_entry->kobj);
>>>>>> +
>>>>>> +unlock:
>>>>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>>>>> +     if (free_metadata) {
>>>>>> +             kfree(kobject_work->sysfs_metadata);
>>>>>> +             kobject_work->sysfs_metadata = NULL;
>>>>>> +     }
>>>>>>     }
>>>>>>
>>>>>> +/* teardown of sysfs entries done asynchronously in the worker thread. */
>>>>>> +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
>>>>>> +{
>>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
>>>>>>
>>>>>> -/* Statistics files do not need to send uevents. */
>>>>>> -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
>>>>>> +     kobject_del(&sysfs_entry->kobj);
>>>>>> +     kobject_put(&sysfs_entry->kobj);
>>>>>> +
>>>>>> +     kfree(kobject_work->sysfs_metadata);
>>>>>> +     kobject_work->sysfs_metadata = NULL;
>>>>>> +}
>>>>>> +
>>>>>> +/* do setup or teardown of sysfs entries as required */
>>>>>> +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
>>>>>>     {
>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>>>>> +     bool setup_needed = false;
>>>>>> +     bool teardown_needed = false;
>>>>>> +
>>>>>> +     sysfs_metadata = kobject_work->sysfs_metadata;
>>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
>>>>>> +             setup_needed = true;
>>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
>>>>>> +     } else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
>>>>>> +             teardown_needed = true;
>>>>>> +     }
>>>>>> +
>>>>>> +     /*
>>>>>> +      * It is ok to release the sysfs_entry_lock here.
>>>>>> +      *
>>>>>> +      * If setup_needed is true, we check the status again after the kobject
>>>>>> +      * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
>>>>>> +      * and if so teardown the kobject.
>>>>>> +      *
>>>>>> +      * If teardown_needed is true, there are no more changes expected to the
>>>>>> +      * status.
>>>>>> +      *
>>>>>> +      * If neither setup_needed nor teardown needed are true, it
>>>>>> +      * means the DMA-BUF was freed before we got around to setting up the
>>>>>> +      * sysfs entry and hence we just need to release the metadata and
>>>>>> +      * return.
>>>>>> +      */
>>>>>> +     spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
>>>>>> +
>>>>>> +     if (setup_needed)
>>>>>> +             dma_buf_sysfs_stats_setup_work(kobject_work);
>>>>>> +     else if (teardown_needed)
>>>>>> +             dma_buf_sysfs_stats_teardown_work(kobject_work);
>>>>>> +     else
>>>>>> +             kfree(kobject_work->sysfs_metadata);
>>>>>> +
>>>>>> +     kfree(kobject_work);
>>>>>> +}
>>>>>> +
>>>>>> +static struct dmabuf_kobj_work *get_next_kobj_work(void)
>>>>>> +{
>>>>>> +     struct dmabuf_kobj_work *kobject_work;
>>>>>> +
>>>>>> +     spin_lock(&dmabuf_kobj_list_lock);
>>>>>> +     kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
>>>>>> +                                             struct dmabuf_kobj_work, list);
>>>>>> +     if (kobject_work)
>>>>>> +             list_del(&kobject_work->list);
>>>>>> +     spin_unlock(&dmabuf_kobj_list_lock);
>>>>>> +     return kobject_work;
>>>>>> +}
>>>>>> +
>>>>>> +static int kobject_work_thread(void *data)
>>>>>> +{
>>>>>> +     struct dmabuf_kobj_work *kobject_work;
>>>>>> +
>>>>>> +     while (1) {
>>>>>> +             wait_event_freezable(dmabuf_kobject_waitqueue,
>>>>>> +                                  (kobject_work = get_next_kobj_work()));
>>>>>> +             do_kobject_work(kobject_work);
>>>>>> +     }
>>>>>> +
>>>>>>         return 0;
>>>>>>     }
>>>>>>
>>>>>> -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
>>>>>> -     .filter = dmabuf_sysfs_uevent_filter,
>>>>>> -};
>>>>>> +static int kobject_worklist_init(void)
>>>>>> +{
>>>>>> +     init_waitqueue_head(&dmabuf_kobject_waitqueue);
>>>>>> +     dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
>>>>>> +                                       "%s", "dmabuf-kobject-worker");
>>>>>> +     if (IS_ERR(dmabuf_kobject_task)) {
>>>>>> +             pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
>>>>>> +             return PTR_ERR(dmabuf_kobject_task);
>>>>>> +     }
>>>>>> +     sched_set_normal(dmabuf_kobject_task, MAX_NICE);
>>>>>> +
>>>>>> +     return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
>>>>>> +{
>>>>>> +     INIT_LIST_HEAD(&kobject_work->list);
>>>>>> +
>>>>>> +     spin_lock(&dmabuf_kobj_list_lock);
>>>>>> +
>>>>>> +     list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
>>>>>> +
>>>>>> +     spin_unlock(&dmabuf_kobj_list_lock);
>>>>>> +
>>>>>> +     wake_up(&dmabuf_kobject_waitqueue);
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
>>>>>> +{
>>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>>>>> +     struct dmabuf_kobj_work *kobj_work;
>>>>>> +
>>>>>> +     sysfs_entry = dmabuf->sysfs_entry;
>>>>>> +     if (!sysfs_entry)
>>>>>> +             return;
>>>>>> +
>>>>>> +     sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>>>>> +     if (!sysfs_metadata)
>>>>>> +             return;
>>>>>> +
>>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>>>>> +
>>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
>>>>>> +         sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
>>>>>> +             /*
>>>>>> +              * The sysfs entry for this buffer has not yet been initialized,
>>>>>> +              * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
>>>>>> +              * initialization.
>>>>>> +              */
>>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
>>>>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>>>>> +
>>>>>> +             /*
>>>>>> +              * In case kobject initialization completes right as we release
>>>>>> +              * the sysfs_entry_lock, disable show() for the sysfs entry by
>>>>>> +              * setting sysfs_entry->dmabuf to NULL to prevent a race.
>>>>>> +              */
>>>>>> +             spin_lock(&dmabuf_sysfs_show_lock);
>>>>>> +             sysfs_entry->dmabuf = NULL;
>>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>>>>> +
>>>>>> +             return;
>>>>>> +     }
>>>>>> +
>>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
>>>>>> +             /*
>>>>>> +              * queue teardown work only if sysfs_entry is fully inititalized.
>>>>>> +              * It is ok to release the sysfs_entry_lock here since the
>>>>>> +              * status can no longer change.
>>>>>> +              */
>>>>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>>>>> +
>>>>>> +             /*
>>>>>> +              * Meanwhile disable show() for the sysfs entry to avoid a race
>>>>>> +              * between teardown and show().
>>>>>> +              */
>>>>>> +             spin_lock(&dmabuf_sysfs_show_lock);
>>>>>> +             sysfs_entry->dmabuf = NULL;
>>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>>>>> +
>>>>>> +             kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
>>>>>> +             if (!kobj_work) {
>>>>>> +                     /* do the teardown immediately. */
>>>>>> +                     kobject_del(&sysfs_entry->kobj);
>>>>>> +                     kobject_put(&sysfs_entry->kobj);
>>>>>> +                     kfree(sysfs_metadata);
>>>>>> +             } else {
>>>>>> +                     /* queue teardown work. */
>>>>>> +                     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
>>>>>> +                     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>>>>> +                     deferred_kobject_create(kobj_work);
>>>>>> +             }
>>>>>> +
>>>>>> +             return;
>>>>>> +     }
>>>>>> +
>>>>>> +     /*
>>>>>> +      * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
>>>>>> +      * metadata.
>>>>>> +      */
>>>>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>>>>> +     kfree(dmabuf->sysfs_entry_metadata);
>>>>>> +     dmabuf->sysfs_entry_metadata = NULL;
>>>>>> +}
>>>>>>
>>>>>>     static struct kset *dma_buf_stats_kset;
>>>>>>     static struct kset *dma_buf_per_buffer_stats_kset;
>>>>>>     int dma_buf_init_sysfs_statistics(void)
>>>>>>     {
>>>>>> +     int ret;
>>>>>> +
>>>>>> +     ret = kobject_worklist_init();
>>>>>> +     if (ret)
>>>>>> +             return ret;
>>>>>> +
>>>>>>         dma_buf_stats_kset = kset_create_and_add("dmabuf",
>>>>>>                                                  &dmabuf_sysfs_no_uevent_ops,
>>>>>>                                                  kernel_kobj);
>>>>>> @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
>>>>>>     int dma_buf_stats_setup(struct dma_buf *dmabuf)
>>>>>>     {
>>>>>>         struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>> -     int ret;
>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>>>>> +     struct dmabuf_kobj_work *kobj_work;
>>>>>>
>>>>>>         if (!dmabuf || !dmabuf->file)
>>>>>>                 return -EINVAL;
>>>>>> @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
>>>>>>         sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
>>>>>>         sysfs_entry->dmabuf = dmabuf;
>>>>>>
>>>>>> +     sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
>>>>>> +                              GFP_KERNEL);
>>>>>> +     if (!sysfs_metadata) {
>>>>>> +             kfree(sysfs_entry);
>>>>>> +             return -ENOMEM;
>>>>>> +     }
>>>>>> +
>>>>>>         dmabuf->sysfs_entry = sysfs_entry;
>>>>>>
>>>>>> -     /* create the directory for buffer stats */
>>>>>> -     ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
>>>>>> -                                "%lu", file_inode(dmabuf->file)->i_ino);
>>>>>> -     if (ret)
>>>>>> -             goto err_sysfs_dmabuf;
>>>>>> +     sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
>>>>>> +     spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
>>>>>>
>>>>>> -     return 0;
>>>>>> +     dmabuf->sysfs_entry_metadata = sysfs_metadata;
>>>>>>
>>>>>> -err_sysfs_dmabuf:
>>>>>> -     kobject_put(&sysfs_entry->kobj);
>>>>>> -     dmabuf->sysfs_entry = NULL;
>>>>>> -     return ret;
>>>>>> +     kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
>>>>>> +     if (!kobj_work) {
>>>>>> +             kfree(sysfs_entry);
>>>>>> +             kfree(sysfs_metadata);
>>>>>> +             return -ENOMEM;
>>>>>> +     }
>>>>>> +
>>>>>> +     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
>>>>>> +     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>>>>> +     /*
>>>>>> +      * stash the inode number in struct dmabuf_kobj_work since setup
>>>>>> +      * might race with DMA-BUF teardown.
>>>>>> +      */
>>>>>> +     kobj_work->uid = file_inode(dmabuf->file)->i_ino;
>>>>>> +
>>>>>> +     deferred_kobject_create(kobj_work);
>>>>>> +     return 0;
>>>>>>     }
>>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>>>>> index 7ab50076e7a6..0597690023a0 100644
>>>>>> --- a/include/linux/dma-buf.h
>>>>>> +++ b/include/linux/dma-buf.h
>>>>>> @@ -287,6 +287,50 @@ struct dma_buf_ops {
>>>>>>         void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>>>>>>     };
>>>>>>
>>>>>> +#ifdef CONFIG_DMABUF_SYSFS_STATS
>>>>>> +enum sysfs_entry_status {
>>>>>> +     SYSFS_ENTRY_UNINITIALIZED,
>>>>>> +     SYSFS_ENTRY_INIT_IN_PROGRESS,
>>>>>> +     SYSFS_ENTRY_ERROR,
>>>>>> +     SYSFS_ENTRY_INIT_ABORTED,
>>>>>> +     SYSFS_ENTRY_INITIALIZED,
>>>>>> +};
>>>>>> +
>>>>>> +/*
>>>>>> + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
>>>>>> + * DMA-BUF sysfs entry.
>>>>>> + *
>>>>>> + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
>>>>>> + * the sysfs entry has the following path.
>>>>>> + *
>>>>>> + *                   SYSFS_ENTRY_UNINITIALIZED
>>>>>> + *            __________________|____________________
>>>>>> + *           |                                       |
>>>>>> + *     SYSFS_ENTRY_INIT_IN_PROGRESS      SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
>>>>>> + *           |                                                     gets freed
>>>>>> + *           |                                                     before
>>>>>> + *           |                                                     init)
>>>>>> + *   ________|_____________________________________
>>>>>> + *   |                         |                   |
>>>>>> + * SYSFS_ENTRY_INITIALIZED     |       SYSFS_ENTRY_INIT_ABORTED
>>>>>> + *                             |               (DMA-BUF gets freed during kobject
>>>>>> + *                             |               init)
>>>>>> + *                             |
>>>>>> + *                             |
>>>>>> + *                 SYSFS_ENTRY_ERROR
>>>>>> + *                 (error during kobject init)
>>>>>> + *
>>>>>> + * @sysfs_entry_lock: protects access to @status.
>>>>>> + */
>>>>>> +struct dma_buf_sysfs_entry_metadata {
>>>>>> +     enum sysfs_entry_status status;
>>>>>> +     /*
>>>>>> +      * Protects sysfs_entry_metadata->status
>>>>>> +      */
>>>>>> +     spinlock_t sysfs_entry_lock;
>>>>>> +};
>>>>>> +#endif
>>>>>> +
>>>>>>     /**
>>>>>>      * struct dma_buf - shared buffer object
>>>>>>      *
>>>>>> @@ -452,6 +496,8 @@ struct dma_buf {
>>>>>>                 struct kobject kobj;
>>>>>>                 struct dma_buf *dmabuf;
>>>>>>         } *sysfs_entry;
>>>>>> +
>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
>>>>>>     #endif
>>>>>>     };
>>>>>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-10  7:28           ` Christian König
@ 2022-01-11  6:01             ` Hridya Valsaraju
  2022-01-11  8:35               ` Daniel Vetter
  0 siblings, 1 reply; 15+ messages in thread
From: Hridya Valsaraju @ 2022-01-11  6:01 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Daniel Vetter, Greg Kroah-Hartman, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

On Sun, Jan 9, 2022 at 11:28 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 07.01.22 um 22:25 schrieb Hridya Valsaraju:
> > On Fri, Jan 7, 2022 at 10:17 AM Hridya Valsaraju <hridya@google.com> wrote:
> >> On Fri, Jan 7, 2022 at 2:22 AM Christian König <christian.koenig@amd.com> wrote:
> >>> Am 06.01.22 um 20:04 schrieb Hridya Valsaraju:
> >>>> On Thu, Jan 6, 2022 at 12:59 AM Christian König
> >>>> <christian.koenig@amd.com> wrote:
> >>>>> Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
> >>>>>> Recently, we noticed an issue where a process went into direct reclaim
> >>>>>> while holding the kernfs rw semaphore for sysfs in write(exclusive)
> >>>>>> mode. This caused processes who were doing DMA-BUF exports and releases
> >>>>>> to go into uninterruptible sleep since they needed to acquire the same
> >>>>>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> >>>>>> blocking DMA-BUF export/release for an indeterminate amount of time
> >>>>>> while another process is holding the sysfs rw semaphore in exclusive
> >>>>>> mode, this patch moves the per-buffer sysfs file creation/deleteion to
> >>>>>> a kthread.
> >>>>> Well I absolutely don't think that this is justified.
> >>>>>
> >>>>> You adding tons of complexity here just to avoid the overhead of
> >>>>> creating the sysfs files while exporting DMA-bufs which is an operation
> >>>>> which should be done exactly once in the lifecycle for the most common
> >>>>> use case.
> >>>>>
> >>>>> Please explain further why that should be necessary.
> >>>> Hi Christian,
> >>>>
> >>>> We noticed that the issue sometimes causes the exporting process to go
> >>>> to the uninterrupted sleep state for tens of milliseconds which
> >>>> unfortunately leads to user-perceptible UI jank. This is the reason
> >>>> why we are trying to move the sysfs entry creation and deletion out of
> >>>> the DMA-BUF export/release path. I will update the commit message to
> >>>> include the same in the next revision.
> >>> That is still not a justification for this change. The question is why
> >>> do you need that in the first place?
> >>>
> >>> Exporting a DMA-buf should be something would should be very rarely,
> >>> e.g. only at the start of an application.
> >> Hi Christian,
> >>
> >> Yes, that is correct. Noticeable jank caused by the issue is not
> >> present at all times and happens on UI transitions(for example during
> >> app launches and exits) when there are several DMA-BUFs being exported
> >> and released. This is especially true in the case of the camera app
> >> since it exports and releases a relatively larger number of DMA-BUFs
> >> during launch and exit respectively.
>
> Well, that sounds at least better than before.
>
> >>
> >> Regards,
> >> Hridya
> >>
> >>> So this strongly looks like you are trying to optimize for an use case
> >>> where we should probably rethink what userspace is doing here instead.
> > Hello Christian,
> >
> > If you don't mind, could you please elaborate on the above statement?
>
> The purpose of DMA-buf is to share a rather low number of buffers
> between different drivers and/or applications.
>
> For example with triple buffering we have three buffers shared between
> the camera driver and the display driver, same thing for use cases
> between rendering and display.
>
> So even when you have ~100 applications open your should not share more
> than ~300 DMA-buf handles and doing that should absolutely not cause any
> problems like you described above.
>
> Long story short when this affects your user experience then your user
> space is exporting *much* more buffers than expected. Especially since
> the sysfs overhead is completely negligible compared to the overhead
> drivers have when they export buffers.



I do not think we can solve this issue from userspace since the
problem is not due to the overhead of sysfs creation/teardown itself.
The problem is that the duration of time for which the exporting
process would need to sleep waiting for the kernfs_rwsem semaphore is
undefined when the holder of the semaphore goes under direct reclaim.
Fsnotify events for sysfs files are also generated while holding the
same semaphore.

This is also not a problem due to the high number of DMA-BUF
exports during launch time, as even a single export can be delayed for
an unpredictable amount of time. We cannot eliminate DMA-BUF exports
completely during app-launches and we are unfortunately seeing reports
of the exporting process occasionally sleeping long enough to cause
user-visible jankiness :(

We also looked at whether any optimizations are possible from the
kernfs implementation side[1] but the semaphore is used quite extensively
and it looks like the best way forward would be to remove sysfs
creation/teardown from the DMA-BUF export/release path altogether. We
have some ideas on how we can reduce the code-complexity in the
current patch. If we manage to
simplify it considerably, would the approach of offloading sysfs
creation and teardown into a separate thread be acceptable Christian?
Thank you for the guidance!

Regards,
Hridya

[1]: https://lore.kernel.org/all/20211118230008.2679780-1-minchan@kernel.org/



>
> I think in that light adding sysfs was rather questionable to begin
> with, but that change here is a complete no-go if you ask me. You are
> adding complexity to the kernel for something which should probably be
> optimized in userspace.
>
> Regards,
> Christian.
>
> > Thanks in advance for the guidance!
> >
> > Regards,
> > Hridya
> >
> >>> If we would want to go down this route you would need to change all the
> >>> drivers implementing the DMA-buf export functionality to avoid
> >>> uninterruptible sleep as well and that is certainly something I would NAK.
> >>>
> >>> Regards,
> >>> Christian.
> >>>
> >>>> Thanks,
> >>>> Hridya
> >>>>
> >>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> >>>>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> >>>>>> ---
> >>>>>>     drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
> >>>>>>     include/linux/dma-buf.h               |  46 ++++
> >>>>>>     2 files changed, 366 insertions(+), 23 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
> >>>>>> index 053baadcada9..3251fdf2f05f 100644
> >>>>>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> >>>>>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> >>>>>> @@ -7,13 +7,39 @@
> >>>>>>
> >>>>>>     #include <linux/dma-buf.h>
> >>>>>>     #include <linux/dma-resv.h>
> >>>>>> +#include <linux/freezer.h>
> >>>>>>     #include <linux/kobject.h>
> >>>>>> +#include <linux/kthread.h>
> >>>>>> +#include <linux/list.h>
> >>>>>>     #include <linux/printk.h>
> >>>>>> +#include <linux/sched/signal.h>
> >>>>>>     #include <linux/slab.h>
> >>>>>>     #include <linux/sysfs.h>
> >>>>>>
> >>>>>>     #include "dma-buf-sysfs-stats.h"
> >>>>>>
> >>>>>> +struct dmabuf_kobj_work {
> >>>>>> +     struct list_head list;
> >>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
> >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> >>>>>> +     unsigned long uid;
> >>>>>> +};
> >>>>>> +
> >>>>>> +/* Both kobject setup and teardown work gets queued on the list. */
> >>>>>> +static LIST_HEAD(dmabuf_kobj_work_list);
> >>>>>> +
> >>>>>> +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
> >>>>>> +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
> >>>>>> +
> >>>>>> +/*
> >>>>>> + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
> >>>>>> + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
> >>>>>> + */
> >>>>>> +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
> >>>>>> +
> >>>>>> +static struct task_struct *dmabuf_kobject_task;
> >>>>>> +static wait_queue_head_t dmabuf_kobject_waitqueue;
> >>>>>> +
> >>>>>>     #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
> >>>>>>
> >>>>>>     /**
> >>>>>> @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
> >>>>>>         struct dma_buf_stats_attribute *attribute;
> >>>>>>         struct dma_buf_sysfs_entry *sysfs_entry;
> >>>>>>         struct dma_buf *dmabuf;
> >>>>>> +     int ret;
> >>>>>>
> >>>>>>         attribute = to_dma_buf_stats_attr(attr);
> >>>>>>         sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
> >>>>>> +
> >>>>>> +     /*
> >>>>>> +      * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
> >>>>>> +      * being freed while sysfs_entry->dmabuf is being accessed.
> >>>>>> +      */
> >>>>>> +     spin_lock(&dmabuf_sysfs_show_lock);
> >>>>>>         dmabuf = sysfs_entry->dmabuf;
> >>>>>>
> >>>>>> -     if (!dmabuf || !attribute->show)
> >>>>>> +     if (!dmabuf || !attribute->show) {
> >>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> >>>>>>                 return -EIO;
> >>>>>> +     }
> >>>>>>
> >>>>>> -     return attribute->show(dmabuf, attribute, buf);
> >>>>>> +     ret = attribute->show(dmabuf, attribute, buf);
> >>>>>> +     spin_unlock(&dmabuf_sysfs_show_lock);
> >>>>>> +     return ret;
> >>>>>>     }
> >>>>>>
> >>>>>>     static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
> >>>>>> @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
> >>>>>>         .default_groups = dma_buf_stats_default_groups,
> >>>>>>     };
> >>>>>>
> >>>>>> -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> >>>>>> +/* Statistics files do not need to send uevents. */
> >>>>>> +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> >>>>>>     {
> >>>>>> -     struct dma_buf_sysfs_entry *sysfs_entry;
> >>>>>> +     return 0;
> >>>>>> +}
> >>>>>>
> >>>>>> -     sysfs_entry = dmabuf->sysfs_entry;
> >>>>>> -     if (!sysfs_entry)
> >>>>>> -             return;
> >>>>>> +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> >>>>>> +     .filter = dmabuf_sysfs_uevent_filter,
> >>>>>> +};
> >>>>>> +
> >>>>>> +/* setup of sysfs entries done asynchronously in the worker thread. */
> >>>>>> +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
> >>>>>> +{
> >>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
> >>>>>> +                     kobject_work->sysfs_metadata;
> >>>>>> +     bool free_metadata = false;
> >>>>>> +
> >>>>>> +     int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> >>>>>> +                                    "%lu", kobject_work->uid);
> >>>>>> +     if (ret) {
> >>>>>> +             kobject_put(&sysfs_entry->kobj);
> >>>>>> +
> >>>>>> +             spin_lock(&sysfs_metadata->sysfs_entry_lock);
> >>>>>> +             if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
> >>>>>> +                     /*
> >>>>>> +                      * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
> >>>>>> +                      * been freed. At this point, its safe to free the memory for
> >>>>>> +                      * the sysfs_metadata;
> >>>>>> +                      */
> >>>>>> +                     free_metadata = true;
> >>>>>> +             } else {
> >>>>>> +                     /*
> >>>>>> +                      * The DMA-BUF has not yet been freed, set the status to
> >>>>>> +                      * sysfs_entry_error so that when the DMA-BUF gets
> >>>>>> +                      * freed, we know there is no need to teardown the sysfs
> >>>>>> +                      * entry.
> >>>>>> +                      */
> >>>>>> +                     sysfs_metadata->status = SYSFS_ENTRY_ERROR;
> >>>>>> +             }
> >>>>>> +             goto unlock;
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     /*
> >>>>>> +      * If the DMA-BUF has not yet been released, status would still be
> >>>>>> +      * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
> >>>>>> +      */
> >>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> >>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> >>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
> >>>>>> +             goto unlock;
> >>>>>> +     }
> >>>>>>
> >>>>>> +     /*
> >>>>>> +      * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
> >>>>>> +      * that the DMA-BUF has already been freed. Hence, we cleanup the
> >>>>>> +      * sysfs_entry and its metadata since neither of them are needed
> >>>>>> +      * anymore.
> >>>>>> +      */
> >>>>>> +     free_metadata = true;
> >>>>>>         kobject_del(&sysfs_entry->kobj);
> >>>>>>         kobject_put(&sysfs_entry->kobj);
> >>>>>> +
> >>>>>> +unlock:
> >>>>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> >>>>>> +     if (free_metadata) {
> >>>>>> +             kfree(kobject_work->sysfs_metadata);
> >>>>>> +             kobject_work->sysfs_metadata = NULL;
> >>>>>> +     }
> >>>>>>     }
> >>>>>>
> >>>>>> +/* teardown of sysfs entries done asynchronously in the worker thread. */
> >>>>>> +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
> >>>>>> +{
> >>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> >>>>>>
> >>>>>> -/* Statistics files do not need to send uevents. */
> >>>>>> -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> >>>>>> +     kobject_del(&sysfs_entry->kobj);
> >>>>>> +     kobject_put(&sysfs_entry->kobj);
> >>>>>> +
> >>>>>> +     kfree(kobject_work->sysfs_metadata);
> >>>>>> +     kobject_work->sysfs_metadata = NULL;
> >>>>>> +}
> >>>>>> +
> >>>>>> +/* do setup or teardown of sysfs entries as required */
> >>>>>> +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
> >>>>>>     {
> >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> >>>>>> +     bool setup_needed = false;
> >>>>>> +     bool teardown_needed = false;
> >>>>>> +
> >>>>>> +     sysfs_metadata = kobject_work->sysfs_metadata;
> >>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> >>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
> >>>>>> +             setup_needed = true;
> >>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
> >>>>>> +     } else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> >>>>>> +             teardown_needed = true;
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     /*
> >>>>>> +      * It is ok to release the sysfs_entry_lock here.
> >>>>>> +      *
> >>>>>> +      * If setup_needed is true, we check the status again after the kobject
> >>>>>> +      * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
> >>>>>> +      * and if so teardown the kobject.
> >>>>>> +      *
> >>>>>> +      * If teardown_needed is true, there are no more changes expected to the
> >>>>>> +      * status.
> >>>>>> +      *
> >>>>>> +      * If neither setup_needed nor teardown needed are true, it
> >>>>>> +      * means the DMA-BUF was freed before we got around to setting up the
> >>>>>> +      * sysfs entry and hence we just need to release the metadata and
> >>>>>> +      * return.
> >>>>>> +      */
> >>>>>> +     spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
> >>>>>> +
> >>>>>> +     if (setup_needed)
> >>>>>> +             dma_buf_sysfs_stats_setup_work(kobject_work);
> >>>>>> +     else if (teardown_needed)
> >>>>>> +             dma_buf_sysfs_stats_teardown_work(kobject_work);
> >>>>>> +     else
> >>>>>> +             kfree(kobject_work->sysfs_metadata);
> >>>>>> +
> >>>>>> +     kfree(kobject_work);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static struct dmabuf_kobj_work *get_next_kobj_work(void)
> >>>>>> +{
> >>>>>> +     struct dmabuf_kobj_work *kobject_work;
> >>>>>> +
> >>>>>> +     spin_lock(&dmabuf_kobj_list_lock);
> >>>>>> +     kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
> >>>>>> +                                             struct dmabuf_kobj_work, list);
> >>>>>> +     if (kobject_work)
> >>>>>> +             list_del(&kobject_work->list);
> >>>>>> +     spin_unlock(&dmabuf_kobj_list_lock);
> >>>>>> +     return kobject_work;
> >>>>>> +}
> >>>>>> +
> >>>>>> +static int kobject_work_thread(void *data)
> >>>>>> +{
> >>>>>> +     struct dmabuf_kobj_work *kobject_work;
> >>>>>> +
> >>>>>> +     while (1) {
> >>>>>> +             wait_event_freezable(dmabuf_kobject_waitqueue,
> >>>>>> +                                  (kobject_work = get_next_kobj_work()));
> >>>>>> +             do_kobject_work(kobject_work);
> >>>>>> +     }
> >>>>>> +
> >>>>>>         return 0;
> >>>>>>     }
> >>>>>>
> >>>>>> -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> >>>>>> -     .filter = dmabuf_sysfs_uevent_filter,
> >>>>>> -};
> >>>>>> +static int kobject_worklist_init(void)
> >>>>>> +{
> >>>>>> +     init_waitqueue_head(&dmabuf_kobject_waitqueue);
> >>>>>> +     dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
> >>>>>> +                                       "%s", "dmabuf-kobject-worker");
> >>>>>> +     if (IS_ERR(dmabuf_kobject_task)) {
> >>>>>> +             pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
> >>>>>> +             return PTR_ERR(dmabuf_kobject_task);
> >>>>>> +     }
> >>>>>> +     sched_set_normal(dmabuf_kobject_task, MAX_NICE);
> >>>>>> +
> >>>>>> +     return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
> >>>>>> +{
> >>>>>> +     INIT_LIST_HEAD(&kobject_work->list);
> >>>>>> +
> >>>>>> +     spin_lock(&dmabuf_kobj_list_lock);
> >>>>>> +
> >>>>>> +     list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
> >>>>>> +
> >>>>>> +     spin_unlock(&dmabuf_kobj_list_lock);
> >>>>>> +
> >>>>>> +     wake_up(&dmabuf_kobject_waitqueue);
> >>>>>> +}
> >>>>>> +
> >>>>>> +
> >>>>>> +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> >>>>>> +{
> >>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
> >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> >>>>>> +     struct dmabuf_kobj_work *kobj_work;
> >>>>>> +
> >>>>>> +     sysfs_entry = dmabuf->sysfs_entry;
> >>>>>> +     if (!sysfs_entry)
> >>>>>> +             return;
> >>>>>> +
> >>>>>> +     sysfs_metadata = dmabuf->sysfs_entry_metadata;
> >>>>>> +     if (!sysfs_metadata)
> >>>>>> +             return;
> >>>>>> +
> >>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> >>>>>> +
> >>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
> >>>>>> +         sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> >>>>>> +             /*
> >>>>>> +              * The sysfs entry for this buffer has not yet been initialized,
> >>>>>> +              * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
> >>>>>> +              * initialization.
> >>>>>> +              */
> >>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
> >>>>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> >>>>>> +
> >>>>>> +             /*
> >>>>>> +              * In case kobject initialization completes right as we release
> >>>>>> +              * the sysfs_entry_lock, disable show() for the sysfs entry by
> >>>>>> +              * setting sysfs_entry->dmabuf to NULL to prevent a race.
> >>>>>> +              */
> >>>>>> +             spin_lock(&dmabuf_sysfs_show_lock);
> >>>>>> +             sysfs_entry->dmabuf = NULL;
> >>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> >>>>>> +
> >>>>>> +             return;
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> >>>>>> +             /*
> >>>>>> +              * queue teardown work only if sysfs_entry is fully inititalized.
> >>>>>> +              * It is ok to release the sysfs_entry_lock here since the
> >>>>>> +              * status can no longer change.
> >>>>>> +              */
> >>>>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> >>>>>> +
> >>>>>> +             /*
> >>>>>> +              * Meanwhile disable show() for the sysfs entry to avoid a race
> >>>>>> +              * between teardown and show().
> >>>>>> +              */
> >>>>>> +             spin_lock(&dmabuf_sysfs_show_lock);
> >>>>>> +             sysfs_entry->dmabuf = NULL;
> >>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> >>>>>> +
> >>>>>> +             kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> >>>>>> +             if (!kobj_work) {
> >>>>>> +                     /* do the teardown immediately. */
> >>>>>> +                     kobject_del(&sysfs_entry->kobj);
> >>>>>> +                     kobject_put(&sysfs_entry->kobj);
> >>>>>> +                     kfree(sysfs_metadata);
> >>>>>> +             } else {
> >>>>>> +                     /* queue teardown work. */
> >>>>>> +                     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> >>>>>> +                     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> >>>>>> +                     deferred_kobject_create(kobj_work);
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             return;
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     /*
> >>>>>> +      * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
> >>>>>> +      * metadata.
> >>>>>> +      */
> >>>>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> >>>>>> +     kfree(dmabuf->sysfs_entry_metadata);
> >>>>>> +     dmabuf->sysfs_entry_metadata = NULL;
> >>>>>> +}
> >>>>>>
> >>>>>>     static struct kset *dma_buf_stats_kset;
> >>>>>>     static struct kset *dma_buf_per_buffer_stats_kset;
> >>>>>>     int dma_buf_init_sysfs_statistics(void)
> >>>>>>     {
> >>>>>> +     int ret;
> >>>>>> +
> >>>>>> +     ret = kobject_worklist_init();
> >>>>>> +     if (ret)
> >>>>>> +             return ret;
> >>>>>> +
> >>>>>>         dma_buf_stats_kset = kset_create_and_add("dmabuf",
> >>>>>>                                                  &dmabuf_sysfs_no_uevent_ops,
> >>>>>>                                                  kernel_kobj);
> >>>>>> @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
> >>>>>>     int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >>>>>>     {
> >>>>>>         struct dma_buf_sysfs_entry *sysfs_entry;
> >>>>>> -     int ret;
> >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> >>>>>> +     struct dmabuf_kobj_work *kobj_work;
> >>>>>>
> >>>>>>         if (!dmabuf || !dmabuf->file)
> >>>>>>                 return -EINVAL;
> >>>>>> @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
> >>>>>>         sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
> >>>>>>         sysfs_entry->dmabuf = dmabuf;
> >>>>>>
> >>>>>> +     sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
> >>>>>> +                              GFP_KERNEL);
> >>>>>> +     if (!sysfs_metadata) {
> >>>>>> +             kfree(sysfs_entry);
> >>>>>> +             return -ENOMEM;
> >>>>>> +     }
> >>>>>> +
> >>>>>>         dmabuf->sysfs_entry = sysfs_entry;
> >>>>>>
> >>>>>> -     /* create the directory for buffer stats */
> >>>>>> -     ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> >>>>>> -                                "%lu", file_inode(dmabuf->file)->i_ino);
> >>>>>> -     if (ret)
> >>>>>> -             goto err_sysfs_dmabuf;
> >>>>>> +     sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
> >>>>>> +     spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
> >>>>>>
> >>>>>> -     return 0;
> >>>>>> +     dmabuf->sysfs_entry_metadata = sysfs_metadata;
> >>>>>>
> >>>>>> -err_sysfs_dmabuf:
> >>>>>> -     kobject_put(&sysfs_entry->kobj);
> >>>>>> -     dmabuf->sysfs_entry = NULL;
> >>>>>> -     return ret;
> >>>>>> +     kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> >>>>>> +     if (!kobj_work) {
> >>>>>> +             kfree(sysfs_entry);
> >>>>>> +             kfree(sysfs_metadata);
> >>>>>> +             return -ENOMEM;
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> >>>>>> +     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> >>>>>> +     /*
> >>>>>> +      * stash the inode number in struct dmabuf_kobj_work since setup
> >>>>>> +      * might race with DMA-BUF teardown.
> >>>>>> +      */
> >>>>>> +     kobj_work->uid = file_inode(dmabuf->file)->i_ino;
> >>>>>> +
> >>>>>> +     deferred_kobject_create(kobj_work);
> >>>>>> +     return 0;
> >>>>>>     }
> >>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> >>>>>> index 7ab50076e7a6..0597690023a0 100644
> >>>>>> --- a/include/linux/dma-buf.h
> >>>>>> +++ b/include/linux/dma-buf.h
> >>>>>> @@ -287,6 +287,50 @@ struct dma_buf_ops {
> >>>>>>         void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> >>>>>>     };
> >>>>>>
> >>>>>> +#ifdef CONFIG_DMABUF_SYSFS_STATS
> >>>>>> +enum sysfs_entry_status {
> >>>>>> +     SYSFS_ENTRY_UNINITIALIZED,
> >>>>>> +     SYSFS_ENTRY_INIT_IN_PROGRESS,
> >>>>>> +     SYSFS_ENTRY_ERROR,
> >>>>>> +     SYSFS_ENTRY_INIT_ABORTED,
> >>>>>> +     SYSFS_ENTRY_INITIALIZED,
> >>>>>> +};
> >>>>>> +
> >>>>>> +/*
> >>>>>> + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
> >>>>>> + * DMA-BUF sysfs entry.
> >>>>>> + *
> >>>>>> + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
> >>>>>> + * the sysfs entry has the following path.
> >>>>>> + *
> >>>>>> + *                   SYSFS_ENTRY_UNINITIALIZED
> >>>>>> + *            __________________|____________________
> >>>>>> + *           |                                       |
> >>>>>> + *     SYSFS_ENTRY_INIT_IN_PROGRESS      SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
> >>>>>> + *           |                                                     gets freed
> >>>>>> + *           |                                                     before
> >>>>>> + *           |                                                     init)
> >>>>>> + *   ________|_____________________________________
> >>>>>> + *   |                         |                   |
> >>>>>> + * SYSFS_ENTRY_INITIALIZED     |       SYSFS_ENTRY_INIT_ABORTED
> >>>>>> + *                             |               (DMA-BUF gets freed during kobject
> >>>>>> + *                             |               init)
> >>>>>> + *                             |
> >>>>>> + *                             |
> >>>>>> + *                 SYSFS_ENTRY_ERROR
> >>>>>> + *                 (error during kobject init)
> >>>>>> + *
> >>>>>> + * @sysfs_entry_lock: protects access to @status.
> >>>>>> + */
> >>>>>> +struct dma_buf_sysfs_entry_metadata {
> >>>>>> +     enum sysfs_entry_status status;
> >>>>>> +     /*
> >>>>>> +      * Protects sysfs_entry_metadata->status
> >>>>>> +      */
> >>>>>> +     spinlock_t sysfs_entry_lock;
> >>>>>> +};
> >>>>>> +#endif
> >>>>>> +
> >>>>>>     /**
> >>>>>>      * struct dma_buf - shared buffer object
> >>>>>>      *
> >>>>>> @@ -452,6 +496,8 @@ struct dma_buf {
> >>>>>>                 struct kobject kobj;
> >>>>>>                 struct dma_buf *dmabuf;
> >>>>>>         } *sysfs_entry;
> >>>>>> +
> >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
> >>>>>>     #endif
> >>>>>>     };
> >>>>>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-11  6:01             ` Hridya Valsaraju
@ 2022-01-11  8:35               ` Daniel Vetter
  2022-01-11 10:58                 ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Vetter @ 2022-01-11  8:35 UTC (permalink / raw)
  To: Hridya Valsaraju
  Cc: Christian König, Sumit Semwal, Greg Kroah-Hartman,
	linux-media, dri-devel, linaro-mm-sig, linux-kernel, john.stultz,
	surenb, kaleshsingh, tjmercier, keescook

On Tue, Jan 11, 2022 at 7:02 AM Hridya Valsaraju <hridya@google.com> wrote:
>
> On Sun, Jan 9, 2022 at 11:28 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 07.01.22 um 22:25 schrieb Hridya Valsaraju:
> > > On Fri, Jan 7, 2022 at 10:17 AM Hridya Valsaraju <hridya@google.com> wrote:
> > >> On Fri, Jan 7, 2022 at 2:22 AM Christian König <christian.koenig@amd.com> wrote:
> > >>> Am 06.01.22 um 20:04 schrieb Hridya Valsaraju:
> > >>>> On Thu, Jan 6, 2022 at 12:59 AM Christian König
> > >>>> <christian.koenig@amd.com> wrote:
> > >>>>> Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
> > >>>>>> Recently, we noticed an issue where a process went into direct reclaim
> > >>>>>> while holding the kernfs rw semaphore for sysfs in write(exclusive)
> > >>>>>> mode. This caused processes who were doing DMA-BUF exports and releases
> > >>>>>> to go into uninterruptible sleep since they needed to acquire the same
> > >>>>>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
> > >>>>>> blocking DMA-BUF export/release for an indeterminate amount of time
> > >>>>>> while another process is holding the sysfs rw semaphore in exclusive
> > >>>>>> mode, this patch moves the per-buffer sysfs file creation/deleteion to
> > >>>>>> a kthread.
> > >>>>> Well I absolutely don't think that this is justified.
> > >>>>>
> > >>>>> You adding tons of complexity here just to avoid the overhead of
> > >>>>> creating the sysfs files while exporting DMA-bufs which is an operation
> > >>>>> which should be done exactly once in the lifecycle for the most common
> > >>>>> use case.
> > >>>>>
> > >>>>> Please explain further why that should be necessary.
> > >>>> Hi Christian,
> > >>>>
> > >>>> We noticed that the issue sometimes causes the exporting process to go
> > >>>> to the uninterrupted sleep state for tens of milliseconds which
> > >>>> unfortunately leads to user-perceptible UI jank. This is the reason
> > >>>> why we are trying to move the sysfs entry creation and deletion out of
> > >>>> the DMA-BUF export/release path. I will update the commit message to
> > >>>> include the same in the next revision.
> > >>> That is still not a justification for this change. The question is why
> > >>> do you need that in the first place?
> > >>>
> > >>> Exporting a DMA-buf should be something would should be very rarely,
> > >>> e.g. only at the start of an application.
> > >> Hi Christian,
> > >>
> > >> Yes, that is correct. Noticeable jank caused by the issue is not
> > >> present at all times and happens on UI transitions(for example during
> > >> app launches and exits) when there are several DMA-BUFs being exported
> > >> and released. This is especially true in the case of the camera app
> > >> since it exports and releases a relatively larger number of DMA-BUFs
> > >> during launch and exit respectively.
> >
> > Well, that sounds at least better than before.
> >
> > >>
> > >> Regards,
> > >> Hridya
> > >>
> > >>> So this strongly looks like you are trying to optimize for an use case
> > >>> where we should probably rethink what userspace is doing here instead.
> > > Hello Christian,
> > >
> > > If you don't mind, could you please elaborate on the above statement?
> >
> > The purpose of DMA-buf is to share a rather low number of buffers
> > between different drivers and/or applications.
> >
> > For example with triple buffering we have three buffers shared between
> > the camera driver and the display driver, same thing for use cases
> > between rendering and display.
> >
> > So even when you have ~100 applications open your should not share more
> > than ~300 DMA-buf handles and doing that should absolutely not cause any
> > problems like you described above.
> >
> > Long story short when this affects your user experience then your user
> > space is exporting *much* more buffers than expected. Especially since
> > the sysfs overhead is completely negligible compared to the overhead
> > drivers have when they export buffers.
>
>
>
> I do not think we can solve this issue from userspace since the
> problem is not due to the overhead of sysfs creation/teardown itself.
> The problem is that the duration of time for which the exporting
> process would need to sleep waiting for the kernfs_rwsem semaphore is
> undefined when the holder of the semaphore goes under direct reclaim.
> Fsnotify events for sysfs files are also generated while holding the
> same semaphore.
>
> This is also not a problem due to the high number of DMA-BUF
> exports during launch time, as even a single export can be delayed for
> an unpredictable amount of time. We cannot eliminate DMA-BUF exports
> completely during app-launches and we are unfortunately seeing reports
> of the exporting process occasionally sleeping long enough to cause
> user-visible jankiness :(
>
> We also looked at whether any optimizations are possible from the
> kernfs implementation side[1] but the semaphore is used quite extensively
> and it looks like the best way forward would be to remove sysfs
> creation/teardown from the DMA-BUF export/release path altogether. We
> have some ideas on how we can reduce the code-complexity in the
> current patch. If we manage to
> simplify it considerably, would the approach of offloading sysfs
> creation and teardown into a separate thread be acceptable Christian?
> Thank you for the guidance!

One worry I have here with doing this async that now userspace might
have a dma-buf, but the sysfs entry does not yet exist, or the dma-buf
is gone, but the sysfs entry still exists. That's a bit awkward wrt
semantics.

Also I'm pretty sure that if we can hit this, then other subsystems
using kernfs have similar problems, so trying to fix this in kernfs
with slightly more fine-grained locking sounds like a much more solid
approach. The linked patch talks about how the big delays happen due
to direct reclaim, and that might be limited to specific code paths
that we need to look at? As-is this feels a bit much like papering
over kernfs issues in hackish ways in sysfs users, instead of tackling
the problem at its root.
-Daniel

> Regards,
> Hridya
>
> [1]: https://lore.kernel.org/all/20211118230008.2679780-1-minchan@kernel.org/
>
>
>
> >
> > I think in that light adding sysfs was rather questionable to begin
> > with, but that change here is a complete no-go if you ask me. You are
> > adding complexity to the kernel for something which should probably be
> > optimized in userspace.
> >
> > Regards,
> > Christian.
> >
> > > Thanks in advance for the guidance!
> > >
> > > Regards,
> > > Hridya
> > >
> > >>> If we would want to go down this route you would need to change all the
> > >>> drivers implementing the DMA-buf export functionality to avoid
> > >>> uninterruptible sleep as well and that is certainly something I would NAK.
> > >>>
> > >>> Regards,
> > >>> Christian.
> > >>>
> > >>>> Thanks,
> > >>>> Hridya
> > >>>>
> > >>>>
> > >>>>> Regards,
> > >>>>> Christian.
> > >>>>>
> > >>>>>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
> > >>>>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
> > >>>>>> ---
> > >>>>>>     drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
> > >>>>>>     include/linux/dma-buf.h               |  46 ++++
> > >>>>>>     2 files changed, 366 insertions(+), 23 deletions(-)
> > >>>>>>
> > >>>>>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > >>>>>> index 053baadcada9..3251fdf2f05f 100644
> > >>>>>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
> > >>>>>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
> > >>>>>> @@ -7,13 +7,39 @@
> > >>>>>>
> > >>>>>>     #include <linux/dma-buf.h>
> > >>>>>>     #include <linux/dma-resv.h>
> > >>>>>> +#include <linux/freezer.h>
> > >>>>>>     #include <linux/kobject.h>
> > >>>>>> +#include <linux/kthread.h>
> > >>>>>> +#include <linux/list.h>
> > >>>>>>     #include <linux/printk.h>
> > >>>>>> +#include <linux/sched/signal.h>
> > >>>>>>     #include <linux/slab.h>
> > >>>>>>     #include <linux/sysfs.h>
> > >>>>>>
> > >>>>>>     #include "dma-buf-sysfs-stats.h"
> > >>>>>>
> > >>>>>> +struct dmabuf_kobj_work {
> > >>>>>> +     struct list_head list;
> > >>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
> > >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > >>>>>> +     unsigned long uid;
> > >>>>>> +};
> > >>>>>> +
> > >>>>>> +/* Both kobject setup and teardown work gets queued on the list. */
> > >>>>>> +static LIST_HEAD(dmabuf_kobj_work_list);
> > >>>>>> +
> > >>>>>> +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
> > >>>>>> +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
> > >>>>>> +
> > >>>>>> +/*
> > >>>>>> + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
> > >>>>>> + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
> > >>>>>> + */
> > >>>>>> +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
> > >>>>>> +
> > >>>>>> +static struct task_struct *dmabuf_kobject_task;
> > >>>>>> +static wait_queue_head_t dmabuf_kobject_waitqueue;
> > >>>>>> +
> > >>>>>>     #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
> > >>>>>>
> > >>>>>>     /**
> > >>>>>> @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
> > >>>>>>         struct dma_buf_stats_attribute *attribute;
> > >>>>>>         struct dma_buf_sysfs_entry *sysfs_entry;
> > >>>>>>         struct dma_buf *dmabuf;
> > >>>>>> +     int ret;
> > >>>>>>
> > >>>>>>         attribute = to_dma_buf_stats_attr(attr);
> > >>>>>>         sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
> > >>>>>> +
> > >>>>>> +     /*
> > >>>>>> +      * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
> > >>>>>> +      * being freed while sysfs_entry->dmabuf is being accessed.
> > >>>>>> +      */
> > >>>>>> +     spin_lock(&dmabuf_sysfs_show_lock);
> > >>>>>>         dmabuf = sysfs_entry->dmabuf;
> > >>>>>>
> > >>>>>> -     if (!dmabuf || !attribute->show)
> > >>>>>> +     if (!dmabuf || !attribute->show) {
> > >>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> > >>>>>>                 return -EIO;
> > >>>>>> +     }
> > >>>>>>
> > >>>>>> -     return attribute->show(dmabuf, attribute, buf);
> > >>>>>> +     ret = attribute->show(dmabuf, attribute, buf);
> > >>>>>> +     spin_unlock(&dmabuf_sysfs_show_lock);
> > >>>>>> +     return ret;
> > >>>>>>     }
> > >>>>>>
> > >>>>>>     static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
> > >>>>>> @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
> > >>>>>>         .default_groups = dma_buf_stats_default_groups,
> > >>>>>>     };
> > >>>>>>
> > >>>>>> -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> > >>>>>> +/* Statistics files do not need to send uevents. */
> > >>>>>> +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> > >>>>>>     {
> > >>>>>> -     struct dma_buf_sysfs_entry *sysfs_entry;
> > >>>>>> +     return 0;
> > >>>>>> +}
> > >>>>>>
> > >>>>>> -     sysfs_entry = dmabuf->sysfs_entry;
> > >>>>>> -     if (!sysfs_entry)
> > >>>>>> -             return;
> > >>>>>> +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> > >>>>>> +     .filter = dmabuf_sysfs_uevent_filter,
> > >>>>>> +};
> > >>>>>> +
> > >>>>>> +/* setup of sysfs entries done asynchronously in the worker thread. */
> > >>>>>> +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
> > >>>>>> +{
> > >>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> > >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
> > >>>>>> +                     kobject_work->sysfs_metadata;
> > >>>>>> +     bool free_metadata = false;
> > >>>>>> +
> > >>>>>> +     int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> > >>>>>> +                                    "%lu", kobject_work->uid);
> > >>>>>> +     if (ret) {
> > >>>>>> +             kobject_put(&sysfs_entry->kobj);
> > >>>>>> +
> > >>>>>> +             spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +             if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
> > >>>>>> +                     /*
> > >>>>>> +                      * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
> > >>>>>> +                      * been freed. At this point, its safe to free the memory for
> > >>>>>> +                      * the sysfs_metadata;
> > >>>>>> +                      */
> > >>>>>> +                     free_metadata = true;
> > >>>>>> +             } else {
> > >>>>>> +                     /*
> > >>>>>> +                      * The DMA-BUF has not yet been freed, set the status to
> > >>>>>> +                      * sysfs_entry_error so that when the DMA-BUF gets
> > >>>>>> +                      * freed, we know there is no need to teardown the sysfs
> > >>>>>> +                      * entry.
> > >>>>>> +                      */
> > >>>>>> +                     sysfs_metadata->status = SYSFS_ENTRY_ERROR;
> > >>>>>> +             }
> > >>>>>> +             goto unlock;
> > >>>>>> +     }
> > >>>>>> +
> > >>>>>> +     /*
> > >>>>>> +      * If the DMA-BUF has not yet been released, status would still be
> > >>>>>> +      * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
> > >>>>>> +      */
> > >>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> > >>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
> > >>>>>> +             goto unlock;
> > >>>>>> +     }
> > >>>>>>
> > >>>>>> +     /*
> > >>>>>> +      * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
> > >>>>>> +      * that the DMA-BUF has already been freed. Hence, we cleanup the
> > >>>>>> +      * sysfs_entry and its metadata since neither of them are needed
> > >>>>>> +      * anymore.
> > >>>>>> +      */
> > >>>>>> +     free_metadata = true;
> > >>>>>>         kobject_del(&sysfs_entry->kobj);
> > >>>>>>         kobject_put(&sysfs_entry->kobj);
> > >>>>>> +
> > >>>>>> +unlock:
> > >>>>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +     if (free_metadata) {
> > >>>>>> +             kfree(kobject_work->sysfs_metadata);
> > >>>>>> +             kobject_work->sysfs_metadata = NULL;
> > >>>>>> +     }
> > >>>>>>     }
> > >>>>>>
> > >>>>>> +/* teardown of sysfs entries done asynchronously in the worker thread. */
> > >>>>>> +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
> > >>>>>> +{
> > >>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
> > >>>>>>
> > >>>>>> -/* Statistics files do not need to send uevents. */
> > >>>>>> -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
> > >>>>>> +     kobject_del(&sysfs_entry->kobj);
> > >>>>>> +     kobject_put(&sysfs_entry->kobj);
> > >>>>>> +
> > >>>>>> +     kfree(kobject_work->sysfs_metadata);
> > >>>>>> +     kobject_work->sysfs_metadata = NULL;
> > >>>>>> +}
> > >>>>>> +
> > >>>>>> +/* do setup or teardown of sysfs entries as required */
> > >>>>>> +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
> > >>>>>>     {
> > >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > >>>>>> +     bool setup_needed = false;
> > >>>>>> +     bool teardown_needed = false;
> > >>>>>> +
> > >>>>>> +     sysfs_metadata = kobject_work->sysfs_metadata;
> > >>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
> > >>>>>> +             setup_needed = true;
> > >>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
> > >>>>>> +     } else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> > >>>>>> +             teardown_needed = true;
> > >>>>>> +     }
> > >>>>>> +
> > >>>>>> +     /*
> > >>>>>> +      * It is ok to release the sysfs_entry_lock here.
> > >>>>>> +      *
> > >>>>>> +      * If setup_needed is true, we check the status again after the kobject
> > >>>>>> +      * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
> > >>>>>> +      * and if so teardown the kobject.
> > >>>>>> +      *
> > >>>>>> +      * If teardown_needed is true, there are no more changes expected to the
> > >>>>>> +      * status.
> > >>>>>> +      *
> > >>>>>> +      * If neither setup_needed nor teardown needed are true, it
> > >>>>>> +      * means the DMA-BUF was freed before we got around to setting up the
> > >>>>>> +      * sysfs entry and hence we just need to release the metadata and
> > >>>>>> +      * return.
> > >>>>>> +      */
> > >>>>>> +     spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +
> > >>>>>> +     if (setup_needed)
> > >>>>>> +             dma_buf_sysfs_stats_setup_work(kobject_work);
> > >>>>>> +     else if (teardown_needed)
> > >>>>>> +             dma_buf_sysfs_stats_teardown_work(kobject_work);
> > >>>>>> +     else
> > >>>>>> +             kfree(kobject_work->sysfs_metadata);
> > >>>>>> +
> > >>>>>> +     kfree(kobject_work);
> > >>>>>> +}
> > >>>>>> +
> > >>>>>> +static struct dmabuf_kobj_work *get_next_kobj_work(void)
> > >>>>>> +{
> > >>>>>> +     struct dmabuf_kobj_work *kobject_work;
> > >>>>>> +
> > >>>>>> +     spin_lock(&dmabuf_kobj_list_lock);
> > >>>>>> +     kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
> > >>>>>> +                                             struct dmabuf_kobj_work, list);
> > >>>>>> +     if (kobject_work)
> > >>>>>> +             list_del(&kobject_work->list);
> > >>>>>> +     spin_unlock(&dmabuf_kobj_list_lock);
> > >>>>>> +     return kobject_work;
> > >>>>>> +}
> > >>>>>> +
> > >>>>>> +static int kobject_work_thread(void *data)
> > >>>>>> +{
> > >>>>>> +     struct dmabuf_kobj_work *kobject_work;
> > >>>>>> +
> > >>>>>> +     while (1) {
> > >>>>>> +             wait_event_freezable(dmabuf_kobject_waitqueue,
> > >>>>>> +                                  (kobject_work = get_next_kobj_work()));
> > >>>>>> +             do_kobject_work(kobject_work);
> > >>>>>> +     }
> > >>>>>> +
> > >>>>>>         return 0;
> > >>>>>>     }
> > >>>>>>
> > >>>>>> -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
> > >>>>>> -     .filter = dmabuf_sysfs_uevent_filter,
> > >>>>>> -};
> > >>>>>> +static int kobject_worklist_init(void)
> > >>>>>> +{
> > >>>>>> +     init_waitqueue_head(&dmabuf_kobject_waitqueue);
> > >>>>>> +     dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
> > >>>>>> +                                       "%s", "dmabuf-kobject-worker");
> > >>>>>> +     if (IS_ERR(dmabuf_kobject_task)) {
> > >>>>>> +             pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
> > >>>>>> +             return PTR_ERR(dmabuf_kobject_task);
> > >>>>>> +     }
> > >>>>>> +     sched_set_normal(dmabuf_kobject_task, MAX_NICE);
> > >>>>>> +
> > >>>>>> +     return 0;
> > >>>>>> +}
> > >>>>>> +
> > >>>>>> +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
> > >>>>>> +{
> > >>>>>> +     INIT_LIST_HEAD(&kobject_work->list);
> > >>>>>> +
> > >>>>>> +     spin_lock(&dmabuf_kobj_list_lock);
> > >>>>>> +
> > >>>>>> +     list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
> > >>>>>> +
> > >>>>>> +     spin_unlock(&dmabuf_kobj_list_lock);
> > >>>>>> +
> > >>>>>> +     wake_up(&dmabuf_kobject_waitqueue);
> > >>>>>> +}
> > >>>>>> +
> > >>>>>> +
> > >>>>>> +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
> > >>>>>> +{
> > >>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
> > >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > >>>>>> +     struct dmabuf_kobj_work *kobj_work;
> > >>>>>> +
> > >>>>>> +     sysfs_entry = dmabuf->sysfs_entry;
> > >>>>>> +     if (!sysfs_entry)
> > >>>>>> +             return;
> > >>>>>> +
> > >>>>>> +     sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > >>>>>> +     if (!sysfs_metadata)
> > >>>>>> +             return;
> > >>>>>> +
> > >>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +
> > >>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
> > >>>>>> +         sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
> > >>>>>> +             /*
> > >>>>>> +              * The sysfs entry for this buffer has not yet been initialized,
> > >>>>>> +              * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
> > >>>>>> +              * initialization.
> > >>>>>> +              */
> > >>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
> > >>>>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +
> > >>>>>> +             /*
> > >>>>>> +              * In case kobject initialization completes right as we release
> > >>>>>> +              * the sysfs_entry_lock, disable show() for the sysfs entry by
> > >>>>>> +              * setting sysfs_entry->dmabuf to NULL to prevent a race.
> > >>>>>> +              */
> > >>>>>> +             spin_lock(&dmabuf_sysfs_show_lock);
> > >>>>>> +             sysfs_entry->dmabuf = NULL;
> > >>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> > >>>>>> +
> > >>>>>> +             return;
> > >>>>>> +     }
> > >>>>>> +
> > >>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
> > >>>>>> +             /*
> > >>>>>> +              * queue teardown work only if sysfs_entry is fully inititalized.
> > >>>>>> +              * It is ok to release the sysfs_entry_lock here since the
> > >>>>>> +              * status can no longer change.
> > >>>>>> +              */
> > >>>>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +
> > >>>>>> +             /*
> > >>>>>> +              * Meanwhile disable show() for the sysfs entry to avoid a race
> > >>>>>> +              * between teardown and show().
> > >>>>>> +              */
> > >>>>>> +             spin_lock(&dmabuf_sysfs_show_lock);
> > >>>>>> +             sysfs_entry->dmabuf = NULL;
> > >>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
> > >>>>>> +
> > >>>>>> +             kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> > >>>>>> +             if (!kobj_work) {
> > >>>>>> +                     /* do the teardown immediately. */
> > >>>>>> +                     kobject_del(&sysfs_entry->kobj);
> > >>>>>> +                     kobject_put(&sysfs_entry->kobj);
> > >>>>>> +                     kfree(sysfs_metadata);
> > >>>>>> +             } else {
> > >>>>>> +                     /* queue teardown work. */
> > >>>>>> +                     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> > >>>>>> +                     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > >>>>>> +                     deferred_kobject_create(kobj_work);
> > >>>>>> +             }
> > >>>>>> +
> > >>>>>> +             return;
> > >>>>>> +     }
> > >>>>>> +
> > >>>>>> +     /*
> > >>>>>> +      * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
> > >>>>>> +      * metadata.
> > >>>>>> +      */
> > >>>>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>> +     kfree(dmabuf->sysfs_entry_metadata);
> > >>>>>> +     dmabuf->sysfs_entry_metadata = NULL;
> > >>>>>> +}
> > >>>>>>
> > >>>>>>     static struct kset *dma_buf_stats_kset;
> > >>>>>>     static struct kset *dma_buf_per_buffer_stats_kset;
> > >>>>>>     int dma_buf_init_sysfs_statistics(void)
> > >>>>>>     {
> > >>>>>> +     int ret;
> > >>>>>> +
> > >>>>>> +     ret = kobject_worklist_init();
> > >>>>>> +     if (ret)
> > >>>>>> +             return ret;
> > >>>>>> +
> > >>>>>>         dma_buf_stats_kset = kset_create_and_add("dmabuf",
> > >>>>>>                                                  &dmabuf_sysfs_no_uevent_ops,
> > >>>>>>                                                  kernel_kobj);
> > >>>>>> @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
> > >>>>>>     int dma_buf_stats_setup(struct dma_buf *dmabuf)
> > >>>>>>     {
> > >>>>>>         struct dma_buf_sysfs_entry *sysfs_entry;
> > >>>>>> -     int ret;
> > >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
> > >>>>>> +     struct dmabuf_kobj_work *kobj_work;
> > >>>>>>
> > >>>>>>         if (!dmabuf || !dmabuf->file)
> > >>>>>>                 return -EINVAL;
> > >>>>>> @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
> > >>>>>>         sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
> > >>>>>>         sysfs_entry->dmabuf = dmabuf;
> > >>>>>>
> > >>>>>> +     sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
> > >>>>>> +                              GFP_KERNEL);
> > >>>>>> +     if (!sysfs_metadata) {
> > >>>>>> +             kfree(sysfs_entry);
> > >>>>>> +             return -ENOMEM;
> > >>>>>> +     }
> > >>>>>> +
> > >>>>>>         dmabuf->sysfs_entry = sysfs_entry;
> > >>>>>>
> > >>>>>> -     /* create the directory for buffer stats */
> > >>>>>> -     ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
> > >>>>>> -                                "%lu", file_inode(dmabuf->file)->i_ino);
> > >>>>>> -     if (ret)
> > >>>>>> -             goto err_sysfs_dmabuf;
> > >>>>>> +     sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
> > >>>>>> +     spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
> > >>>>>>
> > >>>>>> -     return 0;
> > >>>>>> +     dmabuf->sysfs_entry_metadata = sysfs_metadata;
> > >>>>>>
> > >>>>>> -err_sysfs_dmabuf:
> > >>>>>> -     kobject_put(&sysfs_entry->kobj);
> > >>>>>> -     dmabuf->sysfs_entry = NULL;
> > >>>>>> -     return ret;
> > >>>>>> +     kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
> > >>>>>> +     if (!kobj_work) {
> > >>>>>> +             kfree(sysfs_entry);
> > >>>>>> +             kfree(sysfs_metadata);
> > >>>>>> +             return -ENOMEM;
> > >>>>>> +     }
> > >>>>>> +
> > >>>>>> +     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
> > >>>>>> +     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
> > >>>>>> +     /*
> > >>>>>> +      * stash the inode number in struct dmabuf_kobj_work since setup
> > >>>>>> +      * might race with DMA-BUF teardown.
> > >>>>>> +      */
> > >>>>>> +     kobj_work->uid = file_inode(dmabuf->file)->i_ino;
> > >>>>>> +
> > >>>>>> +     deferred_kobject_create(kobj_work);
> > >>>>>> +     return 0;
> > >>>>>>     }
> > >>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > >>>>>> index 7ab50076e7a6..0597690023a0 100644
> > >>>>>> --- a/include/linux/dma-buf.h
> > >>>>>> +++ b/include/linux/dma-buf.h
> > >>>>>> @@ -287,6 +287,50 @@ struct dma_buf_ops {
> > >>>>>>         void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
> > >>>>>>     };
> > >>>>>>
> > >>>>>> +#ifdef CONFIG_DMABUF_SYSFS_STATS
> > >>>>>> +enum sysfs_entry_status {
> > >>>>>> +     SYSFS_ENTRY_UNINITIALIZED,
> > >>>>>> +     SYSFS_ENTRY_INIT_IN_PROGRESS,
> > >>>>>> +     SYSFS_ENTRY_ERROR,
> > >>>>>> +     SYSFS_ENTRY_INIT_ABORTED,
> > >>>>>> +     SYSFS_ENTRY_INITIALIZED,
> > >>>>>> +};
> > >>>>>> +
> > >>>>>> +/*
> > >>>>>> + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
> > >>>>>> + * DMA-BUF sysfs entry.
> > >>>>>> + *
> > >>>>>> + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
> > >>>>>> + * the sysfs entry has the following path.
> > >>>>>> + *
> > >>>>>> + *                   SYSFS_ENTRY_UNINITIALIZED
> > >>>>>> + *            __________________|____________________
> > >>>>>> + *           |                                       |
> > >>>>>> + *     SYSFS_ENTRY_INIT_IN_PROGRESS      SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
> > >>>>>> + *           |                                                     gets freed
> > >>>>>> + *           |                                                     before
> > >>>>>> + *           |                                                     init)
> > >>>>>> + *   ________|_____________________________________
> > >>>>>> + *   |                         |                   |
> > >>>>>> + * SYSFS_ENTRY_INITIALIZED     |       SYSFS_ENTRY_INIT_ABORTED
> > >>>>>> + *                             |               (DMA-BUF gets freed during kobject
> > >>>>>> + *                             |               init)
> > >>>>>> + *                             |
> > >>>>>> + *                             |
> > >>>>>> + *                 SYSFS_ENTRY_ERROR
> > >>>>>> + *                 (error during kobject init)
> > >>>>>> + *
> > >>>>>> + * @sysfs_entry_lock: protects access to @status.
> > >>>>>> + */
> > >>>>>> +struct dma_buf_sysfs_entry_metadata {
> > >>>>>> +     enum sysfs_entry_status status;
> > >>>>>> +     /*
> > >>>>>> +      * Protects sysfs_entry_metadata->status
> > >>>>>> +      */
> > >>>>>> +     spinlock_t sysfs_entry_lock;
> > >>>>>> +};
> > >>>>>> +#endif
> > >>>>>> +
> > >>>>>>     /**
> > >>>>>>      * struct dma_buf - shared buffer object
> > >>>>>>      *
> > >>>>>> @@ -452,6 +496,8 @@ struct dma_buf {
> > >>>>>>                 struct kobject kobj;
> > >>>>>>                 struct dma_buf *dmabuf;
> > >>>>>>         } *sysfs_entry;
> > >>>>>> +
> > >>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
> > >>>>>>     #endif
> > >>>>>>     };
> > >>>>>>
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-11  8:35               ` Daniel Vetter
@ 2022-01-11 10:58                 ` Christian König
  2022-01-11 11:16                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-01-11 10:58 UTC (permalink / raw)
  To: Daniel Vetter, Hridya Valsaraju
  Cc: Sumit Semwal, Greg Kroah-Hartman, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel, john.stultz, surenb, kaleshsingh,
	tjmercier, keescook

Am 11.01.22 um 09:35 schrieb Daniel Vetter:
> On Tue, Jan 11, 2022 at 7:02 AM Hridya Valsaraju <hridya@google.com> wrote:
>> On Sun, Jan 9, 2022 at 11:28 PM Christian König
>> <christian.koenig@amd.com> wrote:
>>> Am 07.01.22 um 22:25 schrieb Hridya Valsaraju:
>>>> On Fri, Jan 7, 2022 at 10:17 AM Hridya Valsaraju <hridya@google.com> wrote:
>>>>> On Fri, Jan 7, 2022 at 2:22 AM Christian König <christian.koenig@amd.com> wrote:
>>>>>> Am 06.01.22 um 20:04 schrieb Hridya Valsaraju:
>>>>>>> On Thu, Jan 6, 2022 at 12:59 AM Christian König
>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>> Am 05.01.22 um 00:51 schrieb Hridya Valsaraju:
>>>>>>>>> Recently, we noticed an issue where a process went into direct reclaim
>>>>>>>>> while holding the kernfs rw semaphore for sysfs in write(exclusive)
>>>>>>>>> mode. This caused processes who were doing DMA-BUF exports and releases
>>>>>>>>> to go into uninterruptible sleep since they needed to acquire the same
>>>>>>>>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid
>>>>>>>>> blocking DMA-BUF export/release for an indeterminate amount of time
>>>>>>>>> while another process is holding the sysfs rw semaphore in exclusive
>>>>>>>>> mode, this patch moves the per-buffer sysfs file creation/deleteion to
>>>>>>>>> a kthread.
>>>>>>>> Well I absolutely don't think that this is justified.
>>>>>>>>
>>>>>>>> You adding tons of complexity here just to avoid the overhead of
>>>>>>>> creating the sysfs files while exporting DMA-bufs which is an operation
>>>>>>>> which should be done exactly once in the lifecycle for the most common
>>>>>>>> use case.
>>>>>>>>
>>>>>>>> Please explain further why that should be necessary.
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> We noticed that the issue sometimes causes the exporting process to go
>>>>>>> to the uninterrupted sleep state for tens of milliseconds which
>>>>>>> unfortunately leads to user-perceptible UI jank. This is the reason
>>>>>>> why we are trying to move the sysfs entry creation and deletion out of
>>>>>>> the DMA-BUF export/release path. I will update the commit message to
>>>>>>> include the same in the next revision.
>>>>>> That is still not a justification for this change. The question is why
>>>>>> do you need that in the first place?
>>>>>>
>>>>>> Exporting a DMA-buf should be something would should be very rarely,
>>>>>> e.g. only at the start of an application.
>>>>> Hi Christian,
>>>>>
>>>>> Yes, that is correct. Noticeable jank caused by the issue is not
>>>>> present at all times and happens on UI transitions(for example during
>>>>> app launches and exits) when there are several DMA-BUFs being exported
>>>>> and released. This is especially true in the case of the camera app
>>>>> since it exports and releases a relatively larger number of DMA-BUFs
>>>>> during launch and exit respectively.
>>> Well, that sounds at least better than before.
>>>
>>>>> Regards,
>>>>> Hridya
>>>>>
>>>>>> So this strongly looks like you are trying to optimize for an use case
>>>>>> where we should probably rethink what userspace is doing here instead.
>>>> Hello Christian,
>>>>
>>>> If you don't mind, could you please elaborate on the above statement?
>>> The purpose of DMA-buf is to share a rather low number of buffers
>>> between different drivers and/or applications.
>>>
>>> For example with triple buffering we have three buffers shared between
>>> the camera driver and the display driver, same thing for use cases
>>> between rendering and display.
>>>
>>> So even when you have ~100 applications open your should not share more
>>> than ~300 DMA-buf handles and doing that should absolutely not cause any
>>> problems like you described above.
>>>
>>> Long story short when this affects your user experience then your user
>>> space is exporting *much* more buffers than expected. Especially since
>>> the sysfs overhead is completely negligible compared to the overhead
>>> drivers have when they export buffers.
>>
>>
>> I do not think we can solve this issue from userspace since the
>> problem is not due to the overhead of sysfs creation/teardown itself.

Yes, thanks. That's the important information I was looking for.

>> The problem is that the duration of time for which the exporting
>> process would need to sleep waiting for the kernfs_rwsem semaphore is
>> undefined when the holder of the semaphore goes under direct reclaim.
>> Fsnotify events for sysfs files are also generated while holding the
>> same semaphore.
>>
>> This is also not a problem due to the high number of DMA-BUF
>> exports during launch time, as even a single export can be delayed for
>> an unpredictable amount of time. We cannot eliminate DMA-BUF exports
>> completely during app-launches and we are unfortunately seeing reports
>> of the exporting process occasionally sleeping long enough to cause
>> user-visible jankiness :(
>>
>> We also looked at whether any optimizations are possible from the
>> kernfs implementation side[1] but the semaphore is used quite extensively
>> and it looks like the best way forward would be to remove sysfs
>> creation/teardown from the DMA-BUF export/release path altogether. We
>> have some ideas on how we can reduce the code-complexity in the
>> current patch. If we manage to
>> simplify it considerably, would the approach of offloading sysfs
>> creation and teardown into a separate thread be acceptable Christian?

At bare minimum I suggest to use a work_struct instead of re-inventing 
that with kthread.

And then only put the exporting of buffers into the background and not 
the teardown.

>> Thank you for the guidance!
> One worry I have here with doing this async that now userspace might
> have a dma-buf, but the sysfs entry does not yet exist, or the dma-buf
> is gone, but the sysfs entry still exists. That's a bit awkward wrt
> semantics.
>
> Also I'm pretty sure that if we can hit this, then other subsystems
> using kernfs have similar problems, so trying to fix this in kernfs
> with slightly more fine-grained locking sounds like a much more solid
> approach. The linked patch talks about how the big delays happen due
> to direct reclaim, and that might be limited to specific code paths
> that we need to look at? As-is this feels a bit much like papering
> over kernfs issues in hackish ways in sysfs users, instead of tackling
> the problem at its root.

Which is exactly my feeling as well, yes.

Regards,
Christian.

> -Daniel
>
>> Regards,
>> Hridya
>>
>> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20211118230008.2679780-1-minchan%40kernel.org%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C261f37c211bc4abe857b08d9d4dd4fd7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637774869282278924%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=tqWiJAZxpOpk5QrzuisMwt1e9esI1q0npaMHevsQCwI%3D&amp;reserved=0
>>
>>
>>
>>> I think in that light adding sysfs was rather questionable to begin
>>> with, but that change here is a complete no-go if you ask me. You are
>>> adding complexity to the kernel for something which should probably be
>>> optimized in userspace.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Thanks in advance for the guidance!
>>>>
>>>> Regards,
>>>> Hridya
>>>>
>>>>>> If we would want to go down this route you would need to change all the
>>>>>> drivers implementing the DMA-buf export functionality to avoid
>>>>>> uninterruptible sleep as well and that is certainly something I would NAK.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> Thanks,
>>>>>>> Hridya
>>>>>>>
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
>>>>>>>>> Signed-off-by: Hridya Valsaraju <hridya@google.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/dma-buf/dma-buf-sysfs-stats.c | 343 ++++++++++++++++++++++++--
>>>>>>>>>      include/linux/dma-buf.h               |  46 ++++
>>>>>>>>>      2 files changed, 366 insertions(+), 23 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
>>>>>>>>> index 053baadcada9..3251fdf2f05f 100644
>>>>>>>>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c
>>>>>>>>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
>>>>>>>>> @@ -7,13 +7,39 @@
>>>>>>>>>
>>>>>>>>>      #include <linux/dma-buf.h>
>>>>>>>>>      #include <linux/dma-resv.h>
>>>>>>>>> +#include <linux/freezer.h>
>>>>>>>>>      #include <linux/kobject.h>
>>>>>>>>> +#include <linux/kthread.h>
>>>>>>>>> +#include <linux/list.h>
>>>>>>>>>      #include <linux/printk.h>
>>>>>>>>> +#include <linux/sched/signal.h>
>>>>>>>>>      #include <linux/slab.h>
>>>>>>>>>      #include <linux/sysfs.h>
>>>>>>>>>
>>>>>>>>>      #include "dma-buf-sysfs-stats.h"
>>>>>>>>>
>>>>>>>>> +struct dmabuf_kobj_work {
>>>>>>>>> +     struct list_head list;
>>>>>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>>>>>>>> +     unsigned long uid;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +/* Both kobject setup and teardown work gets queued on the list. */
>>>>>>>>> +static LIST_HEAD(dmabuf_kobj_work_list);
>>>>>>>>> +
>>>>>>>>> +/* dmabuf_kobj_list_lock protects dmabuf_kobj_work_list. */
>>>>>>>>> +static DEFINE_SPINLOCK(dmabuf_kobj_list_lock);
>>>>>>>>> +
>>>>>>>>> +/*
>>>>>>>>> + * dmabuf_sysfs_show_lock prevents a race between a DMA-BUF sysfs file being
>>>>>>>>> + * read and the DMA-BUF being freed by protecting sysfs_entry->dmabuf.
>>>>>>>>> + */
>>>>>>>>> +static DEFINE_SPINLOCK(dmabuf_sysfs_show_lock);
>>>>>>>>> +
>>>>>>>>> +static struct task_struct *dmabuf_kobject_task;
>>>>>>>>> +static wait_queue_head_t dmabuf_kobject_waitqueue;
>>>>>>>>> +
>>>>>>>>>      #define to_dma_buf_entry_from_kobj(x) container_of(x, struct dma_buf_sysfs_entry, kobj)
>>>>>>>>>
>>>>>>>>>      /**
>>>>>>>>> @@ -64,15 +90,26 @@ static ssize_t dma_buf_stats_attribute_show(struct kobject *kobj,
>>>>>>>>>          struct dma_buf_stats_attribute *attribute;
>>>>>>>>>          struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>>>>>          struct dma_buf *dmabuf;
>>>>>>>>> +     int ret;
>>>>>>>>>
>>>>>>>>>          attribute = to_dma_buf_stats_attr(attr);
>>>>>>>>>          sysfs_entry = to_dma_buf_entry_from_kobj(kobj);
>>>>>>>>> +
>>>>>>>>> +     /*
>>>>>>>>> +      * acquire dmabuf_sysfs_show_lock to prevent a race with the DMA-BUF
>>>>>>>>> +      * being freed while sysfs_entry->dmabuf is being accessed.
>>>>>>>>> +      */
>>>>>>>>> +     spin_lock(&dmabuf_sysfs_show_lock);
>>>>>>>>>          dmabuf = sysfs_entry->dmabuf;
>>>>>>>>>
>>>>>>>>> -     if (!dmabuf || !attribute->show)
>>>>>>>>> +     if (!dmabuf || !attribute->show) {
>>>>>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>>>>>>>>                  return -EIO;
>>>>>>>>> +     }
>>>>>>>>>
>>>>>>>>> -     return attribute->show(dmabuf, attribute, buf);
>>>>>>>>> +     ret = attribute->show(dmabuf, attribute, buf);
>>>>>>>>> +     spin_unlock(&dmabuf_sysfs_show_lock);
>>>>>>>>> +     return ret;
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>>      static const struct sysfs_ops dma_buf_stats_sysfs_ops = {
>>>>>>>>> @@ -118,33 +155,275 @@ static struct kobj_type dma_buf_ktype = {
>>>>>>>>>          .default_groups = dma_buf_stats_default_groups,
>>>>>>>>>      };
>>>>>>>>>
>>>>>>>>> -void dma_buf_stats_teardown(struct dma_buf *dmabuf)
>>>>>>>>> +/* Statistics files do not need to send uevents. */
>>>>>>>>> +static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
>>>>>>>>>      {
>>>>>>>>> -     struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>>>>> +     return 0;
>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>> -     sysfs_entry = dmabuf->sysfs_entry;
>>>>>>>>> -     if (!sysfs_entry)
>>>>>>>>> -             return;
>>>>>>>>> +static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
>>>>>>>>> +     .filter = dmabuf_sysfs_uevent_filter,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +/* setup of sysfs entries done asynchronously in the worker thread. */
>>>>>>>>> +static void dma_buf_sysfs_stats_setup_work(struct dmabuf_kobj_work *kobject_work)
>>>>>>>>> +{
>>>>>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
>>>>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata =
>>>>>>>>> +                     kobject_work->sysfs_metadata;
>>>>>>>>> +     bool free_metadata = false;
>>>>>>>>> +
>>>>>>>>> +     int ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
>>>>>>>>> +                                    "%lu", kobject_work->uid);
>>>>>>>>> +     if (ret) {
>>>>>>>>> +             kobject_put(&sysfs_entry->kobj);
>>>>>>>>> +
>>>>>>>>> +             spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +             if (sysfs_metadata->status == SYSFS_ENTRY_INIT_ABORTED) {
>>>>>>>>> +                     /*
>>>>>>>>> +                      * SYSFS_ENTRY_INIT_ABORTED means that the DMA-BUF has already
>>>>>>>>> +                      * been freed. At this point, its safe to free the memory for
>>>>>>>>> +                      * the sysfs_metadata;
>>>>>>>>> +                      */
>>>>>>>>> +                     free_metadata = true;
>>>>>>>>> +             } else {
>>>>>>>>> +                     /*
>>>>>>>>> +                      * The DMA-BUF has not yet been freed, set the status to
>>>>>>>>> +                      * sysfs_entry_error so that when the DMA-BUF gets
>>>>>>>>> +                      * freed, we know there is no need to teardown the sysfs
>>>>>>>>> +                      * entry.
>>>>>>>>> +                      */
>>>>>>>>> +                     sysfs_metadata->status = SYSFS_ENTRY_ERROR;
>>>>>>>>> +             }
>>>>>>>>> +             goto unlock;
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>> +     /*
>>>>>>>>> +      * If the DMA-BUF has not yet been released, status would still be
>>>>>>>>> +      * SYSFS_ENTRY_INIT_IN_PROGRESS. We set the status as initialized.
>>>>>>>>> +      */
>>>>>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
>>>>>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INITIALIZED;
>>>>>>>>> +             goto unlock;
>>>>>>>>> +     }
>>>>>>>>>
>>>>>>>>> +     /*
>>>>>>>>> +      * At this point the status is SYSFS_ENTRY_INIT_ABORTED which means
>>>>>>>>> +      * that the DMA-BUF has already been freed. Hence, we cleanup the
>>>>>>>>> +      * sysfs_entry and its metadata since neither of them are needed
>>>>>>>>> +      * anymore.
>>>>>>>>> +      */
>>>>>>>>> +     free_metadata = true;
>>>>>>>>>          kobject_del(&sysfs_entry->kobj);
>>>>>>>>>          kobject_put(&sysfs_entry->kobj);
>>>>>>>>> +
>>>>>>>>> +unlock:
>>>>>>>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +     if (free_metadata) {
>>>>>>>>> +             kfree(kobject_work->sysfs_metadata);
>>>>>>>>> +             kobject_work->sysfs_metadata = NULL;
>>>>>>>>> +     }
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> +/* teardown of sysfs entries done asynchronously in the worker thread. */
>>>>>>>>> +static void dma_buf_sysfs_stats_teardown_work(struct dmabuf_kobj_work *kobject_work)
>>>>>>>>> +{
>>>>>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry = kobject_work->sysfs_entry;
>>>>>>>>>
>>>>>>>>> -/* Statistics files do not need to send uevents. */
>>>>>>>>> -static int dmabuf_sysfs_uevent_filter(struct kset *kset, struct kobject *kobj)
>>>>>>>>> +     kobject_del(&sysfs_entry->kobj);
>>>>>>>>> +     kobject_put(&sysfs_entry->kobj);
>>>>>>>>> +
>>>>>>>>> +     kfree(kobject_work->sysfs_metadata);
>>>>>>>>> +     kobject_work->sysfs_metadata = NULL;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/* do setup or teardown of sysfs entries as required */
>>>>>>>>> +static void do_kobject_work(struct dmabuf_kobj_work *kobject_work)
>>>>>>>>>      {
>>>>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>>>>>>>> +     bool setup_needed = false;
>>>>>>>>> +     bool teardown_needed = false;
>>>>>>>>> +
>>>>>>>>> +     sysfs_metadata = kobject_work->sysfs_metadata;
>>>>>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED) {
>>>>>>>>> +             setup_needed = true;
>>>>>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_IN_PROGRESS;
>>>>>>>>> +     } else if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
>>>>>>>>> +             teardown_needed = true;
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>> +     /*
>>>>>>>>> +      * It is ok to release the sysfs_entry_lock here.
>>>>>>>>> +      *
>>>>>>>>> +      * If setup_needed is true, we check the status again after the kobject
>>>>>>>>> +      * initialization to see if it has been set to SYSFS_ENTRY_INIT_ABORTED
>>>>>>>>> +      * and if so teardown the kobject.
>>>>>>>>> +      *
>>>>>>>>> +      * If teardown_needed is true, there are no more changes expected to the
>>>>>>>>> +      * status.
>>>>>>>>> +      *
>>>>>>>>> +      * If neither setup_needed nor teardown needed are true, it
>>>>>>>>> +      * means the DMA-BUF was freed before we got around to setting up the
>>>>>>>>> +      * sysfs entry and hence we just need to release the metadata and
>>>>>>>>> +      * return.
>>>>>>>>> +      */
>>>>>>>>> +     spin_unlock(&kobject_work->sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +
>>>>>>>>> +     if (setup_needed)
>>>>>>>>> +             dma_buf_sysfs_stats_setup_work(kobject_work);
>>>>>>>>> +     else if (teardown_needed)
>>>>>>>>> +             dma_buf_sysfs_stats_teardown_work(kobject_work);
>>>>>>>>> +     else
>>>>>>>>> +             kfree(kobject_work->sysfs_metadata);
>>>>>>>>> +
>>>>>>>>> +     kfree(kobject_work);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static struct dmabuf_kobj_work *get_next_kobj_work(void)
>>>>>>>>> +{
>>>>>>>>> +     struct dmabuf_kobj_work *kobject_work;
>>>>>>>>> +
>>>>>>>>> +     spin_lock(&dmabuf_kobj_list_lock);
>>>>>>>>> +     kobject_work = list_first_entry_or_null(&dmabuf_kobj_work_list,
>>>>>>>>> +                                             struct dmabuf_kobj_work, list);
>>>>>>>>> +     if (kobject_work)
>>>>>>>>> +             list_del(&kobject_work->list);
>>>>>>>>> +     spin_unlock(&dmabuf_kobj_list_lock);
>>>>>>>>> +     return kobject_work;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int kobject_work_thread(void *data)
>>>>>>>>> +{
>>>>>>>>> +     struct dmabuf_kobj_work *kobject_work;
>>>>>>>>> +
>>>>>>>>> +     while (1) {
>>>>>>>>> +             wait_event_freezable(dmabuf_kobject_waitqueue,
>>>>>>>>> +                                  (kobject_work = get_next_kobj_work()));
>>>>>>>>> +             do_kobject_work(kobject_work);
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>>          return 0;
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> -static const struct kset_uevent_ops dmabuf_sysfs_no_uevent_ops = {
>>>>>>>>> -     .filter = dmabuf_sysfs_uevent_filter,
>>>>>>>>> -};
>>>>>>>>> +static int kobject_worklist_init(void)
>>>>>>>>> +{
>>>>>>>>> +     init_waitqueue_head(&dmabuf_kobject_waitqueue);
>>>>>>>>> +     dmabuf_kobject_task = kthread_run(kobject_work_thread, NULL,
>>>>>>>>> +                                       "%s", "dmabuf-kobject-worker");
>>>>>>>>> +     if (IS_ERR(dmabuf_kobject_task)) {
>>>>>>>>> +             pr_err("Creating thread for deferred sysfs entry creation/deletion failed\n");
>>>>>>>>> +             return PTR_ERR(dmabuf_kobject_task);
>>>>>>>>> +     }
>>>>>>>>> +     sched_set_normal(dmabuf_kobject_task, MAX_NICE);
>>>>>>>>> +
>>>>>>>>> +     return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void deferred_kobject_create(struct dmabuf_kobj_work *kobject_work)
>>>>>>>>> +{
>>>>>>>>> +     INIT_LIST_HEAD(&kobject_work->list);
>>>>>>>>> +
>>>>>>>>> +     spin_lock(&dmabuf_kobj_list_lock);
>>>>>>>>> +
>>>>>>>>> +     list_add_tail(&kobject_work->list, &dmabuf_kobj_work_list);
>>>>>>>>> +
>>>>>>>>> +     spin_unlock(&dmabuf_kobj_list_lock);
>>>>>>>>> +
>>>>>>>>> +     wake_up(&dmabuf_kobject_waitqueue);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +
>>>>>>>>> +void dma_buf_stats_teardown(struct dma_buf *dmabuf)
>>>>>>>>> +{
>>>>>>>>> +     struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>>>>>>>> +     struct dmabuf_kobj_work *kobj_work;
>>>>>>>>> +
>>>>>>>>> +     sysfs_entry = dmabuf->sysfs_entry;
>>>>>>>>> +     if (!sysfs_entry)
>>>>>>>>> +             return;
>>>>>>>>> +
>>>>>>>>> +     sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>>>>>>>> +     if (!sysfs_metadata)
>>>>>>>>> +             return;
>>>>>>>>> +
>>>>>>>>> +     spin_lock(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +
>>>>>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_UNINITIALIZED ||
>>>>>>>>> +         sysfs_metadata->status == SYSFS_ENTRY_INIT_IN_PROGRESS) {
>>>>>>>>> +             /*
>>>>>>>>> +              * The sysfs entry for this buffer has not yet been initialized,
>>>>>>>>> +              * we set the status to SYSFS_ENTRY_INIT_ABORTED to abort the
>>>>>>>>> +              * initialization.
>>>>>>>>> +              */
>>>>>>>>> +             sysfs_metadata->status = SYSFS_ENTRY_INIT_ABORTED;
>>>>>>>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +
>>>>>>>>> +             /*
>>>>>>>>> +              * In case kobject initialization completes right as we release
>>>>>>>>> +              * the sysfs_entry_lock, disable show() for the sysfs entry by
>>>>>>>>> +              * setting sysfs_entry->dmabuf to NULL to prevent a race.
>>>>>>>>> +              */
>>>>>>>>> +             spin_lock(&dmabuf_sysfs_show_lock);
>>>>>>>>> +             sysfs_entry->dmabuf = NULL;
>>>>>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>>>>>>>> +
>>>>>>>>> +             return;
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>> +     if (sysfs_metadata->status == SYSFS_ENTRY_INITIALIZED) {
>>>>>>>>> +             /*
>>>>>>>>> +              * queue teardown work only if sysfs_entry is fully inititalized.
>>>>>>>>> +              * It is ok to release the sysfs_entry_lock here since the
>>>>>>>>> +              * status can no longer change.
>>>>>>>>> +              */
>>>>>>>>> +             spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +
>>>>>>>>> +             /*
>>>>>>>>> +              * Meanwhile disable show() for the sysfs entry to avoid a race
>>>>>>>>> +              * between teardown and show().
>>>>>>>>> +              */
>>>>>>>>> +             spin_lock(&dmabuf_sysfs_show_lock);
>>>>>>>>> +             sysfs_entry->dmabuf = NULL;
>>>>>>>>> +             spin_unlock(&dmabuf_sysfs_show_lock);
>>>>>>>>> +
>>>>>>>>> +             kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
>>>>>>>>> +             if (!kobj_work) {
>>>>>>>>> +                     /* do the teardown immediately. */
>>>>>>>>> +                     kobject_del(&sysfs_entry->kobj);
>>>>>>>>> +                     kobject_put(&sysfs_entry->kobj);
>>>>>>>>> +                     kfree(sysfs_metadata);
>>>>>>>>> +             } else {
>>>>>>>>> +                     /* queue teardown work. */
>>>>>>>>> +                     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
>>>>>>>>> +                     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>>>>>>>> +                     deferred_kobject_create(kobj_work);
>>>>>>>>> +             }
>>>>>>>>> +
>>>>>>>>> +             return;
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>> +     /*
>>>>>>>>> +      * status is SYSFS_ENTRY_INIT_ERROR so we only need to free the
>>>>>>>>> +      * metadata.
>>>>>>>>> +      */
>>>>>>>>> +     spin_unlock(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>> +     kfree(dmabuf->sysfs_entry_metadata);
>>>>>>>>> +     dmabuf->sysfs_entry_metadata = NULL;
>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>>      static struct kset *dma_buf_stats_kset;
>>>>>>>>>      static struct kset *dma_buf_per_buffer_stats_kset;
>>>>>>>>>      int dma_buf_init_sysfs_statistics(void)
>>>>>>>>>      {
>>>>>>>>> +     int ret;
>>>>>>>>> +
>>>>>>>>> +     ret = kobject_worklist_init();
>>>>>>>>> +     if (ret)
>>>>>>>>> +             return ret;
>>>>>>>>> +
>>>>>>>>>          dma_buf_stats_kset = kset_create_and_add("dmabuf",
>>>>>>>>>                                                   &dmabuf_sysfs_no_uevent_ops,
>>>>>>>>>                                                   kernel_kobj);
>>>>>>>>> @@ -171,7 +450,8 @@ void dma_buf_uninit_sysfs_statistics(void)
>>>>>>>>>      int dma_buf_stats_setup(struct dma_buf *dmabuf)
>>>>>>>>>      {
>>>>>>>>>          struct dma_buf_sysfs_entry *sysfs_entry;
>>>>>>>>> -     int ret;
>>>>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_metadata;
>>>>>>>>> +     struct dmabuf_kobj_work *kobj_work;
>>>>>>>>>
>>>>>>>>>          if (!dmabuf || !dmabuf->file)
>>>>>>>>>                  return -EINVAL;
>>>>>>>>> @@ -188,18 +468,35 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
>>>>>>>>>          sysfs_entry->kobj.kset = dma_buf_per_buffer_stats_kset;
>>>>>>>>>          sysfs_entry->dmabuf = dmabuf;
>>>>>>>>>
>>>>>>>>> +     sysfs_metadata = kzalloc(sizeof(struct dma_buf_sysfs_entry_metadata),
>>>>>>>>> +                              GFP_KERNEL);
>>>>>>>>> +     if (!sysfs_metadata) {
>>>>>>>>> +             kfree(sysfs_entry);
>>>>>>>>> +             return -ENOMEM;
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>>          dmabuf->sysfs_entry = sysfs_entry;
>>>>>>>>>
>>>>>>>>> -     /* create the directory for buffer stats */
>>>>>>>>> -     ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
>>>>>>>>> -                                "%lu", file_inode(dmabuf->file)->i_ino);
>>>>>>>>> -     if (ret)
>>>>>>>>> -             goto err_sysfs_dmabuf;
>>>>>>>>> +     sysfs_metadata->status = SYSFS_ENTRY_UNINITIALIZED;
>>>>>>>>> +     spin_lock_init(&sysfs_metadata->sysfs_entry_lock);
>>>>>>>>>
>>>>>>>>> -     return 0;
>>>>>>>>> +     dmabuf->sysfs_entry_metadata = sysfs_metadata;
>>>>>>>>>
>>>>>>>>> -err_sysfs_dmabuf:
>>>>>>>>> -     kobject_put(&sysfs_entry->kobj);
>>>>>>>>> -     dmabuf->sysfs_entry = NULL;
>>>>>>>>> -     return ret;
>>>>>>>>> +     kobj_work = kzalloc(sizeof(struct dmabuf_kobj_work), GFP_KERNEL);
>>>>>>>>> +     if (!kobj_work) {
>>>>>>>>> +             kfree(sysfs_entry);
>>>>>>>>> +             kfree(sysfs_metadata);
>>>>>>>>> +             return -ENOMEM;
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>> +     kobj_work->sysfs_entry = dmabuf->sysfs_entry;
>>>>>>>>> +     kobj_work->sysfs_metadata = dmabuf->sysfs_entry_metadata;
>>>>>>>>> +     /*
>>>>>>>>> +      * stash the inode number in struct dmabuf_kobj_work since setup
>>>>>>>>> +      * might race with DMA-BUF teardown.
>>>>>>>>> +      */
>>>>>>>>> +     kobj_work->uid = file_inode(dmabuf->file)->i_ino;
>>>>>>>>> +
>>>>>>>>> +     deferred_kobject_create(kobj_work);
>>>>>>>>> +     return 0;
>>>>>>>>>      }
>>>>>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>>>>>>>> index 7ab50076e7a6..0597690023a0 100644
>>>>>>>>> --- a/include/linux/dma-buf.h
>>>>>>>>> +++ b/include/linux/dma-buf.h
>>>>>>>>> @@ -287,6 +287,50 @@ struct dma_buf_ops {
>>>>>>>>>          void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
>>>>>>>>>      };
>>>>>>>>>
>>>>>>>>> +#ifdef CONFIG_DMABUF_SYSFS_STATS
>>>>>>>>> +enum sysfs_entry_status {
>>>>>>>>> +     SYSFS_ENTRY_UNINITIALIZED,
>>>>>>>>> +     SYSFS_ENTRY_INIT_IN_PROGRESS,
>>>>>>>>> +     SYSFS_ENTRY_ERROR,
>>>>>>>>> +     SYSFS_ENTRY_INIT_ABORTED,
>>>>>>>>> +     SYSFS_ENTRY_INITIALIZED,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +/*
>>>>>>>>> + * struct dma_buf_sysfs_entry_metadata - Holds the current status for the
>>>>>>>>> + * DMA-BUF sysfs entry.
>>>>>>>>> + *
>>>>>>>>> + * @status: holds the current status for the DMA-BUF sysfs entry. The status of
>>>>>>>>> + * the sysfs entry has the following path.
>>>>>>>>> + *
>>>>>>>>> + *                   SYSFS_ENTRY_UNINITIALIZED
>>>>>>>>> + *            __________________|____________________
>>>>>>>>> + *           |                                       |
>>>>>>>>> + *     SYSFS_ENTRY_INIT_IN_PROGRESS      SYSFS_ENTRY_INIT_ABORTED (DMA-BUF
>>>>>>>>> + *           |                                                     gets freed
>>>>>>>>> + *           |                                                     before
>>>>>>>>> + *           |                                                     init)
>>>>>>>>> + *   ________|_____________________________________
>>>>>>>>> + *   |                         |                   |
>>>>>>>>> + * SYSFS_ENTRY_INITIALIZED     |       SYSFS_ENTRY_INIT_ABORTED
>>>>>>>>> + *                             |               (DMA-BUF gets freed during kobject
>>>>>>>>> + *                             |               init)
>>>>>>>>> + *                             |
>>>>>>>>> + *                             |
>>>>>>>>> + *                 SYSFS_ENTRY_ERROR
>>>>>>>>> + *                 (error during kobject init)
>>>>>>>>> + *
>>>>>>>>> + * @sysfs_entry_lock: protects access to @status.
>>>>>>>>> + */
>>>>>>>>> +struct dma_buf_sysfs_entry_metadata {
>>>>>>>>> +     enum sysfs_entry_status status;
>>>>>>>>> +     /*
>>>>>>>>> +      * Protects sysfs_entry_metadata->status
>>>>>>>>> +      */
>>>>>>>>> +     spinlock_t sysfs_entry_lock;
>>>>>>>>> +};
>>>>>>>>> +#endif
>>>>>>>>> +
>>>>>>>>>      /**
>>>>>>>>>       * struct dma_buf - shared buffer object
>>>>>>>>>       *
>>>>>>>>> @@ -452,6 +496,8 @@ struct dma_buf {
>>>>>>>>>                  struct kobject kobj;
>>>>>>>>>                  struct dma_buf *dmabuf;
>>>>>>>>>          } *sysfs_entry;
>>>>>>>>> +
>>>>>>>>> +     struct dma_buf_sysfs_entry_metadata *sysfs_entry_metadata;
>>>>>>>>>      #endif
>>>>>>>>>      };
>>>>>>>>>
>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-11 10:58                 ` Christian König
@ 2022-01-11 11:16                   ` Greg Kroah-Hartman
  2022-01-11 11:43                     ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Greg Kroah-Hartman @ 2022-01-11 11:16 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Hridya Valsaraju, Sumit Semwal, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

On Tue, Jan 11, 2022 at 11:58:07AM +0100, Christian König wrote:
> > > This is also not a problem due to the high number of DMA-BUF
> > > exports during launch time, as even a single export can be delayed for
> > > an unpredictable amount of time. We cannot eliminate DMA-BUF exports
> > > completely during app-launches and we are unfortunately seeing reports
> > > of the exporting process occasionally sleeping long enough to cause
> > > user-visible jankiness :(
> > > 
> > > We also looked at whether any optimizations are possible from the
> > > kernfs implementation side[1] but the semaphore is used quite extensively
> > > and it looks like the best way forward would be to remove sysfs
> > > creation/teardown from the DMA-BUF export/release path altogether. We
> > > have some ideas on how we can reduce the code-complexity in the
> > > current patch. If we manage to
> > > simplify it considerably, would the approach of offloading sysfs
> > > creation and teardown into a separate thread be acceptable Christian?
> 
> At bare minimum I suggest to use a work_struct instead of re-inventing that
> with kthread.
> 
> And then only put the exporting of buffers into the background and not the
> teardown.
> 
> > > Thank you for the guidance!
> > One worry I have here with doing this async that now userspace might
> > have a dma-buf, but the sysfs entry does not yet exist, or the dma-buf
> > is gone, but the sysfs entry still exists. That's a bit awkward wrt
> > semantics.
> > 
> > Also I'm pretty sure that if we can hit this, then other subsystems
> > using kernfs have similar problems, so trying to fix this in kernfs
> > with slightly more fine-grained locking sounds like a much more solid
> > approach. The linked patch talks about how the big delays happen due
> > to direct reclaim, and that might be limited to specific code paths
> > that we need to look at? As-is this feels a bit much like papering
> > over kernfs issues in hackish ways in sysfs users, instead of tackling
> > the problem at its root.
> 
> Which is exactly my feeling as well, yes.

More and more people are using sysfs/kernfs now for things that it was
never designed for (i.e. high-speed statistic gathering).  That's not
the fault of kernfs, it's the fault of people thinking it can be used
for stuff like that :)

But delays like this is odd, tearing down sysfs attributes should
normally _never_ be a fast-path that matters to system throughput.  So
offloading it to a workqueue makes sense as the attributes here are for
objects that are on the fast-path.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-11 11:16                   ` Greg Kroah-Hartman
@ 2022-01-11 11:43                     ` Christian König
  2022-01-12  1:11                       ` Hridya Valsaraju
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-01-11 11:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Daniel Vetter, Hridya Valsaraju, Sumit Semwal, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook


Am 11.01.22 um 12:16 schrieb Greg Kroah-Hartman:
> On Tue, Jan 11, 2022 at 11:58:07AM +0100, Christian König wrote:
>>>> This is also not a problem due to the high number of DMA-BUF
>>>> exports during launch time, as even a single export can be delayed for
>>>> an unpredictable amount of time. We cannot eliminate DMA-BUF exports
>>>> completely during app-launches and we are unfortunately seeing reports
>>>> of the exporting process occasionally sleeping long enough to cause
>>>> user-visible jankiness :(
>>>>
>>>> We also looked at whether any optimizations are possible from the
>>>> kernfs implementation side[1] but the semaphore is used quite extensively
>>>> and it looks like the best way forward would be to remove sysfs
>>>> creation/teardown from the DMA-BUF export/release path altogether. We
>>>> have some ideas on how we can reduce the code-complexity in the
>>>> current patch. If we manage to
>>>> simplify it considerably, would the approach of offloading sysfs
>>>> creation and teardown into a separate thread be acceptable Christian?
>> At bare minimum I suggest to use a work_struct instead of re-inventing that
>> with kthread.
>>
>> And then only put the exporting of buffers into the background and not the
>> teardown.
>>
>>>> Thank you for the guidance!
>>> One worry I have here with doing this async that now userspace might
>>> have a dma-buf, but the sysfs entry does not yet exist, or the dma-buf
>>> is gone, but the sysfs entry still exists. That's a bit awkward wrt
>>> semantics.
>>>
>>> Also I'm pretty sure that if we can hit this, then other subsystems
>>> using kernfs have similar problems, so trying to fix this in kernfs
>>> with slightly more fine-grained locking sounds like a much more solid
>>> approach. The linked patch talks about how the big delays happen due
>>> to direct reclaim, and that might be limited to specific code paths
>>> that we need to look at? As-is this feels a bit much like papering
>>> over kernfs issues in hackish ways in sysfs users, instead of tackling
>>> the problem at its root.
>> Which is exactly my feeling as well, yes.
> More and more people are using sysfs/kernfs now for things that it was
> never designed for (i.e. high-speed statistic gathering).  That's not
> the fault of kernfs, it's the fault of people thinking it can be used
> for stuff like that :)

I'm starting to get the feeling that we should maybe have questioned 
adding sysfs files for each exported DMA-buf a bit more. Anyway, to late 
for that. We have to live with the consequences.

> But delays like this is odd, tearing down sysfs attributes should
> normally _never_ be a fast-path that matters to system throughput.  So
> offloading it to a workqueue makes sense as the attributes here are for
> objects that are on the fast-path.

That's what is puzzling me as well. As far as I understood Hridya 
tearing down things is not the problem, because during teardown we 
usually have a dying task where it's usually not much of a problem if 
the corpse is around for another few milliseconds until everything is 
cleaned up.

The issue happens during creation of the sysfs attribute and that's 
extremely odd because if this waits for reclaim then drivers will 
certainly wait for reclaim as well. See we need a few bytes for the 
sysfs attribute, but drivers usually need a few megabytes for the 
DMA-buf backing store before they can even export the DMA-buf.

So something doesn't add up in the rational for this problem.

Regards,
Christian.

>
> thanks,
>
> greg k-h


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path
  2022-01-11 11:43                     ` Christian König
@ 2022-01-12  1:11                       ` Hridya Valsaraju
  0 siblings, 0 replies; 15+ messages in thread
From: Hridya Valsaraju @ 2022-01-12  1:11 UTC (permalink / raw)
  To: Christian König
  Cc: Greg Kroah-Hartman, Daniel Vetter, Sumit Semwal, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, john.stultz, surenb,
	kaleshsingh, tjmercier, keescook

On Tue, Jan 11, 2022 at 3:43 AM Christian König
<christian.koenig@amd.com> wrote:
>
>
> Am 11.01.22 um 12:16 schrieb Greg Kroah-Hartman:
> > On Tue, Jan 11, 2022 at 11:58:07AM +0100, Christian König wrote:
> >>>> This is also not a problem due to the high number of DMA-BUF
> >>>> exports during launch time, as even a single export can be delayed for
> >>>> an unpredictable amount of time. We cannot eliminate DMA-BUF exports
> >>>> completely during app-launches and we are unfortunately seeing reports
> >>>> of the exporting process occasionally sleeping long enough to cause
> >>>> user-visible jankiness :(
> >>>>
> >>>> We also looked at whether any optimizations are possible from the
> >>>> kernfs implementation side[1] but the semaphore is used quite extensively
> >>>> and it looks like the best way forward would be to remove sysfs
> >>>> creation/teardown from the DMA-BUF export/release path altogether. We
> >>>> have some ideas on how we can reduce the code-complexity in the
> >>>> current patch. If we manage to
> >>>> simplify it considerably, would the approach of offloading sysfs
> >>>> creation and teardown into a separate thread be acceptable Christian?
> >> At bare minimum I suggest to use a work_struct instead of re-inventing that
> >> with kthread.
> >>
> >> And then only put the exporting of buffers into the background and not the
> >> teardown.
> >>
> >>>> Thank you for the guidance!
> >>> One worry I have here with doing this async that now userspace might
> >>> have a dma-buf, but the sysfs entry does not yet exist, or the dma-buf
> >>> is gone, but the sysfs entry still exists. That's a bit awkward wrt
> >>> semantics.


Thank you all for your thoughts and guidance. You are correct that we
will be trading accuracy for performance here. One precedence we could
find was in the case of RSS accounting where SPLIT_RSS_COUNTING caused
the accounting to have less overhead but also made it less accurate.
If you would prefer that it not be the default case, we can make it
configurable by putting it behind a config instead.


> >>>
> >>> Also I'm pretty sure that if we can hit this, then other subsystems
> >>> using kernfs have similar problems, so trying to fix this in kernfs
> >>> with slightly more fine-grained locking sounds like a much more solid
> >>> approach. The linked patch talks about how the big delays happen due
> >>> to direct reclaim, and that might be limited to specific code paths
> >>> that we need to look at? As-is this feels a bit much like papering
> >>> over kernfs issues in hackish ways in sysfs users, instead of tackling
> >>> the problem at its root.
> >> Which is exactly my feeling as well, yes.
> > More and more people are using sysfs/kernfs now for things that it was
> > never designed for (i.e. high-speed statistic gathering).  That's not
> > the fault of kernfs, it's the fault of people thinking it can be used
> > for stuff like that :)
>
> I'm starting to get the feeling that we should maybe have questioned
> adding sysfs files for each exported DMA-buf a bit more. Anyway, to late
> for that. We have to live with the consequences.
>
> > But delays like this is odd, tearing down sysfs attributes should
> > normally _never_ be a fast-path that matters to system throughput.  So
> > offloading it to a workqueue makes sense as the attributes here are for
> > objects that are on the fast-path.
>
> That's what is puzzling me as well. As far as I understood Hridya
> tearing down things is not the problem, because during teardown we
> usually have a dying task where it's usually not much of a problem if
> the corpse is around for another few milliseconds until everything is
> cleaned up.


We have seen instances where the last reference to the buffer is not
dropped by the dying process but by Surfaceflinger[1].


>
> The issue happens during creation of the sysfs attribute and that's
> extremely odd because if this waits for reclaim then drivers will
> certainly wait for reclaim as well. See we need a few bytes for the
> sysfs attribute, but drivers usually need a few megabytes for the
> DMA-buf backing store before they can even export the DMA-buf.


We have been working off of traces collected from the devices of end
users to analyze the issue and currently don't have sufficient
information to understand why exactly direct reclaim affects sysfs the
way we are seeing it on the traces. We are actively trying to
reproduce the issue consistently to perform more experiments to
understand it. The DMA-BUF system heap on the Android Common Kernel
keeps a pool of pre-allocated pages in reserve and we are guessing
that it could possibly be the reason why we have not seen similar
issues with direct reclaim earlier. We will update the thread once we
have more information.

We are also working on a leaner version of the patch that uses
work_struct instead.

Regards,
Hridya

[1] : https://source.android.com/devices/graphics/surfaceflinger-windowmanager


>
> So something doesn't add up in the rational for this problem.
>
> Regards,
> Christian.
>
> >
> > thanks,
> >
> > greg k-h
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-01-12  1:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04 23:51 [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path Hridya Valsaraju
2022-01-05 15:13 ` Greg Kroah-Hartman
2022-01-05 18:38   ` Hridya Valsaraju
2022-01-06  8:59 ` Christian König
2022-01-06 19:04   ` Hridya Valsaraju
2022-01-07 10:22     ` Christian König
2022-01-07 18:17       ` Hridya Valsaraju
2022-01-07 21:25         ` Hridya Valsaraju
2022-01-10  7:28           ` Christian König
2022-01-11  6:01             ` Hridya Valsaraju
2022-01-11  8:35               ` Daniel Vetter
2022-01-11 10:58                 ` Christian König
2022-01-11 11:16                   ` Greg Kroah-Hartman
2022-01-11 11:43                     ` Christian König
2022-01-12  1:11                       ` Hridya Valsaraju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).