linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] android: fix warning when releasing active sync point
@ 2015-12-15  1:29 Dmitry Torokhov
  2015-12-15  9:26 ` Daniel Vetter
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dmitry Torokhov @ 2015-12-15  1:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sumit Semwal, Arve Hjønnevåg, Riley Andrews,
	Andrew Bresticker, Maarten Lankhorst, linux-media, dri-devel,
	linux-kernel, devel

Userspace can close the sync device while there are still active fence
points, in which case kernel produces the following warning:

[   43.853176] ------------[ cut here ]------------
[   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
[   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
[   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
[   43.885834] Call trace:
[   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
[   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
[   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
[   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
[   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
[   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
[   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
[   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
[   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
[   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
[   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
[   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
[   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
[   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---

Let's fix it by introducing a new optional callback (disable_signaling)
to fence operations so that drivers can do proper clean ups when we
remove last callback for given fence.

Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
---
 drivers/dma-buf/fence.c        | 6 +++++-
 drivers/staging/android/sync.c | 8 ++++++++
 include/linux/fence.h          | 2 ++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
index 7b05dbe..0ed73ad 100644
--- a/drivers/dma-buf/fence.c
+++ b/drivers/dma-buf/fence.c
@@ -304,8 +304,12 @@ fence_remove_callback(struct fence *fence, struct fence_cb *cb)
 	spin_lock_irqsave(fence->lock, flags);
 
 	ret = !list_empty(&cb->node);
-	if (ret)
+	if (ret) {
 		list_del_init(&cb->node);
+		if (list_empty(&fence->cb_list))
+			if (fence->ops->disable_signaling)
+				fence->ops->disable_signaling(fence);
+	}
 
 	spin_unlock_irqrestore(fence->lock, flags);
 
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index e0c1acb..f8566c1 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -465,6 +465,13 @@ static bool android_fence_enable_signaling(struct fence *fence)
 	return true;
 }
 
+static void android_fence_disable_signaling(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+
+	list_del_init(&pt->active_list);
+}
+
 static int android_fence_fill_driver_data(struct fence *fence,
 					  void *data, int size)
 {
@@ -508,6 +515,7 @@ static const struct fence_ops android_fence_ops = {
 	.get_driver_name = android_fence_get_driver_name,
 	.get_timeline_name = android_fence_get_timeline_name,
 	.enable_signaling = android_fence_enable_signaling,
+	.disable_signaling = android_fence_disable_signaling,
 	.signaled = android_fence_signaled,
 	.wait = fence_default_wait,
 	.release = android_fence_release,
diff --git a/include/linux/fence.h b/include/linux/fence.h
index bb52201..ce44348 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -107,6 +107,7 @@ struct fence_cb {
  * @get_driver_name: returns the driver name.
  * @get_timeline_name: return the name of the context this fence belongs to.
  * @enable_signaling: enable software signaling of fence.
+ * @disable_signaling: disable software signaling of fence (optional).
  * @signaled: [optional] peek whether the fence is signaled, can be null.
  * @wait: custom wait implementation, or fence_default_wait.
  * @release: [optional] called on destruction of fence, can be null
@@ -166,6 +167,7 @@ struct fence_ops {
 	const char * (*get_driver_name)(struct fence *fence);
 	const char * (*get_timeline_name)(struct fence *fence);
 	bool (*enable_signaling)(struct fence *fence);
+	void (*disable_signaling)(struct fence *fence);
 	bool (*signaled)(struct fence *fence);
 	signed long (*wait)(struct fence *fence, bool intr, signed long timeout);
 	void (*release)(struct fence *fence);
-- 
2.6.0.rc2.230.g3dd15c0


-- 
Dmitry

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15  1:29 [PATCH] android: fix warning when releasing active sync point Dmitry Torokhov
@ 2015-12-15  9:26 ` Daniel Vetter
  2015-12-15 17:17   ` Dmitry Torokhov
  2015-12-15 19:00   ` Gustavo Padovan
  2015-12-15 10:01 ` Maarten Lankhorst
  2015-12-15 13:30 ` Gustavo Padovan
  2 siblings, 2 replies; 14+ messages in thread
From: Daniel Vetter @ 2015-12-15  9:26 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Greg Kroah-Hartman, devel, Andrew Bresticker,
	Arve Hjønnevåg, dri-devel, linux-kernel, Riley Andrews,
	linux-media

On Mon, Dec 14, 2015 at 05:29:55PM -0800, Dmitry Torokhov wrote:
> Userspace can close the sync device while there are still active fence
> points, in which case kernel produces the following warning:
> 
> [   43.853176] ------------[ cut here ]------------
> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
> [   43.885834] Call trace:
> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
> 
> Let's fix it by introducing a new optional callback (disable_signaling)
> to fence operations so that drivers can do proper clean ups when we
> remove last callback for given fence.
> 
> Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
> Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
> ---
>  drivers/dma-buf/fence.c        | 6 +++++-
>  drivers/staging/android/sync.c | 8 ++++++++
>  include/linux/fence.h          | 2 ++
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
> index 7b05dbe..0ed73ad 100644
> --- a/drivers/dma-buf/fence.c
> +++ b/drivers/dma-buf/fence.c
> @@ -304,8 +304,12 @@ fence_remove_callback(struct fence *fence, struct fence_cb *cb)
>  	spin_lock_irqsave(fence->lock, flags);
>  
>  	ret = !list_empty(&cb->node);
> -	if (ret)
> +	if (ret) {
>  		list_del_init(&cb->node);
> +		if (list_empty(&fence->cb_list))
> +			if (fence->ops->disable_signaling)
> +				fence->ops->disable_signaling(fence);

What exactly is the bug here? A fence with no callbacks registered any
more shouldn't have any problem. Why exactly does this blow up?

I guess I don't really understand the bug ... we do seem to remove the
callback already.

Thanks, Daniel


> +	}
>  
>  	spin_unlock_irqrestore(fence->lock, flags);
>  
> diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
> index e0c1acb..f8566c1 100644
> --- a/drivers/staging/android/sync.c
> +++ b/drivers/staging/android/sync.c
> @@ -465,6 +465,13 @@ static bool android_fence_enable_signaling(struct fence *fence)
>  	return true;
>  }
>  
> +static void android_fence_disable_signaling(struct fence *fence)
> +{
> +	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
> +
> +	list_del_init(&pt->active_list);
> +}
> +
>  static int android_fence_fill_driver_data(struct fence *fence,
>  					  void *data, int size)
>  {
> @@ -508,6 +515,7 @@ static const struct fence_ops android_fence_ops = {
>  	.get_driver_name = android_fence_get_driver_name,
>  	.get_timeline_name = android_fence_get_timeline_name,
>  	.enable_signaling = android_fence_enable_signaling,
> +	.disable_signaling = android_fence_disable_signaling,
>  	.signaled = android_fence_signaled,
>  	.wait = fence_default_wait,
>  	.release = android_fence_release,
> diff --git a/include/linux/fence.h b/include/linux/fence.h
> index bb52201..ce44348 100644
> --- a/include/linux/fence.h
> +++ b/include/linux/fence.h
> @@ -107,6 +107,7 @@ struct fence_cb {
>   * @get_driver_name: returns the driver name.
>   * @get_timeline_name: return the name of the context this fence belongs to.
>   * @enable_signaling: enable software signaling of fence.
> + * @disable_signaling: disable software signaling of fence (optional).
>   * @signaled: [optional] peek whether the fence is signaled, can be null.
>   * @wait: custom wait implementation, or fence_default_wait.
>   * @release: [optional] called on destruction of fence, can be null
> @@ -166,6 +167,7 @@ struct fence_ops {
>  	const char * (*get_driver_name)(struct fence *fence);
>  	const char * (*get_timeline_name)(struct fence *fence);
>  	bool (*enable_signaling)(struct fence *fence);
> +	void (*disable_signaling)(struct fence *fence);
>  	bool (*signaled)(struct fence *fence);
>  	signed long (*wait)(struct fence *fence, bool intr, signed long timeout);
>  	void (*release)(struct fence *fence);
> -- 
> 2.6.0.rc2.230.g3dd15c0
> 
> 
> -- 
> Dmitry
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15  1:29 [PATCH] android: fix warning when releasing active sync point Dmitry Torokhov
  2015-12-15  9:26 ` Daniel Vetter
@ 2015-12-15 10:01 ` Maarten Lankhorst
  2015-12-15 17:19   ` Dmitry Torokhov
  2015-12-15 13:30 ` Gustavo Padovan
  2 siblings, 1 reply; 14+ messages in thread
From: Maarten Lankhorst @ 2015-12-15 10:01 UTC (permalink / raw)
  To: Dmitry Torokhov, Greg Kroah-Hartman
  Cc: Sumit Semwal, Arve Hjønnevåg, Riley Andrews,
	Andrew Bresticker, linux-media, dri-devel, linux-kernel, devel

Op 15-12-15 om 02:29 schreef Dmitry Torokhov:
> Userspace can close the sync device while there are still active fence
> points, in which case kernel produces the following warning:
>
> [   43.853176] ------------[ cut here ]------------
> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
> [   43.885834] Call trace:
> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
>
> Let's fix it by introducing a new optional callback (disable_signaling)
> to fence operations so that drivers can do proper clean ups when we
> remove last callback for given fence.
>
> Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
> Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
>
NACK! There's no way to do this race free.
The driver should hold a refcount until fence is signaled.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15  1:29 [PATCH] android: fix warning when releasing active sync point Dmitry Torokhov
  2015-12-15  9:26 ` Daniel Vetter
  2015-12-15 10:01 ` Maarten Lankhorst
@ 2015-12-15 13:30 ` Gustavo Padovan
  2015-12-15 13:50   ` Frank Binns
  2015-12-15 17:22   ` Dmitry Torokhov
  2 siblings, 2 replies; 14+ messages in thread
From: Gustavo Padovan @ 2015-12-15 13:30 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Greg Kroah-Hartman, devel, Andrew Bresticker,
	Arve Hjønnevåg, dri-devel, linux-kernel, Riley Andrews,
	linux-media

2015-12-14 Dmitry Torokhov <dtor@chromium.org>:

> Userspace can close the sync device while there are still active fence
> points, in which case kernel produces the following warning:
> 
> [   43.853176] ------------[ cut here ]------------
> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
> [   43.885834] Call trace:
> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---

This crash report seems to be for a 3.18 kernel. Can you reproduce it
on upstream kernel as well?

	Gustavo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15 13:30 ` Gustavo Padovan
@ 2015-12-15 13:50   ` Frank Binns
  2015-12-15 17:21     ` Dmitry Torokhov
  2015-12-15 17:22   ` Dmitry Torokhov
  1 sibling, 1 reply; 14+ messages in thread
From: Frank Binns @ 2015-12-15 13:50 UTC (permalink / raw)
  To: Gustavo Padovan, Dmitry Torokhov, Greg Kroah-Hartman, devel,
	Andrew Bresticker, Arve Hjønnevåg, dri-devel,
	linux-kernel, Riley Andrews, linux-media

Is this not the issue fixed by 8e43c9c75?

Thanks
Frank

On 15/12/15 13:30, Gustavo Padovan wrote:
> 2015-12-14 Dmitry Torokhov <dtor@chromium.org>:
>
>> Userspace can close the sync device while there are still active fence
>> points, in which case kernel produces the following warning:
>>
>> [   43.853176] ------------[ cut here ]------------
>> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
>> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
>> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
>> [   43.885834] Call trace:
>> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
>> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
>> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
>> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
>> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
>> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
>> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
>> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
>> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
>> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
>> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
>> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
>> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
>> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
> This crash report seems to be for a 3.18 kernel. Can you reproduce it
> on upstream kernel as well?
>
> 	Gustavo
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15  9:26 ` Daniel Vetter
@ 2015-12-15 17:17   ` Dmitry Torokhov
  2015-12-15 19:00   ` Gustavo Padovan
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Torokhov @ 2015-12-15 17:17 UTC (permalink / raw)
  To: Dmitry Torokhov, Greg Kroah-Hartman, devel, Andrew Bresticker,
	Arve Hjønnevåg, dri-devel, linux-kernel, Riley Andrews,
	linux-media

On Tue, Dec 15, 2015 at 1:26 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Dec 14, 2015 at 05:29:55PM -0800, Dmitry Torokhov wrote:
>> Userspace can close the sync device while there are still active fence
>> points, in which case kernel produces the following warning:
>>
>> [   43.853176] ------------[ cut here ]------------
>> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
>> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
>> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
>> [   43.885834] Call trace:
>> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
>> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
>> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
>> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
>> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
>> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
>> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
>> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
>> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
>> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
>> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
>> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
>> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
>> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
>>
>> Let's fix it by introducing a new optional callback (disable_signaling)
>> to fence operations so that drivers can do proper clean ups when we
>> remove last callback for given fence.
>>
>> Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
>> Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
>> ---
>>  drivers/dma-buf/fence.c        | 6 +++++-
>>  drivers/staging/android/sync.c | 8 ++++++++
>>  include/linux/fence.h          | 2 ++
>>  3 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
>> index 7b05dbe..0ed73ad 100644
>> --- a/drivers/dma-buf/fence.c
>> +++ b/drivers/dma-buf/fence.c
>> @@ -304,8 +304,12 @@ fence_remove_callback(struct fence *fence, struct fence_cb *cb)
>>       spin_lock_irqsave(fence->lock, flags);
>>
>>       ret = !list_empty(&cb->node);
>> -     if (ret)
>> +     if (ret) {
>>               list_del_init(&cb->node);
>> +             if (list_empty(&fence->cb_list))
>> +                     if (fence->ops->disable_signaling)
>> +                             fence->ops->disable_signaling(fence);
>
> What exactly is the bug here? A fence with no callbacks registered any
> more shouldn't have any problem. Why exactly does this blow up?
>
> I guess I don't really understand the bug ... we do seem to remove the
> callback already.
>

The issue is that when enabling signalling in sync driver we put fence
on an internal list in the driver and there is no way of taking it off
this list, except when it is signalled. The driver, when destroying
the fence, checks if the fence is not on this list (as a sanity
measure) and that produces the backtrace in the commit log.

IOW for some drivers we need an "undo" for enable_signaling() callback
so that drivers can maintain consistent internal state.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15 10:01 ` Maarten Lankhorst
@ 2015-12-15 17:19   ` Dmitry Torokhov
  2015-12-16  8:36     ` Maarten Lankhorst
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Torokhov @ 2015-12-15 17:19 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Sumit Semwal,
	Arve Hjønnevå,
	Riley Andrews, Andrew Bresticker, linux-media, dri-devel,
	linux-kernel, devel

On Tue, Dec 15, 2015 at 2:01 AM, Maarten Lankhorst
<maarten.lankhorst@linux.intel.com> wrote:
> Op 15-12-15 om 02:29 schreef Dmitry Torokhov:
>> Userspace can close the sync device while there are still active fence
>> points, in which case kernel produces the following warning:
>>
>> [   43.853176] ------------[ cut here ]------------
>> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
>> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
>> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
>> [   43.885834] Call trace:
>> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
>> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
>> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
>> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
>> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
>> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
>> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
>> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
>> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
>> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
>> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
>> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
>> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
>> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
>>
>> Let's fix it by introducing a new optional callback (disable_signaling)
>> to fence operations so that drivers can do proper clean ups when we
>> remove last callback for given fence.
>>
>> Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
>> Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
>>
> NACK! There's no way to do this race free.

Can you please explain the race because as far as I can see there is not one.

> The driver should hold a refcount until fence is signaled.

If we are no longer interested in fence why do we need to wait for the
fence to be signaled?

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15 13:50   ` Frank Binns
@ 2015-12-15 17:21     ` Dmitry Torokhov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Torokhov @ 2015-12-15 17:21 UTC (permalink / raw)
  To: Frank Binns
  Cc: Gustavo Padovan, Dmitry Torokhov, Greg Kroah-Hartman, devel,
	Andrew Bresticker, Arve Hjønnevåg, dri-devel,
	linux-kernel, Riley Andrews, linux-media

On Tue, Dec 15, 2015 at 5:50 AM, Frank Binns <frank.binns@imgtec.com> wrote:
> Is this not the issue fixed by 8e43c9c75?

No because if we start teardown without waiting for the fence to be
signaled it will still be on the active_list.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15 13:30 ` Gustavo Padovan
  2015-12-15 13:50   ` Frank Binns
@ 2015-12-15 17:22   ` Dmitry Torokhov
  2015-12-16 15:37     ` Daniel Vetter
  1 sibling, 1 reply; 14+ messages in thread
From: Dmitry Torokhov @ 2015-12-15 17:22 UTC (permalink / raw)
  To: Gustavo Padovan, Dmitry Torokhov, Greg Kroah-Hartman, devel,
	Andrew Bresticker, Arve Hjønnevåg, dri-devel,
	linux-kernel, Riley Andrews, linux-media

On Tue, Dec 15, 2015 at 5:30 AM, Gustavo Padovan <gustavo@padovan.org> wrote:
> 2015-12-14 Dmitry Torokhov <dtor@chromium.org>:
>
>> Userspace can close the sync device while there are still active fence
>> points, in which case kernel produces the following warning:
>>
>> [   43.853176] ------------[ cut here ]------------
>> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
>> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
>> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
>> [   43.885834] Call trace:
>> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
>> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
>> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
>> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
>> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
>> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
>> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
>> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
>> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
>> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
>> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
>> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
>> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
>> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
>
> This crash report seems to be for a 3.18 kernel. Can you reproduce it
> on upstream kernel as well?

Unfortunately this board does not run upsrteam just yet, but looking
at the sync driver and fence code we are pretty much in sync with
upstream.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15  9:26 ` Daniel Vetter
  2015-12-15 17:17   ` Dmitry Torokhov
@ 2015-12-15 19:00   ` Gustavo Padovan
  2015-12-15 19:08     ` Dmitry Torokhov
  1 sibling, 1 reply; 14+ messages in thread
From: Gustavo Padovan @ 2015-12-15 19:00 UTC (permalink / raw)
  To: Dmitry Torokhov, Greg Kroah-Hartman, devel, Andrew Bresticker,
	Arve Hjønnevåg, dri-devel, linux-kernel, Riley Andrews,
	linux-media

2015-12-15 Daniel Vetter <daniel@ffwll.ch>:

> On Mon, Dec 14, 2015 at 05:29:55PM -0800, Dmitry Torokhov wrote:
> > Userspace can close the sync device while there are still active fence
> > points, in which case kernel produces the following warning:
> > 
> > [   43.853176] ------------[ cut here ]------------
> > [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
> > [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
> > [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
> > [   43.885834] Call trace:
> > [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
> > [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
> > [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
> > [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
> > [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
> > [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
> > [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
> > [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
> > [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
> > [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
> > [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
> > [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
> > [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
> > [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
> > 
> > Let's fix it by introducing a new optional callback (disable_signaling)
> > to fence operations so that drivers can do proper clean ups when we
> > remove last callback for given fence.
> > 
> > Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
> > Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
> > ---
> >  drivers/dma-buf/fence.c        | 6 +++++-
> >  drivers/staging/android/sync.c | 8 ++++++++
> >  include/linux/fence.h          | 2 ++
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
> > index 7b05dbe..0ed73ad 100644
> > --- a/drivers/dma-buf/fence.c
> > +++ b/drivers/dma-buf/fence.c
> > @@ -304,8 +304,12 @@ fence_remove_callback(struct fence *fence, struct fence_cb *cb)
> >  	spin_lock_irqsave(fence->lock, flags);
> >  
> >  	ret = !list_empty(&cb->node);
> > -	if (ret)
> > +	if (ret) {
> >  		list_del_init(&cb->node);
> > +		if (list_empty(&fence->cb_list))
> > +			if (fence->ops->disable_signaling)
> > +				fence->ops->disable_signaling(fence);
> 
> What exactly is the bug here? A fence with no callbacks registered any
> more shouldn't have any problem. Why exactly does this blow up?

The WARN_ON is probably this one:
https://android.googlesource.com/kernel/common/+/android-3.18/drivers/staging/android/sync.c#433

I've been wondering in the last few days if this warning is really
necessary. If the user is closing a sync_timeline that has unsignalled
fences it should probably be aware of that already. Then I think it is
okay to remove the the sync_pt from the active_list at the release-time.
In fact I've already prepared a patch doing that. Thoughts?  

	Gustavo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15 19:00   ` Gustavo Padovan
@ 2015-12-15 19:08     ` Dmitry Torokhov
  2015-12-16 15:36       ` Daniel Vetter
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Torokhov @ 2015-12-15 19:08 UTC (permalink / raw)
  To: Gustavo Padovan, Dmitry Torokhov, Greg Kroah-Hartman, devel,
	Andrew Bresticker, Arve Hjønnevåg, dri-devel,
	linux-kernel, Riley Andrews, linux-media

On Tue, Dec 15, 2015 at 11:00 AM, Gustavo Padovan <gustavo@padovan.org> wrote:
> 2015-12-15 Daniel Vetter <daniel@ffwll.ch>:
>
>> On Mon, Dec 14, 2015 at 05:29:55PM -0800, Dmitry Torokhov wrote:
>> > Userspace can close the sync device while there are still active fence
>> > points, in which case kernel produces the following warning:
>> >
>> > [   43.853176] ------------[ cut here ]------------
>> > [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
>> > [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
>> > [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
>> > [   43.885834] Call trace:
>> > [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
>> > [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
>> > [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
>> > [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
>> > [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
>> > [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
>> > [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
>> > [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
>> > [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
>> > [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
>> > [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
>> > [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
>> > [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
>> > [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
>> >
>> > Let's fix it by introducing a new optional callback (disable_signaling)
>> > to fence operations so that drivers can do proper clean ups when we
>> > remove last callback for given fence.
>> >
>> > Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
>> > Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
>> > ---
>> >  drivers/dma-buf/fence.c        | 6 +++++-
>> >  drivers/staging/android/sync.c | 8 ++++++++
>> >  include/linux/fence.h          | 2 ++
>> >  3 files changed, 15 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
>> > index 7b05dbe..0ed73ad 100644
>> > --- a/drivers/dma-buf/fence.c
>> > +++ b/drivers/dma-buf/fence.c
>> > @@ -304,8 +304,12 @@ fence_remove_callback(struct fence *fence, struct fence_cb *cb)
>> >     spin_lock_irqsave(fence->lock, flags);
>> >
>> >     ret = !list_empty(&cb->node);
>> > -   if (ret)
>> > +   if (ret) {
>> >             list_del_init(&cb->node);
>> > +           if (list_empty(&fence->cb_list))
>> > +                   if (fence->ops->disable_signaling)
>> > +                           fence->ops->disable_signaling(fence);
>>
>> What exactly is the bug here? A fence with no callbacks registered any
>> more shouldn't have any problem. Why exactly does this blow up?
>
> The WARN_ON is probably this one:
> https://android.googlesource.com/kernel/common/+/android-3.18/drivers/staging/android/sync.c#433
>
> I've been wondering in the last few days if this warning is really
> necessary. If the user is closing a sync_timeline that has unsignalled
> fences it should probably be aware of that already. Then I think it is
> okay to remove the the sync_pt from the active_list at the release-time.
> In fact I've already prepared a patch doing that. Thoughts?
>

Maybe, but you need to make sure that you only affecting your fences.

My main objection is that still leaves fence_remove_callback() being
not mirror image of fence_add_callback().

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15 17:19   ` Dmitry Torokhov
@ 2015-12-16  8:36     ` Maarten Lankhorst
  0 siblings, 0 replies; 14+ messages in thread
From: Maarten Lankhorst @ 2015-12-16  8:36 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Greg Kroah-Hartman, Sumit Semwal, Arve Hjønnevå,
	Riley Andrews, Andrew Bresticker, linux-media, dri-devel,
	linux-kernel, devel

Op 15-12-15 om 18:19 schreef Dmitry Torokhov:
> On Tue, Dec 15, 2015 at 2:01 AM, Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com> wrote:
>> Op 15-12-15 om 02:29 schreef Dmitry Torokhov:
>>> Userspace can close the sync device while there are still active fence
>>> points, in which case kernel produces the following warning:
>>>
>>> [   43.853176] ------------[ cut here ]------------
>>> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
>>> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
>>> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
>>> [   43.885834] Call trace:
>>> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
>>> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
>>> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
>>> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
>>> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
>>> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
>>> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
>>> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
>>> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
>>> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
>>> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
>>> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
>>> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
>>> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
>>>
>>> Let's fix it by introducing a new optional callback (disable_signaling)
>>> to fence operations so that drivers can do proper clean ups when we
>>> remove last callback for given fence.
>>>
>>> Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
>>> Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
>>>
>> NACK! There's no way to do this race free.
> Can you please explain the race because as far as I can see there is not one.\
The entire code in fence.c assumes that a fence can only go from not enable_signaling to .enable_signaling. .enable_signaling is not refcounted so 2 calls to .enable_disabling and 1 to .disable_signaling would mess up.
Furthermore we try to make sure that fence_signal doesn't take locks if its unneeded. With a disable_signaling callback you would always need locks.

To get rid of these warnings make sure that there's a refcount on the fence until it's signaled.
>> The driver should hold a refcount until fence is signaled.
> If we are no longer interested in fence why do we need to wait for the
> fence to be signaled?
It's the part of the design. A driver tracks its outstanding requests and submissions, and every submission has its own fence. Before the driver releases its final ref the request should be completed or aborted. In either case it must call fence_signal.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15 19:08     ` Dmitry Torokhov
@ 2015-12-16 15:36       ` Daniel Vetter
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Vetter @ 2015-12-16 15:36 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Gustavo Padovan, Greg Kroah-Hartman, devel, Andrew Bresticker,
	Arve Hjønnevåg, dri-devel, linux-kernel, Riley Andrews,
	linux-media

On Tue, Dec 15, 2015 at 11:08:01AM -0800, Dmitry Torokhov wrote:
> On Tue, Dec 15, 2015 at 11:00 AM, Gustavo Padovan <gustavo@padovan.org> wrote:
> > 2015-12-15 Daniel Vetter <daniel@ffwll.ch>:
> >
> >> On Mon, Dec 14, 2015 at 05:29:55PM -0800, Dmitry Torokhov wrote:
> >> > Userspace can close the sync device while there are still active fence
> >> > points, in which case kernel produces the following warning:
> >> >
> >> > [   43.853176] ------------[ cut here ]------------
> >> > [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
> >> > [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
> >> > [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
> >> > [   43.885834] Call trace:
> >> > [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
> >> > [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
> >> > [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
> >> > [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
> >> > [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
> >> > [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
> >> > [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
> >> > [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
> >> > [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
> >> > [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
> >> > [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
> >> > [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
> >> > [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
> >> > [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
> >> >
> >> > Let's fix it by introducing a new optional callback (disable_signaling)
> >> > to fence operations so that drivers can do proper clean ups when we
> >> > remove last callback for given fence.
> >> >
> >> > Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
> >> > Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
> >> > ---
> >> >  drivers/dma-buf/fence.c        | 6 +++++-
> >> >  drivers/staging/android/sync.c | 8 ++++++++
> >> >  include/linux/fence.h          | 2 ++
> >> >  3 files changed, 15 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c
> >> > index 7b05dbe..0ed73ad 100644
> >> > --- a/drivers/dma-buf/fence.c
> >> > +++ b/drivers/dma-buf/fence.c
> >> > @@ -304,8 +304,12 @@ fence_remove_callback(struct fence *fence, struct fence_cb *cb)
> >> >     spin_lock_irqsave(fence->lock, flags);
> >> >
> >> >     ret = !list_empty(&cb->node);
> >> > -   if (ret)
> >> > +   if (ret) {
> >> >             list_del_init(&cb->node);
> >> > +           if (list_empty(&fence->cb_list))
> >> > +                   if (fence->ops->disable_signaling)
> >> > +                           fence->ops->disable_signaling(fence);
> >>
> >> What exactly is the bug here? A fence with no callbacks registered any
> >> more shouldn't have any problem. Why exactly does this blow up?
> >
> > The WARN_ON is probably this one:
> > https://android.googlesource.com/kernel/common/+/android-3.18/drivers/staging/android/sync.c#433
> >
> > I've been wondering in the last few days if this warning is really
> > necessary. If the user is closing a sync_timeline that has unsignalled
> > fences it should probably be aware of that already. Then I think it is
> > okay to remove the the sync_pt from the active_list at the release-time.
> > In fact I've already prepared a patch doing that. Thoughts?
> >
> 
> Maybe, but you need to make sure that you only affecting your fences.
> 
> My main objection is that still leaves fence_remove_callback() being
> not mirror image of fence_add_callback().

That's 100% intentional. I looked at the sync.c code a bit more and it
duplicates a bunch of the fence stuff still. We need to either merge that
code into the mainline struct fence logic, or remove it. There shouldn't
really be any need for the userspace ABI layer to keep track of active
fences at all. Worse, it means that you must use the sync_pt struct to be
able to export it to userspace, and can't just export any normal struct
fence object. That breaks the abstraction we're aiming for.

Imo just remove that WARN_ON for now.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] android: fix warning when releasing active sync point
  2015-12-15 17:22   ` Dmitry Torokhov
@ 2015-12-16 15:37     ` Daniel Vetter
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Vetter @ 2015-12-16 15:37 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Gustavo Padovan, Greg Kroah-Hartman, devel, Andrew Bresticker,
	Arve Hjønnevåg, dri-devel, linux-kernel, Riley Andrews,
	linux-media

On Tue, Dec 15, 2015 at 09:22:56AM -0800, Dmitry Torokhov wrote:
> On Tue, Dec 15, 2015 at 5:30 AM, Gustavo Padovan <gustavo@padovan.org> wrote:
> > 2015-12-14 Dmitry Torokhov <dtor@chromium.org>:
> >
> >> Userspace can close the sync device while there are still active fence
> >> points, in which case kernel produces the following warning:
> >>
> >> [   43.853176] ------------[ cut here ]------------
> >> [   43.857834] WARNING: CPU: 0 PID: 892 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/staging/android/sync.c:439 android_fence_release+0x88/0x104()
> >> [   43.871741] CPU: 0 PID: 892 Comm: Binder_5 Tainted: G     U 3.18.0-07661-g0550ce9 #1
> >> [   43.880176] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
> >> [   43.885834] Call trace:
> >> [   43.888294] [<ffffffc000207464>] dump_backtrace+0x0/0x10c
> >> [   43.893697] [<ffffffc000207580>] show_stack+0x10/0x1c
> >> [   43.898756] [<ffffffc000ab1258>] dump_stack+0x74/0xb8
> >> [   43.903814] [<ffffffc00021d414>] warn_slowpath_common+0x84/0xb0
> >> [   43.909736] [<ffffffc00021d530>] warn_slowpath_null+0x14/0x20
> >> [   43.915482] [<ffffffc00088aefc>] android_fence_release+0x84/0x104
> >> [   43.921582] [<ffffffc000671cc4>] fence_release+0x104/0x134
> >> [   43.927066] [<ffffffc00088b0cc>] sync_fence_free+0x74/0x9c
> >> [   43.932552] [<ffffffc00088b128>] sync_fence_release+0x34/0x48
> >> [   43.938304] [<ffffffc000317bbc>] __fput+0x100/0x1b8
> >> [   43.943185] [<ffffffc000317cc8>] ____fput+0x8/0x14
> >> [   43.947982] [<ffffffc000237f38>] task_work_run+0xb0/0xe4
> >> [   43.953297] [<ffffffc000207074>] do_notify_resume+0x44/0x5c
> >> [   43.958867] ---[ end trace 5a2aa4027cc5d171 ]---
> >
> > This crash report seems to be for a 3.18 kernel. Can you reproduce it
> > on upstream kernel as well?
> 
> Unfortunately this board does not run upsrteam just yet, but looking
> at the sync driver and fence code we are pretty much in sync with
> upstream.

Just to check: Is that with a proper hw driver, or using SW_SYNC? The
later will get removed in upstream since it's a debug/validation only
interface. Well, removed for drivers and production systems, the
kselftests will use it.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-12-16 15:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-15  1:29 [PATCH] android: fix warning when releasing active sync point Dmitry Torokhov
2015-12-15  9:26 ` Daniel Vetter
2015-12-15 17:17   ` Dmitry Torokhov
2015-12-15 19:00   ` Gustavo Padovan
2015-12-15 19:08     ` Dmitry Torokhov
2015-12-16 15:36       ` Daniel Vetter
2015-12-15 10:01 ` Maarten Lankhorst
2015-12-15 17:19   ` Dmitry Torokhov
2015-12-16  8:36     ` Maarten Lankhorst
2015-12-15 13:30 ` Gustavo Padovan
2015-12-15 13:50   ` Frank Binns
2015-12-15 17:21     ` Dmitry Torokhov
2015-12-15 17:22   ` Dmitry Torokhov
2015-12-16 15:37     ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).