* sw_sync deadlock avoidance, take 3 @ 2020-07-15 10:04 Chris Wilson 2020-07-15 10:04 ` [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal Chris Wilson ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Chris Wilson @ 2020-07-15 10:04 UTC (permalink / raw) To: dri-devel; +Cc: intel-gfx dma_fence_release() objects to a fence being freed before it is signaled, so instead of playing fancy tricks to avoid handling dying requests, let's keep the syncpt alive until signaled. This neatly removes the issue with having to decouple the syncpt from the timeline upon fence release. -Chris _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal 2020-07-15 10:04 sw_sync deadlock avoidance, take 3 Chris Wilson @ 2020-07-15 10:04 ` Chris Wilson 2020-07-15 11:26 ` Bas Nieuwenhuizen 2020-07-15 10:04 ` [PATCH 2/2] dma-buf/selftests: Add locking selftests for sw_sync Chris Wilson 2020-07-15 10:23 ` sw_sync deadlock avoidance, take 3 Bas Nieuwenhuizen 2 siblings, 1 reply; 11+ messages in thread From: Chris Wilson @ 2020-07-15 10:04 UTC (permalink / raw) To: dri-devel Cc: Gustavo Padovan, intel-gfx, stable, Chris Wilson, Christian König If a signal callback releases the sw_sync fence, that will trigger a deadlock as the timeline_fence_release recurses onto the fence->lock (used both for signaling and the the timeline tree). If we always hold a reference for an unsignaled fence held by the timeline, we no longer need to detach the fence from the timeline upon release. This is only possible since commit ea4d5a270b57 ("dma-buf/sw_sync: force signal all unsignaled fences on dying timeline") where we introduced decoupling of the fences from the timeline upon release. Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Christian König <christian.koenig@amd.com> Cc: <stable@vger.kernel.org> --- drivers/dma-buf/sw_sync.c | 32 +++++++------------------------- 1 file changed, 7 insertions(+), 25 deletions(-) diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index 348b3a9170fa..4cc2ac03a84a 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -130,16 +130,7 @@ static const char *timeline_fence_get_timeline_name(struct dma_fence *fence) static void timeline_fence_release(struct dma_fence *fence) { - struct sync_pt *pt = dma_fence_to_sync_pt(fence); struct sync_timeline *parent = dma_fence_parent(fence); - unsigned long flags; - - spin_lock_irqsave(fence->lock, flags); - if (!list_empty(&pt->link)) { - list_del(&pt->link); - rb_erase(&pt->node, &parent->pt_tree); - } - spin_unlock_irqrestore(fence->lock, flags); sync_timeline_put(parent); dma_fence_free(fence); @@ -203,18 +194,11 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) if (!timeline_fence_signaled(&pt->base)) break; - list_del_init(&pt->link); + list_del(&pt->link); rb_erase(&pt->node, &obj->pt_tree); - /* - * A signal callback may release the last reference to this - * fence, causing it to be freed. That operation has to be - * last to avoid a use after free inside this loop, and must - * be after we remove the fence from the timeline in order to - * prevent deadlocking on timeline->lock inside - * timeline_fence_release(). - */ dma_fence_signal_locked(&pt->base); + dma_fence_put(&pt->base); } spin_unlock_irq(&obj->lock); @@ -261,13 +245,9 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj, } else if (cmp < 0) { p = &parent->rb_left; } else { - if (dma_fence_get_rcu(&other->base)) { - sync_timeline_put(obj); - kfree(pt); - pt = other; - goto unlock; - } - p = &parent->rb_left; + dma_fence_put(&pt->base); + pt = other; + goto unlock; } } rb_link_node(&pt->node, parent, p); @@ -278,6 +258,7 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj, parent ? &rb_entry(parent, typeof(*pt), node)->link : &obj->pt_list); } unlock: + dma_fence_get(&pt->base); /* keep a ref for the timeline */ spin_unlock_irq(&obj->lock); return pt; @@ -316,6 +297,7 @@ static int sw_sync_debugfs_release(struct inode *inode, struct file *file) list_for_each_entry_safe(pt, next, &obj->pt_list, link) { dma_fence_set_error(&pt->base, -ENOENT); dma_fence_signal_locked(&pt->base); + dma_fence_put(&pt->base); } spin_unlock_irq(&obj->lock); -- 2.20.1 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal 2020-07-15 10:04 ` [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal Chris Wilson @ 2020-07-15 11:26 ` Bas Nieuwenhuizen 0 siblings, 0 replies; 11+ messages in thread From: Bas Nieuwenhuizen @ 2020-07-15 11:26 UTC (permalink / raw) To: Chris Wilson Cc: Gustavo Padovan, intel-gfx, stable, Christian König, ML dri-devel Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> On Wed, Jul 15, 2020 at 12:04 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > If a signal callback releases the sw_sync fence, that will trigger a > deadlock as the timeline_fence_release recurses onto the fence->lock > (used both for signaling and the the timeline tree). > > If we always hold a reference for an unsignaled fence held by the > timeline, we no longer need to detach the fence from the timeline upon > release. This is only possible since commit ea4d5a270b57 > ("dma-buf/sw_sync: force signal all unsignaled fences on dying timeline") > where we introduced decoupling of the fences from the timeline upon release. > > Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Sumit Semwal <sumit.semwal@linaro.org> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Gustavo Padovan <gustavo@padovan.org> > Cc: Christian König <christian.koenig@amd.com> > Cc: <stable@vger.kernel.org> > --- > drivers/dma-buf/sw_sync.c | 32 +++++++------------------------- > 1 file changed, 7 insertions(+), 25 deletions(-) > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > index 348b3a9170fa..4cc2ac03a84a 100644 > --- a/drivers/dma-buf/sw_sync.c > +++ b/drivers/dma-buf/sw_sync.c > @@ -130,16 +130,7 @@ static const char *timeline_fence_get_timeline_name(struct dma_fence *fence) > > static void timeline_fence_release(struct dma_fence *fence) > { > - struct sync_pt *pt = dma_fence_to_sync_pt(fence); > struct sync_timeline *parent = dma_fence_parent(fence); > - unsigned long flags; > - > - spin_lock_irqsave(fence->lock, flags); > - if (!list_empty(&pt->link)) { > - list_del(&pt->link); > - rb_erase(&pt->node, &parent->pt_tree); > - } > - spin_unlock_irqrestore(fence->lock, flags); > > sync_timeline_put(parent); > dma_fence_free(fence); > @@ -203,18 +194,11 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > if (!timeline_fence_signaled(&pt->base)) > break; > > - list_del_init(&pt->link); > + list_del(&pt->link); > rb_erase(&pt->node, &obj->pt_tree); > > - /* > - * A signal callback may release the last reference to this > - * fence, causing it to be freed. That operation has to be > - * last to avoid a use after free inside this loop, and must > - * be after we remove the fence from the timeline in order to > - * prevent deadlocking on timeline->lock inside > - * timeline_fence_release(). > - */ > dma_fence_signal_locked(&pt->base); > + dma_fence_put(&pt->base); > } > > spin_unlock_irq(&obj->lock); > @@ -261,13 +245,9 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj, > } else if (cmp < 0) { > p = &parent->rb_left; > } else { > - if (dma_fence_get_rcu(&other->base)) { > - sync_timeline_put(obj); > - kfree(pt); > - pt = other; > - goto unlock; > - } > - p = &parent->rb_left; > + dma_fence_put(&pt->base); > + pt = other; > + goto unlock; > } > } > rb_link_node(&pt->node, parent, p); > @@ -278,6 +258,7 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj, > parent ? &rb_entry(parent, typeof(*pt), node)->link : &obj->pt_list); > } > unlock: > + dma_fence_get(&pt->base); /* keep a ref for the timeline */ > spin_unlock_irq(&obj->lock); > > return pt; > @@ -316,6 +297,7 @@ static int sw_sync_debugfs_release(struct inode *inode, struct file *file) > list_for_each_entry_safe(pt, next, &obj->pt_list, link) { > dma_fence_set_error(&pt->base, -ENOENT); > dma_fence_signal_locked(&pt->base); > + dma_fence_put(&pt->base); > } > > spin_unlock_irq(&obj->lock); > -- > 2.20.1 > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] dma-buf/selftests: Add locking selftests for sw_sync 2020-07-15 10:04 sw_sync deadlock avoidance, take 3 Chris Wilson 2020-07-15 10:04 ` [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal Chris Wilson @ 2020-07-15 10:04 ` Chris Wilson 2020-07-15 10:23 ` sw_sync deadlock avoidance, take 3 Bas Nieuwenhuizen 2 siblings, 0 replies; 11+ messages in thread From: Chris Wilson @ 2020-07-15 10:04 UTC (permalink / raw) To: dri-devel; +Cc: intel-gfx, Chris Wilson While sw_sync is purely a debug facility for userspace to create fences and timelines it can control, nevertheless it has some tricky locking semantics of its own. In particular, Bas Nieuwenhuizen reported that we had reintroduced a deadlock if a signal callback attempted to destroy the fence. So let's add a few trivial selftests to make sure that once fixed again, it stays fixed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> --- drivers/dma-buf/Makefile | 3 +- drivers/dma-buf/selftests.h | 1 + drivers/dma-buf/st-sw_sync.c | 279 +++++++++++++++++++++++++++++++++++ drivers/dma-buf/sw_sync.c | 39 +++++ drivers/dma-buf/sync_debug.h | 8 + 5 files changed, 329 insertions(+), 1 deletion(-) create mode 100644 drivers/dma-buf/st-sw_sync.c diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index 995e05f609ff..9be4d4611609 100644 --- a/drivers/dma-buf/Makefile +++ b/drivers/dma-buf/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_UDMABUF) += udmabuf.o dmabuf_selftests-y := \ selftest.o \ st-dma-fence.o \ - st-dma-fence-chain.o + st-dma-fence-chain.o \ + st-sw_sync.o obj-$(CONFIG_DMABUF_SELFTESTS) += dmabuf_selftests.o diff --git a/drivers/dma-buf/selftests.h b/drivers/dma-buf/selftests.h index bc8cea67bf1e..232499a24872 100644 --- a/drivers/dma-buf/selftests.h +++ b/drivers/dma-buf/selftests.h @@ -12,3 +12,4 @@ selftest(sanitycheck, __sanitycheck__) /* keep first (igt selfcheck) */ selftest(dma_fence, dma_fence) selftest(dma_fence_chain, dma_fence_chain) +selftest(sw_sync, sw_sync) diff --git a/drivers/dma-buf/st-sw_sync.c b/drivers/dma-buf/st-sw_sync.c new file mode 100644 index 000000000000..145fd330f1c6 --- /dev/null +++ b/drivers/dma-buf/st-sw_sync.c @@ -0,0 +1,279 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include <linux/delay.h> +#include <linux/dma-fence.h> +#include <linux/kernel.h> +#include <linux/kthread.h> +#include <linux/sched/signal.h> +#include <linux/slab.h> +#include <linux/spinlock.h> + +#include "sync_debug.h" +#include "selftest.h" + +static int sanitycheck(void *arg) +{ + struct sync_timeline *tl; + struct dma_fence *f; + int err = -ENOMEM; + + /* Quick check we can create the timeline and syncpt */ + + tl = st_sync_timeline_create("mock"); + if (!tl) + return -ENOMEM; + + f = st_sync_pt_create(tl, 1); + if (!f) + goto out; + + dma_fence_signal(f); + dma_fence_put(f); + + err = 0; +out: + st_sync_timeline_put(tl); + return err; +} + +static int signal(void *arg) +{ + struct sync_timeline *tl; + struct dma_fence *f; + int err = -EINVAL; + + /* Check that the syncpt fence is signaled when the timeline advances */ + + tl = st_sync_timeline_create("mock"); + if (!tl) + return -ENOMEM; + + f = st_sync_pt_create(tl, 1); + if (!f) { + err = -ENOMEM; + goto out; + } + + if (dma_fence_is_signaled(f)) { + pr_err("syncpt:%lld signaled too early\n", f->seqno); + goto out_fence; + } + + st_sync_timeline_signal(tl, 1); + + if (!dma_fence_is_signaled(f)) { + pr_err("syncpt:%lld not signaled after increment\n", f->seqno); + goto out_fence; + } + + err = 0; +out_fence: + dma_fence_signal(f); + dma_fence_put(f); +out: + st_sync_timeline_put(tl); + return err; +} + +struct cb_destroy { + struct dma_fence_cb cb; + struct dma_fence *f; +}; + +static void cb_destroy(struct dma_fence *fence, struct dma_fence_cb *_cb) +{ + struct cb_destroy *cb = container_of(_cb, typeof(*cb), cb); + + pr_info("syncpt:%llx destroying syncpt:%llx\n", + fence->seqno, cb->f->seqno); + dma_fence_put(cb->f); + cb->f = NULL; +} + +static int cb_autodestroy(void *arg) +{ + struct sync_timeline *tl; + struct cb_destroy cb; + int err = -EINVAL; + + /* Check that we can drop the final syncpt reference from a callback */ + + tl = st_sync_timeline_create("mock"); + if (!tl) + return -ENOMEM; + + cb.f = st_sync_pt_create(tl, 1); + if (!cb.f) { + err = -ENOMEM; + goto out; + } + + if (dma_fence_add_callback(cb.f, &cb.cb, cb_destroy)) { + pr_err("syncpt:%lld signaled before increment\n", cb.f->seqno); + goto out; + } + + st_sync_timeline_signal(tl, 1); + if (cb.f) { + pr_err("syncpt:%lld callback not run\n", cb.f->seqno); + dma_fence_put(cb.f); + goto out; + } + + err = 0; +out: + st_sync_timeline_put(tl); + return err; +} + +static int cb_destroy_12(void *arg) +{ + struct sync_timeline *tl; + struct cb_destroy cb; + struct dma_fence *f; + int err = -EINVAL; + + /* Check that we can drop some other syncpt reference from a callback */ + + tl = st_sync_timeline_create("mock"); + if (!tl) + return -ENOMEM; + + f = st_sync_pt_create(tl, 1); + if (!f) { + err = -ENOMEM; + goto out; + } + + cb.f = st_sync_pt_create(tl, 2); + if (!cb.f) { + err = -ENOMEM; + goto out; + } + + if (dma_fence_add_callback(f, &cb.cb, cb_destroy)) { + pr_err("syncpt:%lld signaled before increment\n", f->seqno); + goto out; + } + + st_sync_timeline_signal(tl, 1); + if (cb.f) { + pr_err("syncpt:%lld callback not run\n", f->seqno); + dma_fence_put(cb.f); + goto out_fence; + } + + err = 0; +out_fence: + dma_fence_put(f); +out: + st_sync_timeline_put(tl); + return err; +} + +static int cb_destroy_21(void *arg) +{ + struct sync_timeline *tl; + struct cb_destroy cb; + struct dma_fence *f; + int err = -EINVAL; + + /* Check that we can drop an earlier syncpt reference from a callback */ + + tl = st_sync_timeline_create("mock"); + if (!tl) + return -ENOMEM; + + cb.f = st_sync_pt_create(tl, 1); + if (!cb.f) { + err = -ENOMEM; + goto out; + } + + f = st_sync_pt_create(tl, 2); + if (!f) { + err = -ENOMEM; + goto out; + } + + if (dma_fence_add_callback(f, &cb.cb, cb_destroy)) { + pr_err("syncpt:%lld signaled before increment\n", f->seqno); + goto out; + } + + st_sync_timeline_signal(tl, 2); + if (cb.f) { + pr_err("syncpt:%lld callback not run\n", f->seqno); + dma_fence_put(cb.f); + goto out_fence; + } + + err = 0; +out_fence: + dma_fence_put(f); +out: + st_sync_timeline_put(tl); + return err; +} + +static int cb_destroy_22(void *arg) +{ + struct sync_timeline *tl; + struct cb_destroy cb; + struct dma_fence *f; + int err = -EINVAL; + + /* Check that we can drop the later syncpt reference from a callback */ + + tl = st_sync_timeline_create("mock"); + if (!tl) + return -ENOMEM; + + f = st_sync_pt_create(tl, 1); + if (!f) { + err = -ENOMEM; + goto out; + } + + cb.f = st_sync_pt_create(tl, 2); + if (!cb.f) { + err = -ENOMEM; + goto out; + } + + if (dma_fence_add_callback(f, &cb.cb, cb_destroy)) { + pr_err("syncpt:%lld signaled before increment\n", f->seqno); + goto out; + } + + st_sync_timeline_signal(tl, 2); + if (cb.f) { + pr_err("syncpt:%lld callback not run\n", f->seqno); + dma_fence_put(cb.f); + goto out_fence; + } + + err = 0; +out_fence: + dma_fence_put(f); +out: + st_sync_timeline_put(tl); + return err; +} + +int sw_sync(void) +{ + static const struct subtest tests[] = { + SUBTEST(sanitycheck), + SUBTEST(signal), + SUBTEST(cb_autodestroy), + SUBTEST(cb_destroy_12), + SUBTEST(cb_destroy_21), + SUBTEST(cb_destroy_22), + }; + + return subtests(tests, NULL); +} diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index 4cc2ac03a84a..72de93211c0c 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -392,3 +392,42 @@ const struct file_operations sw_sync_debugfs_fops = { .unlocked_ioctl = sw_sync_ioctl, .compat_ioctl = compat_ptr_ioctl, }; + +#if IS_ENABLED(CONFIG_DMABUF_SELFTESTS) +struct sync_timeline *st_sync_timeline_create(const char *name) +{ + return sync_timeline_create(name); +} +EXPORT_SYMBOL_GPL(st_sync_timeline_create); + +void st_sync_timeline_get(struct sync_timeline *tl) +{ + sync_timeline_get(tl); +} +EXPORT_SYMBOL_GPL(st_sync_timeline_get); + +void st_sync_timeline_put(struct sync_timeline *tl) +{ + sync_timeline_put(tl); +} +EXPORT_SYMBOL_GPL(st_sync_timeline_put); + +void st_sync_timeline_signal(struct sync_timeline *tl, unsigned int inc) +{ + sync_timeline_signal(tl, inc); +} +EXPORT_SYMBOL_GPL(st_sync_timeline_signal); + +struct dma_fence * +st_sync_pt_create(struct sync_timeline *tl, unsigned int seqno) +{ + struct sync_pt *pt; + + pt = sync_pt_create(tl, seqno); + if (!pt) + return NULL; + + return &pt->base; +} +EXPORT_SYMBOL_GPL(st_sync_pt_create); +#endif diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h index 6176e52ba2d7..b3e2dbaedf7e 100644 --- a/drivers/dma-buf/sync_debug.h +++ b/drivers/dma-buf/sync_debug.h @@ -69,4 +69,12 @@ void sync_timeline_debug_remove(struct sync_timeline *obj); void sync_file_debug_add(struct sync_file *fence); void sync_file_debug_remove(struct sync_file *fence); +struct sync_timeline *st_sync_timeline_create(const char *name); +void st_sync_timeline_get(struct sync_timeline *tl); +void st_sync_timeline_put(struct sync_timeline *tl); +void st_sync_timeline_signal(struct sync_timeline *tl, unsigned int inc); + +struct dma_fence * +st_sync_pt_create(struct sync_timeline *tl, unsigned int seqno); + #endif /* _LINUX_SYNC_H */ -- 2.20.1 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: sw_sync deadlock avoidance, take 3 2020-07-15 10:04 sw_sync deadlock avoidance, take 3 Chris Wilson 2020-07-15 10:04 ` [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal Chris Wilson 2020-07-15 10:04 ` [PATCH 2/2] dma-buf/selftests: Add locking selftests for sw_sync Chris Wilson @ 2020-07-15 10:23 ` Bas Nieuwenhuizen 2020-07-15 10:29 ` Daniel Stone 2020-07-15 10:34 ` Chris Wilson 2 siblings, 2 replies; 11+ messages in thread From: Bas Nieuwenhuizen @ 2020-07-15 10:23 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx, ML dri-devel Hi Chris, My concern with going in this direction was that we potentially allow an application to allocate a lot of kernel memory but not a lot of fds by creating lots of fences and then closing the fds but never signaling them. Is that not an issue? - Bas On Wed, Jul 15, 2020 at 12:04 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > dma_fence_release() objects to a fence being freed before it is > signaled, so instead of playing fancy tricks to avoid handling dying > requests, let's keep the syncpt alive until signaled. This neatly > removes the issue with having to decouple the syncpt from the timeline > upon fence release. > -Chris > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sw_sync deadlock avoidance, take 3 2020-07-15 10:23 ` sw_sync deadlock avoidance, take 3 Bas Nieuwenhuizen @ 2020-07-15 10:29 ` Daniel Stone 2020-07-15 10:34 ` Chris Wilson 1 sibling, 0 replies; 11+ messages in thread From: Daniel Stone @ 2020-07-15 10:29 UTC (permalink / raw) To: Bas Nieuwenhuizen; +Cc: intel-gfx, ML dri-devel, Chris Wilson Hi, On Wed, 15 Jul 2020 at 11:23, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > My concern with going in this direction was that we potentially allow > an application to allocate a lot of kernel memory but not a lot of fds > by creating lots of fences and then closing the fds but never > signaling them. Is that not an issue? sw_sync is a userspace DoS mechanism by design - if someone wants to enable and use it, they have bigger problems than unbounded memory allocations. Cheers, Daniel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sw_sync deadlock avoidance, take 3 2020-07-15 10:23 ` sw_sync deadlock avoidance, take 3 Bas Nieuwenhuizen 2020-07-15 10:29 ` Daniel Stone @ 2020-07-15 10:34 ` Chris Wilson 2020-07-15 11:05 ` Bas Nieuwenhuizen 1 sibling, 1 reply; 11+ messages in thread From: Chris Wilson @ 2020-07-15 10:34 UTC (permalink / raw) To: Bas Nieuwenhuizen; +Cc: intel-gfx, ML dri-devel Quoting Bas Nieuwenhuizen (2020-07-15 11:23:35) > Hi Chris, > > My concern with going in this direction was that we potentially allow > an application to allocate a lot of kernel memory but not a lot of fds > by creating lots of fences and then closing the fds but never > signaling them. Is that not an issue? I did look to see if there was a quick way we could couple into the sync_file release itself to remove the syncpt from the timeline, but decided that for a debug feature, it wasn't a pressing concern. Maybe now is the time to ask: are you using sw_sync outside of validation? -Chris _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sw_sync deadlock avoidance, take 3 2020-07-15 10:34 ` Chris Wilson @ 2020-07-15 11:05 ` Bas Nieuwenhuizen 2020-07-15 11:47 ` [Intel-gfx] " Daniel Stone 0 siblings, 1 reply; 11+ messages in thread From: Bas Nieuwenhuizen @ 2020-07-15 11:05 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx, ML dri-devel On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Quoting Bas Nieuwenhuizen (2020-07-15 11:23:35) > > Hi Chris, > > > > My concern with going in this direction was that we potentially allow > > an application to allocate a lot of kernel memory but not a lot of fds > > by creating lots of fences and then closing the fds but never > > signaling them. Is that not an issue? > > I did look to see if there was a quick way we could couple into the > sync_file release itself to remove the syncpt from the timeline, but > decided that for a debug feature, it wasn't a pressing concern. > > Maybe now is the time to ask: are you using sw_sync outside of > validation? Yes, this is used as part of the Android stack on Chrome OS (need to see if ChromeOS specific, but https://source.android.com/devices/graphics/sync#sync_timeline suggests not) > -Chris _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-gfx] sw_sync deadlock avoidance, take 3 2020-07-15 11:05 ` Bas Nieuwenhuizen @ 2020-07-15 11:47 ` Daniel Stone 2020-07-15 11:57 ` Daniel Vetter 0 siblings, 1 reply; 11+ messages in thread From: Daniel Stone @ 2020-07-15 11:47 UTC (permalink / raw) To: Bas Nieuwenhuizen; +Cc: intel-gfx, ML dri-devel, Chris Wilson Hi, On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Maybe now is the time to ask: are you using sw_sync outside of > > validation? > > Yes, this is used as part of the Android stack on Chrome OS (need to > see if ChromeOS specific, but > https://source.android.com/devices/graphics/sync#sync_timeline > suggests not) Android used to mandate it for their earlier iteration of release fences, which was an empty/future fence having no guarantee of eventual forward progress until someone committed work later on. For example, when you committed a buffer to SF, it would give you an empty 'release fence' for that buffer which would only be tied to work to signal it when you committed your _next_ buffer, which might never happen. They removed that because a) future fences were a bad idea, and b) it was only ever useful if you assumed strictly FIFO/round-robin return order which wasn't always true. So now it's been watered down to 'use this if you don't have a hardware timeline', but why don't we work with Android people to get that removed entirely? Cheers, Daniel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-gfx] sw_sync deadlock avoidance, take 3 2020-07-15 11:47 ` [Intel-gfx] " Daniel Stone @ 2020-07-15 11:57 ` Daniel Vetter 2020-07-17 0:24 ` Daniel Stone 0 siblings, 1 reply; 11+ messages in thread From: Daniel Vetter @ 2020-07-15 11:57 UTC (permalink / raw) To: Daniel Stone, Rob Herring, John Stultz Cc: intel-gfx, Chris Wilson, ML dri-devel On Wed, Jul 15, 2020 at 1:47 PM Daniel Stone <daniel@fooishbar.org> wrote: > > Hi, > > On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > On Wed, Jul 15, 2020 at 12:34 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > Maybe now is the time to ask: are you using sw_sync outside of > > > validation? > > > > Yes, this is used as part of the Android stack on Chrome OS (need to > > see if ChromeOS specific, but > > https://source.android.com/devices/graphics/sync#sync_timeline > > suggests not) > > Android used to mandate it for their earlier iteration of release > fences, which was an empty/future fence having no guarantee of > eventual forward progress until someone committed work later on. For > example, when you committed a buffer to SF, it would give you an empty > 'release fence' for that buffer which would only be tied to work to > signal it when you committed your _next_ buffer, which might never > happen. They removed that because a) future fences were a bad idea, > and b) it was only ever useful if you assumed strictly > FIFO/round-robin return order which wasn't always true. > > So now it's been watered down to 'use this if you don't have a > hardware timeline', but why don't we work with Android people to get > that removed entirely? I think there's some testcases still using these, but most real fence testcases use vgem nowadays. So from an upstream pov there's indeed not much if anything holding us back from just deleting this all. And would probably be a good idea. Adding Rob and John for more of the android pov. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-gfx] sw_sync deadlock avoidance, take 3 2020-07-15 11:57 ` Daniel Vetter @ 2020-07-17 0:24 ` Daniel Stone 0 siblings, 0 replies; 11+ messages in thread From: Daniel Stone @ 2020-07-17 0:24 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx, ML dri-devel, Chris Wilson Hi all, On Wed, 15 Jul 2020 at 12:57, Daniel Vetter <daniel@ffwll.ch> wrote: > On Wed, Jul 15, 2020 at 1:47 PM Daniel Stone <daniel@fooishbar.org> wrote: > > On Wed, 15 Jul 2020 at 12:05, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > > Yes, this is used as part of the Android stack on Chrome OS (need to > > > see if ChromeOS specific, but > > > https://source.android.com/devices/graphics/sync#sync_timeline > > > suggests not) > > > > Android used to mandate it for their earlier iteration of release > > fences, which was an empty/future fence having no guarantee of > > eventual forward progress until someone committed work later on. For > > example, when you committed a buffer to SF, it would give you an empty > > 'release fence' for that buffer which would only be tied to work to > > signal it when you committed your _next_ buffer, which might never > > happen. They removed that because a) future fences were a bad idea, > > and b) it was only ever useful if you assumed strictly > > FIFO/round-robin return order which wasn't always true. > > > > So now it's been watered down to 'use this if you don't have a > > hardware timeline', but why don't we work with Android people to get > > that removed entirely? > > I think there's some testcases still using these, but most real fence > testcases use vgem nowadays. So from an upstream pov there's indeed > not much if anything holding us back from just deleting this all. And > would probably be a good idea. It looks like this is just a docs hangover which can be fixed; sw_sync is no longer part of the unified Android kernel image, so it can no longer be relied on post-Treble. So let's just continue on the assumption that sw_sync is not anything anyone can rely on. Cheers, Daniel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-07-17 0:24 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-15 10:04 sw_sync deadlock avoidance, take 3 Chris Wilson 2020-07-15 10:04 ` [PATCH 1/2] dma-buf/sw_sync: Avoid recursive lock during fence signal Chris Wilson 2020-07-15 11:26 ` Bas Nieuwenhuizen 2020-07-15 10:04 ` [PATCH 2/2] dma-buf/selftests: Add locking selftests for sw_sync Chris Wilson 2020-07-15 10:23 ` sw_sync deadlock avoidance, take 3 Bas Nieuwenhuizen 2020-07-15 10:29 ` Daniel Stone 2020-07-15 10:34 ` Chris Wilson 2020-07-15 11:05 ` Bas Nieuwenhuizen 2020-07-15 11:47 ` [Intel-gfx] " Daniel Stone 2020-07-15 11:57 ` Daniel Vetter 2020-07-17 0:24 ` Daniel Stone
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).