From: Jordan Crouse <jcrouse@codeaurora.org> To: linux-arm-msm@vger.kernel.org Cc: "Christian König" <christian.koenig@amd.com>, "Gustavo Padovan" <gustavo@padovan.org>, "Sumit Semwal" <sumit.semwal@linaro.org>, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [RFC PATCH v1] dma-fence-array: Deal with sub-fences that are signaled late Date: Wed, 12 Aug 2020 17:55:44 -0600 [thread overview] Message-ID: <20200812235544.2289895-1-jcrouse@codeaurora.org> (raw) This is an RFC because I'm still trying to grok the correct behavior. Consider a dma_fence_array created two two fence and signal_on_any is true. A reference to dma_fence_array is taken for each waiting fence. When the client calls dma_fence_wait() only one of the fences is signaled. The client returns successfully from the wait and puts it's reference to the array fence but the array fence still remains because of the remaining un-signaled fence. Now consider that the unsignaled fence is signaled while the timeline is being destroyed much later. The timeline destroy calls dma_fence_signal_locked(). The following sequence occurs: 1) dma_fence_array_cb_func is called 2) array->num_pending is 0 (because it was set to 1 due to signal_on_any) so the callback function calls dma_fence_put() instead of triggering the irq work 3) The array fence is released which in turn puts the lingering fence which is then released 4) deadlock with the timeline I think that we can fix this with the attached patch. Once the fence is signaled signaling it again in the irq worker shouldn't hurt anything. The only gotcha might be how the error is propagated - I wasn't quite sure the intent of clearing it only after getting to the irq worker. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- drivers/dma-buf/dma-fence-array.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index d3fbd950be94..b8829b024255 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -46,8 +46,6 @@ static void irq_dma_fence_array_work(struct irq_work *wrk) { struct dma_fence_array *array = container_of(wrk, typeof(*array), work); - dma_fence_array_clear_pending_error(array); - dma_fence_signal(&array->base); dma_fence_put(&array->base); } @@ -61,10 +59,10 @@ static void dma_fence_array_cb_func(struct dma_fence *f, dma_fence_array_set_pending_error(array, f->error); - if (atomic_dec_and_test(&array->num_pending)) - irq_work_queue(&array->work); - else - dma_fence_put(&array->base); + if (!atomic_dec_and_test(&array->num_pending)) + dma_fence_array_set_pending_error(array, f->error); + + irq_work_queue(&array->work); } static bool dma_fence_array_enable_signaling(struct dma_fence *fence) -- 2.25.1
WARNING: multiple messages have this Message-ID (diff)
From: Jordan Crouse <jcrouse@codeaurora.org> To: linux-arm-msm@vger.kernel.org Cc: "Gustavo Padovan" <gustavo@padovan.org>, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, "Christian König" <christian.koenig@amd.com>, linux-media@vger.kernel.org Subject: [RFC PATCH v1] dma-fence-array: Deal with sub-fences that are signaled late Date: Wed, 12 Aug 2020 17:55:44 -0600 [thread overview] Message-ID: <20200812235544.2289895-1-jcrouse@codeaurora.org> (raw) This is an RFC because I'm still trying to grok the correct behavior. Consider a dma_fence_array created two two fence and signal_on_any is true. A reference to dma_fence_array is taken for each waiting fence. When the client calls dma_fence_wait() only one of the fences is signaled. The client returns successfully from the wait and puts it's reference to the array fence but the array fence still remains because of the remaining un-signaled fence. Now consider that the unsignaled fence is signaled while the timeline is being destroyed much later. The timeline destroy calls dma_fence_signal_locked(). The following sequence occurs: 1) dma_fence_array_cb_func is called 2) array->num_pending is 0 (because it was set to 1 due to signal_on_any) so the callback function calls dma_fence_put() instead of triggering the irq work 3) The array fence is released which in turn puts the lingering fence which is then released 4) deadlock with the timeline I think that we can fix this with the attached patch. Once the fence is signaled signaling it again in the irq worker shouldn't hurt anything. The only gotcha might be how the error is propagated - I wasn't quite sure the intent of clearing it only after getting to the irq worker. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- drivers/dma-buf/dma-fence-array.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index d3fbd950be94..b8829b024255 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -46,8 +46,6 @@ static void irq_dma_fence_array_work(struct irq_work *wrk) { struct dma_fence_array *array = container_of(wrk, typeof(*array), work); - dma_fence_array_clear_pending_error(array); - dma_fence_signal(&array->base); dma_fence_put(&array->base); } @@ -61,10 +59,10 @@ static void dma_fence_array_cb_func(struct dma_fence *f, dma_fence_array_set_pending_error(array, f->error); - if (atomic_dec_and_test(&array->num_pending)) - irq_work_queue(&array->work); - else - dma_fence_put(&array->base); + if (!atomic_dec_and_test(&array->num_pending)) + dma_fence_array_set_pending_error(array, f->error); + + irq_work_queue(&array->work); } static bool dma_fence_array_enable_signaling(struct dma_fence *fence) -- 2.25.1 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
next reply other threads:[~2020-08-12 23:55 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-08-12 23:55 Jordan Crouse [this message] 2020-08-12 23:55 ` [RFC PATCH v1] dma-fence-array: Deal with sub-fences that are signaled late Jordan Crouse 2020-08-13 6:49 ` Chris Wilson 2020-08-17 16:24 ` Jordan Crouse 2020-08-17 16:24 ` Jordan Crouse 2020-08-13 6:52 ` Christian König 2020-08-13 6:52 ` Christian König 2020-09-01 8:03 ` [dma] ee7499cf7d: igt.Subtest_busy-hang-all.fail kernel test robot 2020-09-01 8:03 ` kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200812235544.2289895-1-jcrouse@codeaurora.org \ --to=jcrouse@codeaurora.org \ --cc=christian.koenig@amd.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=gustavo@padovan.org \ --cc=linaro-mm-sig@lists.linaro.org \ --cc=linux-arm-msm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-media@vger.kernel.org \ --cc=sumit.semwal@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.