All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amd/display: move the nonblocking commit helpers appropriately
@ 2018-09-24 13:31 Shirish S
       [not found] ` <1537795907-1168-1-git-send-email-shirish.s-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Shirish S @ 2018-09-24 13:31 UTC (permalink / raw)
  To: harry.wentland-5C7GfCeVMHo, sunpeng.li-5C7GfCeVMHo
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Shirish S

[Why]
In multi-monitor scenario, if first crtc's flip done event occurs delayed
(but within timeout), due to non-blocking design of commit_tail(), there
are more than one commit's scheduled by the time the second crtc's
wait_for_completion_timeout() is called in drm_atomic_helper_wait_for_flip_done.

Due to these in-between additions and deletions in the atomic state, it is
found that the system fails while accessing common data structures of the
second crtc in drm_atomic_helper_wait_for_flip_done(), leading to crash as
below:

	BUG: unable to handle kernel paging request at 000000010000001c
	IP: do_raw_spin_lock+0xf/0x94
	PGD 0 P4D 0
	Oops: 0000 [#1] PREEMPT SMP NOPTI
	Call Trace:
	 __wait_for_common+0x36/0x60
	 drm_atomic_helper_wait_for_flip_done+0x47/0x89
	 amdgpu_dm_atomic_commit_tail+0xf4b/0xf84
	 ? drm_atomic_helper_wait_for_dependencies+0x1cd/0x217
	 commit_tail+0x41/0x5f

[How]
Move drm_atomic_helper_commit_hw_done() post wait_for_flip_done(),
which cleans up the atomic state's commit and completes pending hw_done and
flip_done works as a result there wont be dangling flip waits on random commits.

Signed-off-by: Shirish S <shirish.s@amd.com>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0f10d92..41a1958 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4626,12 +4626,17 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
 	}
 	spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
 
-	/* Signal HW programming completion */
-	drm_atomic_helper_commit_hw_done(state);
 
 	if (wait_for_vblank)
 		drm_atomic_helper_wait_for_flip_done(dev, state);
 
+	/* Atomic state pointer gets corrupted in case of frequent
+	 * modesets operations like changing resolutions.
+	 * Hence discard state->commit before signalling to user
+	 * space.
+	 */
+	drm_atomic_helper_commit_hw_done(state);
+
 	drm_atomic_helper_cleanup_planes(dev, state);
 
 	/*
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] drm/amd/display: Work around race beetween hw_done and wait_for_flip
       [not found] ` <1537795907-1168-1-git-send-email-shirish.s-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-28 13:16   ` Harry Wentland
       [not found]     ` <20180928131606.15931-1-harry.wentland-5C7GfCeVMHo@public.gmane.org>
  2018-10-01 23:26   ` [PATCH v2] drm/amd/display: Signal hw_done() after waiting for flip_done() sunpeng.li-5C7GfCeVMHo
  1 sibling, 1 reply; 5+ messages in thread
From: Harry Wentland @ 2018-09-28 13:16 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, shirish.s-5C7GfCeVMHo,
	sunpeng.li-5C7GfCeVMHo
  Cc: Harry Wentland

[Why]
There is a race that between drm_crtc_commit_hw_done and
drm_atomic_helper_wait_for_flip where the possibilty exist for the
crtc->commit to be cleared after the null check in wait_for_flip_done
but before the call to wait_for_completion_timeout on commit->flip_done.

[How]
Take a reference to all commits in the state before
drm_crtc_commit_hw_done is called and release those after
drm_atomic_helper_wait_for_flip has finished.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
---

Would something like this work? I get the strong sense that this happens
because Intel and IMX use the helpers in the other order, hence the
dependency between hw_done and wait_for_flip was missed.

I'd rather make it obvious that there's (a) no reason to reorder these
two calls on AMD HW (other than this unexpected dependency) and (b) this
is something we'll probably want to fix in DRM.

Sorry it took me a while to understand what was happening here. Been
busy at XDC until Jordan and Leo reminded me to take another look.

Harry

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0f10d920a785..ed9a7d680b63 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4626,12 +4626,27 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
 	}
 	spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
 
+	/*
+	 * WORKAROUND: Take a ref for all crtc_state commits to avoid
+	 * a race where the commit gets freed before commit_hw_done had
+	 * a chance to look for commit->flip_done
+	 */
+	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i)
+		drm_crtc_commit_get(new_crtc_state->commit);
+
 	/* Signal HW programming completion */
 	drm_atomic_helper_commit_hw_done(state);
 
 	if (wait_for_vblank)
 		drm_atomic_helper_wait_for_flip_done(dev, state);
 
+	/*
+	 * WORKAROUND: put the commit refs from above (see comment on
+	 * the drm_crtc_commit_get call above)
+	 */
+	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i)
+		drm_crtc_commit_put(new_crtc_state->commit);
+
 	drm_atomic_helper_cleanup_planes(dev, state);
 
 	/*
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amd/display: Work around race beetween hw_done and wait_for_flip
       [not found]     ` <20180928131606.15931-1-harry.wentland-5C7GfCeVMHo@public.gmane.org>
@ 2018-10-01  9:06       ` S, Shirish
  0 siblings, 0 replies; 5+ messages in thread
From: S, Shirish @ 2018-10-01  9:06 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Li, Sun peng (Leo)
  Cc: Wentland, Harry

This workaround is not fixing the issue.

Regards,
Shirish S

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Harry Wentland
Sent: Friday, September 28, 2018 6:46 PM
To: amd-gfx@lists.freedesktop.org; S, Shirish <Shirish.S@amd.com>; Li, Sun peng (Leo) <Sunpeng.Li@amd.com>
Cc: Wentland, Harry <Harry.Wentland@amd.com>
Subject: [PATCH] drm/amd/display: Work around race beetween hw_done and wait_for_flip

[Why]
There is a race that between drm_crtc_commit_hw_done and drm_atomic_helper_wait_for_flip where the possibilty exist for the
crtc->commit to be cleared after the null check in wait_for_flip_done
but before the call to wait_for_completion_timeout on commit->flip_done.

[How]
Take a reference to all commits in the state before drm_crtc_commit_hw_done is called and release those after drm_atomic_helper_wait_for_flip has finished.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
---

Would something like this work? I get the strong sense that this happens because Intel and IMX use the helpers in the other order, hence the dependency between hw_done and wait_for_flip was missed.

I'd rather make it obvious that there's (a) no reason to reorder these two calls on AMD HW (other than this unexpected dependency) and (b) this is something we'll probably want to fix in DRM.

Sorry it took me a while to understand what was happening here. Been busy at XDC until Jordan and Leo reminded me to take another look.

Harry

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0f10d920a785..ed9a7d680b63 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4626,12 +4626,27 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
 	}
 	spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
 
+	/*
+	 * WORKAROUND: Take a ref for all crtc_state commits to avoid
+	 * a race where the commit gets freed before commit_hw_done had
+	 * a chance to look for commit->flip_done
+	 */
+	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i)
+		drm_crtc_commit_get(new_crtc_state->commit);
+
 	/* Signal HW programming completion */
 	drm_atomic_helper_commit_hw_done(state);
 
 	if (wait_for_vblank)
 		drm_atomic_helper_wait_for_flip_done(dev, state);
 
+	/*
+	 * WORKAROUND: put the commit refs from above (see comment on
+	 * the drm_crtc_commit_get call above)
+	 */
+	for_each_new_crtc_in_state(state, crtc, new_crtc_state, i)
+		drm_crtc_commit_put(new_crtc_state->commit);
+
 	drm_atomic_helper_cleanup_planes(dev, state);
 
 	/*
--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2] drm/amd/display: Signal hw_done() after waiting for flip_done()
       [not found] ` <1537795907-1168-1-git-send-email-shirish.s-5C7GfCeVMHo@public.gmane.org>
  2018-09-28 13:16   ` [PATCH] drm/amd/display: Work around race beetween hw_done and wait_for_flip Harry Wentland
@ 2018-10-01 23:26   ` sunpeng.li-5C7GfCeVMHo
       [not found]     ` <1538436399-28084-1-git-send-email-sunpeng.li-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 5+ messages in thread
From: sunpeng.li-5C7GfCeVMHo @ 2018-10-01 23:26 UTC (permalink / raw)
  To: shirish.s-5C7GfCeVMHo
  Cc: Leo Li, harry.wentland-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

From: Shirish S <shirish.s@amd.com>

In amdgpu_dm_commit_tail(), wait until flip_done() is signaled before
we signal hw_done().

[Why]

This is to temporary address a paging error that occurs when a
nonblocking commit contends with another commit, particularly in a
mirrored multi-display configuration. The error occurs in
drm_atomic_helper_wait_for_flip_done(), when we attempt to access the
contents of new_crtc_state->commit.

Here's the sequence for a mirrored 2 display setup (irrelevant steps
left out for clarity):

**THREAD 1**                        | **THREAD 2**
                                    |
Initialize atomic state for flip    |
                                    |
Queue worker                        |
                                   ...

                                    | Do work for flip
                                    |
                                    | Signal hw_done() on CRTC 1
                                    | Signal hw_done() on CRTC 2
                                    |
                                    | Wait for flip_done() on CRTC 1

                                <---- **PREEMPTED BY THREAD 1**

Initialize atomic state for cursor
update (1)                          |
                                    |
Do cursor update worker             |
                                    |
Clear atomic state (2)              |
**DONE**                            |
                                   ...
                                    |
                                    | Wait for flip_done() on CRTC 2
                                    | *ERROR*
                                    |

The issue starts with (1). When the atomic state is initialized, the
current CRTC states are duplicated to be the new_crtc_states, and
referenced to be the old_crtc_states. (The new_crtc_states are to be
filled with update data.)

Some things to note:

* Due to the mirrored configuration, the cursor updates on both CRTCs.

* At this point, the pflip IRQ has already been handled, and flip_done
  signaled on all CRTCs. The cursor commit can therefore continue.

* The old_crtc_states used by the cursor update are the **same states**
  as the new_crtc_states used by the flip worker.

At (2), the old_crtc_state is freed (*), and the cursor commit
completes. We then context switch back to the flip worker, where we
attempt to access the new_crtc_state->commit object. This is
problematic, as this state has already been freed.

(*) Technically, 'state->crtcs[i].state' is freed, which was made to
    reference old_crtc_state in drm_atomic_helper_swap_state()

[How]

By moving hw_done() after wait_for_flip_done(), we're guaranteed that
the new_crtc_state (from the flip worker's perspective) still exists.
This is because any other commit will be blocked, waiting for the
hw_done() signal.

Note that both the i915 and imx drivers have this sequence flipped
already, masking this problem.

Signed-off-by: Shirish S <shirish.s@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
---

Hi Shirish,

Sorry for the late reply. I took some time to dig at this issue, and found
that a proper fix will need more thought. I'm OK with this fix for now, until
we can address it with a proper fix.

I've updated the commit message and comments below to reflect the underlying
issue.

Thanks,
Leo

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 774db34..bbfffaf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4755,12 +4755,18 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
 	}
 	spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
 
-	/* Signal HW programming completion */
-	drm_atomic_helper_commit_hw_done(state);
 
 	if (wait_for_vblank)
 		drm_atomic_helper_wait_for_flip_done(dev, state);
 
+	/*
+	 * FIXME:
+	 * Delay hw_done() until flip_done() is signaled. This is to block
+	 * another commit from freeing the CRTC state while we're still
+	 * waiting on flip_done.
+	 */
+	drm_atomic_helper_commit_hw_done(state);
+
 	drm_atomic_helper_cleanup_planes(dev, state);
 
 	/*
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] drm/amd/display: Signal hw_done() after waiting for flip_done()
       [not found]     ` <1538436399-28084-1-git-send-email-sunpeng.li-5C7GfCeVMHo@public.gmane.org>
@ 2018-10-02 14:41       ` Harry Wentland
  0 siblings, 0 replies; 5+ messages in thread
From: Harry Wentland @ 2018-10-02 14:41 UTC (permalink / raw)
  To: sunpeng.li-5C7GfCeVMHo, shirish.s-5C7GfCeVMHo
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2018-10-01 07:26 PM, sunpeng.li@amd.com wrote:
> From: Shirish S <shirish.s@amd.com>
> 
> In amdgpu_dm_commit_tail(), wait until flip_done() is signaled before
> we signal hw_done().
> 
> [Why]
> 
> This is to temporary address a paging error that occurs when a

s/temporary/temporarily/g

> nonblocking commit contends with another commit, particularly in a
> mirrored multi-display configuration. The error occurs in
> drm_atomic_helper_wait_for_flip_done(), when we attempt to access the
> contents of new_crtc_state->commit.
> 
> Here's the sequence for a mirrored 2 display setup (irrelevant steps
> left out for clarity):
> 
> **THREAD 1**                        | **THREAD 2**
>                                     |
> Initialize atomic state for flip    |
>                                     |
> Queue worker                        |
>                                    ...
> 
>                                     | Do work for flip
>                                     |
>                                     | Signal hw_done() on CRTC 1
>                                     | Signal hw_done() on CRTC 2
>                                     |
>                                     | Wait for flip_done() on CRTC 1
> 
>                                 <---- **PREEMPTED BY THREAD 1**
> 
> Initialize atomic state for cursor
> update (1)                          |
>                                     |
> Do cursor update worker             |
>                                     |
> Clear atomic state (2)              |
> **DONE**                            |
>                                    ...
>                                     |
>                                     | Wait for flip_done() on CRTC 2
>                                     | *ERROR*
>                                     |
> 
> The issue starts with (1). When the atomic state is initialized, the
> current CRTC states are duplicated to be the new_crtc_states, and
> referenced to be the old_crtc_states. (The new_crtc_states are to be
> filled with update data.)
> 
> Some things to note:
> 
> * Due to the mirrored configuration, the cursor updates on both CRTCs.
> 
> * At this point, the pflip IRQ has already been handled, and flip_done
>   signaled on all CRTCs. The cursor commit can therefore continue.
> 
> * The old_crtc_states used by the cursor update are the **same states**
>   as the new_crtc_states used by the flip worker.
> 
> At (2), the old_crtc_state is freed (*), and the cursor commit
> completes. We then context switch back to the flip worker, where we
> attempt to access the new_crtc_state->commit object. This is
> problematic, as this state has already been freed.
> 
> (*) Technically, 'state->crtcs[i].state' is freed, which was made to
>     reference old_crtc_state in drm_atomic_helper_swap_state()
> 
> [How]
> 
> By moving hw_done() after wait_for_flip_done(), we're guaranteed that
> the new_crtc_state (from the flip worker's perspective) still exists.
> This is because any other commit will be blocked, waiting for the
> hw_done() signal.
> 
> Note that both the i915 and imx drivers have this sequence flipped
> already, masking this problem.
> 
> Signed-off-by: Shirish S <shirish.s@amd.com>
> Signed-off-by: Leo Li <sunpeng.li@amd.com>

Thanks for the additional explanation. Took me a while to understand what was happening here. Let's keep pursuing a proper solution that allows us to flip hw_done and wait_for_flip_done again but since that might be non-trivial this change is

Reviewed-by: Harry Wentland <harry.wentland@amd.com>

Harry

> ---
> 
> Hi Shirish,
> 
> Sorry for the late reply. I took some time to dig at this issue, and found
> that a proper fix will need more thought. I'm OK with this fix for now, until
> we can address it with a proper fix.
> 
> I've updated the commit message and comments below to reflect the underlying
> issue.
> 
> Thanks,
> Leo
> 
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 774db34..bbfffaf 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4755,12 +4755,18 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
>  	}
>  	spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
>  
> -	/* Signal HW programming completion */
> -	drm_atomic_helper_commit_hw_done(state);
>  
>  	if (wait_for_vblank)
>  		drm_atomic_helper_wait_for_flip_done(dev, state);
>  
> +	/*
> +	 * FIXME:
> +	 * Delay hw_done() until flip_done() is signaled. This is to block
> +	 * another commit from freeing the CRTC state while we're still
> +	 * waiting on flip_done.
> +	 */
> +	drm_atomic_helper_commit_hw_done(state);
> +
>  	drm_atomic_helper_cleanup_planes(dev, state);
>  
>  	/*
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-02 14:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-24 13:31 [PATCH] drm/amd/display: move the nonblocking commit helpers appropriately Shirish S
     [not found] ` <1537795907-1168-1-git-send-email-shirish.s-5C7GfCeVMHo@public.gmane.org>
2018-09-28 13:16   ` [PATCH] drm/amd/display: Work around race beetween hw_done and wait_for_flip Harry Wentland
     [not found]     ` <20180928131606.15931-1-harry.wentland-5C7GfCeVMHo@public.gmane.org>
2018-10-01  9:06       ` S, Shirish
2018-10-01 23:26   ` [PATCH v2] drm/amd/display: Signal hw_done() after waiting for flip_done() sunpeng.li-5C7GfCeVMHo
     [not found]     ` <1538436399-28084-1-git-send-email-sunpeng.li-5C7GfCeVMHo@public.gmane.org>
2018-10-02 14:41       ` Harry Wentland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.