dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
@ 2024-02-08 14:50 Dmitry Baryshkov
  2024-02-14 18:02 ` Abhinav Kumar
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Baryshkov @ 2024-02-08 14:50 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Sean Paul, Marijn Suijten,
	David Airlie, Daniel Vetter
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel

We have several reports of vblank timeout messages. However after some
debugging it was found that there might be different causes to that.
To allow us to identify the DPU block that gets stuck, include the
actual CTL_FLUSH value into the timeout message and trigger the devcore
snapshot capture.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
Changes in v2:
- Added a call to msm_disp_snapshot_state() to trigger devcore dump
  (Abhinav)
- Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
index d0f56c5c4cce..a8d6165b3c0a 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
@@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
 		(hw_ctl->ops.get_flush_register(hw_ctl) == 0),
 		msecs_to_jiffies(50));
 	if (ret <= 0) {
-		DPU_ERROR("vblank timeout\n");
+		DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
+		msm_disp_snapshot_state(phys_enc->parent->dev);
 		return -ETIMEDOUT;
 	}
 

---
base-commit: 39676dfe52331dba909c617f213fdb21015c8d10
change-id: 20240106-fd-dpu-debug-timeout-e917f0bc8063

Best regards,
-- 
Dmitry Baryshkov <dmitry.baryshkov@linaro.org>


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
  2024-02-08 14:50 [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful Dmitry Baryshkov
@ 2024-02-14 18:02 ` Abhinav Kumar
  2024-02-14 19:20   ` Dmitry Baryshkov
  0 siblings, 1 reply; 9+ messages in thread
From: Abhinav Kumar @ 2024-02-14 18:02 UTC (permalink / raw)
  To: Dmitry Baryshkov, Rob Clark, Sean Paul, Marijn Suijten,
	David Airlie, Daniel Vetter
  Cc: linux-arm-msm, dri-devel, freedreno, linux-kernel



On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
> We have several reports of vblank timeout messages. However after some
> debugging it was found that there might be different causes to that.
> To allow us to identify the DPU block that gets stuck, include the
> actual CTL_FLUSH value into the timeout message and trigger the devcore
> snapshot capture.
> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
> Changes in v2:
> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
>    (Abhinav)
> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
> ---
>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> index d0f56c5c4cce..a8d6165b3c0a 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
>   		(hw_ctl->ops.get_flush_register(hw_ctl) == 0),
>   		msecs_to_jiffies(50));
>   	if (ret <= 0) {
> -		DPU_ERROR("vblank timeout\n");
> +		DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
> +		msm_disp_snapshot_state(phys_enc->parent->dev);


There is no rate limiting in this piece of code unfortunately. So this 
will flood the number of snapshots.

Short-term solution is you can go with a vblank_timeout_cnt and reset it 
in the enable() like other similar error counters.

long-term solution is we need to centralize these error locations to one 
single dpu_encoder_error_handler() with a single counter and the error 
handler will print out the error code along with the snapshot instead of 
the snapshot being called from all over the place.



>   		return -ETIMEDOUT;
>   	}
>   
> 
> ---
> base-commit: 39676dfe52331dba909c617f213fdb21015c8d10
> change-id: 20240106-fd-dpu-debug-timeout-e917f0bc8063
> 
> Best regards,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
  2024-02-14 18:02 ` Abhinav Kumar
@ 2024-02-14 19:20   ` Dmitry Baryshkov
  2024-02-14 20:35     ` Abhinav Kumar
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Baryshkov @ 2024-02-14 19:20 UTC (permalink / raw)
  To: Abhinav Kumar
  Cc: Rob Clark, Sean Paul, Marijn Suijten, David Airlie,
	Daniel Vetter, linux-arm-msm, dri-devel, freedreno, linux-kernel

On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>
>
>
> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
> > We have several reports of vblank timeout messages. However after some
> > debugging it was found that there might be different causes to that.
> > To allow us to identify the DPU block that gets stuck, include the
> > actual CTL_FLUSH value into the timeout message and trigger the devcore
> > snapshot capture.
> >
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
> > Changes in v2:
> > - Added a call to msm_disp_snapshot_state() to trigger devcore dump
> >    (Abhinav)
> > - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
> > ---
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> > index d0f56c5c4cce..a8d6165b3c0a 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> > @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
> >               (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
> >               msecs_to_jiffies(50));
> >       if (ret <= 0) {
> > -             DPU_ERROR("vblank timeout\n");
> > +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
> > +             msm_disp_snapshot_state(phys_enc->parent->dev);
>
>
> There is no rate limiting in this piece of code unfortunately. So this
> will flood the number of snapshots.

Well... Yes and no. The devcoredump will destroy other snapshots if
there is a pending one. So only the console will be flooded and only
in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.

>
> Short-term solution is you can go with a vblank_timeout_cnt and reset it
> in the enable() like other similar error counters.
>
> long-term solution is we need to centralize these error locations to one
> single dpu_encoder_error_handler() with a single counter and the error
> handler will print out the error code along with the snapshot instead of
> the snapshot being called from all over the place.
>
>
>
> >               return -ETIMEDOUT;
> >       }
> >
> >
> > ---
> > base-commit: 39676dfe52331dba909c617f213fdb21015c8d10
> > change-id: 20240106-fd-dpu-debug-timeout-e917f0bc8063
> >
> > Best regards,



-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
  2024-02-14 19:20   ` Dmitry Baryshkov
@ 2024-02-14 20:35     ` Abhinav Kumar
  2024-02-19 11:52       ` Dmitry Baryshkov
  0 siblings, 1 reply; 9+ messages in thread
From: Abhinav Kumar @ 2024-02-14 20:35 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Rob Clark, Sean Paul, Marijn Suijten, David Airlie,
	Daniel Vetter, linux-arm-msm, dri-devel, freedreno, linux-kernel



On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
> On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>
>>
>>
>> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
>>> We have several reports of vblank timeout messages. However after some
>>> debugging it was found that there might be different causes to that.
>>> To allow us to identify the DPU block that gets stuck, include the
>>> actual CTL_FLUSH value into the timeout message and trigger the devcore
>>> snapshot capture.
>>>
>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>> ---
>>> Changes in v2:
>>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
>>>     (Abhinav)
>>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
>>> ---
>>>    drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>> index d0f56c5c4cce..a8d6165b3c0a 100644
>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
>>>                (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
>>>                msecs_to_jiffies(50));
>>>        if (ret <= 0) {
>>> -             DPU_ERROR("vblank timeout\n");
>>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
>>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
>>
>>
>> There is no rate limiting in this piece of code unfortunately. So this
>> will flood the number of snapshots.
> 
> Well... Yes and no. The devcoredump will destroy other snapshots if
> there is a pending one. So only the console will be flooded and only
> in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
> 

Yes, true but at the same time this makes it hard to capture a good dump 
as potentially every vblank you could timeout so this destroy/create 
cycle wont end.

>>
>> Short-term solution is you can go with a vblank_timeout_cnt and reset it
>> in the enable() like other similar error counters.
>>
>> long-term solution is we need to centralize these error locations to one
>> single dpu_encoder_error_handler() with a single counter and the error
>> handler will print out the error code along with the snapshot instead of
>> the snapshot being called from all over the place.
>>
>>
>>
>>>                return -ETIMEDOUT;
>>>        }
>>>
>>>
>>> ---
>>> base-commit: 39676dfe52331dba909c617f213fdb21015c8d10
>>> change-id: 20240106-fd-dpu-debug-timeout-e917f0bc8063
>>>
>>> Best regards,
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
  2024-02-14 20:35     ` Abhinav Kumar
@ 2024-02-19 11:52       ` Dmitry Baryshkov
  2024-02-20 22:40         ` Abhinav Kumar
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Baryshkov @ 2024-02-19 11:52 UTC (permalink / raw)
  To: Abhinav Kumar
  Cc: Rob Clark, Sean Paul, Marijn Suijten, David Airlie,
	Daniel Vetter, linux-arm-msm, dri-devel, freedreno, linux-kernel

On Wed, 14 Feb 2024 at 22:36, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>
>
>
> On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
> > On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
> >>
> >>
> >>
> >> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
> >>> We have several reports of vblank timeout messages. However after some
> >>> debugging it was found that there might be different causes to that.
> >>> To allow us to identify the DPU block that gets stuck, include the
> >>> actual CTL_FLUSH value into the timeout message and trigger the devcore
> >>> snapshot capture.
> >>>
> >>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> >>> ---
> >>> Changes in v2:
> >>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
> >>>     (Abhinav)
> >>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
> >>> ---
> >>>    drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
> >>>    1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>> index d0f56c5c4cce..a8d6165b3c0a 100644
> >>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
> >>>                (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
> >>>                msecs_to_jiffies(50));
> >>>        if (ret <= 0) {
> >>> -             DPU_ERROR("vblank timeout\n");
> >>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
> >>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
> >>
> >>
> >> There is no rate limiting in this piece of code unfortunately. So this
> >> will flood the number of snapshots.
> >
> > Well... Yes and no. The devcoredump will destroy other snapshots if
> > there is a pending one. So only the console will be flooded and only
> > in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
> >
>
> Yes, true but at the same time this makes it hard to capture a good dump
> as potentially every vblank you could timeout so this destroy/create
> cycle wont end.

Excuse me, maybe I miss something. On the first timeout the snapshot
is created. It is held by the kernel until it is fully read out from
the userspace. Other snapshots will not interfere with this snapshot.

Or are you worried that snapshotting takes time, so taking a snapshot
will also interfere with the vblank timings for the next vblank?

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
  2024-02-19 11:52       ` Dmitry Baryshkov
@ 2024-02-20 22:40         ` Abhinav Kumar
  2024-02-20 22:42           ` Dmitry Baryshkov
  0 siblings, 1 reply; 9+ messages in thread
From: Abhinav Kumar @ 2024-02-20 22:40 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Rob Clark, Sean Paul, Marijn Suijten, David Airlie,
	Daniel Vetter, linux-arm-msm, dri-devel, freedreno, linux-kernel



On 2/19/2024 3:52 AM, Dmitry Baryshkov wrote:
> On Wed, 14 Feb 2024 at 22:36, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>
>>
>>
>> On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
>>> On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
>>>>> We have several reports of vblank timeout messages. However after some
>>>>> debugging it was found that there might be different causes to that.
>>>>> To allow us to identify the DPU block that gets stuck, include the
>>>>> actual CTL_FLUSH value into the timeout message and trigger the devcore
>>>>> snapshot capture.
>>>>>
>>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>>>> ---
>>>>> Changes in v2:
>>>>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
>>>>>      (Abhinav)
>>>>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
>>>>> ---
>>>>>     drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>> index d0f56c5c4cce..a8d6165b3c0a 100644
>>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
>>>>>                 (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
>>>>>                 msecs_to_jiffies(50));
>>>>>         if (ret <= 0) {
>>>>> -             DPU_ERROR("vblank timeout\n");
>>>>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
>>>>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
>>>>
>>>>
>>>> There is no rate limiting in this piece of code unfortunately. So this
>>>> will flood the number of snapshots.
>>>
>>> Well... Yes and no. The devcoredump will destroy other snapshots if
>>> there is a pending one. So only the console will be flooded and only
>>> in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
>>>
>>
>> Yes, true but at the same time this makes it hard to capture a good dump
>> as potentially every vblank you could timeout so this destroy/create
>> cycle wont end.
> 
> Excuse me, maybe I miss something. On the first timeout the snapshot
> is created. It is held by the kernel until it is fully read out from
> the userspace. Other snapshots will not interfere with this snapshot.
> 

For every new snapshot a new devcoredump device will be created which 
should remain till it has been read. But now this will be created every 
blank. IMO, this is really too much data for no reason.

Subsequent vblank timeouts are not going to give any new information 
compared to the existing snapshot of the first vblank timeout thats why 
we should just create the snapshot when the first error happens and stop.

For other frame done timeouts, infact subsequent timeouts without any 
sort of recovery in between are quite misleading because hardware was 
already not able to fetch the previous frame so it will most likely not 
fetch the next one either till it has recovered. Typically thats why 
these vblank timeouts happen in a flurry as the hardware never really 
recovered from the first timeout.

> Or are you worried that snapshotting takes time, so taking a snapshot
> will also interfere with the vblank timings for the next vblank?
> 

Yes this is another point.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
  2024-02-20 22:40         ` Abhinav Kumar
@ 2024-02-20 22:42           ` Dmitry Baryshkov
  2024-02-20 23:04             ` Abhinav Kumar
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Baryshkov @ 2024-02-20 22:42 UTC (permalink / raw)
  To: Abhinav Kumar
  Cc: Rob Clark, Sean Paul, Marijn Suijten, David Airlie,
	Daniel Vetter, linux-arm-msm, dri-devel, freedreno, linux-kernel

On Wed, 21 Feb 2024 at 00:40, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>
>
>
> On 2/19/2024 3:52 AM, Dmitry Baryshkov wrote:
> > On Wed, 14 Feb 2024 at 22:36, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
> >>
> >>
> >>
> >> On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
> >>> On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
> >>>>> We have several reports of vblank timeout messages. However after some
> >>>>> debugging it was found that there might be different causes to that.
> >>>>> To allow us to identify the DPU block that gets stuck, include the
> >>>>> actual CTL_FLUSH value into the timeout message and trigger the devcore
> >>>>> snapshot capture.
> >>>>>
> >>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> >>>>> ---
> >>>>> Changes in v2:
> >>>>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
> >>>>>      (Abhinav)
> >>>>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
> >>>>> ---
> >>>>>     drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
> >>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>> index d0f56c5c4cce..a8d6165b3c0a 100644
> >>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
> >>>>>                 (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
> >>>>>                 msecs_to_jiffies(50));
> >>>>>         if (ret <= 0) {
> >>>>> -             DPU_ERROR("vblank timeout\n");
> >>>>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
> >>>>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
> >>>>
> >>>>
> >>>> There is no rate limiting in this piece of code unfortunately. So this
> >>>> will flood the number of snapshots.
> >>>
> >>> Well... Yes and no. The devcoredump will destroy other snapshots if
> >>> there is a pending one. So only the console will be flooded and only
> >>> in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
> >>>
> >>
> >> Yes, true but at the same time this makes it hard to capture a good dump
> >> as potentially every vblank you could timeout so this destroy/create
> >> cycle wont end.
> >
> > Excuse me, maybe I miss something. On the first timeout the snapshot
> > is created. It is held by the kernel until it is fully read out from
> > the userspace. Other snapshots will not interfere with this snapshot.
> >
>
> For every new snapshot a new devcoredump device will be created which
> should remain till it has been read. But now this will be created every
> blank. IMO, this is really too much data for no reason.

No-no-no. If there is a devcoredump for a device, the next one will
not be created. See dev_coredumpm().
So all the snapshots will be created and then destroyed immediately.

>
> Subsequent vblank timeouts are not going to give any new information
> compared to the existing snapshot of the first vblank timeout thats why
> we should just create the snapshot when the first error happens and stop.
>
> For other frame done timeouts, infact subsequent timeouts without any
> sort of recovery in between are quite misleading because hardware was
> already not able to fetch the previous frame so it will most likely not
> fetch the next one either till it has recovered. Typically thats why
> these vblank timeouts happen in a flurry as the hardware never really
> recovered from the first timeout.
>
> > Or are you worried that snapshotting takes time, so taking a snapshot
> > will also interfere with the vblank timings for the next vblank?
> >
>
> Yes this is another point.

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
  2024-02-20 22:42           ` Dmitry Baryshkov
@ 2024-02-20 23:04             ` Abhinav Kumar
  2024-02-20 23:07               ` Dmitry Baryshkov
  0 siblings, 1 reply; 9+ messages in thread
From: Abhinav Kumar @ 2024-02-20 23:04 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Rob Clark, Sean Paul, Marijn Suijten, David Airlie,
	Daniel Vetter, linux-arm-msm, dri-devel, freedreno, linux-kernel



On 2/20/2024 2:42 PM, Dmitry Baryshkov wrote:
> On Wed, 21 Feb 2024 at 00:40, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>
>>
>>
>> On 2/19/2024 3:52 AM, Dmitry Baryshkov wrote:
>>> On Wed, 14 Feb 2024 at 22:36, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
>>>>> On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
>>>>>>> We have several reports of vblank timeout messages. However after some
>>>>>>> debugging it was found that there might be different causes to that.
>>>>>>> To allow us to identify the DPU block that gets stuck, include the
>>>>>>> actual CTL_FLUSH value into the timeout message and trigger the devcore
>>>>>>> snapshot capture.
>>>>>>>
>>>>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>>>>>> ---
>>>>>>> Changes in v2:
>>>>>>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
>>>>>>>       (Abhinav)
>>>>>>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
>>>>>>>      1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>>>> index d0f56c5c4cce..a8d6165b3c0a 100644
>>>>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>>>>>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
>>>>>>>                  (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
>>>>>>>                  msecs_to_jiffies(50));
>>>>>>>          if (ret <= 0) {
>>>>>>> -             DPU_ERROR("vblank timeout\n");
>>>>>>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
>>>>>>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
>>>>>>
>>>>>>
>>>>>> There is no rate limiting in this piece of code unfortunately. So this
>>>>>> will flood the number of snapshots.
>>>>>
>>>>> Well... Yes and no. The devcoredump will destroy other snapshots if
>>>>> there is a pending one. So only the console will be flooded and only
>>>>> in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
>>>>>
>>>>
>>>> Yes, true but at the same time this makes it hard to capture a good dump
>>>> as potentially every vblank you could timeout so this destroy/create
>>>> cycle wont end.
>>>
>>> Excuse me, maybe I miss something. On the first timeout the snapshot
>>> is created. It is held by the kernel until it is fully read out from
>>> the userspace. Other snapshots will not interfere with this snapshot.
>>>
>>
>> For every new snapshot a new devcoredump device will be created which
>> should remain till it has been read. But now this will be created every
>> blank. IMO, this is really too much data for no reason.
> 
> No-no-no. If there is a devcoredump for a device, the next one will
> not be created. See dev_coredumpm().
> So all the snapshots will be created and then destroyed immediately.
> 

hmm ... I have certainly seen devcd_count go higher than one (but not 
more than 2). I am wondering whether this was because of some race 
condition of the previous destroy / new create.

But anyway, this part is clear now. thanks.

>>
>> Subsequent vblank timeouts are not going to give any new information
>> compared to the existing snapshot of the first vblank timeout thats why
>> we should just create the snapshot when the first error happens and stop.
>>
>> For other frame done timeouts, infact subsequent timeouts without any
>> sort of recovery in between are quite misleading because hardware was
>> already not able to fetch the previous frame so it will most likely not
>> fetch the next one either till it has recovered. Typically thats why
>> these vblank timeouts happen in a flurry as the hardware never really
>> recovered from the first timeout.
>>
>>> Or are you worried that snapshotting takes time, so taking a snapshot
>>> will also interfere with the vblank timings for the next vblank?
>>>
>>
>> Yes this is another point.
> 

snapshots will still be captured every vblank timeout and reading 
through the entire DPU reg space every vblank timeout is certainly 
something we can avoid.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful
  2024-02-20 23:04             ` Abhinav Kumar
@ 2024-02-20 23:07               ` Dmitry Baryshkov
  0 siblings, 0 replies; 9+ messages in thread
From: Dmitry Baryshkov @ 2024-02-20 23:07 UTC (permalink / raw)
  To: Abhinav Kumar
  Cc: Rob Clark, Sean Paul, Marijn Suijten, David Airlie,
	Daniel Vetter, linux-arm-msm, dri-devel, freedreno, linux-kernel

On Wed, 21 Feb 2024 at 01:04, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>
>
>
> On 2/20/2024 2:42 PM, Dmitry Baryshkov wrote:
> > On Wed, 21 Feb 2024 at 00:40, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
> >>
> >>
> >>
> >> On 2/19/2024 3:52 AM, Dmitry Baryshkov wrote:
> >>> On Wed, 14 Feb 2024 at 22:36, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
> >>>>> On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
> >>>>>>> We have several reports of vblank timeout messages. However after some
> >>>>>>> debugging it was found that there might be different causes to that.
> >>>>>>> To allow us to identify the DPU block that gets stuck, include the
> >>>>>>> actual CTL_FLUSH value into the timeout message and trigger the devcore
> >>>>>>> snapshot capture.
> >>>>>>>
> >>>>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> >>>>>>> ---
> >>>>>>> Changes in v2:
> >>>>>>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
> >>>>>>>       (Abhinav)
> >>>>>>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
> >>>>>>> ---
> >>>>>>>      drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
> >>>>>>>      1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>>>> index d0f56c5c4cce..a8d6165b3c0a 100644
> >>>>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>>>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
> >>>>>>>                  (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
> >>>>>>>                  msecs_to_jiffies(50));
> >>>>>>>          if (ret <= 0) {
> >>>>>>> -             DPU_ERROR("vblank timeout\n");
> >>>>>>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
> >>>>>>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
> >>>>>>
> >>>>>>
> >>>>>> There is no rate limiting in this piece of code unfortunately. So this
> >>>>>> will flood the number of snapshots.
> >>>>>
> >>>>> Well... Yes and no. The devcoredump will destroy other snapshots if
> >>>>> there is a pending one. So only the console will be flooded and only
> >>>>> in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
> >>>>>
> >>>>
> >>>> Yes, true but at the same time this makes it hard to capture a good dump
> >>>> as potentially every vblank you could timeout so this destroy/create
> >>>> cycle wont end.
> >>>
> >>> Excuse me, maybe I miss something. On the first timeout the snapshot
> >>> is created. It is held by the kernel until it is fully read out from
> >>> the userspace. Other snapshots will not interfere with this snapshot.
> >>>
> >>
> >> For every new snapshot a new devcoredump device will be created which
> >> should remain till it has been read. But now this will be created every
> >> blank. IMO, this is really too much data for no reason.
> >
> > No-no-no. If there is a devcoredump for a device, the next one will
> > not be created. See dev_coredumpm().
> > So all the snapshots will be created and then destroyed immediately.
> >
>
> hmm ... I have certainly seen devcd_count go higher than one (but not
> more than 2). I am wondering whether this was because of some race
> condition of the previous destroy / new create.
>
> But anyway, this part is clear now. thanks.
>
> >>
> >> Subsequent vblank timeouts are not going to give any new information
> >> compared to the existing snapshot of the first vblank timeout thats why
> >> we should just create the snapshot when the first error happens and stop.
> >>
> >> For other frame done timeouts, infact subsequent timeouts without any
> >> sort of recovery in between are quite misleading because hardware was
> >> already not able to fetch the previous frame so it will most likely not
> >> fetch the next one either till it has recovered. Typically thats why
> >> these vblank timeouts happen in a flurry as the hardware never really
> >> recovered from the first timeout.
> >>
> >>> Or are you worried that snapshotting takes time, so taking a snapshot
> >>> will also interfere with the vblank timings for the next vblank?
> >>>
> >>
> >> Yes this is another point.
> >
>
> snapshots will still be captured every vblank timeout and reading
> through the entire DPU reg space every vblank timeout is certainly
> something we can avoid.

Ack.

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-20 23:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-08 14:50 [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful Dmitry Baryshkov
2024-02-14 18:02 ` Abhinav Kumar
2024-02-14 19:20   ` Dmitry Baryshkov
2024-02-14 20:35     ` Abhinav Kumar
2024-02-19 11:52       ` Dmitry Baryshkov
2024-02-20 22:40         ` Abhinav Kumar
2024-02-20 22:42           ` Dmitry Baryshkov
2024-02-20 23:04             ` Abhinav Kumar
2024-02-20 23:07               ` Dmitry Baryshkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).