All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-15 10:34 Vlastimil Babka
  2016-01-15 12:26 ` Ville Syrjälä
  0 siblings, 1 reply; 59+ messages in thread
From: Vlastimil Babka @ 2016-01-15 10:34 UTC (permalink / raw)
  To: Alex Deucher, Christian König, Ville Syrjälä,
	Daniel Vetter, mgraesslin
  Cc: David Airlie, dri-devel, LKML, Mario Kleiner, kwin

Hi,

since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE 
Tumbleweed). There's a screen with progressbar showing the startup, 
which normally fades away after reaching 100%. But with kernel 4.4, the 
progress gets stuck somewhere between 1/2 and 3/4 (not always the same).
Top shows that kwin is using few % of CPU's but mostly sleeps in poll().
When I kill it from another console, I see that everything has actually 
started up, just the progressbar screen was obscuring it. The windows 
obviously don't have decorations etc. Starting kwin manually again shows 
me again the progressbar screen at the same position.

I have suspected that kwin is waiting for some event, but nevertheless 
tried bisecting the kernel between 4.3 and 4.4, which lead to:

# first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use 
vblank timestamps to guesstimate how many vblanks were missed

I can confirm that 4.4 works if I revert the following commits:
63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank 
counter/ts for new drm_update_vblank_count() (v3)"

d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank 
counter/ts for new drm_update_vblank_count() (v2)"

31ace027c9f1f8e0a2b09bbf961e4db7b1f6cf19 "drm: Don't zero vblank 
timestamps from the irq handler"

ac0567a4b132fa66e3edf3f913938af9daf7f916 "drm: Add DRM_DEBUG_VBL()"

4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps to 
guesstimate how many vblanks were missed"

All clean reverts, just needs some fixup on top to use abs() instead of 
abs64() due to 79211c8ed19c055ca105502c8733800d442a0ae6.

Unfortunately I don't know if this is a kernel problem or kwin problem. 
I tried to CC maintainers of both, advices what to try or what info to 
provide welcome. The card is "CAICOS" with 1GB memory.

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-15 10:34 linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon Vlastimil Babka
@ 2016-01-15 12:26 ` Ville Syrjälä
  2016-01-15 12:40   ` Vlastimil Babka
  2016-01-16  4:24     ` Mario Kleiner
  0 siblings, 2 replies; 59+ messages in thread
From: Ville Syrjälä @ 2016-01-15 12:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Alex Deucher, Christian König, Daniel Vetter, mgraesslin,
	David Airlie, dri-devel, LKML, Mario Kleiner, kwin

On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
> Hi,
> 
> since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE 
> Tumbleweed). There's a screen with progressbar showing the startup, 
> which normally fades away after reaching 100%. But with kernel 4.4, the 
> progress gets stuck somewhere between 1/2 and 3/4 (not always the same).
> Top shows that kwin is using few % of CPU's but mostly sleeps in poll().
> When I kill it from another console, I see that everything has actually 
> started up, just the progressbar screen was obscuring it. The windows 
> obviously don't have decorations etc. Starting kwin manually again shows 
> me again the progressbar screen at the same position.

Hmm. Sounds like it could then be waiting for a vblank in the distant
future. There's that 1<<23 limit in the code though, but even with that
we end up with a max wait of ~38 hours assuming a 60Hz refresh rate.

Stuff to try might include enabling drm.debug=0x2f, though that'll
generate a lot of stuff. Another option would be to use the drm vblank
tracepoints to try and catch what seq number it's waiting for and
where we're at currently. Or I suppose you could just hack
up drm_wait_vblank() to print an error message or something if the
requested seq number is in the future by, say, more than a few seconds,
and if that's the case then we could try to figure out why that happens.

> 
> I have suspected that kwin is waiting for some event, but nevertheless 
> tried bisecting the kernel between 4.3 and 4.4, which lead to:
> 
> # first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use 
> vblank timestamps to guesstimate how many vblanks were missed
> 
> I can confirm that 4.4 works if I revert the following commits:
> 63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank 
> counter/ts for new drm_update_vblank_count() (v3)"
> 
> d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank 
> counter/ts for new drm_update_vblank_count() (v2)"

The sha1s don't seem to match what I have, so not sure which kernel tree
you have, but looking at the radeon commit at least one thing
immediately caught my attention;

+                       /* Bump counter if we are at >= leading edge of vblank,
+                        * but before vsync where vpos would turn negative and
+                        * the hw counter really increments.
+                        */
+                       if (vpos >= 0)
+                               count++;

It's rather hard to see what it's really doing since the custom flags to
the get_scanout_position now cause it return non-standard things. But if
I'm reading things correctly it should really say something like:

if (vpos >= 0 && vpos < (vsync_start - vblank_start))
	count++;

Hmm. Actually even that might not be correct since it could be using the
"fake" vblank start here, so might be it'd need to be something like:

if (vpos >= 0 && vpos < (vsync_start - vblank_start + lb_vblank_lead_lines)
	count++;

Also might be worth a shot to just ignore the hw frame counter. Eg.:

index e266ffc520d2..db728580549a 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -492,7 +492,6 @@ static struct drm_driver kms_driver = {
        .lastclose = radeon_driver_lastclose_kms,
        .set_busid = drm_pci_set_busid,
        .unload = radeon_driver_unload_kms,
-       .get_vblank_counter = radeon_get_vblank_counter_kms,
        .enable_vblank = radeon_enable_vblank_kms,
        .disable_vblank = radeon_disable_vblank_kms,
        .get_vblank_timestamp = radeon_get_vblank_timestamp_kms,
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index 979f3bf65f2c..3c5fcab74152 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -152,11 +152,6 @@ int radeon_driver_irq_postinstall_kms(struct drm_device *dev)
 {
        struct radeon_device *rdev = dev->dev_private;
 
-       if (ASIC_IS_AVIVO(rdev))
-               dev->max_vblank_count = 0x00ffffff;
-       else
-               dev->max_vblank_count = 0x001fffff;
-
        return 0;
 }

assuming I'm reading the code correctly.

> 
> 31ace027c9f1f8e0a2b09bbf961e4db7b1f6cf19 "drm: Don't zero vblank 
> timestamps from the irq handler"
> 
> ac0567a4b132fa66e3edf3f913938af9daf7f916 "drm: Add DRM_DEBUG_VBL()"
> 
> 4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps to 
> guesstimate how many vblanks were missed"
> 
> All clean reverts, just needs some fixup on top to use abs() instead of 
> abs64() due to 79211c8ed19c055ca105502c8733800d442a0ae6.
> 
> Unfortunately I don't know if this is a kernel problem or kwin problem. 
> I tried to CC maintainers of both, advices what to try or what info to 
> provide welcome. The card is "CAICOS" with 1GB memory.
> 
> Thanks,
> Vlastimil

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-15 12:26 ` Ville Syrjälä
@ 2016-01-15 12:40   ` Vlastimil Babka
  2016-01-16  4:24     ` Mario Kleiner
  1 sibling, 0 replies; 59+ messages in thread
From: Vlastimil Babka @ 2016-01-15 12:40 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Alex Deucher, Christian König, Daniel Vetter, mgraesslin,
	David Airlie, dri-devel, LKML, Mario Kleiner, kwin

On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>>
>> I have suspected that kwin is waiting for some event, but nevertheless
>> tried bisecting the kernel between 4.3 and 4.4, which lead to:
>>
>> # first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use
>> vblank timestamps to guesstimate how many vblanks were missed
>>
>> I can confirm that 4.4 works if I revert the following commits:
>> 63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v3)"
>>
>> d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v2)"
>
> The sha1s don't seem to match what I have, so not sure which kernel tree

Hm sorry, I pasted the sha1 of the reverts by mistake.
Correct sha1 are:
5b5561b3660db734652fbd02b4b6cbe00434d96b "drm/radeon: Fixup hw vblank 
counter/ts for new drm_update_vblank_count() (v2)"
fa4270d8e0257b4b76f11baa2866f4313d29aaf5 "drm: Don't zero vblank 
timestamps from the irq handler"
235fabe09b46469adad2c9e4cb0563758155187c "drm: Add DRM_DEBUG_VBL()"
4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps to 
guesstimate how many vblanks were missed"
8e36f9d33c134d5c6448ad65b423a9fd94e045cf "drm/amdgpu: Fixup hw vblank 
counter/ts for new drm_update_vblank_count() (v3)"

Also, it turns out that the process actually showing the progress is
"ksplashqml", not kwin. It survives killing kwin, and restarting kwin
just makes it shown on top again, or something. If I force kill
ksplashqml instead of kwin, the desktop works including decorations
and everything. ksplashqml itself also waits in kernel in poll().

I'll try some of your suggestions, thanks!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-15 12:26 ` Ville Syrjälä
@ 2016-01-16  4:24     ` Mario Kleiner
  2016-01-16  4:24     ` Mario Kleiner
  1 sibling, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-16  4:24 UTC (permalink / raw)
  To: Ville Syrjälä, Vlastimil Babka
  Cc: Alex Deucher, Christian König, Daniel Vetter, mgraesslin,
	David Airlie, dri-devel, LKML, kwin



On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE
>> Tumbleweed). There's a screen with progressbar showing the startup,
>> which normally fades away after reaching 100%. But with kernel 4.4, the
>> progress gets stuck somewhere between 1/2 and 3/4 (not always the same).
>> Top shows that kwin is using few % of CPU's but mostly sleeps in poll().
>> When I kill it from another console, I see that everything has actually
>> started up, just the progressbar screen was obscuring it. The windows
>> obviously don't have decorations etc. Starting kwin manually again shows
>> me again the progressbar screen at the same position.
>

Depressing. I was stress-testing those patches with Linux 4.4 for days 
on 2 AMD gpu's (HD-4000 RV 730 and HD-5770) under KDE 5 Plasma 5.4.2 
(KUbuntu 15.10, XOrg 1.17) and just retested Linux 4.4 on 
nouveau/radeon/intel also with XOrg 1.18 and XOrg master) with Linux 4.4 
a few days ago and never encountered such a hang or other vblank related 
problem on KDE-5 or GNOME-3.

I'm currently running...

while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done

... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i 
can't trigger a hang after hundreds of runs.

Does this also hang for you?

> Hmm. Sounds like it could then be waiting for a vblank in the distant
> future. There's that 1<<23 limit in the code though, but even with that
> we end up with a max wait of ~38 hours assuming a 60Hz refresh rate.
>

xtrace suggests that ksplashqml seems to use classic OpenGL  + 
glXSwapBuffers under DRI2. So no clever swap scheduling based on vblank 
counter values.

> Stuff to try might include enabling drm.debug=0x2f, though that'll
> generate a lot of stuff. Another option would be to use the drm vblank
> tracepoints to try and catch what seq number it's waiting for and
> where we're at currently. Or I suppose you could just hack
> up drm_wait_vblank() to print an error message or something if the
> requested seq number is in the future by, say, more than a few seconds,
> and if that's the case then we could try to figure out why that happens.
>
>>
>> I have suspected that kwin is waiting for some event, but nevertheless
>> tried bisecting the kernel between 4.3 and 4.4, which lead to:
>>
>> # first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use
>> vblank timestamps to guesstimate how many vblanks were missed
>>
>> I can confirm that 4.4 works if I revert the following commits:
>> 63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v3)"
>>
>> d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v2)"
>
> The sha1s don't seem to match what I have, so not sure which kernel tree
> you have, but looking at the radeon commit at least one thing
> immediately caught my attention;
>
> +                       /* Bump counter if we are at >= leading edge of vblank,
> +                        * but before vsync where vpos would turn negative and
> +                        * the hw counter really increments.
> +                        */
> +                       if (vpos >= 0)
> +                               count++;
>
> It's rather hard to see what it's really doing since the custom flags to
> the get_scanout_position now cause it return non-standard things. But if
> I'm reading things correctly it should really say something like:
>
> if (vpos >= 0 && vpos < (vsync_start - vblank_start))
> 	count++;
>
> Hmm. Actually even that might not be correct since it could be using the
> "fake" vblank start here, so might be it'd need to be something like:
>
> if (vpos >= 0 && vpos < (vsync_start - vblank_start + lb_vblank_lead_lines)
> 	count++;
>

The current code should be correct. vpos here returns the distance of hw 
vertical scanout position to the start of vblank. According to Alex and 
Harry Wentland of AMD's display team, and my testing of my two cards the 
hw vertical scanout position resets to zero at start line of vsync, 
therefore the "vpos" in that code becomes negative at start of vsync. At 
the same time the hw frame counter increments by one, making that 
"count++" to bump the returned count by +1 no longer neccessary.

If the reset of hw vertical scanout pos to zero and the increment of hw 
frame counter wouldn't happen at exactly the same time at start of vsync 
i could see how that could cause two successive queries of 
driver->get_vblank_counter() could report a count of N+1 and then N if 
the timing of both calls would be just perfectly right. That would cause 
the DRM code to falsely detect counter wraparound and jump the vblank 
counter forward by 2^24.

My tested gpu's had DCE-3 or DCE-4 display engines, Caicos has DCE-5, so 
could this be some hw quirk for DCE-5?


> Also might be worth a shot to just ignore the hw frame counter. Eg.:
>
> index e266ffc520d2..db728580549a 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -492,7 +492,6 @@ static struct drm_driver kms_driver = {
>          .lastclose = radeon_driver_lastclose_kms,
>          .set_busid = drm_pci_set_busid,
>          .unload = radeon_driver_unload_kms,
> -       .get_vblank_counter = radeon_get_vblank_counter_kms,
>          .enable_vblank = radeon_enable_vblank_kms,
>          .disable_vblank = radeon_disable_vblank_kms,
>          .get_vblank_timestamp = radeon_get_vblank_timestamp_kms,
> diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> index 979f3bf65f2c..3c5fcab74152 100644
> --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> @@ -152,11 +152,6 @@ int radeon_driver_irq_postinstall_kms(struct drm_device *dev)
>   {
>          struct radeon_device *rdev = dev->dev_private;
>
> -       if (ASIC_IS_AVIVO(rdev))
> -               dev->max_vblank_count = 0x00ffffff;
> -       else
> -               dev->max_vblank_count = 0x001fffff;
> -
>          return 0;
>   }
>
> assuming I'm reading the code correctly.
>
>>
>> 31ace027c9f1f8e0a2b09bbf961e4db7b1f6cf19 "drm: Don't zero vblank
>> timestamps from the irq handler"
>>
>> ac0567a4b132fa66e3edf3f913938af9daf7f916 "drm: Add DRM_DEBUG_VBL()"
>>
>> 4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps to
>> guesstimate how many vblanks were missed"
>>
>> All clean reverts, just needs some fixup on top to use abs() instead of
>> abs64() due to 79211c8ed19c055ca105502c8733800d442a0ae6.
>>
>> Unfortunately I don't know if this is a kernel problem or kwin problem.
>> I tried to CC maintainers of both, advices what to try or what info to
>> provide welcome. The card is "CAICOS" with 1GB memory.
>>

I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank" 
should probably give useful info around the time of the hang.

Maybe also check XOrg.0.log for (WW) warnings related to flip.

thanks,
-mario


>> Thanks,
>> Vlastimil
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-16  4:24     ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-16  4:24 UTC (permalink / raw)
  To: Ville Syrjälä, Vlastimil Babka
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König



On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE
>> Tumbleweed). There's a screen with progressbar showing the startup,
>> which normally fades away after reaching 100%. But with kernel 4.4, the
>> progress gets stuck somewhere between 1/2 and 3/4 (not always the same).
>> Top shows that kwin is using few % of CPU's but mostly sleeps in poll().
>> When I kill it from another console, I see that everything has actually
>> started up, just the progressbar screen was obscuring it. The windows
>> obviously don't have decorations etc. Starting kwin manually again shows
>> me again the progressbar screen at the same position.
>

Depressing. I was stress-testing those patches with Linux 4.4 for days 
on 2 AMD gpu's (HD-4000 RV 730 and HD-5770) under KDE 5 Plasma 5.4.2 
(KUbuntu 15.10, XOrg 1.17) and just retested Linux 4.4 on 
nouveau/radeon/intel also with XOrg 1.18 and XOrg master) with Linux 4.4 
a few days ago and never encountered such a hang or other vblank related 
problem on KDE-5 or GNOME-3.

I'm currently running...

while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done

... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i 
can't trigger a hang after hundreds of runs.

Does this also hang for you?

> Hmm. Sounds like it could then be waiting for a vblank in the distant
> future. There's that 1<<23 limit in the code though, but even with that
> we end up with a max wait of ~38 hours assuming a 60Hz refresh rate.
>

xtrace suggests that ksplashqml seems to use classic OpenGL  + 
glXSwapBuffers under DRI2. So no clever swap scheduling based on vblank 
counter values.

> Stuff to try might include enabling drm.debug=0x2f, though that'll
> generate a lot of stuff. Another option would be to use the drm vblank
> tracepoints to try and catch what seq number it's waiting for and
> where we're at currently. Or I suppose you could just hack
> up drm_wait_vblank() to print an error message or something if the
> requested seq number is in the future by, say, more than a few seconds,
> and if that's the case then we could try to figure out why that happens.
>
>>
>> I have suspected that kwin is waiting for some event, but nevertheless
>> tried bisecting the kernel between 4.3 and 4.4, which lead to:
>>
>> # first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use
>> vblank timestamps to guesstimate how many vblanks were missed
>>
>> I can confirm that 4.4 works if I revert the following commits:
>> 63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v3)"
>>
>> d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v2)"
>
> The sha1s don't seem to match what I have, so not sure which kernel tree
> you have, but looking at the radeon commit at least one thing
> immediately caught my attention;
>
> +                       /* Bump counter if we are at >= leading edge of vblank,
> +                        * but before vsync where vpos would turn negative and
> +                        * the hw counter really increments.
> +                        */
> +                       if (vpos >= 0)
> +                               count++;
>
> It's rather hard to see what it's really doing since the custom flags to
> the get_scanout_position now cause it return non-standard things. But if
> I'm reading things correctly it should really say something like:
>
> if (vpos >= 0 && vpos < (vsync_start - vblank_start))
> 	count++;
>
> Hmm. Actually even that might not be correct since it could be using the
> "fake" vblank start here, so might be it'd need to be something like:
>
> if (vpos >= 0 && vpos < (vsync_start - vblank_start + lb_vblank_lead_lines)
> 	count++;
>

The current code should be correct. vpos here returns the distance of hw 
vertical scanout position to the start of vblank. According to Alex and 
Harry Wentland of AMD's display team, and my testing of my two cards the 
hw vertical scanout position resets to zero at start line of vsync, 
therefore the "vpos" in that code becomes negative at start of vsync. At 
the same time the hw frame counter increments by one, making that 
"count++" to bump the returned count by +1 no longer neccessary.

If the reset of hw vertical scanout pos to zero and the increment of hw 
frame counter wouldn't happen at exactly the same time at start of vsync 
i could see how that could cause two successive queries of 
driver->get_vblank_counter() could report a count of N+1 and then N if 
the timing of both calls would be just perfectly right. That would cause 
the DRM code to falsely detect counter wraparound and jump the vblank 
counter forward by 2^24.

My tested gpu's had DCE-3 or DCE-4 display engines, Caicos has DCE-5, so 
could this be some hw quirk for DCE-5?


> Also might be worth a shot to just ignore the hw frame counter. Eg.:
>
> index e266ffc520d2..db728580549a 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -492,7 +492,6 @@ static struct drm_driver kms_driver = {
>          .lastclose = radeon_driver_lastclose_kms,
>          .set_busid = drm_pci_set_busid,
>          .unload = radeon_driver_unload_kms,
> -       .get_vblank_counter = radeon_get_vblank_counter_kms,
>          .enable_vblank = radeon_enable_vblank_kms,
>          .disable_vblank = radeon_disable_vblank_kms,
>          .get_vblank_timestamp = radeon_get_vblank_timestamp_kms,
> diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> index 979f3bf65f2c..3c5fcab74152 100644
> --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> @@ -152,11 +152,6 @@ int radeon_driver_irq_postinstall_kms(struct drm_device *dev)
>   {
>          struct radeon_device *rdev = dev->dev_private;
>
> -       if (ASIC_IS_AVIVO(rdev))
> -               dev->max_vblank_count = 0x00ffffff;
> -       else
> -               dev->max_vblank_count = 0x001fffff;
> -
>          return 0;
>   }
>
> assuming I'm reading the code correctly.
>
>>
>> 31ace027c9f1f8e0a2b09bbf961e4db7b1f6cf19 "drm: Don't zero vblank
>> timestamps from the irq handler"
>>
>> ac0567a4b132fa66e3edf3f913938af9daf7f916 "drm: Add DRM_DEBUG_VBL()"
>>
>> 4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps to
>> guesstimate how many vblanks were missed"
>>
>> All clean reverts, just needs some fixup on top to use abs() instead of
>> abs64() due to 79211c8ed19c055ca105502c8733800d442a0ae6.
>>
>> Unfortunately I don't know if this is a kernel problem or kwin problem.
>> I tried to CC maintainers of both, advices what to try or what info to
>> provide welcome. The card is "CAICOS" with 1GB memory.
>>

I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank" 
should probably give useful info around the time of the hang.

Maybe also check XOrg.0.log for (WW) warnings related to flip.

thanks,
-mario


>> Thanks,
>> Vlastimil
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-16  4:24     ` Mario Kleiner
  (?)
@ 2016-01-18 10:49     ` Vlastimil Babka
  2016-01-18 14:06       ` Vlastimil Babka
                         ` (2 more replies)
  -1 siblings, 3 replies; 59+ messages in thread
From: Vlastimil Babka @ 2016-01-18 10:49 UTC (permalink / raw)
  To: Mario Kleiner, Ville Syrjälä
  Cc: Alex Deucher, Christian König, Daniel Vetter, mgraesslin,
	David Airlie, dri-devel, LKML, kwin

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>
>
> On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
>> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>
> I'm currently running...
>
> while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done
>
> ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
> can't trigger a hang after hundreds of runs.
>
> Does this also hang for you?

No, test mode seems to be fine.

> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
> should probably give useful info around the time of the hang.

Attached. Captured by having kdm running, switching to console, running 
"dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see 
frozen splashscreen, switch back, terminate dmesg. So somewhere around 
the middle there should be where ksplashscreen starts...

> Maybe also check XOrg.0.log for (WW) warnings related to flip.

No such warnings there.

> thanks,
> -mario
>
>
>>> Thanks,
>>> Vlastimil
>>


[-- Attachment #2: dmesg.gz --]
[-- Type: application/gzip, Size: 77399 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-18 10:49     ` Vlastimil Babka
@ 2016-01-18 14:06       ` Vlastimil Babka
  2016-01-18 14:14           ` Christian König
  2016-01-20 20:25         ` Vlastimil Babka
  2016-01-20 20:32         ` Mario Kleiner
  2 siblings, 1 reply; 59+ messages in thread
From: Vlastimil Babka @ 2016-01-18 14:06 UTC (permalink / raw)
  To: Mario Kleiner, Ville Syrjälä
  Cc: Alex Deucher, Christian König, Daniel Vetter, mgraesslin,
	David Airlie, dri-devel, LKML, kwin, Thomas Lübking

On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>> should probably give useful info around the time of the hang.
> 
> Attached. Captured by having kdm running, switching to console, running
> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
> frozen splashscreen, switch back, terminate dmesg. So somewhere around
> the middle there should be where ksplashscreen starts...
> 
>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
> 
> No such warnings there.

This is how gdb backtraces look like from the 4 threads of ksplashqml that's stuck.
Thread 3 seems to be waiting on some response to radeon's ioctl?

(gdb) info threads
  Id   Target Id         Frame
  4    Thread 0x7feb296f5700 (LWP 3643) "QXcbEventReader" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  3    Thread 0x7feb199f8700 (LWP 3644) "ksplashqml" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  2    Thread 0x7feb18ff2700 (LWP 3645) "QQmlThread" 0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
* 1    Thread 0x7feb3b79f8c0 (LWP 3642) "ksplashqml" 0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
(gdb) bt
#0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007feb32509432 in poll (__timeout=-1, __nfds=1, __fds=0x7ffce30ffb50) at /usr/include/bits/poll2.h:46
#2  _xcb_conn_wait (c=c@entry=0x17e25c0, cond=cond@entry=0x7ffce30ffc70, vector=vector@entry=0x0, count=count@entry=0x0) at xcb_conn.c:459
#3  0x00007feb3250ad57 in wait_for_reply (c=c@entry=0x17e25c0, request=request@entry=883, e=e@entry=0x7ffce30ffd48) at xcb_in.c:516
#4  0x00007feb3250aec1 in xcb_wait_for_reply64 (c=c@entry=0x17e25c0, request=883, e=e@entry=0x7ffce30ffd48) at xcb_in.c:560
#5  0x00007feb32b80300 in _XReply (dpy=dpy@entry=0x17e12c0, rep=rep@entry=0x7ffce30ffdc0, extra=extra@entry=0, discard=discard@entry=0) at xcb_io.c:596
#6  0x00007feb36eda712 in DRI2GetBuffersWithFormat (dpy=0x17e12c0, drawable=12582924, width=width@entry=0x181e528, height=height@entry=0x181e52c,
    attachments=0x7ffce30fff10, count=1, outCount=0x7ffce30ffef0) at dri2.c:491
#7  0x00007feb36edaa17 in dri2GetBuffersWithFormat (driDrawable=<optimized out>, width=0x181e528, height=0x181e52c, attachments=<optimized out>,
    count=<optimized out>, out_count=0x7ffce30ffef0, loaderPrivate=0x1fb1290) at dri2_glx.c:900
#8  0x00007feb20132618 in dri2_drawable_get_buffers (count=<synthetic pointer>, atts=0x1817da0, drawable=0x1816d20) at dri2.c:213
#9  dri2_allocate_textures (ctx=0x1a453d0, drawable=0x1816d20, statts=0x1817da0, statts_count=2) at dri2.c:407
#10 0x00007feb2012f17c in dri_st_framebuffer_validate (stctx=<optimized out>, stfbi=<optimized out>, statts=0x1817da0, count=2, out=0x7ffce3100050)
    at dri_drawable.c:83
#11 0x00007feb2005b5fe in st_framebuffer_validate (stfb=0x1817940, st=st@entry=0x1b11f20) at state_tracker/st_manager.c:200
#12 0x00007feb2005c88e in st_api_make_current (stapi=<optimized out>, stctxi=0x1b11f20, stdrawi=0x1816d20, streadi=0x1816d20) at state_tracker/st_manager.c:831
#13 0x00007feb2012ecd1 in dri_make_current (cPriv=<optimized out>, driDrawPriv=0x181e500, driReadPriv=0x181e500) at dri_context.c:245
#14 0x00007feb2012dcb6 in driBindContext (pcp=<optimized out>, pdp=<optimized out>, prp=<optimized out>) at dri_util.c:531
#15 0x00007feb36edc38b in dri2_bind_context (context=0x1a70960, old=<optimized out>, draw=12582924, read=12582924) at dri2_glx.c:160
#16 0x00007feb36eb99b7 in MakeContextCurrent (dpy=0x17e12c0, draw=draw@entry=12582924, read=read@entry=12582924, gc_user=0x1a70960) at glxcurrent.c:228
#17 0x00007feb36eb9b3b in glXMakeCurrent (dpy=<optimized out>, draw=draw@entry=12582924, gc=<optimized out>) at glxcurrent.c:262
#18 0x00007feb288d9a2d in QGLXContext::makeCurrent (this=0x1a48760, surface=0x1a0ac40) at qglxintegration.cpp:476
#19 0x00007feb3a0f8750 in QOpenGLContext::makeCurrent (this=0x18401e0, surface=0x1842d90) at kernel/qopenglcontext.cpp:936
#20 0x00007feb3af63aef in QSGGuiThreadRenderLoop::renderWindow (this=this@entry=0x1913f50, window=0x1842d80)
    at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/quick/scenegraph/qsgrenderloop.cpp:341
#21 0x00007feb3af64d11 in QSGGuiThreadRenderLoop::event (this=0x1913f50, e=<optimized out>)
    at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/quick/scenegraph/qsgrenderloop.cpp:474
#22 0x00007feb39b7fbd9 in QCoreApplication::notify (this=<optimized out>, receiver=<optimized out>, event=<optimized out>) at kernel/qcoreapplication.cpp:1038
#23 0x00007feb39b7fcf3 in QCoreApplication::notifyInternal (this=0x7ffce3100740, receiver=0x1913f50, event=event@entry=0x7ffce31004c0)
    at kernel/qcoreapplication.cpp:965
#24 0x00007feb39bd23bd in sendEvent (event=0x7ffce31004c0, receiver=<optimized out>) at ../../src/corelib/kernel/qcoreapplication.h:224
#25 QTimerInfoList::activateTimers (this=0x183e220) at kernel/qtimerinfo_unix.cpp:637
#26 0x00007feb39bd2909 in timerSourceDispatch (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:177
#27 idleTimerSourceDispatch (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:224
#28 0x00007feb35f4c097 in g_main_dispatch (context=0x7feb240016f0) at gmain.c:3154
#29 g_main_context_dispatch (context=context@entry=0x7feb240016f0) at gmain.c:3769
#30 0x00007feb35f4c2c8 in g_main_context_iterate (context=context@entry=0x7feb240016f0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at gmain.c:3840
#31 0x00007feb35f4c36c in g_main_context_iteration (context=0x7feb240016f0, may_block=may_block@entry=1) at gmain.c:3901
#32 0x00007feb39bd350f in QEventDispatcherGlib::processEvents (this=0x183f080, flags=...) at kernel/qeventdispatcher_glib.cpp:418
#33 0x00007feb39b7d63a in QEventLoop::exec (this=this@entry=0x7ffce31006e0, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
#34 0x00007feb39b852fd in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1229
#35 0x00007feb3a0bb53c in QGuiApplication::exec () at kernel/qguiapplication.cpp:1527
#36 0x0000000000405ce1 in main (argc=3, argv=0x7ffce3100878) at /usr/src/debug/plasma-workspace-5.5.2/ksplash/ksplashqml/main.cpp:98

(gdb) thread 2
[Switching to thread 2 (Thread 0x7feb18ff2700 (LWP 3645))]
#0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
84      in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007feb35f4c264 in g_main_context_poll (priority=2147483647, n_fds=1, fds=0x7feb14003070, timeout=<optimized out>, context=0x7feb14000990)
    at gmain.c:4135
#2  g_main_context_iterate (context=context@entry=0x7feb14000990, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3835
#3  0x00007feb35f4c36c in g_main_context_iteration (context=0x7feb14000990, may_block=may_block@entry=1) at gmain.c:3901
#4  0x00007feb39bd350f in QEventDispatcherGlib::processEvents (this=0x7feb140008c0, flags=...) at kernel/qeventdispatcher_glib.cpp:418
#5  0x00007feb39b7d63a in QEventLoop::exec (this=this@entry=0x7feb18ff1cf0, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
#6  0x00007feb399a9b1c in QThread::exec (this=this@entry=0x184dc00) at thread/qthread.cpp:503
#7  0x00007feb38c799a5 in QQmlThreadPrivate::run (this=0x184dc00) at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/qml/qml/ftw/qqmlthread.cpp:141
#8  0x00007feb399ae94f in QThreadPrivate::start (arg=0x184dc00) at thread/qthread_unix.cpp:331
#9  0x00007feb37a3d4a4 in start_thread (arg=0x7feb18ff2700) at pthread_create.c:334
#10 0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) thread 3
[Switching to thread 3 (Thread 0x7feb199f8700 (LWP 3644))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007feb20496e63 in cnd_wait (mtx=0x18abc90, cond=0x18abcb8) at ../../../../../include/c11/threads_posix.h:159
#2  pipe_semaphore_wait (sema=0x18abc90) at ../../../../../src/gallium/auxiliary/os/os_thread.h:259
#3  radeon_drm_cs_emit_ioctl (param=param@entry=0x18ab940) at radeon_drm_winsys.c:653
#4  0x00007feb204966a7 in impl_thrd_routine (p=<optimized out>) at ../../../../../include/c11/threads_posix.h:87
#5  0x00007feb37a3d4a4 in start_thread (arg=0x7feb199f8700) at pthread_create.c:334
#6  0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) thread 4
[Switching to thread 4 (Thread 0x7feb296f5700 (LWP 3643))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007feb32509689 in _xcb_conn_wait (c=c@entry=0x17e25c0, cond=cond@entry=0x17e2600, vector=vector@entry=0x0, count=count@entry=0x0) at xcb_conn.c:427
#2  0x00007feb3250b007 in xcb_wait_for_event (c=0x17e25c0) at xcb_in.c:693
#3  0x00007feb2ba48e29 in QXcbEventReader::run (this=0x17f55d0) at qxcbconnection.cpp:1229
#4  0x00007feb399ae94f in QThreadPrivate::start (arg=0x17f55d0) at thread/qthread_unix.cpp:331
#5  0x00007feb37a3d4a4 in start_thread (arg=0x7feb296f5700) at pthread_create.c:334
#6  0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-18 14:06       ` Vlastimil Babka
@ 2016-01-18 14:14           ` Christian König
  0 siblings, 0 replies; 59+ messages in thread
From: Christian König @ 2016-01-18 14:14 UTC (permalink / raw)
  To: Vlastimil Babka, Mario Kleiner, Ville Syrjälä
  Cc: Alex Deucher, Daniel Vetter, mgraesslin, David Airlie, dri-devel,
	LKML, kwin, Thomas Lübking

> Thread 3 seems to be waiting on some response to radeon's ioctl?
That's just the worker thread waiting for something to do. At least for 
this case you can ignore it.

The interesting one is calling DRI2GetBuffersWithFormat and waiting for 
a reply. For some reason the X server seems to be stuck waiting (most 
likely) for a page flip.

Regards,
Christian.

Am 18.01.2016 um 15:06 schrieb Vlastimil Babka:
> On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
>> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>>> should probably give useful info around the time of the hang.
>> Attached. Captured by having kdm running, switching to console, running
>> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
>> frozen splashscreen, switch back, terminate dmesg. So somewhere around
>> the middle there should be where ksplashscreen starts...
>>
>>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
>> No such warnings there.
> This is how gdb backtraces look like from the 4 threads of ksplashqml that's stuck.
> Thread 3 seems to be waiting on some response to radeon's ioctl?
>
> (gdb) info threads
>    Id   Target Id         Frame
>    4    Thread 0x7feb296f5700 (LWP 3643) "QXcbEventReader" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>    3    Thread 0x7feb199f8700 (LWP 3644) "ksplashqml" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>    2    Thread 0x7feb18ff2700 (LWP 3645) "QQmlThread" 0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> * 1    Thread 0x7feb3b79f8c0 (LWP 3642) "ksplashqml" 0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> (gdb) bt
> #0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00007feb32509432 in poll (__timeout=-1, __nfds=1, __fds=0x7ffce30ffb50) at /usr/include/bits/poll2.h:46
> #2  _xcb_conn_wait (c=c@entry=0x17e25c0, cond=cond@entry=0x7ffce30ffc70, vector=vector@entry=0x0, count=count@entry=0x0) at xcb_conn.c:459
> #3  0x00007feb3250ad57 in wait_for_reply (c=c@entry=0x17e25c0, request=request@entry=883, e=e@entry=0x7ffce30ffd48) at xcb_in.c:516
> #4  0x00007feb3250aec1 in xcb_wait_for_reply64 (c=c@entry=0x17e25c0, request=883, e=e@entry=0x7ffce30ffd48) at xcb_in.c:560
> #5  0x00007feb32b80300 in _XReply (dpy=dpy@entry=0x17e12c0, rep=rep@entry=0x7ffce30ffdc0, extra=extra@entry=0, discard=discard@entry=0) at xcb_io.c:596
> #6  0x00007feb36eda712 in DRI2GetBuffersWithFormat (dpy=0x17e12c0, drawable=12582924, width=width@entry=0x181e528, height=height@entry=0x181e52c,
>      attachments=0x7ffce30fff10, count=1, outCount=0x7ffce30ffef0) at dri2.c:491
> #7  0x00007feb36edaa17 in dri2GetBuffersWithFormat (driDrawable=<optimized out>, width=0x181e528, height=0x181e52c, attachments=<optimized out>,
>      count=<optimized out>, out_count=0x7ffce30ffef0, loaderPrivate=0x1fb1290) at dri2_glx.c:900
> #8  0x00007feb20132618 in dri2_drawable_get_buffers (count=<synthetic pointer>, atts=0x1817da0, drawable=0x1816d20) at dri2.c:213
> #9  dri2_allocate_textures (ctx=0x1a453d0, drawable=0x1816d20, statts=0x1817da0, statts_count=2) at dri2.c:407
> #10 0x00007feb2012f17c in dri_st_framebuffer_validate (stctx=<optimized out>, stfbi=<optimized out>, statts=0x1817da0, count=2, out=0x7ffce3100050)
>      at dri_drawable.c:83
> #11 0x00007feb2005b5fe in st_framebuffer_validate (stfb=0x1817940, st=st@entry=0x1b11f20) at state_tracker/st_manager.c:200
> #12 0x00007feb2005c88e in st_api_make_current (stapi=<optimized out>, stctxi=0x1b11f20, stdrawi=0x1816d20, streadi=0x1816d20) at state_tracker/st_manager.c:831
> #13 0x00007feb2012ecd1 in dri_make_current (cPriv=<optimized out>, driDrawPriv=0x181e500, driReadPriv=0x181e500) at dri_context.c:245
> #14 0x00007feb2012dcb6 in driBindContext (pcp=<optimized out>, pdp=<optimized out>, prp=<optimized out>) at dri_util.c:531
> #15 0x00007feb36edc38b in dri2_bind_context (context=0x1a70960, old=<optimized out>, draw=12582924, read=12582924) at dri2_glx.c:160
> #16 0x00007feb36eb99b7 in MakeContextCurrent (dpy=0x17e12c0, draw=draw@entry=12582924, read=read@entry=12582924, gc_user=0x1a70960) at glxcurrent.c:228
> #17 0x00007feb36eb9b3b in glXMakeCurrent (dpy=<optimized out>, draw=draw@entry=12582924, gc=<optimized out>) at glxcurrent.c:262
> #18 0x00007feb288d9a2d in QGLXContext::makeCurrent (this=0x1a48760, surface=0x1a0ac40) at qglxintegration.cpp:476
> #19 0x00007feb3a0f8750 in QOpenGLContext::makeCurrent (this=0x18401e0, surface=0x1842d90) at kernel/qopenglcontext.cpp:936
> #20 0x00007feb3af63aef in QSGGuiThreadRenderLoop::renderWindow (this=this@entry=0x1913f50, window=0x1842d80)
>      at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/quick/scenegraph/qsgrenderloop.cpp:341
> #21 0x00007feb3af64d11 in QSGGuiThreadRenderLoop::event (this=0x1913f50, e=<optimized out>)
>      at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/quick/scenegraph/qsgrenderloop.cpp:474
> #22 0x00007feb39b7fbd9 in QCoreApplication::notify (this=<optimized out>, receiver=<optimized out>, event=<optimized out>) at kernel/qcoreapplication.cpp:1038
> #23 0x00007feb39b7fcf3 in QCoreApplication::notifyInternal (this=0x7ffce3100740, receiver=0x1913f50, event=event@entry=0x7ffce31004c0)
>      at kernel/qcoreapplication.cpp:965
> #24 0x00007feb39bd23bd in sendEvent (event=0x7ffce31004c0, receiver=<optimized out>) at ../../src/corelib/kernel/qcoreapplication.h:224
> #25 QTimerInfoList::activateTimers (this=0x183e220) at kernel/qtimerinfo_unix.cpp:637
> #26 0x00007feb39bd2909 in timerSourceDispatch (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:177
> #27 idleTimerSourceDispatch (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:224
> #28 0x00007feb35f4c097 in g_main_dispatch (context=0x7feb240016f0) at gmain.c:3154
> #29 g_main_context_dispatch (context=context@entry=0x7feb240016f0) at gmain.c:3769
> #30 0x00007feb35f4c2c8 in g_main_context_iterate (context=context@entry=0x7feb240016f0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
>      at gmain.c:3840
> #31 0x00007feb35f4c36c in g_main_context_iteration (context=0x7feb240016f0, may_block=may_block@entry=1) at gmain.c:3901
> #32 0x00007feb39bd350f in QEventDispatcherGlib::processEvents (this=0x183f080, flags=...) at kernel/qeventdispatcher_glib.cpp:418
> #33 0x00007feb39b7d63a in QEventLoop::exec (this=this@entry=0x7ffce31006e0, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
> #34 0x00007feb39b852fd in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1229
> #35 0x00007feb3a0bb53c in QGuiApplication::exec () at kernel/qguiapplication.cpp:1527
> #36 0x0000000000405ce1 in main (argc=3, argv=0x7ffce3100878) at /usr/src/debug/plasma-workspace-5.5.2/ksplash/ksplashqml/main.cpp:98
>
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7feb18ff2700 (LWP 3645))]
> #0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> 84      in ../sysdeps/unix/syscall-template.S
> (gdb) bt
> #0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00007feb35f4c264 in g_main_context_poll (priority=2147483647, n_fds=1, fds=0x7feb14003070, timeout=<optimized out>, context=0x7feb14000990)
>      at gmain.c:4135
> #2  g_main_context_iterate (context=context@entry=0x7feb14000990, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3835
> #3  0x00007feb35f4c36c in g_main_context_iteration (context=0x7feb14000990, may_block=may_block@entry=1) at gmain.c:3901
> #4  0x00007feb39bd350f in QEventDispatcherGlib::processEvents (this=0x7feb140008c0, flags=...) at kernel/qeventdispatcher_glib.cpp:418
> #5  0x00007feb39b7d63a in QEventLoop::exec (this=this@entry=0x7feb18ff1cf0, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
> #6  0x00007feb399a9b1c in QThread::exec (this=this@entry=0x184dc00) at thread/qthread.cpp:503
> #7  0x00007feb38c799a5 in QQmlThreadPrivate::run (this=0x184dc00) at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/qml/qml/ftw/qqmlthread.cpp:141
> #8  0x00007feb399ae94f in QThreadPrivate::start (arg=0x184dc00) at thread/qthread_unix.cpp:331
> #9  0x00007feb37a3d4a4 in start_thread (arg=0x7feb18ff2700) at pthread_create.c:334
> #10 0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7feb199f8700 (LWP 3644))]
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> 185     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
> (gdb) bt
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007feb20496e63 in cnd_wait (mtx=0x18abc90, cond=0x18abcb8) at ../../../../../include/c11/threads_posix.h:159
> #2  pipe_semaphore_wait (sema=0x18abc90) at ../../../../../src/gallium/auxiliary/os/os_thread.h:259
> #3  radeon_drm_cs_emit_ioctl (param=param@entry=0x18ab940) at radeon_drm_winsys.c:653
> #4  0x00007feb204966a7 in impl_thrd_routine (p=<optimized out>) at ../../../../../include/c11/threads_posix.h:87
> #5  0x00007feb37a3d4a4 in start_thread (arg=0x7feb199f8700) at pthread_create.c:334
> #6  0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> (gdb) thread 4
> [Switching to thread 4 (Thread 0x7feb296f5700 (LWP 3643))]
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> 185     in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
> (gdb) bt
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007feb32509689 in _xcb_conn_wait (c=c@entry=0x17e25c0, cond=cond@entry=0x17e2600, vector=vector@entry=0x0, count=count@entry=0x0) at xcb_conn.c:427
> #2  0x00007feb3250b007 in xcb_wait_for_event (c=0x17e25c0) at xcb_in.c:693
> #3  0x00007feb2ba48e29 in QXcbEventReader::run (this=0x17f55d0) at qxcbconnection.cpp:1229
> #4  0x00007feb399ae94f in QThreadPrivate::start (arg=0x17f55d0) at thread/qthread_unix.cpp:331
> #5  0x00007feb37a3d4a4 in start_thread (arg=0x7feb296f5700) at pthread_create.c:334
> #6  0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-18 14:14           ` Christian König
  0 siblings, 0 replies; 59+ messages in thread
From: Christian König @ 2016-01-18 14:14 UTC (permalink / raw)
  To: Vlastimil Babka, Mario Kleiner, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Thomas Lübking

> Thread 3 seems to be waiting on some response to radeon's ioctl?
That's just the worker thread waiting for something to do. At least for 
this case you can ignore it.

The interesting one is calling DRI2GetBuffersWithFormat and waiting for 
a reply. For some reason the X server seems to be stuck waiting (most 
likely) for a page flip.

Regards,
Christian.

Am 18.01.2016 um 15:06 schrieb Vlastimil Babka:
> On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
>> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>>> should probably give useful info around the time of the hang.
>> Attached. Captured by having kdm running, switching to console, running
>> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
>> frozen splashscreen, switch back, terminate dmesg. So somewhere around
>> the middle there should be where ksplashscreen starts...
>>
>>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
>> No such warnings there.
> This is how gdb backtraces look like from the 4 threads of ksplashqml that's stuck.
> Thread 3 seems to be waiting on some response to radeon's ioctl?
>
> (gdb) info threads
>    Id   Target Id         Frame
>    4    Thread 0x7feb296f5700 (LWP 3643) "QXcbEventReader" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>    3    Thread 0x7feb199f8700 (LWP 3644) "ksplashqml" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>    2    Thread 0x7feb18ff2700 (LWP 3645) "QQmlThread" 0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> * 1    Thread 0x7feb3b79f8c0 (LWP 3642) "ksplashqml" 0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> (gdb) bt
> #0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00007feb32509432 in poll (__timeout=-1, __nfds=1, __fds=0x7ffce30ffb50) at /usr/include/bits/poll2.h:46
> #2  _xcb_conn_wait (c=c@entry=0x17e25c0, cond=cond@entry=0x7ffce30ffc70, vector=vector@entry=0x0, count=count@entry=0x0) at xcb_conn.c:459
> #3  0x00007feb3250ad57 in wait_for_reply (c=c@entry=0x17e25c0, request=request@entry=883, e=e@entry=0x7ffce30ffd48) at xcb_in.c:516
> #4  0x00007feb3250aec1 in xcb_wait_for_reply64 (c=c@entry=0x17e25c0, request=883, e=e@entry=0x7ffce30ffd48) at xcb_in.c:560
> #5  0x00007feb32b80300 in _XReply (dpy=dpy@entry=0x17e12c0, rep=rep@entry=0x7ffce30ffdc0, extra=extra@entry=0, discard=discard@entry=0) at xcb_io.c:596
> #6  0x00007feb36eda712 in DRI2GetBuffersWithFormat (dpy=0x17e12c0, drawable=12582924, width=width@entry=0x181e528, height=height@entry=0x181e52c,
>      attachments=0x7ffce30fff10, count=1, outCount=0x7ffce30ffef0) at dri2.c:491
> #7  0x00007feb36edaa17 in dri2GetBuffersWithFormat (driDrawable=<optimized out>, width=0x181e528, height=0x181e52c, attachments=<optimized out>,
>      count=<optimized out>, out_count=0x7ffce30ffef0, loaderPrivate=0x1fb1290) at dri2_glx.c:900
> #8  0x00007feb20132618 in dri2_drawable_get_buffers (count=<synthetic pointer>, atts=0x1817da0, drawable=0x1816d20) at dri2.c:213
> #9  dri2_allocate_textures (ctx=0x1a453d0, drawable=0x1816d20, statts=0x1817da0, statts_count=2) at dri2.c:407
> #10 0x00007feb2012f17c in dri_st_framebuffer_validate (stctx=<optimized out>, stfbi=<optimized out>, statts=0x1817da0, count=2, out=0x7ffce3100050)
>      at dri_drawable.c:83
> #11 0x00007feb2005b5fe in st_framebuffer_validate (stfb=0x1817940, st=st@entry=0x1b11f20) at state_tracker/st_manager.c:200
> #12 0x00007feb2005c88e in st_api_make_current (stapi=<optimized out>, stctxi=0x1b11f20, stdrawi=0x1816d20, streadi=0x1816d20) at state_tracker/st_manager.c:831
> #13 0x00007feb2012ecd1 in dri_make_current (cPriv=<optimized out>, driDrawPriv=0x181e500, driReadPriv=0x181e500) at dri_context.c:245
> #14 0x00007feb2012dcb6 in driBindContext (pcp=<optimized out>, pdp=<optimized out>, prp=<optimized out>) at dri_util.c:531
> #15 0x00007feb36edc38b in dri2_bind_context (context=0x1a70960, old=<optimized out>, draw=12582924, read=12582924) at dri2_glx.c:160
> #16 0x00007feb36eb99b7 in MakeContextCurrent (dpy=0x17e12c0, draw=draw@entry=12582924, read=read@entry=12582924, gc_user=0x1a70960) at glxcurrent.c:228
> #17 0x00007feb36eb9b3b in glXMakeCurrent (dpy=<optimized out>, draw=draw@entry=12582924, gc=<optimized out>) at glxcurrent.c:262
> #18 0x00007feb288d9a2d in QGLXContext::makeCurrent (this=0x1a48760, surface=0x1a0ac40) at qglxintegration.cpp:476
> #19 0x00007feb3a0f8750 in QOpenGLContext::makeCurrent (this=0x18401e0, surface=0x1842d90) at kernel/qopenglcontext.cpp:936
> #20 0x00007feb3af63aef in QSGGuiThreadRenderLoop::renderWindow (this=this@entry=0x1913f50, window=0x1842d80)
>      at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/quick/scenegraph/qsgrenderloop.cpp:341
> #21 0x00007feb3af64d11 in QSGGuiThreadRenderLoop::event (this=0x1913f50, e=<optimized out>)
>      at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/quick/scenegraph/qsgrenderloop.cpp:474
> #22 0x00007feb39b7fbd9 in QCoreApplication::notify (this=<optimized out>, receiver=<optimized out>, event=<optimized out>) at kernel/qcoreapplication.cpp:1038
> #23 0x00007feb39b7fcf3 in QCoreApplication::notifyInternal (this=0x7ffce3100740, receiver=0x1913f50, event=event@entry=0x7ffce31004c0)
>      at kernel/qcoreapplication.cpp:965
> #24 0x00007feb39bd23bd in sendEvent (event=0x7ffce31004c0, receiver=<optimized out>) at ../../src/corelib/kernel/qcoreapplication.h:224
> #25 QTimerInfoList::activateTimers (this=0x183e220) at kernel/qtimerinfo_unix.cpp:637
> #26 0x00007feb39bd2909 in timerSourceDispatch (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:177
> #27 idleTimerSourceDispatch (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:224
> #28 0x00007feb35f4c097 in g_main_dispatch (context=0x7feb240016f0) at gmain.c:3154
> #29 g_main_context_dispatch (context=context@entry=0x7feb240016f0) at gmain.c:3769
> #30 0x00007feb35f4c2c8 in g_main_context_iterate (context=context@entry=0x7feb240016f0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
>      at gmain.c:3840
> #31 0x00007feb35f4c36c in g_main_context_iteration (context=0x7feb240016f0, may_block=may_block@entry=1) at gmain.c:3901
> #32 0x00007feb39bd350f in QEventDispatcherGlib::processEvents (this=0x183f080, flags=...) at kernel/qeventdispatcher_glib.cpp:418
> #33 0x00007feb39b7d63a in QEventLoop::exec (this=this@entry=0x7ffce31006e0, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
> #34 0x00007feb39b852fd in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1229
> #35 0x00007feb3a0bb53c in QGuiApplication::exec () at kernel/qguiapplication.cpp:1527
> #36 0x0000000000405ce1 in main (argc=3, argv=0x7ffce3100878) at /usr/src/debug/plasma-workspace-5.5.2/ksplash/ksplashqml/main.cpp:98
>
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7feb18ff2700 (LWP 3645))]
> #0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> 84      in ../sysdeps/unix/syscall-template.S
> (gdb) bt
> #0  0x00007feb392bd24d in poll () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00007feb35f4c264 in g_main_context_poll (priority=2147483647, n_fds=1, fds=0x7feb14003070, timeout=<optimized out>, context=0x7feb14000990)
>      at gmain.c:4135
> #2  g_main_context_iterate (context=context@entry=0x7feb14000990, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3835
> #3  0x00007feb35f4c36c in g_main_context_iteration (context=0x7feb14000990, may_block=may_block@entry=1) at gmain.c:3901
> #4  0x00007feb39bd350f in QEventDispatcherGlib::processEvents (this=0x7feb140008c0, flags=...) at kernel/qeventdispatcher_glib.cpp:418
> #5  0x00007feb39b7d63a in QEventLoop::exec (this=this@entry=0x7feb18ff1cf0, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
> #6  0x00007feb399a9b1c in QThread::exec (this=this@entry=0x184dc00) at thread/qthread.cpp:503
> #7  0x00007feb38c799a5 in QQmlThreadPrivate::run (this=0x184dc00) at /usr/src/debug/qtdeclarative-opensource-src-5.5.1/src/qml/qml/ftw/qqmlthread.cpp:141
> #8  0x00007feb399ae94f in QThreadPrivate::start (arg=0x184dc00) at thread/qthread_unix.cpp:331
> #9  0x00007feb37a3d4a4 in start_thread (arg=0x7feb18ff2700) at pthread_create.c:334
> #10 0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7feb199f8700 (LWP 3644))]
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> 185     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
> (gdb) bt
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007feb20496e63 in cnd_wait (mtx=0x18abc90, cond=0x18abcb8) at ../../../../../include/c11/threads_posix.h:159
> #2  pipe_semaphore_wait (sema=0x18abc90) at ../../../../../src/gallium/auxiliary/os/os_thread.h:259
> #3  radeon_drm_cs_emit_ioctl (param=param@entry=0x18ab940) at radeon_drm_winsys.c:653
> #4  0x00007feb204966a7 in impl_thrd_routine (p=<optimized out>) at ../../../../../include/c11/threads_posix.h:87
> #5  0x00007feb37a3d4a4 in start_thread (arg=0x7feb199f8700) at pthread_create.c:334
> #6  0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> (gdb) thread 4
> [Switching to thread 4 (Thread 0x7feb296f5700 (LWP 3643))]
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> 185     in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
> (gdb) bt
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00007feb32509689 in _xcb_conn_wait (c=c@entry=0x17e25c0, cond=cond@entry=0x17e2600, vector=vector@entry=0x0, count=count@entry=0x0) at xcb_conn.c:427
> #2  0x00007feb3250b007 in xcb_wait_for_event (c=0x17e25c0) at xcb_in.c:693
> #3  0x00007feb2ba48e29 in QXcbEventReader::run (this=0x17f55d0) at qxcbconnection.cpp:1229
> #4  0x00007feb399ae94f in QThreadPrivate::start (arg=0x17f55d0) at thread/qthread_unix.cpp:331
> #5  0x00007feb37a3d4a4 in start_thread (arg=0x7feb296f5700) at pthread_create.c:334
> #6  0x00007feb392c5bdd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-18 10:49     ` Vlastimil Babka
@ 2016-01-20 20:25         ` Vlastimil Babka
  2016-01-20 20:25         ` Vlastimil Babka
  2016-01-20 20:32         ` Mario Kleiner
  2 siblings, 0 replies; 59+ messages in thread
From: Vlastimil Babka @ 2016-01-20 20:25 UTC (permalink / raw)
  To: Mario Kleiner, Ville Syrjälä
  Cc: Alex Deucher, Christian König, Daniel Vetter, mgraesslin,
	David Airlie, dri-devel, LKML, kwin

On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>>
>>
>> On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
>>> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>>
>> I'm currently running...
>>
>> while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done
>>
>> ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
>> can't trigger a hang after hundreds of runs.
>>
>> Does this also hang for you?
> 
> No, test mode seems to be fine.
> 
>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>> should probably give useful info around the time of the hang.
> 
> Attached. Captured by having kdm running, switching to console, running 
> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see 
> frozen splashscreen, switch back, terminate dmesg. So somewhere around 
> the middle there should be where ksplashscreen starts...

Hmm this looks suspicious? (!!! mine)

[  538.918990] [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=385876589, diff=1, hw=622 hw_last=621
[  538.918991] [drm:evergreen_irq_process] IH: D2 vblank
[  538.935035] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[  538.935040] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[  538.935041] [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=16808100, diff=1, hw=30885 hw_last=30884
[  538.935042] [drm:evergreen_irq_process] IH: D1 vblank
[  538.939702] [drm:drm_wait_vblank] waiting on vblank count 385876590, crtc 1
[  538.939704] [drm:drm_wait_vblank] returning 385876590 to client
[  538.939709] [drm:drm_wait_vblank] waiting on vblank count 385876590, crtc 1
[  538.939710] [drm:drm_wait_vblank] returning 385876590 to client
!!!538.939715] [drm:drm_queue_vblank_event] event on vblank count 385876591, current 385876590, crtc 1
[  538.944452] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.944453] [drm:drm_wait_vblank] returning 16808101 to client
[  538.944458] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.944460] [drm:drm_wait_vblank] returning 16808101 to client
[  538.944465] [drm:drm_queue_vblank_event] event on vblank count 16808102, current 16808101, crtc 0
[  538.948210] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.948212] [drm:drm_wait_vblank] returning 16808101 to client
[  538.948222] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.948224] [drm:drm_wait_vblank] returning 16808101 to client
[  538.949589] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.949591] [drm:drm_wait_vblank] returning 16808101 to client
[  538.951238] [drm:radeon_get_vblank_counter_kms] crtc 1: dist from vblank start 6
[  538.951245] [drm:radeon_get_vblank_counter_kms] crtc 1: dist from vblank start 7
!!!538.951246] [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=385876590, diff=16776597, hw=3 hw_last=622
[  538.951247] [drm:evergreen_irq_process] IH: D2 vblank
[  538.951746] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 4
[  538.951752] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 4
[  538.951753] [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=16808101, diff=1, hw=30886 hw_last=30885
[  538.951754] [drm:drm_handle_vblank_events] vblank event on 16808102, current 16808102
[  538.951756] [drm:evergreen_irq_process] IH: D1 vblank
[  538.964570] [drm:radeon_get_vblank_counter_kms] crtc 1: dist from vblank start 7
[  538.964581] [drm:radeon_get_vblank_counter_kms] crtc 1: dist from vblank start -1058
[  538.964583] [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=402653187, diff=1, hw=4 hw_last=3

Could it be that the underflow caused some signed logic to misbehave and fail to detect that we passed 385876591?

Later we have another such big skip (but this time nothing waits for it I guess):

[  541.337813] [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=402653363, diff=16777040, hw=3 hw_last=179



>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
> 
> No such warnings there.
> 
>> thanks,
>> -mario
>>
>>
>>>> Thanks,
>>>> Vlastimil
>>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-20 20:25         ` Vlastimil Babka
  0 siblings, 0 replies; 59+ messages in thread
From: Vlastimil Babka @ 2016-01-20 20:25 UTC (permalink / raw)
  To: Mario Kleiner, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>>
>>
>> On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
>>> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>>
>> I'm currently running...
>>
>> while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done
>>
>> ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
>> can't trigger a hang after hundreds of runs.
>>
>> Does this also hang for you?
> 
> No, test mode seems to be fine.
> 
>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>> should probably give useful info around the time of the hang.
> 
> Attached. Captured by having kdm running, switching to console, running 
> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see 
> frozen splashscreen, switch back, terminate dmesg. So somewhere around 
> the middle there should be where ksplashscreen starts...

Hmm this looks suspicious? (!!! mine)

[  538.918990] [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=385876589, diff=1, hw=622 hw_last=621
[  538.918991] [drm:evergreen_irq_process] IH: D2 vblank
[  538.935035] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[  538.935040] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[  538.935041] [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=16808100, diff=1, hw=30885 hw_last=30884
[  538.935042] [drm:evergreen_irq_process] IH: D1 vblank
[  538.939702] [drm:drm_wait_vblank] waiting on vblank count 385876590, crtc 1
[  538.939704] [drm:drm_wait_vblank] returning 385876590 to client
[  538.939709] [drm:drm_wait_vblank] waiting on vblank count 385876590, crtc 1
[  538.939710] [drm:drm_wait_vblank] returning 385876590 to client
!!!538.939715] [drm:drm_queue_vblank_event] event on vblank count 385876591, current 385876590, crtc 1
[  538.944452] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.944453] [drm:drm_wait_vblank] returning 16808101 to client
[  538.944458] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.944460] [drm:drm_wait_vblank] returning 16808101 to client
[  538.944465] [drm:drm_queue_vblank_event] event on vblank count 16808102, current 16808101, crtc 0
[  538.948210] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.948212] [drm:drm_wait_vblank] returning 16808101 to client
[  538.948222] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.948224] [drm:drm_wait_vblank] returning 16808101 to client
[  538.949589] [drm:drm_wait_vblank] waiting on vblank count 16808101, crtc 0
[  538.949591] [drm:drm_wait_vblank] returning 16808101 to client
[  538.951238] [drm:radeon_get_vblank_counter_kms] crtc 1: dist from vblank start 6
[  538.951245] [drm:radeon_get_vblank_counter_kms] crtc 1: dist from vblank start 7
!!!538.951246] [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=385876590, diff=16776597, hw=3 hw_last=622
[  538.951247] [drm:evergreen_irq_process] IH: D2 vblank
[  538.951746] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 4
[  538.951752] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 4
[  538.951753] [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=16808101, diff=1, hw=30886 hw_last=30885
[  538.951754] [drm:drm_handle_vblank_events] vblank event on 16808102, current 16808102
[  538.951756] [drm:evergreen_irq_process] IH: D1 vblank
[  538.964570] [drm:radeon_get_vblank_counter_kms] crtc 1: dist from vblank start 7
[  538.964581] [drm:radeon_get_vblank_counter_kms] crtc 1: dist from vblank start -1058
[  538.964583] [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=402653187, diff=1, hw=4 hw_last=3

Could it be that the underflow caused some signed logic to misbehave and fail to detect that we passed 385876591?

Later we have another such big skip (but this time nothing waits for it I guess):

[  541.337813] [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=402653363, diff=16777040, hw=3 hw_last=179



>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
> 
> No such warnings there.
> 
>> thanks,
>> -mario
>>
>>
>>>> Thanks,
>>>> Vlastimil
>>>
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-18 10:49     ` Vlastimil Babka
@ 2016-01-20 20:32         ` Mario Kleiner
  2016-01-20 20:25         ` Vlastimil Babka
  2016-01-20 20:32         ` Mario Kleiner
  2 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-20 20:32 UTC (permalink / raw)
  To: Vlastimil Babka, Ville Syrjälä
  Cc: Alex Deucher, Christian König, Daniel Vetter, mgraesslin,
	David Airlie, dri-devel, LKML, kwin

On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>>
>>
>> On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
>>> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>>
>> I'm currently running...
>>
>> while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done
>>
>> ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
>> can't trigger a hang after hundreds of runs.
>>
>> Does this also hang for you?
>
> No, test mode seems to be fine.
>
>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>> should probably give useful info around the time of the hang.
>
> Attached. Captured by having kdm running, switching to console, running
> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
> frozen splashscreen, switch back, terminate dmesg. So somewhere around
> the middle there should be where ksplashscreen starts...
>
>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
>
> No such warnings there.
>
>> thanks,
>> -mario
>>
>>
>>>> Thanks,
>>>> Vlastimil
>>>
>

Thanks. So the problem is that AMDs hardware frame counters reset to 
zero during a modeset. The old DRM code dealt with drivers doing that by 
keeping vblank irqs enabled during modesets and incrementing vblank 
count by one during each vblank irq, i think that's what 
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.

The new code in drm_update_vblank_count() breaks this. The reset of the 
counter to zero is treated as counter wraparound, so our software vblank 
counter jumps forward by up to 2^24 counts in response (in case of AMD's 
24 bit hw counters), and then the vblank event handling code in 
drm_handle_vblank_events() and other places detects the counter being 
more than 2^23 counts ahead of queued vblank events and as part of its 
own wraparound handling for the 32-Bit software counter doesn't deliver 
these queued events for a long time -> no vblank swap trigger event -> 
no swap -> client hangs waiting for swap completion.

I think i remember seeing the ksplash progress screen occasionally 
blanking half way through login, i guess that's when kwin triggers a 
modeset in parallel to ksplash doing its OpenGL animations. So depending 
on the hw vblank count at the time of login ksplash would or wouldn't 
hang, apparently i got "lucky" with my counts at login.

-mario

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-20 20:32         ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-20 20:32 UTC (permalink / raw)
  To: Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>>
>>
>> On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
>>> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>>
>> I'm currently running...
>>
>> while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done
>>
>> ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
>> can't trigger a hang after hundreds of runs.
>>
>> Does this also hang for you?
>
> No, test mode seems to be fine.
>
>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>> should probably give useful info around the time of the hang.
>
> Attached. Captured by having kdm running, switching to console, running
> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
> frozen splashscreen, switch back, terminate dmesg. So somewhere around
> the middle there should be where ksplashscreen starts...
>
>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
>
> No such warnings there.
>
>> thanks,
>> -mario
>>
>>
>>>> Thanks,
>>>> Vlastimil
>>>
>

Thanks. So the problem is that AMDs hardware frame counters reset to 
zero during a modeset. The old DRM code dealt with drivers doing that by 
keeping vblank irqs enabled during modesets and incrementing vblank 
count by one during each vblank irq, i think that's what 
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.

The new code in drm_update_vblank_count() breaks this. The reset of the 
counter to zero is treated as counter wraparound, so our software vblank 
counter jumps forward by up to 2^24 counts in response (in case of AMD's 
24 bit hw counters), and then the vblank event handling code in 
drm_handle_vblank_events() and other places detects the counter being 
more than 2^23 counts ahead of queued vblank events and as part of its 
own wraparound handling for the 32-Bit software counter doesn't deliver 
these queued events for a long time -> no vblank swap trigger event -> 
no swap -> client hangs waiting for swap completion.

I think i remember seeing the ksplash progress screen occasionally 
blanking half way through login, i guess that's when kwin triggers a 
modeset in parallel to ksplash doing its OpenGL animations. So depending 
on the hw vblank count at the time of login ksplash would or wouldn't 
hang, apparently i got "lucky" with my counts at login.

-mario
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-20 20:32         ` Mario Kleiner
@ 2016-01-21  3:43           ` Michel Dänzer
  -1 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-21  3:43 UTC (permalink / raw)
  To: Mario Kleiner, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 21.01.2016 05:32, Mario Kleiner wrote:
>
> So the problem is that AMDs hardware frame counters reset to
> zero during a modeset. The old DRM code dealt with drivers doing that by
> keeping vblank irqs enabled during modesets and incrementing vblank
> count by one during each vblank irq, i think that's what
> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.

Right, looks like there's been a regression breaking this. I suspect the
problem is that vblank->last isn't getting updated from
drm_vblank_post_modeset. Not sure which change broke that though, or how
to fix it. Ville?


BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
vblank counters"). I've been meaning to track that down since then; one
of these days hopefully, but if anybody has any ideas offhand...


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21  3:43           ` Michel Dänzer
  0 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-21  3:43 UTC (permalink / raw)
  To: Mario Kleiner, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 21.01.2016 05:32, Mario Kleiner wrote:
>
> So the problem is that AMDs hardware frame counters reset to
> zero during a modeset. The old DRM code dealt with drivers doing that by
> keeping vblank irqs enabled during modesets and incrementing vblank
> count by one during each vblank irq, i think that's what
> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.

Right, looks like there's been a regression breaking this. I suspect the
problem is that vblank->last isn't getting updated from
drm_vblank_post_modeset. Not sure which change broke that though, or how
to fix it. Ville?


BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
vblank counters"). I've been meaning to track that down since then; one
of these days hopefully, but if anybody has any ideas offhand...


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21  3:43           ` Michel Dänzer
@ 2016-01-21  5:31             ` Mario Kleiner
  -1 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-21  5:31 UTC (permalink / raw)
  To: Michel Dänzer, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> On 21.01.2016 05:32, Mario Kleiner wrote:
>>
>> So the problem is that AMDs hardware frame counters reset to
>> zero during a modeset. The old DRM code dealt with drivers doing that by
>> keeping vblank irqs enabled during modesets and incrementing vblank
>> count by one during each vblank irq, i think that's what
>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>
> Right, looks like there's been a regression breaking this. I suspect the
> problem is that vblank->last isn't getting updated from
> drm_vblank_post_modeset. Not sure which change broke that though, or how
> to fix it. Ville?
>

The whole logic has changed and the software counter updates are now 
driven all the time by the hw counter.

>
> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> vblank counters"). I've been meaning to track that down since then; one
> of these days hopefully, but if anybody has any ideas offhand...
>
>

I spent the last few hours reading through the drm and radeon code and i 
think what should probably work is to replace the 
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on 
calls. These are apparently meant for drivers whose hw counters reset 
during modeset, and seem to reinitialize stuff properly and release 
clients queued vblank events to avoid blocking - not tested so far, just 
looked at the code.

Once drm_vblank_off is called, drm_vblank_get will no-op and return an 
error, so clients can't enable vblank irqs during the modeset - pageflip 
ioctl and waitvblank ioctl would fail while a modeset happens - 
hopefully userspace handles this correctly everywhere.

It would also cause radeons power management to not sync its actions to 
vblank if it would get invoked during a modeset, but that seems to be 
handled by a 200 msec timeout and hopefully only cause visual glitches - 
or invisible glitches while the crtc is blanked during modeset?

There could be another tiny race with the new "vblank counter bumping" 
logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts 
...") if drm_update_vblank_counter() would be called multiple times in 
quick succession within the "radeon_crtc->lb_vblank_lead_lines" 
scanlines before start of real vblank iff at the same time a modeset 
would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller 
value due to a change in horizontal mode resolution. That needs a 
modeset to happen to a higher horizontal resolution just exactly when 
the scanout is in exactly the right 5 or so scanlines and some client is 
calling drm_vblank_get() to enable vblank irqs at the same time, but it 
would cause the same hang if it happened - not that likely to happen 
often, but still not nice, also Murphy's law... If we could switch to 
drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove 
those race as well by forbidding any vblank irq related activity during 
a modeset.

I'll hack up a patch for demonstration now.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21  5:31             ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-21  5:31 UTC (permalink / raw)
  To: Michel Dänzer, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> On 21.01.2016 05:32, Mario Kleiner wrote:
>>
>> So the problem is that AMDs hardware frame counters reset to
>> zero during a modeset. The old DRM code dealt with drivers doing that by
>> keeping vblank irqs enabled during modesets and incrementing vblank
>> count by one during each vblank irq, i think that's what
>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>
> Right, looks like there's been a regression breaking this. I suspect the
> problem is that vblank->last isn't getting updated from
> drm_vblank_post_modeset. Not sure which change broke that though, or how
> to fix it. Ville?
>

The whole logic has changed and the software counter updates are now 
driven all the time by the hw counter.

>
> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> vblank counters"). I've been meaning to track that down since then; one
> of these days hopefully, but if anybody has any ideas offhand...
>
>

I spent the last few hours reading through the drm and radeon code and i 
think what should probably work is to replace the 
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on 
calls. These are apparently meant for drivers whose hw counters reset 
during modeset, and seem to reinitialize stuff properly and release 
clients queued vblank events to avoid blocking - not tested so far, just 
looked at the code.

Once drm_vblank_off is called, drm_vblank_get will no-op and return an 
error, so clients can't enable vblank irqs during the modeset - pageflip 
ioctl and waitvblank ioctl would fail while a modeset happens - 
hopefully userspace handles this correctly everywhere.

It would also cause radeons power management to not sync its actions to 
vblank if it would get invoked during a modeset, but that seems to be 
handled by a 200 msec timeout and hopefully only cause visual glitches - 
or invisible glitches while the crtc is blanked during modeset?

There could be another tiny race with the new "vblank counter bumping" 
logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts 
...") if drm_update_vblank_counter() would be called multiple times in 
quick succession within the "radeon_crtc->lb_vblank_lead_lines" 
scanlines before start of real vblank iff at the same time a modeset 
would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller 
value due to a change in horizontal mode resolution. That needs a 
modeset to happen to a higher horizontal resolution just exactly when 
the scanout is in exactly the right 5 or so scanlines and some client is 
calling drm_vblank_get() to enable vblank irqs at the same time, but it 
would cause the same hang if it happened - not that likely to happen 
often, but still not nice, also Murphy's law... If we could switch to 
drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove 
those race as well by forbidding any vblank irq related activity during 
a modeset.

I'll hack up a patch for demonstration now.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21  5:31             ` Mario Kleiner
@ 2016-01-21  6:38               ` Michel Dänzer
  -1 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-21  6:38 UTC (permalink / raw)
  To: Mario Kleiner, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 21.01.2016 14:31, Mario Kleiner wrote:
> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>
>>> So the problem is that AMDs hardware frame counters reset to
>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>> count by one during each vblank irq, i think that's what
>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>
>> Right, looks like there's been a regression breaking this. I suspect the
>> problem is that vblank->last isn't getting updated from
>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>> to fix it. Ville?
>>
> 
> The whole logic has changed and the software counter updates are now
> driven all the time by the hw counter.
> 
>>
>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>> vblank counters"). I've been meaning to track that down since then; one
>> of these days hopefully, but if anybody has any ideas offhand...
> 
> I spent the last few hours reading through the drm and radeon code and i
> think what should probably work is to replace the
> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> calls. These are apparently meant for drivers whose hw counters reset
> during modeset, [...]

... just like drm_vblank_pre/post_modeset. That those were broken is a
regression which needs to be fixed anyway. I don't think switching to
drm_vblank_on/off is suitable for stable trees.

Looking at Vlastimil's original post again, I'd say the most likely
culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
vblanks were missed").


> Once drm_vblank_off is called, drm_vblank_get will no-op and return an
> error, so clients can't enable vblank irqs during the modeset - pageflip
> ioctl and waitvblank ioctl would fail while a modeset happens -
> hopefully userspace handles this correctly everywhere.

We've fixed xf86-video-ati for this.


> I'll hack up a patch for demonstration now.

You're a bit late to that party. :)

http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21  6:38               ` Michel Dänzer
  0 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-21  6:38 UTC (permalink / raw)
  To: Mario Kleiner, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 21.01.2016 14:31, Mario Kleiner wrote:
> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>
>>> So the problem is that AMDs hardware frame counters reset to
>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>> count by one during each vblank irq, i think that's what
>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>
>> Right, looks like there's been a regression breaking this. I suspect the
>> problem is that vblank->last isn't getting updated from
>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>> to fix it. Ville?
>>
> 
> The whole logic has changed and the software counter updates are now
> driven all the time by the hw counter.
> 
>>
>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>> vblank counters"). I've been meaning to track that down since then; one
>> of these days hopefully, but if anybody has any ideas offhand...
> 
> I spent the last few hours reading through the drm and radeon code and i
> think what should probably work is to replace the
> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> calls. These are apparently meant for drivers whose hw counters reset
> during modeset, [...]

... just like drm_vblank_pre/post_modeset. That those were broken is a
regression which needs to be fixed anyway. I don't think switching to
drm_vblank_on/off is suitable for stable trees.

Looking at Vlastimil's original post again, I'd say the most likely
culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
vblanks were missed").


> Once drm_vblank_off is called, drm_vblank_get will no-op and return an
> error, so clients can't enable vblank irqs during the modeset - pageflip
> ioctl and waitvblank ioctl would fail while a modeset happens -
> hopefully userspace handles this correctly everywhere.

We've fixed xf86-video-ati for this.


> I'll hack up a patch for demonstration now.

You're a bit late to that party. :)

http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21  6:38               ` Michel Dänzer
@ 2016-01-21  6:41                 ` Michel Dänzer
  -1 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-21  6:41 UTC (permalink / raw)
  To: Mario Kleiner, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 21.01.2016 15:38, Michel Dänzer wrote:
> On 21.01.2016 14:31, Mario Kleiner wrote:
>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>>
>>>> So the problem is that AMDs hardware frame counters reset to
>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>>> count by one during each vblank irq, i think that's what
>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>>
>>> Right, looks like there's been a regression breaking this. I suspect the
>>> problem is that vblank->last isn't getting updated from
>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>>> to fix it. Ville?
>>>
>>
>> The whole logic has changed and the software counter updates are now
>> driven all the time by the hw counter.
>>
>>>
>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>>> vblank counters"). I've been meaning to track that down since then; one
>>> of these days hopefully, but if anybody has any ideas offhand...
>>
>> I spent the last few hours reading through the drm and radeon code and i
>> think what should probably work is to replace the
>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
>> calls. These are apparently meant for drivers whose hw counters reset
>> during modeset, [...]
> 
> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> regression which needs to be fixed anyway. I don't think switching to
> drm_vblank_on/off is suitable for stable trees.

Even more so since as I mentioned, there is (has been since at least
about half a year ago) a counter jumping bug with drm_vblank_on/off as well.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21  6:41                 ` Michel Dänzer
  0 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-21  6:41 UTC (permalink / raw)
  To: Mario Kleiner, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 21.01.2016 15:38, Michel Dänzer wrote:
> On 21.01.2016 14:31, Mario Kleiner wrote:
>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>>
>>>> So the problem is that AMDs hardware frame counters reset to
>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>>> count by one during each vblank irq, i think that's what
>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>>
>>> Right, looks like there's been a regression breaking this. I suspect the
>>> problem is that vblank->last isn't getting updated from
>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>>> to fix it. Ville?
>>>
>>
>> The whole logic has changed and the software counter updates are now
>> driven all the time by the hw counter.
>>
>>>
>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>>> vblank counters"). I've been meaning to track that down since then; one
>>> of these days hopefully, but if anybody has any ideas offhand...
>>
>> I spent the last few hours reading through the drm and radeon code and i
>> think what should probably work is to replace the
>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
>> calls. These are apparently meant for drivers whose hw counters reset
>> during modeset, [...]
> 
> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> regression which needs to be fixed anyway. I don't think switching to
> drm_vblank_on/off is suitable for stable trees.

Even more so since as I mentioned, there is (has been since at least
about half a year ago) a counter jumping bug with drm_vblank_on/off as well.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21  6:41                 ` Michel Dänzer
@ 2016-01-21  7:58                   ` Daniel Vetter
  -1 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-21  7:58 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Mario Kleiner, Vlastimil Babka, Ville Syrjälä,
	Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
> On 21.01.2016 15:38, Michel Dänzer wrote:
> > On 21.01.2016 14:31, Mario Kleiner wrote:
> >> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> >>> On 21.01.2016 05:32, Mario Kleiner wrote:
> >>>>
> >>>> So the problem is that AMDs hardware frame counters reset to
> >>>> zero during a modeset. The old DRM code dealt with drivers doing that by
> >>>> keeping vblank irqs enabled during modesets and incrementing vblank
> >>>> count by one during each vblank irq, i think that's what
> >>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
> >>>
> >>> Right, looks like there's been a regression breaking this. I suspect the
> >>> problem is that vblank->last isn't getting updated from
> >>> drm_vblank_post_modeset. Not sure which change broke that though, or how
> >>> to fix it. Ville?
> >>>
> >>
> >> The whole logic has changed and the software counter updates are now
> >> driven all the time by the hw counter.
> >>
> >>>
> >>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> >>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> >>> vblank counters"). I've been meaning to track that down since then; one
> >>> of these days hopefully, but if anybody has any ideas offhand...
> >>
> >> I spent the last few hours reading through the drm and radeon code and i
> >> think what should probably work is to replace the
> >> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> >> calls. These are apparently meant for drivers whose hw counters reset
> >> during modeset, [...]
> > 
> > ... just like drm_vblank_pre/post_modeset. That those were broken is a
> > regression which needs to be fixed anyway. I don't think switching to
> > drm_vblank_on/off is suitable for stable trees.
> 
> Even more so since as I mentioned, there is (has been since at least
> about half a year ago) a counter jumping bug with drm_vblank_on/off as well.

Hm, never noticed you reported that. I thought the reason for not picking
up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace
where it tried to use vblank waits on a disabled pipe?

Can you please point me at the vblank on/off jump bug please?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21  7:58                   ` Daniel Vetter
  0 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-21  7:58 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, Vlastimil Babka,
	kwin, Alex Deucher, Christian König

On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
> On 21.01.2016 15:38, Michel Dänzer wrote:
> > On 21.01.2016 14:31, Mario Kleiner wrote:
> >> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> >>> On 21.01.2016 05:32, Mario Kleiner wrote:
> >>>>
> >>>> So the problem is that AMDs hardware frame counters reset to
> >>>> zero during a modeset. The old DRM code dealt with drivers doing that by
> >>>> keeping vblank irqs enabled during modesets and incrementing vblank
> >>>> count by one during each vblank irq, i think that's what
> >>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
> >>>
> >>> Right, looks like there's been a regression breaking this. I suspect the
> >>> problem is that vblank->last isn't getting updated from
> >>> drm_vblank_post_modeset. Not sure which change broke that though, or how
> >>> to fix it. Ville?
> >>>
> >>
> >> The whole logic has changed and the software counter updates are now
> >> driven all the time by the hw counter.
> >>
> >>>
> >>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> >>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> >>> vblank counters"). I've been meaning to track that down since then; one
> >>> of these days hopefully, but if anybody has any ideas offhand...
> >>
> >> I spent the last few hours reading through the drm and radeon code and i
> >> think what should probably work is to replace the
> >> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> >> calls. These are apparently meant for drivers whose hw counters reset
> >> during modeset, [...]
> > 
> > ... just like drm_vblank_pre/post_modeset. That those were broken is a
> > regression which needs to be fixed anyway. I don't think switching to
> > drm_vblank_on/off is suitable for stable trees.
> 
> Even more so since as I mentioned, there is (has been since at least
> about half a year ago) a counter jumping bug with drm_vblank_on/off as well.

Hm, never noticed you reported that. I thought the reason for not picking
up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace
where it tried to use vblank waits on a disabled pipe?

Can you please point me at the vblank on/off jump bug please?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21  6:38               ` Michel Dänzer
@ 2016-01-21  8:28                 ` Mario Kleiner
  -1 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-21  8:28 UTC (permalink / raw)
  To: Michel Dänzer, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 01/21/2016 07:38 AM, Michel Dänzer wrote:
> On 21.01.2016 14:31, Mario Kleiner wrote:
>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>>
>>>> So the problem is that AMDs hardware frame counters reset to
>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>>> count by one during each vblank irq, i think that's what
>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>>
>>> Right, looks like there's been a regression breaking this. I suspect the
>>> problem is that vblank->last isn't getting updated from
>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>>> to fix it. Ville?
>>>
>>
>> The whole logic has changed and the software counter updates are now
>> driven all the time by the hw counter.
>>
>>>
>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>>> vblank counters"). I've been meaning to track that down since then; one
>>> of these days hopefully, but if anybody has any ideas offhand...
>>
>> I spent the last few hours reading through the drm and radeon code and i
>> think what should probably work is to replace the
>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
>> calls. These are apparently meant for drivers whose hw counters reset
>> during modeset, [...]
>
> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> regression which needs to be fixed anyway. I don't think switching to
> drm_vblank_on/off is suitable for stable trees.
>
> Looking at Vlastimil's original post again, I'd say the most likely
> culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
> vblanks were missed").
>

Yes, i think reverting that one alone would likely fix it by reverting 
to the old vblank update logic.

>
>> Once drm_vblank_off is called, drm_vblank_get will no-op and return an
>> error, so clients can't enable vblank irqs during the modeset - pageflip
>> ioctl and waitvblank ioctl would fail while a modeset happens -
>> hopefully userspace handles this correctly everywhere.
>
> We've fixed xf86-video-ati for this.
>
>
>> I'll hack up a patch for demonstration now.
>
> You're a bit late to that party. :)
>
> http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
> http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
>
>

Oops. Just sent out my little (so far untested) creations. Yes, they are 
essentially the same as Daniel's patches. The only addition is to also 
fix that other potential small race i describe by slightly moving the 
xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put 
imbalance in radeon_pm if vblank_on/off would be used.

-mario

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21  8:28                 ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-21  8:28 UTC (permalink / raw)
  To: Michel Dänzer, Vlastimil Babka, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 01/21/2016 07:38 AM, Michel Dänzer wrote:
> On 21.01.2016 14:31, Mario Kleiner wrote:
>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>>
>>>> So the problem is that AMDs hardware frame counters reset to
>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>>> count by one during each vblank irq, i think that's what
>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>>
>>> Right, looks like there's been a regression breaking this. I suspect the
>>> problem is that vblank->last isn't getting updated from
>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>>> to fix it. Ville?
>>>
>>
>> The whole logic has changed and the software counter updates are now
>> driven all the time by the hw counter.
>>
>>>
>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>>> vblank counters"). I've been meaning to track that down since then; one
>>> of these days hopefully, but if anybody has any ideas offhand...
>>
>> I spent the last few hours reading through the drm and radeon code and i
>> think what should probably work is to replace the
>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
>> calls. These are apparently meant for drivers whose hw counters reset
>> during modeset, [...]
>
> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> regression which needs to be fixed anyway. I don't think switching to
> drm_vblank_on/off is suitable for stable trees.
>
> Looking at Vlastimil's original post again, I'd say the most likely
> culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
> vblanks were missed").
>

Yes, i think reverting that one alone would likely fix it by reverting 
to the old vblank update logic.

>
>> Once drm_vblank_off is called, drm_vblank_get will no-op and return an
>> error, so clients can't enable vblank irqs during the modeset - pageflip
>> ioctl and waitvblank ioctl would fail while a modeset happens -
>> hopefully userspace handles this correctly everywhere.
>
> We've fixed xf86-video-ati for this.
>
>
>> I'll hack up a patch for demonstration now.
>
> You're a bit late to that party. :)
>
> http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
> http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
>
>

Oops. Just sent out my little (so far untested) creations. Yes, they are 
essentially the same as Daniel's patches. The only addition is to also 
fix that other potential small race i describe by slightly moving the 
xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put 
imbalance in radeon_pm if vblank_on/off would be used.

-mario

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21  7:58                   ` Daniel Vetter
@ 2016-01-21  8:36                     ` Michel Dänzer
  -1 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-21  8:36 UTC (permalink / raw)
  To: Mario Kleiner, Vlastimil Babka, Ville Syrjälä,
	LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 21.01.2016 16:58, Daniel Vetter wrote:
> On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
>> On 21.01.2016 15:38, Michel Dänzer wrote:
>>> On 21.01.2016 14:31, Mario Kleiner wrote:
>>>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>>>>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>>>>
>>>>>> So the problem is that AMDs hardware frame counters reset to
>>>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>>>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>>>>> count by one during each vblank irq, i think that's what
>>>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>>>>
>>>>> Right, looks like there's been a regression breaking this. I suspect the
>>>>> problem is that vblank->last isn't getting updated from
>>>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>>>>> to fix it. Ville?
>>>>>
>>>>
>>>> The whole logic has changed and the software counter updates are now
>>>> driven all the time by the hw counter.
>>>>
>>>>>
>>>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>>>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>>>>> vblank counters"). I've been meaning to track that down since then; one
>>>>> of these days hopefully, but if anybody has any ideas offhand...
>>>>
>>>> I spent the last few hours reading through the drm and radeon code and i
>>>> think what should probably work is to replace the
>>>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
>>>> calls. These are apparently meant for drivers whose hw counters reset
>>>> during modeset, [...]
>>>
>>> ... just like drm_vblank_pre/post_modeset. That those were broken is a
>>> regression which needs to be fixed anyway. I don't think switching to
>>> drm_vblank_on/off is suitable for stable trees.
>>
>> Even more so since as I mentioned, there is (has been since at least
>> about half a year ago) a counter jumping bug with drm_vblank_on/off as well.
> 
> Hm, never noticed you reported that. I thought the reason for not picking
> up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace
> where it tried to use vblank waits on a disabled pipe?

http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html

I don't know why it didn't get picked up.


> Can you please point me at the vblank on/off jump bug please?

AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21  8:36                     ` Michel Dänzer
  0 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-21  8:36 UTC (permalink / raw)
  To: Mario Kleiner, Vlastimil Babka, Ville Syrjälä,
	LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 21.01.2016 16:58, Daniel Vetter wrote:
> On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
>> On 21.01.2016 15:38, Michel Dänzer wrote:
>>> On 21.01.2016 14:31, Mario Kleiner wrote:
>>>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>>>>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>>>>
>>>>>> So the problem is that AMDs hardware frame counters reset to
>>>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>>>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>>>>> count by one during each vblank irq, i think that's what
>>>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>>>>
>>>>> Right, looks like there's been a regression breaking this. I suspect the
>>>>> problem is that vblank->last isn't getting updated from
>>>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>>>>> to fix it. Ville?
>>>>>
>>>>
>>>> The whole logic has changed and the software counter updates are now
>>>> driven all the time by the hw counter.
>>>>
>>>>>
>>>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>>>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>>>>> vblank counters"). I've been meaning to track that down since then; one
>>>>> of these days hopefully, but if anybody has any ideas offhand...
>>>>
>>>> I spent the last few hours reading through the drm and radeon code and i
>>>> think what should probably work is to replace the
>>>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
>>>> calls. These are apparently meant for drivers whose hw counters reset
>>>> during modeset, [...]
>>>
>>> ... just like drm_vblank_pre/post_modeset. That those were broken is a
>>> regression which needs to be fixed anyway. I don't think switching to
>>> drm_vblank_on/off is suitable for stable trees.
>>
>> Even more so since as I mentioned, there is (has been since at least
>> about half a year ago) a counter jumping bug with drm_vblank_on/off as well.
> 
> Hm, never noticed you reported that. I thought the reason for not picking
> up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace
> where it tried to use vblank waits on a disabled pipe?

http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html

I don't know why it didn't get picked up.


> Can you please point me at the vblank on/off jump bug please?

AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21  8:28                 ` Mario Kleiner
@ 2016-01-21  9:15                   ` Vlastimil Babka
  -1 siblings, 0 replies; 59+ messages in thread
From: Vlastimil Babka @ 2016-01-21  9:15 UTC (permalink / raw)
  To: Mario Kleiner, Michel Dänzer, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 01/21/2016 09:28 AM, Mario Kleiner wrote:
>> ... just like drm_vblank_pre/post_modeset. That those were broken is a
>> regression which needs to be fixed anyway. I don't think switching to
>> drm_vblank_on/off is suitable for stable trees.
>>
>> Looking at Vlastimil's original post again, I'd say the most likely
>> culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
>> vblanks were missed").
>>

Yeah, this is what I bisected to.

> Yes, i think reverting that one alone would likely fix it by reverting
> to the old vblank update logic.

Yep I said in the original mail that reverting on top of 4.4 fixed it. 
Well not just this single commit, but also some patches on top (e.g. 
radeon and amdgpu adaptations to that commit, IIRC it wouldn't have 
compiled otherwise).

>>
>>> Once drm_vblank_off is called, drm_vblank_get will no-op and return an
>>> error, so clients can't enable vblank irqs during the modeset - pageflip
>>> ioctl and waitvblank ioctl would fail while a modeset happens -
>>> hopefully userspace handles this correctly everywhere.
>>
>> We've fixed xf86-video-ati for this.
>>
>>
>>> I'll hack up a patch for demonstration now.
>>
>> You're a bit late to that party. :)
>>
>> http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
>> http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
>>
>>
>
> Oops. Just sent out my little (so far untested) creations. Yes, they are
> essentially the same as Daniel's patches. The only addition is to also
> fix that other potential small race i describe by slightly moving the
> xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put
> imbalance in radeon_pm if vblank_on/off would be used.

Thanks, I'll test.

>
> -mario
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21  9:15                   ` Vlastimil Babka
  0 siblings, 0 replies; 59+ messages in thread
From: Vlastimil Babka @ 2016-01-21  9:15 UTC (permalink / raw)
  To: Mario Kleiner, Michel Dänzer, Ville Syrjälä
  Cc: Daniel Vetter, LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On 01/21/2016 09:28 AM, Mario Kleiner wrote:
>> ... just like drm_vblank_pre/post_modeset. That those were broken is a
>> regression which needs to be fixed anyway. I don't think switching to
>> drm_vblank_on/off is suitable for stable trees.
>>
>> Looking at Vlastimil's original post again, I'd say the most likely
>> culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
>> vblanks were missed").
>>

Yeah, this is what I bisected to.

> Yes, i think reverting that one alone would likely fix it by reverting
> to the old vblank update logic.

Yep I said in the original mail that reverting on top of 4.4 fixed it. 
Well not just this single commit, but also some patches on top (e.g. 
radeon and amdgpu adaptations to that commit, IIRC it wouldn't have 
compiled otherwise).

>>
>>> Once drm_vblank_off is called, drm_vblank_get will no-op and return an
>>> error, so clients can't enable vblank irqs during the modeset - pageflip
>>> ioctl and waitvblank ioctl would fail while a modeset happens -
>>> hopefully userspace handles this correctly everywhere.
>>
>> We've fixed xf86-video-ati for this.
>>
>>
>>> I'll hack up a patch for demonstration now.
>>
>> You're a bit late to that party. :)
>>
>> http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
>> http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
>>
>>
>
> Oops. Just sent out my little (so far untested) creations. Yes, they are
> essentially the same as Daniel's patches. The only addition is to also
> fix that other potential small race i describe by slightly moving the
> xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put
> imbalance in radeon_pm if vblank_on/off would be used.

Thanks, I'll test.

>
> -mario
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21  8:36                     ` Michel Dänzer
@ 2016-01-21 10:09                       ` Daniel Vetter
  -1 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-21 10:09 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Mario Kleiner, Vlastimil Babka, Ville Syrjälä,
	LKML, dri-devel, mgraesslin, kwin, Alex Deucher,
	Christian König

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> On 21.01.2016 16:58, Daniel Vetter wrote:
> > On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
> >> On 21.01.2016 15:38, Michel Dänzer wrote:
> >>> On 21.01.2016 14:31, Mario Kleiner wrote:
> >>>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> >>>>> On 21.01.2016 05:32, Mario Kleiner wrote:
> >>>>>>
> >>>>>> So the problem is that AMDs hardware frame counters reset to
> >>>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
> >>>>>> keeping vblank irqs enabled during modesets and incrementing vblank
> >>>>>> count by one during each vblank irq, i think that's what
> >>>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
> >>>>>
> >>>>> Right, looks like there's been a regression breaking this. I suspect the
> >>>>> problem is that vblank->last isn't getting updated from
> >>>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
> >>>>> to fix it. Ville?
> >>>>>
> >>>>
> >>>> The whole logic has changed and the software counter updates are now
> >>>> driven all the time by the hw counter.
> >>>>
> >>>>>
> >>>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> >>>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> >>>>> vblank counters"). I've been meaning to track that down since then; one
> >>>>> of these days hopefully, but if anybody has any ideas offhand...
> >>>>
> >>>> I spent the last few hours reading through the drm and radeon code and i
> >>>> think what should probably work is to replace the
> >>>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> >>>> calls. These are apparently meant for drivers whose hw counters reset
> >>>> during modeset, [...]
> >>>
> >>> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> >>> regression which needs to be fixed anyway. I don't think switching to
> >>> drm_vblank_on/off is suitable for stable trees.
> >>
> >> Even more so since as I mentioned, there is (has been since at least
> >> about half a year ago) a counter jumping bug with drm_vblank_on/off as well.
> > 
> > Hm, never noticed you reported that. I thought the reason for not picking
> > up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace
> > where it tried to use vblank waits on a disabled pipe?
> 
> http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
> 
> I don't know why it didn't get picked up.

Yeah, checking my tree your ack is indeed in there. I think I'll resend
them.

> > Can you please point me at the vblank on/off jump bug please?
> 
> AFAIR I originally reported it in response to
> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> , but I can't find that in the archives, so maybe that was just on IRC.
> See
> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> . Basically, I ran into the bug fixed by your patch because the counter
> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> just a few days.

Ok, so just uncovered the overflow bug.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-21 10:09                       ` Daniel Vetter
  0 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-21 10:09 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: kwin, LKML, dri-devel, mgraesslin, Vlastimil Babka, Alex Deucher,
	Christian König

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> On 21.01.2016 16:58, Daniel Vetter wrote:
> > On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
> >> On 21.01.2016 15:38, Michel Dänzer wrote:
> >>> On 21.01.2016 14:31, Mario Kleiner wrote:
> >>>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> >>>>> On 21.01.2016 05:32, Mario Kleiner wrote:
> >>>>>>
> >>>>>> So the problem is that AMDs hardware frame counters reset to
> >>>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
> >>>>>> keeping vblank irqs enabled during modesets and incrementing vblank
> >>>>>> count by one during each vblank irq, i think that's what
> >>>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
> >>>>>
> >>>>> Right, looks like there's been a regression breaking this. I suspect the
> >>>>> problem is that vblank->last isn't getting updated from
> >>>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
> >>>>> to fix it. Ville?
> >>>>>
> >>>>
> >>>> The whole logic has changed and the software counter updates are now
> >>>> driven all the time by the hw counter.
> >>>>
> >>>>>
> >>>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> >>>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> >>>>> vblank counters"). I've been meaning to track that down since then; one
> >>>>> of these days hopefully, but if anybody has any ideas offhand...
> >>>>
> >>>> I spent the last few hours reading through the drm and radeon code and i
> >>>> think what should probably work is to replace the
> >>>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> >>>> calls. These are apparently meant for drivers whose hw counters reset
> >>>> during modeset, [...]
> >>>
> >>> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> >>> regression which needs to be fixed anyway. I don't think switching to
> >>> drm_vblank_on/off is suitable for stable trees.
> >>
> >> Even more so since as I mentioned, there is (has been since at least
> >> about half a year ago) a counter jumping bug with drm_vblank_on/off as well.
> > 
> > Hm, never noticed you reported that. I thought the reason for not picking
> > up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace
> > where it tried to use vblank waits on a disabled pipe?
> 
> http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
> 
> I don't know why it didn't get picked up.

Yeah, checking my tree your ack is indeed in there. I think I'll resend
them.

> > Can you please point me at the vblank on/off jump bug please?
> 
> AFAIR I originally reported it in response to
> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> , but I can't find that in the archives, so maybe that was just on IRC.
> See
> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> . Basically, I ran into the bug fixed by your patch because the counter
> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> just a few days.

Ok, so just uncovered the overflow bug.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-21 10:09                       ` Daniel Vetter
@ 2016-01-22  3:06                         ` Michel Dänzer
  -1 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-22  3:06 UTC (permalink / raw)
  To: Mario Kleiner, Ville Syrjälä
  Cc: Vlastimil Babka, LKML, dri-devel, Alex Deucher, Christian König


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:
> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>> 
>>> Can you please point me at the vblank on/off jump bug please?
>>
>> AFAIR I originally reported it in response to
>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>> , but I can't find that in the archives, so maybe that was just on IRC.
>> See
>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>> . Basically, I ran into the bug fixed by your patch because the counter
>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>> just a few days.
> 
> Ok, so just uncovered the overflow bug.

Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1

I suspect this may not be evident with current Intel hardware because
dev->max_vblank_count = 0xffffffff, which makes the wraparound code in
drm_update_vblank_count a no-op. Maybe you can reproduce it if you
artificially set a lower max_vblank_count in the driver.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-22  3:06                         ` Michel Dänzer
  0 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-22  3:06 UTC (permalink / raw)
  To: Mario Kleiner, Ville Syrjälä
  Cc: Alex Deucher, dri-devel, LKML, Vlastimil Babka, Christian König


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:
> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>> 
>>> Can you please point me at the vblank on/off jump bug please?
>>
>> AFAIR I originally reported it in response to
>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>> , but I can't find that in the archives, so maybe that was just on IRC.
>> See
>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>> . Basically, I ran into the bug fixed by your patch because the counter
>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>> just a few days.
> 
> Ok, so just uncovered the overflow bug.

Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1

I suspect this may not be evident with current Intel hardware because
dev->max_vblank_count = 0xffffffff, which makes the wraparound code in
drm_update_vblank_count a no-op. Maybe you can reproduce it if you
artificially set a lower max_vblank_count in the driver.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-22  3:06                         ` Michel Dänzer
@ 2016-01-22 15:18                           ` Ville Syrjälä
  -1 siblings, 0 replies; 59+ messages in thread
From: Ville Syrjälä @ 2016-01-22 15:18 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Mario Kleiner, Vlastimil Babka, LKML, dri-devel, Alex Deucher,
	Christian König

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> 
> [ Trimming KDE folks from Cc ]
> 
> On 21.01.2016 19:09, Daniel Vetter wrote:
> > On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >> On 21.01.2016 16:58, Daniel Vetter wrote:
> >>> 
> >>> Can you please point me at the vblank on/off jump bug please?
> >>
> >> AFAIR I originally reported it in response to
> >> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >> , but I can't find that in the archives, so maybe that was just on IRC.
> >> See
> >> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >> . Basically, I ran into the bug fixed by your patch because the counter
> >> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >> just a few days.
> > 
> > Ok, so just uncovered the overflow bug.
> 
> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> counter jumping bug (similar to the bug this thread is about), which
> exposed the overflow bug, is still alive and kicking in 4.5. It seems
> to happen when turning off the CRTC:
> 
> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916

Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.

> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1

Same here.

These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.

> dev->max_vblank_count = 0xffffffff, which makes the wraparound code in
> drm_update_vblank_count a no-op. Maybe you can reproduce it if you
> artificially set a lower max_vblank_count in the driver.
> 
> 
> -- 
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-22 15:18                           ` Ville Syrjälä
  0 siblings, 0 replies; 59+ messages in thread
From: Ville Syrjälä @ 2016-01-22 15:18 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: LKML, dri-devel, Alex Deucher, Christian König, Vlastimil Babka

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> 
> [ Trimming KDE folks from Cc ]
> 
> On 21.01.2016 19:09, Daniel Vetter wrote:
> > On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >> On 21.01.2016 16:58, Daniel Vetter wrote:
> >>> 
> >>> Can you please point me at the vblank on/off jump bug please?
> >>
> >> AFAIR I originally reported it in response to
> >> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >> , but I can't find that in the archives, so maybe that was just on IRC.
> >> See
> >> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >> . Basically, I ran into the bug fixed by your patch because the counter
> >> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >> just a few days.
> > 
> > Ok, so just uncovered the overflow bug.
> 
> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> counter jumping bug (similar to the bug this thread is about), which
> exposed the overflow bug, is still alive and kicking in 4.5. It seems
> to happen when turning off the CRTC:
> 
> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916

Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.

> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1

Same here.

These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.

> dev->max_vblank_count = 0xffffffff, which makes the wraparound code in
> drm_update_vblank_count a no-op. Maybe you can reproduce it if you
> artificially set a lower max_vblank_count in the driver.
> 
> 
> -- 
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-22 15:18                           ` Ville Syrjälä
@ 2016-01-22 18:29                             ` Mario Kleiner
  -1 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-22 18:29 UTC (permalink / raw)
  To: Ville Syrjälä, Michel Dänzer
  Cc: Vlastimil Babka, LKML, dri-devel, Alex Deucher, Christian König



On 01/22/2016 04:18 PM, Ville Syrjälä wrote:
> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>
>> [ Trimming KDE folks from Cc ]
>>
>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>
>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>
>>>> AFAIR I originally reported it in response to
>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>> See
>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>> just a few days.
>>>
>>> Ok, so just uncovered the overflow bug.
>>
>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>> counter jumping bug (similar to the bug this thread is about), which
>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>> to happen when turning off the CRTC:
>>
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>
> Not sure what bug we're talking about here, but here the hw counter
> clearly jumps backwards.
>
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>
> Same here.
>
> These things just don't happen on i915 because drm_vblank_off() and
> drm_vblank_on() are always called around the times when the hw counter
> might get reset. Or at least that's how it should be.
>

Fwiw, testing the HD-57570 single display with my patch that uses 
drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show 
hardware counter reset to zero as expected, but no jumps of software 
vblank counter. So with that vblank_off/on placement it seems to work 
nicely here.

-mario

>> dev->max_vblank_count = 0xffffffff, which makes the wraparound code in
>> drm_update_vblank_count a no-op. Maybe you can reproduce it if you
>> artificially set a lower max_vblank_count in the driver.
>>
>>
>> --
>> Earthling Michel Dänzer               |               http://www.amd.com
>> Libre software enthusiast             |             Mesa and X developer
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-22 18:29                             ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-22 18:29 UTC (permalink / raw)
  To: Ville Syrjälä, Michel Dänzer
  Cc: Alex Deucher, dri-devel, LKML, Vlastimil Babka, Christian König



On 01/22/2016 04:18 PM, Ville Syrjälä wrote:
> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>
>> [ Trimming KDE folks from Cc ]
>>
>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>
>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>
>>>> AFAIR I originally reported it in response to
>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>> See
>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>> just a few days.
>>>
>>> Ok, so just uncovered the overflow bug.
>>
>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>> counter jumping bug (similar to the bug this thread is about), which
>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>> to happen when turning off the CRTC:
>>
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>
> Not sure what bug we're talking about here, but here the hw counter
> clearly jumps backwards.
>
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>
> Same here.
>
> These things just don't happen on i915 because drm_vblank_off() and
> drm_vblank_on() are always called around the times when the hw counter
> might get reset. Or at least that's how it should be.
>

Fwiw, testing the HD-57570 single display with my patch that uses 
drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show 
hardware counter reset to zero as expected, but no jumps of software 
vblank counter. So with that vblank_off/on placement it seems to work 
nicely here.

-mario

>> dev->max_vblank_count = 0xffffffff, which makes the wraparound code in
>> drm_update_vblank_count a no-op. Maybe you can reproduce it if you
>> artificially set a lower max_vblank_count in the driver.
>>
>>
>> --
>> Earthling Michel Dänzer               |               http://www.amd.com
>> Libre software enthusiast             |             Mesa and X developer
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-22 18:29                             ` Mario Kleiner
@ 2016-01-23 18:23                               ` Mario Kleiner
  -1 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-23 18:23 UTC (permalink / raw)
  To: Ville Syrjälä, Michel Dänzer
  Cc: Vlastimil Babka, LKML, dri-devel, Alex Deucher, Christian König

On 01/22/2016 07:29 PM, Mario Kleiner wrote:
>
>
> On 01/22/2016 04:18 PM, Ville Syrjälä wrote:
>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>
>>> [ Trimming KDE folks from Cc ]
>>>
>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>
>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>
>>>>> AFAIR I originally reported it in response to
>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>
>>>>> , but I can't find that in the archives, so maybe that was just on
>>>>> IRC.
>>>>> See
>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>
>>>>> . Basically, I ran into the bug fixed by your patch because the
>>>>> counter
>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>> just a few days.
>>>>
>>>> Ok, so just uncovered the overflow bug.
>>>
>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>> counter jumping bug (similar to the bug this thread is about), which
>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>> to happen when turning off the CRTC:
>>>
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=218104694, diff=0, hw=916 hw_last=916
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7
>>> p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=218104694, diff=16776301, hw=1 hw_last=916
>>
>> Not sure what bug we're talking about here, but here the hw counter
>> clearly jumps backwards.
>>
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@
>>> 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=234880995, diff=16777215, hw=0 hw_last=1
>>
>> Same here.
>>
>> These things just don't happen on i915 because drm_vblank_off() and
>> drm_vblank_on() are always called around the times when the hw counter
>> might get reset. Or at least that's how it should be.
>>
>
> Fwiw, testing the HD-57570 single display with my patch that uses
> drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show
> hardware counter reset to zero as expected, but no jumps of software
> vblank counter. So with that vblank_off/on placement it seems to work
> nicely here.
>
> -mario
>

I spoke too early. The jump doesn't happen when i change video modes - 
video resolution / refresh rate etc, despite hw counter reset. But if i 
just disable and then reenable a display, the software counter jumps.

-mario

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-23 18:23                               ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-23 18:23 UTC (permalink / raw)
  To: Ville Syrjälä, Michel Dänzer
  Cc: Alex Deucher, dri-devel, LKML, Vlastimil Babka, Christian König

On 01/22/2016 07:29 PM, Mario Kleiner wrote:
>
>
> On 01/22/2016 04:18 PM, Ville Syrjälä wrote:
>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>
>>> [ Trimming KDE folks from Cc ]
>>>
>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>
>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>
>>>>> AFAIR I originally reported it in response to
>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>
>>>>> , but I can't find that in the archives, so maybe that was just on
>>>>> IRC.
>>>>> See
>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>
>>>>> . Basically, I ran into the bug fixed by your patch because the
>>>>> counter
>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>> just a few days.
>>>>
>>>> Ok, so just uncovered the overflow bug.
>>>
>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>> counter jumping bug (similar to the bug this thread is about), which
>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>> to happen when turning off the CRTC:
>>>
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=218104694, diff=0, hw=916 hw_last=916
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7
>>> p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=218104694, diff=16776301, hw=1 hw_last=916
>>
>> Not sure what bug we're talking about here, but here the hw counter
>> clearly jumps backwards.
>>
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@
>>> 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=234880995, diff=16777215, hw=0 hw_last=1
>>
>> Same here.
>>
>> These things just don't happen on i915 because drm_vblank_off() and
>> drm_vblank_on() are always called around the times when the hw counter
>> might get reset. Or at least that's how it should be.
>>
>
> Fwiw, testing the HD-57570 single display with my patch that uses
> drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show
> hardware counter reset to zero as expected, but no jumps of software
> vblank counter. So with that vblank_off/on placement it seems to work
> nicely here.
>
> -mario
>

I spoke too early. The jump doesn't happen when i change video modes - 
video resolution / refresh rate etc, despite hw counter reset. But if i 
just disable and then reenable a display, the software counter jumps.

-mario
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-22 15:18                           ` Ville Syrjälä
@ 2016-01-25  4:15                             ` Michel Dänzer
  -1 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-25  4:15 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: LKML, dri-devel, Alex Deucher, Christian König, Vlastimil Babka

On 23.01.2016 00:18, Ville Syrjälä wrote:
> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>
>> [ Trimming KDE folks from Cc ]
>>
>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>
>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>
>>>> AFAIR I originally reported it in response to
>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>> See
>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>> just a few days.
>>>
>>> Ok, so just uncovered the overflow bug.
>>
>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>> counter jumping bug (similar to the bug this thread is about), which
>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>> to happen when turning off the CRTC:
>>
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> 
> Not sure what bug we're talking about here, but here the hw counter
> clearly jumps backwards.
> 
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> 
> Same here.

At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.


> These things just don't happen on i915 because drm_vblank_off() and
> drm_vblank_on() are always called around the times when the hw counter
> might get reset. Or at least that's how it should be.

Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25  4:15                             ` Michel Dänzer
  0 siblings, 0 replies; 59+ messages in thread
From: Michel Dänzer @ 2016-01-25  4:15 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Alex Deucher, Vlastimil Babka, LKML, dri-devel, Christian König

On 23.01.2016 00:18, Ville Syrjälä wrote:
> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>
>> [ Trimming KDE folks from Cc ]
>>
>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>
>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>
>>>> AFAIR I originally reported it in response to
>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>> See
>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>> just a few days.
>>>
>>> Ok, so just uncovered the overflow bug.
>>
>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>> counter jumping bug (similar to the bug this thread is about), which
>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>> to happen when turning off the CRTC:
>>
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> 
> Not sure what bug we're talking about here, but here the hw counter
> clearly jumps backwards.
> 
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> 
> Same here.

At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.


> These things just don't happen on i915 because drm_vblank_off() and
> drm_vblank_on() are always called around the times when the hw counter
> might get reset. Or at least that's how it should be.

Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25  4:15                             ` Michel Dänzer
@ 2016-01-25 13:16                               ` Mario Kleiner
  -1 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 13:16 UTC (permalink / raw)
  To: Michel Dänzer, Ville Syrjälä
  Cc: Alex Deucher, Vlastimil Babka, LKML, dri-devel, Christian König



On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> On 23.01.2016 00:18, Ville Syrjälä wrote:
>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>
>>> [ Trimming KDE folks from Cc ]
>>>
>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>
>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>
>>>>> AFAIR I originally reported it in response to
>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>> See
>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>> just a few days.
>>>>
>>>> Ok, so just uncovered the overflow bug.
>>>
>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>> counter jumping bug (similar to the bug this thread is about), which
>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>> to happen when turning off the CRTC:
>>>
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>
>> Not sure what bug we're talking about here, but here the hw counter
>> clearly jumps backwards.
>>
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>
>> Same here.
>
> At least one of the jumps is expected, because this is around turning
> off the CRTC for DPMS off. Don't know yet why there are two jumps back
> though.
>
>
>> These things just don't happen on i915 because drm_vblank_off() and
>> drm_vblank_on() are always called around the times when the hw counter
>> might get reset. Or at least that's how it should be.
>
> Which is of course the idea of Daniel's patch (which is what I'm getting
> the above with) or Mario's patch as well, but clearly something's still
> wrong. It's certainly possible that it's something in the driver, but
> since calling drm_vblank_pre/post_modeset from the same places seems to
> work fine (ignoring the regression discussed in this thread)... Do
> drm_vblank_on/off require something else to handle this correctly?
>
>

I suspect it is because vblank_disable_and_save calls 
drm_update_vblank_count() unconditionally, even if vblank irqs are 
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> 
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes 
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> 
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> 
drm_vblank_off -> vblank_disable_and_save -> A pointless 
drm_update_vblank_count() while the hw counter is already reset to zero 
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video 
resolution/refresh rate, as then we only have one call into 
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call 
drm_update_vblank_count() if vblank irqs get actually disabled, not on 
no-op calls. I will try that now.

Otherwise kms drivers would have to be careful to never call 
drm_vblank_off multiple times before calling drm_vblank_on, but the help 
text to drm_vblank_on() claims that unbalanced calls to these functions 
are perfectly fine.

-mario

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25 13:16                               ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 13:16 UTC (permalink / raw)
  To: Michel Dänzer, Ville Syrjälä
  Cc: Alex Deucher, dri-devel, LKML, Vlastimil Babka, Christian König



On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> On 23.01.2016 00:18, Ville Syrjälä wrote:
>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>
>>> [ Trimming KDE folks from Cc ]
>>>
>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>
>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>
>>>>> AFAIR I originally reported it in response to
>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>> See
>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>> just a few days.
>>>>
>>>> Ok, so just uncovered the overflow bug.
>>>
>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>> counter jumping bug (similar to the bug this thread is about), which
>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>> to happen when turning off the CRTC:
>>>
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>
>> Not sure what bug we're talking about here, but here the hw counter
>> clearly jumps backwards.
>>
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>
>> Same here.
>
> At least one of the jumps is expected, because this is around turning
> off the CRTC for DPMS off. Don't know yet why there are two jumps back
> though.
>
>
>> These things just don't happen on i915 because drm_vblank_off() and
>> drm_vblank_on() are always called around the times when the hw counter
>> might get reset. Or at least that's how it should be.
>
> Which is of course the idea of Daniel's patch (which is what I'm getting
> the above with) or Mario's patch as well, but clearly something's still
> wrong. It's certainly possible that it's something in the driver, but
> since calling drm_vblank_pre/post_modeset from the same places seems to
> work fine (ignoring the regression discussed in this thread)... Do
> drm_vblank_on/off require something else to handle this correctly?
>
>

I suspect it is because vblank_disable_and_save calls 
drm_update_vblank_count() unconditionally, even if vblank irqs are 
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> 
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes 
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> 
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> 
drm_vblank_off -> vblank_disable_and_save -> A pointless 
drm_update_vblank_count() while the hw counter is already reset to zero 
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video 
resolution/refresh rate, as then we only have one call into 
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call 
drm_update_vblank_count() if vblank irqs get actually disabled, not on 
no-op calls. I will try that now.

Otherwise kms drivers would have to be careful to never call 
drm_vblank_off multiple times before calling drm_vblank_on, but the help 
text to drm_vblank_on() claims that unbalanced calls to these functions 
are perfectly fine.

-mario








_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 13:16                               ` Mario Kleiner
  (?)
@ 2016-01-25 13:23                               ` Ville Syrjälä
  2016-01-25 13:44                                   ` Mario Kleiner
  -1 siblings, 1 reply; 59+ messages in thread
From: Ville Syrjälä @ 2016-01-25 13:23 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
> 
> 
> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> > On 23.01.2016 00:18, Ville Syrjälä wrote:
> >> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> >>>
> >>> [ Trimming KDE folks from Cc ]
> >>>
> >>> On 21.01.2016 19:09, Daniel Vetter wrote:
> >>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
> >>>>>>
> >>>>>> Can you please point me at the vblank on/off jump bug please?
> >>>>>
> >>>>> AFAIR I originally reported it in response to
> >>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >>>>> , but I can't find that in the archives, so maybe that was just on IRC.
> >>>>> See
> >>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >>>>> . Basically, I ran into the bug fixed by your patch because the counter
> >>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >>>>> just a few days.
> >>>>
> >>>> Ok, so just uncovered the overflow bug.
> >>>
> >>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> >>> counter jumping bug (similar to the bug this thread is about), which
> >>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
> >>> to happen when turning off the CRTC:
> >>>
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> >>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> >>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> >>
> >> Not sure what bug we're talking about here, but here the hw counter
> >> clearly jumps backwards.
> >>
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> >>
> >> Same here.
> >
> > At least one of the jumps is expected, because this is around turning
> > off the CRTC for DPMS off. Don't know yet why there are two jumps back
> > though.
> >
> >
> >> These things just don't happen on i915 because drm_vblank_off() and
> >> drm_vblank_on() are always called around the times when the hw counter
> >> might get reset. Or at least that's how it should be.
> >
> > Which is of course the idea of Daniel's patch (which is what I'm getting
> > the above with) or Mario's patch as well, but clearly something's still
> > wrong. It's certainly possible that it's something in the driver, but
> > since calling drm_vblank_pre/post_modeset from the same places seems to
> > work fine (ignoring the regression discussed in this thread)... Do
> > drm_vblank_on/off require something else to handle this correctly?
> >
> >
> 
> I suspect it is because vblank_disable_and_save calls 
> drm_update_vblank_count() unconditionally, even if vblank irqs are 
> already off.
> 
> So on a manual display disable -> reenable you get something like
> 
> At disable:
> 
> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> 
> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes 
> final count.
> 
> Then the crtc is shut down and its hw counter resets to zero.
> 
> At reenable:
> 
> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> 
> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> 
> drm_vblank_off -> vblank_disable_and_save -> A pointless 
> drm_update_vblank_count() while the hw counter is already reset to zero 
> --> Unwanted counter jump.
> 
> 
> The problem doesn't happen on a pure modeset to a different video 
> resolution/refresh rate, as then we only have one call into 
> atombios_crtc_dpms(DPMS_OFF).
> 
> I think the fix is to fix vblank_disable_and_save() to only call 
> drm_update_vblank_count() if vblank irqs get actually disabled, not on 
> no-op calls. I will try that now.

It does that on purpose. Otherwise the vblank counter would appear to
have stalled while the interrupt was off.

> 
> Otherwise kms drivers would have to be careful to never call 
> drm_vblank_off multiple times before calling drm_vblank_on, but the help 
> text to drm_vblank_on() claims that unbalanced calls to these functions 
> are perfectly fine.
> 
> -mario
> 
> 
> 
> 
> 
> 
> 

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 13:23                               ` Ville Syrjälä
@ 2016-01-25 13:44                                   ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 13:44 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König



On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>
>>
>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>
>>>>> [ Trimming KDE folks from Cc ]
>>>>>
>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>
>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>
>>>>>>> AFAIR I originally reported it in response to
>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>> See
>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>> just a few days.
>>>>>>
>>>>>> Ok, so just uncovered the overflow bug.
>>>>>
>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>> to happen when turning off the CRTC:
>>>>>
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>>>
>>>> Not sure what bug we're talking about here, but here the hw counter
>>>> clearly jumps backwards.
>>>>
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>>>
>>>> Same here.
>>>
>>> At least one of the jumps is expected, because this is around turning
>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>> though.
>>>
>>>
>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>> drm_vblank_on() are always called around the times when the hw counter
>>>> might get reset. Or at least that's how it should be.
>>>
>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>> the above with) or Mario's patch as well, but clearly something's still
>>> wrong. It's certainly possible that it's something in the driver, but
>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>> work fine (ignoring the regression discussed in this thread)... Do
>>> drm_vblank_on/off require something else to handle this correctly?
>>>
>>>
>>
>> I suspect it is because vblank_disable_and_save calls
>> drm_update_vblank_count() unconditionally, even if vblank irqs are
>> already off.
>>
>> So on a manual display disable -> reenable you get something like
>>
>> At disable:
>>
>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
>> final count.
>>
>> Then the crtc is shut down and its hw counter resets to zero.
>>
>> At reenable:
>>
>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
>> drm_vblank_off -> vblank_disable_and_save -> A pointless
>> drm_update_vblank_count() while the hw counter is already reset to zero
>> --> Unwanted counter jump.
>>
>>
>> The problem doesn't happen on a pure modeset to a different video
>> resolution/refresh rate, as then we only have one call into
>> atombios_crtc_dpms(DPMS_OFF).
>>
>> I think the fix is to fix vblank_disable_and_save() to only call
>> drm_update_vblank_count() if vblank irqs get actually disabled, not on
>> no-op calls. I will try that now.
>
> It does that on purpose. Otherwise the vblank counter would appear to
> have stalled while the interrupt was off.
>

Ok, that's what the comments there say, although i don't see atm. why 
that perceived stall would be a big problem. I checked all callers of 
vblank_disable_and_save(). They are all careful to not call that 
function if vblanks are already disabled. The only exception is 
drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms 
drivers which have resetting hw counters or other problematic behaviour 
during modesets etc. then this will break. E.g., calling the vblank 
timestamping stuff is also not safe/well-defined during modesets when 
the timestamping constants are not (yet) updated to reflect the new mode 
timing of the modeset in progress.

-mario


>>
>> Otherwise kms drivers would have to be careful to never call
>> drm_vblank_off multiple times before calling drm_vblank_on, but the help
>> text to drm_vblank_on() claims that unbalanced calls to these functions
>> are perfectly fine.
>>
>> -mario
>>
>>
>>
>>
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25 13:44                                   ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 13:44 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Michel Dänzer, LKML, dri-devel, Alex Deucher,
	Christian König, Vlastimil Babka



On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>
>>
>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>
>>>>> [ Trimming KDE folks from Cc ]
>>>>>
>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>
>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>
>>>>>>> AFAIR I originally reported it in response to
>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>> See
>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>> just a few days.
>>>>>>
>>>>>> Ok, so just uncovered the overflow bug.
>>>>>
>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>> to happen when turning off the CRTC:
>>>>>
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>>>
>>>> Not sure what bug we're talking about here, but here the hw counter
>>>> clearly jumps backwards.
>>>>
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>>>
>>>> Same here.
>>>
>>> At least one of the jumps is expected, because this is around turning
>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>> though.
>>>
>>>
>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>> drm_vblank_on() are always called around the times when the hw counter
>>>> might get reset. Or at least that's how it should be.
>>>
>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>> the above with) or Mario's patch as well, but clearly something's still
>>> wrong. It's certainly possible that it's something in the driver, but
>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>> work fine (ignoring the regression discussed in this thread)... Do
>>> drm_vblank_on/off require something else to handle this correctly?
>>>
>>>
>>
>> I suspect it is because vblank_disable_and_save calls
>> drm_update_vblank_count() unconditionally, even if vblank irqs are
>> already off.
>>
>> So on a manual display disable -> reenable you get something like
>>
>> At disable:
>>
>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
>> final count.
>>
>> Then the crtc is shut down and its hw counter resets to zero.
>>
>> At reenable:
>>
>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
>> drm_vblank_off -> vblank_disable_and_save -> A pointless
>> drm_update_vblank_count() while the hw counter is already reset to zero
>> --> Unwanted counter jump.
>>
>>
>> The problem doesn't happen on a pure modeset to a different video
>> resolution/refresh rate, as then we only have one call into
>> atombios_crtc_dpms(DPMS_OFF).
>>
>> I think the fix is to fix vblank_disable_and_save() to only call
>> drm_update_vblank_count() if vblank irqs get actually disabled, not on
>> no-op calls. I will try that now.
>
> It does that on purpose. Otherwise the vblank counter would appear to
> have stalled while the interrupt was off.
>

Ok, that's what the comments there say, although i don't see atm. why 
that perceived stall would be a big problem. I checked all callers of 
vblank_disable_and_save(). They are all careful to not call that 
function if vblanks are already disabled. The only exception is 
drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms 
drivers which have resetting hw counters or other problematic behaviour 
during modesets etc. then this will break. E.g., calling the vblank 
timestamping stuff is also not safe/well-defined during modesets when 
the timestamping constants are not (yet) updated to reflect the new mode 
timing of the modeset in progress.

-mario


>>
>> Otherwise kms drivers would have to be careful to never call
>> drm_vblank_off multiple times before calling drm_vblank_on, but the help
>> text to drm_vblank_on() claims that unbalanced calls to these functions
>> are perfectly fine.
>>
>> -mario
>>
>>
>>
>>
>>
>>
>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 13:44                                   ` Mario Kleiner
@ 2016-01-25 14:53                                     ` Ville Syrjälä
  -1 siblings, 0 replies; 59+ messages in thread
From: Ville Syrjälä @ 2016-01-25 14:53 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König

On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
> 
> 
> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> > On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
> >>
> >>
> >> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> >>> On 23.01.2016 00:18, Ville Syrjälä wrote:
> >>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> >>>>>
> >>>>> [ Trimming KDE folks from Cc ]
> >>>>>
> >>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
> >>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
> >>>>>>>>
> >>>>>>>> Can you please point me at the vblank on/off jump bug please?
> >>>>>>>
> >>>>>>> AFAIR I originally reported it in response to
> >>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
> >>>>>>> See
> >>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
> >>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >>>>>>> just a few days.
> >>>>>>
> >>>>>> Ok, so just uncovered the overflow bug.
> >>>>>
> >>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> >>>>> counter jumping bug (similar to the bug this thread is about), which
> >>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
> >>>>> to happen when turning off the CRTC:
> >>>>>
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> >>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> >>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> >>>>
> >>>> Not sure what bug we're talking about here, but here the hw counter
> >>>> clearly jumps backwards.
> >>>>
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> >>>>
> >>>> Same here.
> >>>
> >>> At least one of the jumps is expected, because this is around turning
> >>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
> >>> though.
> >>>
> >>>
> >>>> These things just don't happen on i915 because drm_vblank_off() and
> >>>> drm_vblank_on() are always called around the times when the hw counter
> >>>> might get reset. Or at least that's how it should be.
> >>>
> >>> Which is of course the idea of Daniel's patch (which is what I'm getting
> >>> the above with) or Mario's patch as well, but clearly something's still
> >>> wrong. It's certainly possible that it's something in the driver, but
> >>> since calling drm_vblank_pre/post_modeset from the same places seems to
> >>> work fine (ignoring the regression discussed in this thread)... Do
> >>> drm_vblank_on/off require something else to handle this correctly?
> >>>
> >>>
> >>
> >> I suspect it is because vblank_disable_and_save calls
> >> drm_update_vblank_count() unconditionally, even if vblank irqs are
> >> already off.
> >>
> >> So on a manual display disable -> reenable you get something like
> >>
> >> At disable:
> >>
> >> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
> >> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
> >> final count.
> >>
> >> Then the crtc is shut down and its hw counter resets to zero.
> >>
> >> At reenable:
> >>
> >> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
> >> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
> >> drm_vblank_off -> vblank_disable_and_save -> A pointless
> >> drm_update_vblank_count() while the hw counter is already reset to zero
> >> --> Unwanted counter jump.
> >>
> >>
> >> The problem doesn't happen on a pure modeset to a different video
> >> resolution/refresh rate, as then we only have one call into
> >> atombios_crtc_dpms(DPMS_OFF).
> >>
> >> I think the fix is to fix vblank_disable_and_save() to only call
> >> drm_update_vblank_count() if vblank irqs get actually disabled, not on
> >> no-op calls. I will try that now.
> >
> > It does that on purpose. Otherwise the vblank counter would appear to
> > have stalled while the interrupt was off.
> >
> 
> Ok, that's what the comments there say, although i don't see atm. why 
> that perceived stall would be a big problem. I checked all callers of 
> vblank_disable_and_save(). They are all careful to not call that 
> function if vblanks are already disabled. The only exception is 
> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms 
> drivers which have resetting hw counters or other problematic behaviour 
> during modesets etc. then this will break. E.g., calling the vblank 
> timestamping stuff is also not safe/well-defined during modesets when 
> the timestamping constants are not (yet) updated to reflect the new mode 
> timing of the modeset in progress.

The idea is to maintain the appearance that the counter ticks all the
time as long as the crtc is active. While that may not be really
required in case if no one is currently interested in the vblank
counter, I think it's a nice thing to have just to make the behaviour
of the counter consistent.

As far as calling drm_vblank_off() after the hw counter got reset, well,
that not correct. It should be called before the reset.

> 
> -mario
> 
> 
> >>
> >> Otherwise kms drivers would have to be careful to never call
> >> drm_vblank_off multiple times before calling drm_vblank_on, but the help
> >> text to drm_vblank_on() claims that unbalanced calls to these functions
> >> are perfectly fine.
> >>
> >> -mario
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25 14:53                                     ` Ville Syrjälä
  0 siblings, 0 replies; 59+ messages in thread
From: Ville Syrjälä @ 2016-01-25 14:53 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Michel Dänzer, LKML, dri-devel, Alex Deucher,
	Christian König, Vlastimil Babka

On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
> 
> 
> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> > On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
> >>
> >>
> >> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> >>> On 23.01.2016 00:18, Ville Syrjälä wrote:
> >>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> >>>>>
> >>>>> [ Trimming KDE folks from Cc ]
> >>>>>
> >>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
> >>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
> >>>>>>>>
> >>>>>>>> Can you please point me at the vblank on/off jump bug please?
> >>>>>>>
> >>>>>>> AFAIR I originally reported it in response to
> >>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
> >>>>>>> See
> >>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
> >>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >>>>>>> just a few days.
> >>>>>>
> >>>>>> Ok, so just uncovered the overflow bug.
> >>>>>
> >>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> >>>>> counter jumping bug (similar to the bug this thread is about), which
> >>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
> >>>>> to happen when turning off the CRTC:
> >>>>>
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> >>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> >>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> >>>>
> >>>> Not sure what bug we're talking about here, but here the hw counter
> >>>> clearly jumps backwards.
> >>>>
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> >>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> >>>>
> >>>> Same here.
> >>>
> >>> At least one of the jumps is expected, because this is around turning
> >>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
> >>> though.
> >>>
> >>>
> >>>> These things just don't happen on i915 because drm_vblank_off() and
> >>>> drm_vblank_on() are always called around the times when the hw counter
> >>>> might get reset. Or at least that's how it should be.
> >>>
> >>> Which is of course the idea of Daniel's patch (which is what I'm getting
> >>> the above with) or Mario's patch as well, but clearly something's still
> >>> wrong. It's certainly possible that it's something in the driver, but
> >>> since calling drm_vblank_pre/post_modeset from the same places seems to
> >>> work fine (ignoring the regression discussed in this thread)... Do
> >>> drm_vblank_on/off require something else to handle this correctly?
> >>>
> >>>
> >>
> >> I suspect it is because vblank_disable_and_save calls
> >> drm_update_vblank_count() unconditionally, even if vblank irqs are
> >> already off.
> >>
> >> So on a manual display disable -> reenable you get something like
> >>
> >> At disable:
> >>
> >> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
> >> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
> >> final count.
> >>
> >> Then the crtc is shut down and its hw counter resets to zero.
> >>
> >> At reenable:
> >>
> >> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
> >> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
> >> drm_vblank_off -> vblank_disable_and_save -> A pointless
> >> drm_update_vblank_count() while the hw counter is already reset to zero
> >> --> Unwanted counter jump.
> >>
> >>
> >> The problem doesn't happen on a pure modeset to a different video
> >> resolution/refresh rate, as then we only have one call into
> >> atombios_crtc_dpms(DPMS_OFF).
> >>
> >> I think the fix is to fix vblank_disable_and_save() to only call
> >> drm_update_vblank_count() if vblank irqs get actually disabled, not on
> >> no-op calls. I will try that now.
> >
> > It does that on purpose. Otherwise the vblank counter would appear to
> > have stalled while the interrupt was off.
> >
> 
> Ok, that's what the comments there say, although i don't see atm. why 
> that perceived stall would be a big problem. I checked all callers of 
> vblank_disable_and_save(). They are all careful to not call that 
> function if vblanks are already disabled. The only exception is 
> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms 
> drivers which have resetting hw counters or other problematic behaviour 
> during modesets etc. then this will break. E.g., calling the vblank 
> timestamping stuff is also not safe/well-defined during modesets when 
> the timestamping constants are not (yet) updated to reflect the new mode 
> timing of the modeset in progress.

The idea is to maintain the appearance that the counter ticks all the
time as long as the crtc is active. While that may not be really
required in case if no one is currently interested in the vblank
counter, I think it's a nice thing to have just to make the behaviour
of the counter consistent.

As far as calling drm_vblank_off() after the hw counter got reset, well,
that not correct. It should be called before the reset.

> 
> -mario
> 
> 
> >>
> >> Otherwise kms drivers would have to be careful to never call
> >> drm_vblank_off multiple times before calling drm_vblank_on, but the help
> >> text to drm_vblank_on() claims that unbalanced calls to these functions
> >> are perfectly fine.
> >>
> >> -mario
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 14:53                                     ` Ville Syrjälä
  (?)
@ 2016-01-25 16:38                                     ` Mario Kleiner
  2016-01-25 18:51                                         ` Daniel Vetter
  -1 siblings, 1 reply; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 16:38 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König, Daniel Vetter

Readding Daniel, which somehow got dropped from the cc.

On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>
>>
>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>
>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>
>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>
>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>
>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>>>> See
>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>>>> just a few days.
>>>>>>>>
>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>
>>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>>>> to happen when turning off the CRTC:
>>>>>>>
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>
>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>> clearly jumps backwards.
>>>>>>
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>>>>>
>>>>>> Same here.
>>>>>
>>>>> At least one of the jumps is expected, because this is around turning
>>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>>>> though.
>>>>>
>>>>>
>>>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>>>> drm_vblank_on() are always called around the times when the hw counter
>>>>>> might get reset. Or at least that's how it should be.
>>>>>
>>>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>>>> the above with) or Mario's patch as well, but clearly something's still
>>>>> wrong. It's certainly possible that it's something in the driver, but
>>>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>>>> work fine (ignoring the regression discussed in this thread)... Do
>>>>> drm_vblank_on/off require something else to handle this correctly?
>>>>>
>>>>>
>>>>
>>>> I suspect it is because vblank_disable_and_save calls
>>>> drm_update_vblank_count() unconditionally, even if vblank irqs are
>>>> already off.
>>>>
>>>> So on a manual display disable -> reenable you get something like
>>>>
>>>> At disable:
>>>>
>>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
>>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
>>>> final count.
>>>>
>>>> Then the crtc is shut down and its hw counter resets to zero.
>>>>
>>>> At reenable:
>>>>
>>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
>>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
>>>> drm_vblank_off -> vblank_disable_and_save -> A pointless
>>>> drm_update_vblank_count() while the hw counter is already reset to zero
>>>> --> Unwanted counter jump.
>>>>
>>>>
>>>> The problem doesn't happen on a pure modeset to a different video
>>>> resolution/refresh rate, as then we only have one call into
>>>> atombios_crtc_dpms(DPMS_OFF).
>>>>
>>>> I think the fix is to fix vblank_disable_and_save() to only call
>>>> drm_update_vblank_count() if vblank irqs get actually disabled, not on
>>>> no-op calls. I will try that now.
>>>
>>> It does that on purpose. Otherwise the vblank counter would appear to
>>> have stalled while the interrupt was off.
>>>
>>
>> Ok, that's what the comments there say, although i don't see atm. why
>> that perceived stall would be a big problem. I checked all callers of
>> vblank_disable_and_save(). They are all careful to not call that
>> function if vblanks are already disabled. The only exception is
>> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
>> drivers which have resetting hw counters or other problematic behaviour
>> during modesets etc. then this will break. E.g., calling the vblank
>> timestamping stuff is also not safe/well-defined during modesets when
>> the timestamping constants are not (yet) updated to reflect the new mode
>> timing of the modeset in progress.
>
> The idea is to maintain the appearance that the counter ticks all the
> time as long as the crtc is active. While that may not be really
> required in case if no one is currently interested in the vblank
> counter, I think it's a nice thing to have just to make the behaviour
> of the counter consistent.
>
> As far as calling drm_vblank_off() after the hw counter got reset, well,
> that not correct. It should be called before the reset.

What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The 
first call to DMPS_OFF will call drm_vblank_off() and really disable 
vblank-irqs if they were running, updating the counts/ts a last time. 
But then the dpms off will reset the hw counter to zero. When one 
reenables the display, a second call to DPMS_OFF will now call 
drm_vblank_off again when it apparently shouldn't.

I just tested this patch, which fixes the counter jumps on radeon-kms 
with my or Daniel's drm_vblank_off patches to radeon:

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 607f493..d739d93 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, 
unsigned int pipe)
         spin_lock_irqsave(&dev->event_lock, irqflags);

         spin_lock(&dev->vbl_lock);
-       vblank_disable_and_save(dev, pipe);
+       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, 
vblank->enabled);
+
+       if (vblank->enabled)
+               vblank_disable_and_save(dev, pipe);
         wake_up(&vblank->queue);

         /*
@@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, 
unsigned int pipe)
                 return;

         spin_lock_irqsave(&dev->vbl_lock, irqflags);
+       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, 
vblank->enabled);
+
         /* Drop our private "prevent drm_vblank_get" refcount */
         if (vblank->inmodeset) {
                 atomic_dec(&vblank->refcount);



Another, maybe better, approach might be to no-op redundant calls to 
drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to 
drm_vblank_on() iff !vblank->inmodeset.

-mario


>
>>
>> -mario
>>
>>
>>>>
>>>> Otherwise kms drivers would have to be careful to never call
>>>> drm_vblank_off multiple times before calling drm_vblank_on, but the help
>>>> text to drm_vblank_on() claims that unbalanced calls to these functions
>>>> are perfectly fine.
>>>>
>>>> -mario
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 16:38                                     ` Mario Kleiner
@ 2016-01-25 18:51                                         ` Daniel Vetter
  0 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-25 18:51 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Ville Syrjälä,
	Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König, Daniel Vetter

On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
> Readding Daniel, which somehow got dropped from the cc.
> 
> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
> >On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
> >>
> >>
> >>On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> >>>On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
> >>>>
> >>>>
> >>>>On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> >>>>>On 23.01.2016 00:18, Ville Syrjälä wrote:
> >>>>>>On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> >>>>>>>
> >>>>>>>[ Trimming KDE folks from Cc ]
> >>>>>>>
> >>>>>>>On 21.01.2016 19:09, Daniel Vetter wrote:
> >>>>>>>>On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >>>>>>>>>On 21.01.2016 16:58, Daniel Vetter wrote:
> >>>>>>>>>>
> >>>>>>>>>>Can you please point me at the vblank on/off jump bug please?
> >>>>>>>>>
> >>>>>>>>>AFAIR I originally reported it in response to
> >>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >>>>>>>>>, but I can't find that in the archives, so maybe that was just on IRC.
> >>>>>>>>>See
> >>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >>>>>>>>>. Basically, I ran into the bug fixed by your patch because the counter
> >>>>>>>>>jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >>>>>>>>>just a few days.
> >>>>>>>>
> >>>>>>>>Ok, so just uncovered the overflow bug.
> >>>>>>>
> >>>>>>>Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> >>>>>>>counter jumping bug (similar to the bug this thread is about), which
> >>>>>>>exposed the overflow bug, is still alive and kicking in 4.5. It seems
> >>>>>>>to happen when turning off the CRTC:
> >>>>>>>
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> >>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> >>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> >>>>>>
> >>>>>>Not sure what bug we're talking about here, but here the hw counter
> >>>>>>clearly jumps backwards.
> >>>>>>
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> >>>>>>
> >>>>>>Same here.
> >>>>>
> >>>>>At least one of the jumps is expected, because this is around turning
> >>>>>off the CRTC for DPMS off. Don't know yet why there are two jumps back
> >>>>>though.
> >>>>>
> >>>>>
> >>>>>>These things just don't happen on i915 because drm_vblank_off() and
> >>>>>>drm_vblank_on() are always called around the times when the hw counter
> >>>>>>might get reset. Or at least that's how it should be.
> >>>>>
> >>>>>Which is of course the idea of Daniel's patch (which is what I'm getting
> >>>>>the above with) or Mario's patch as well, but clearly something's still
> >>>>>wrong. It's certainly possible that it's something in the driver, but
> >>>>>since calling drm_vblank_pre/post_modeset from the same places seems to
> >>>>>work fine (ignoring the regression discussed in this thread)... Do
> >>>>>drm_vblank_on/off require something else to handle this correctly?
> >>>>>
> >>>>>
> >>>>
> >>>>I suspect it is because vblank_disable_and_save calls
> >>>>drm_update_vblank_count() unconditionally, even if vblank irqs are
> >>>>already off.
> >>>>
> >>>>So on a manual display disable -> reenable you get something like
> >>>>
> >>>>At disable:
> >>>>
> >>>>Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
> >>>>vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
> >>>>final count.
> >>>>
> >>>>Then the crtc is shut down and its hw counter resets to zero.
> >>>>
> >>>>At reenable:
> >>>>
> >>>>Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
> >>>>atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
> >>>>drm_vblank_off -> vblank_disable_and_save -> A pointless
> >>>>drm_update_vblank_count() while the hw counter is already reset to zero
> >>>>--> Unwanted counter jump.
> >>>>
> >>>>
> >>>>The problem doesn't happen on a pure modeset to a different video
> >>>>resolution/refresh rate, as then we only have one call into
> >>>>atombios_crtc_dpms(DPMS_OFF).
> >>>>
> >>>>I think the fix is to fix vblank_disable_and_save() to only call
> >>>>drm_update_vblank_count() if vblank irqs get actually disabled, not on
> >>>>no-op calls. I will try that now.
> >>>
> >>>It does that on purpose. Otherwise the vblank counter would appear to
> >>>have stalled while the interrupt was off.
> >>>
> >>
> >>Ok, that's what the comments there say, although i don't see atm. why
> >>that perceived stall would be a big problem. I checked all callers of
> >>vblank_disable_and_save(). They are all careful to not call that
> >>function if vblanks are already disabled. The only exception is
> >>drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
> >>drivers which have resetting hw counters or other problematic behaviour
> >>during modesets etc. then this will break. E.g., calling the vblank
> >>timestamping stuff is also not safe/well-defined during modesets when
> >>the timestamping constants are not (yet) updated to reflect the new mode
> >>timing of the modeset in progress.
> >
> >The idea is to maintain the appearance that the counter ticks all the
> >time as long as the crtc is active. While that may not be really
> >required in case if no one is currently interested in the vblank
> >counter, I think it's a nice thing to have just to make the behaviour
> >of the counter consistent.
> >
> >As far as calling drm_vblank_off() after the hw counter got reset, well,
> >that not correct. It should be called before the reset.
> 
> What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
> first call to DMPS_OFF will call drm_vblank_off() and really disable
> vblank-irqs if they were running, updating the counts/ts a last time. But
> then the dpms off will reset the hw counter to zero. When one reenables the
> display, a second call to DPMS_OFF will now call drm_vblank_off again when
> it apparently shouldn't.
> 
> I just tested this patch, which fixes the counter jumps on radeon-kms with
> my or Daniel's drm_vblank_off patches to radeon:

This might be due to the legacy helpers, which just love to redundantly
disable stuff that's off already. The problem I see with no-oping these
out is that for atomic drivers (which really should get this right) this
might paper over bugs: E.g. when you forget to call _off() when disabling
the crtc, then calling _on() twice in a row is indeed a serious bug.
Similar when you forget to call _on() and have multiple _off() calls in a
row.

So not sure what to do here.
-Daniel

> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 607f493..d739d93 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned
> int pipe)
>         spin_lock_irqsave(&dev->event_lock, irqflags);
> 
>         spin_lock(&dev->vbl_lock);
> -       vblank_disable_and_save(dev, pipe);
> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
> vblank->enabled);
> +
> +       if (vblank->enabled)
> +               vblank_disable_and_save(dev, pipe);
>         wake_up(&vblank->queue);
> 
>         /*
> @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned
> int pipe)
>                 return;
> 
>         spin_lock_irqsave(&dev->vbl_lock, irqflags);
> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
> vblank->enabled);
> +
>         /* Drop our private "prevent drm_vblank_get" refcount */
>         if (vblank->inmodeset) {
>                 atomic_dec(&vblank->refcount);
> 
> 
> 
> Another, maybe better, approach might be to no-op redundant calls to
> drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to
> drm_vblank_on() iff !vblank->inmodeset.
> 
> -mario
> 
> 
> >
> >>
> >>-mario
> >>
> >>
> >>>>
> >>>>Otherwise kms drivers would have to be careful to never call
> >>>>drm_vblank_off multiple times before calling drm_vblank_on, but the help
> >>>>text to drm_vblank_on() claims that unbalanced calls to these functions
> >>>>are perfectly fine.
> >>>>
> >>>>-mario
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25 18:51                                         ` Daniel Vetter
  0 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-25 18:51 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Daniel Vetter, Michel Dänzer, LKML, dri-devel, Alex Deucher,
	Christian König, Vlastimil Babka

On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
> Readding Daniel, which somehow got dropped from the cc.
> 
> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
> >On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
> >>
> >>
> >>On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> >>>On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
> >>>>
> >>>>
> >>>>On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> >>>>>On 23.01.2016 00:18, Ville Syrjälä wrote:
> >>>>>>On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> >>>>>>>
> >>>>>>>[ Trimming KDE folks from Cc ]
> >>>>>>>
> >>>>>>>On 21.01.2016 19:09, Daniel Vetter wrote:
> >>>>>>>>On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >>>>>>>>>On 21.01.2016 16:58, Daniel Vetter wrote:
> >>>>>>>>>>
> >>>>>>>>>>Can you please point me at the vblank on/off jump bug please?
> >>>>>>>>>
> >>>>>>>>>AFAIR I originally reported it in response to
> >>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >>>>>>>>>, but I can't find that in the archives, so maybe that was just on IRC.
> >>>>>>>>>See
> >>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >>>>>>>>>. Basically, I ran into the bug fixed by your patch because the counter
> >>>>>>>>>jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >>>>>>>>>just a few days.
> >>>>>>>>
> >>>>>>>>Ok, so just uncovered the overflow bug.
> >>>>>>>
> >>>>>>>Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> >>>>>>>counter jumping bug (similar to the bug this thread is about), which
> >>>>>>>exposed the overflow bug, is still alive and kicking in 4.5. It seems
> >>>>>>>to happen when turning off the CRTC:
> >>>>>>>
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> >>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> >>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> >>>>>>
> >>>>>>Not sure what bug we're talking about here, but here the hw counter
> >>>>>>clearly jumps backwards.
> >>>>>>
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> >>>>>>
> >>>>>>Same here.
> >>>>>
> >>>>>At least one of the jumps is expected, because this is around turning
> >>>>>off the CRTC for DPMS off. Don't know yet why there are two jumps back
> >>>>>though.
> >>>>>
> >>>>>
> >>>>>>These things just don't happen on i915 because drm_vblank_off() and
> >>>>>>drm_vblank_on() are always called around the times when the hw counter
> >>>>>>might get reset. Or at least that's how it should be.
> >>>>>
> >>>>>Which is of course the idea of Daniel's patch (which is what I'm getting
> >>>>>the above with) or Mario's patch as well, but clearly something's still
> >>>>>wrong. It's certainly possible that it's something in the driver, but
> >>>>>since calling drm_vblank_pre/post_modeset from the same places seems to
> >>>>>work fine (ignoring the regression discussed in this thread)... Do
> >>>>>drm_vblank_on/off require something else to handle this correctly?
> >>>>>
> >>>>>
> >>>>
> >>>>I suspect it is because vblank_disable_and_save calls
> >>>>drm_update_vblank_count() unconditionally, even if vblank irqs are
> >>>>already off.
> >>>>
> >>>>So on a manual display disable -> reenable you get something like
> >>>>
> >>>>At disable:
> >>>>
> >>>>Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
> >>>>vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
> >>>>final count.
> >>>>
> >>>>Then the crtc is shut down and its hw counter resets to zero.
> >>>>
> >>>>At reenable:
> >>>>
> >>>>Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
> >>>>atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
> >>>>drm_vblank_off -> vblank_disable_and_save -> A pointless
> >>>>drm_update_vblank_count() while the hw counter is already reset to zero
> >>>>--> Unwanted counter jump.
> >>>>
> >>>>
> >>>>The problem doesn't happen on a pure modeset to a different video
> >>>>resolution/refresh rate, as then we only have one call into
> >>>>atombios_crtc_dpms(DPMS_OFF).
> >>>>
> >>>>I think the fix is to fix vblank_disable_and_save() to only call
> >>>>drm_update_vblank_count() if vblank irqs get actually disabled, not on
> >>>>no-op calls. I will try that now.
> >>>
> >>>It does that on purpose. Otherwise the vblank counter would appear to
> >>>have stalled while the interrupt was off.
> >>>
> >>
> >>Ok, that's what the comments there say, although i don't see atm. why
> >>that perceived stall would be a big problem. I checked all callers of
> >>vblank_disable_and_save(). They are all careful to not call that
> >>function if vblanks are already disabled. The only exception is
> >>drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
> >>drivers which have resetting hw counters or other problematic behaviour
> >>during modesets etc. then this will break. E.g., calling the vblank
> >>timestamping stuff is also not safe/well-defined during modesets when
> >>the timestamping constants are not (yet) updated to reflect the new mode
> >>timing of the modeset in progress.
> >
> >The idea is to maintain the appearance that the counter ticks all the
> >time as long as the crtc is active. While that may not be really
> >required in case if no one is currently interested in the vblank
> >counter, I think it's a nice thing to have just to make the behaviour
> >of the counter consistent.
> >
> >As far as calling drm_vblank_off() after the hw counter got reset, well,
> >that not correct. It should be called before the reset.
> 
> What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
> first call to DMPS_OFF will call drm_vblank_off() and really disable
> vblank-irqs if they were running, updating the counts/ts a last time. But
> then the dpms off will reset the hw counter to zero. When one reenables the
> display, a second call to DPMS_OFF will now call drm_vblank_off again when
> it apparently shouldn't.
> 
> I just tested this patch, which fixes the counter jumps on radeon-kms with
> my or Daniel's drm_vblank_off patches to radeon:

This might be due to the legacy helpers, which just love to redundantly
disable stuff that's off already. The problem I see with no-oping these
out is that for atomic drivers (which really should get this right) this
might paper over bugs: E.g. when you forget to call _off() when disabling
the crtc, then calling _on() twice in a row is indeed a serious bug.
Similar when you forget to call _on() and have multiple _off() calls in a
row.

So not sure what to do here.
-Daniel

> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 607f493..d739d93 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned
> int pipe)
>         spin_lock_irqsave(&dev->event_lock, irqflags);
> 
>         spin_lock(&dev->vbl_lock);
> -       vblank_disable_and_save(dev, pipe);
> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
> vblank->enabled);
> +
> +       if (vblank->enabled)
> +               vblank_disable_and_save(dev, pipe);
>         wake_up(&vblank->queue);
> 
>         /*
> @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned
> int pipe)
>                 return;
> 
>         spin_lock_irqsave(&dev->vbl_lock, irqflags);
> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
> vblank->enabled);
> +
>         /* Drop our private "prevent drm_vblank_get" refcount */
>         if (vblank->inmodeset) {
>                 atomic_dec(&vblank->refcount);
> 
> 
> 
> Another, maybe better, approach might be to no-op redundant calls to
> drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to
> drm_vblank_on() iff !vblank->inmodeset.
> 
> -mario
> 
> 
> >
> >>
> >>-mario
> >>
> >>
> >>>>
> >>>>Otherwise kms drivers would have to be careful to never call
> >>>>drm_vblank_off multiple times before calling drm_vblank_on, but the help
> >>>>text to drm_vblank_on() claims that unbalanced calls to these functions
> >>>>are perfectly fine.
> >>>>
> >>>>-mario
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 18:51                                         ` Daniel Vetter
@ 2016-01-25 19:30                                           ` Mario Kleiner
  -1 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 19:30 UTC (permalink / raw)
  To: Ville Syrjälä,
	Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König



On 01/25/2016 07:51 PM, Daniel Vetter wrote:
> On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
>> Readding Daniel, which somehow got dropped from the cc.
>>
>> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
>>> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>>>
>>>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>>>
>>>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>>>
>>>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>>>>>> See
>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>>>>>> just a few days.
>>>>>>>>>>
>>>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>>>
>>>>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>>>>>> to happen when turning off the CRTC:
>>>>>>>>>
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>>>
>>>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>>>> clearly jumps backwards.
>>>>>>>>
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>>>>>>>
>>>>>>>> Same here.
>>>>>>>
>>>>>>> At least one of the jumps is expected, because this is around turning
>>>>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>>>>>> though.
>>>>>>>
>>>>>>>
>>>>>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>>>>>> drm_vblank_on() are always called around the times when the hw counter
>>>>>>>> might get reset. Or at least that's how it should be.
>>>>>>>
>>>>>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>>>>>> the above with) or Mario's patch as well, but clearly something's still
>>>>>>> wrong. It's certainly possible that it's something in the driver, but
>>>>>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>>>>>> work fine (ignoring the regression discussed in this thread)... Do
>>>>>>> drm_vblank_on/off require something else to handle this correctly?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I suspect it is because vblank_disable_and_save calls
>>>>>> drm_update_vblank_count() unconditionally, even if vblank irqs are
>>>>>> already off.
>>>>>>
>>>>>> So on a manual display disable -> reenable you get something like
>>>>>>
>>>>>> At disable:
>>>>>>
>>>>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
>>>>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
>>>>>> final count.
>>>>>>
>>>>>> Then the crtc is shut down and its hw counter resets to zero.
>>>>>>
>>>>>> At reenable:
>>>>>>
>>>>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
>>>>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
>>>>>> drm_vblank_off -> vblank_disable_and_save -> A pointless
>>>>>> drm_update_vblank_count() while the hw counter is already reset to zero
>>>>>> --> Unwanted counter jump.
>>>>>>
>>>>>>
>>>>>> The problem doesn't happen on a pure modeset to a different video
>>>>>> resolution/refresh rate, as then we only have one call into
>>>>>> atombios_crtc_dpms(DPMS_OFF).
>>>>>>
>>>>>> I think the fix is to fix vblank_disable_and_save() to only call
>>>>>> drm_update_vblank_count() if vblank irqs get actually disabled, not on
>>>>>> no-op calls. I will try that now.
>>>>>
>>>>> It does that on purpose. Otherwise the vblank counter would appear to
>>>>> have stalled while the interrupt was off.
>>>>>
>>>>
>>>> Ok, that's what the comments there say, although i don't see atm. why
>>>> that perceived stall would be a big problem. I checked all callers of
>>>> vblank_disable_and_save(). They are all careful to not call that
>>>> function if vblanks are already disabled. The only exception is
>>>> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
>>>> drivers which have resetting hw counters or other problematic behaviour
>>>> during modesets etc. then this will break. E.g., calling the vblank
>>>> timestamping stuff is also not safe/well-defined during modesets when
>>>> the timestamping constants are not (yet) updated to reflect the new mode
>>>> timing of the modeset in progress.
>>>
>>> The idea is to maintain the appearance that the counter ticks all the
>>> time as long as the crtc is active. While that may not be really
>>> required in case if no one is currently interested in the vblank
>>> counter, I think it's a nice thing to have just to make the behaviour
>>> of the counter consistent.
>>>
>>> As far as calling drm_vblank_off() after the hw counter got reset, well,
>>> that not correct. It should be called before the reset.
>>
>> What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
>> first call to DMPS_OFF will call drm_vblank_off() and really disable
>> vblank-irqs if they were running, updating the counts/ts a last time. But
>> then the dpms off will reset the hw counter to zero. When one reenables the
>> display, a second call to DPMS_OFF will now call drm_vblank_off again when
>> it apparently shouldn't.
>>
>> I just tested this patch, which fixes the counter jumps on radeon-kms with
>> my or Daniel's drm_vblank_off patches to radeon:
>
> This might be due to the legacy helpers, which just love to redundantly
> disable stuff that's off already. The problem I see with no-oping these
> out is that for atomic drivers (which really should get this right) this
> might paper over bugs: E.g. when you forget to call _off() when disabling
> the crtc, then calling _on() twice in a row is indeed a serious bug.
> Similar when you forget to call _on() and have multiple _off() calls in a
> row.
>
> So not sure what to do here.
> -Daniel
>

Yes, the legacy helpers cause two calls to dpms off if one disables a 
display. First during display disable as intended. Then when one 
reenables the display during modesetting as part of 
crtc_funcs->prepare() - at least on radeon.

Maybe the minimum thing that would help is to just check for 
vblank->inmodeset in drm_vblank_off(). If that would be the case we'd 
know it is a redundant call and could no-op it and do a 
WARN_ON(vblank->inmodeset)?

drm_vblank_on() i don't know how to treat, but that one calls 
drm_reset_vblank_timestamp() which should be less problematic if called 
redundantly.

Now the patch i want to try next to fix the drm_vblank_pre/post_modeset 
regression in Linux 4.4/4.5 is to add a ...

if ((diff > 1) && vblank->inmodeset) diff = 1;

... to the bottom of drm_update_vblank_count(). That should hopefully 
restore the pre/post_modeset behavior as close to the original behavior 
as possible. As a side effect it would also prevent the counter jump 
caused by redundant calls to drm_vblank_off().

-mario

>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 607f493..d739d93 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned
>> int pipe)
>>          spin_lock_irqsave(&dev->event_lock, irqflags);
>>
>>          spin_lock(&dev->vbl_lock);
>> -       vblank_disable_and_save(dev, pipe);
>> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
>> vblank->enabled);
>> +
>> +       if (vblank->enabled)
>> +               vblank_disable_and_save(dev, pipe);
>>          wake_up(&vblank->queue);
>>
>>          /*
>> @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned
>> int pipe)
>>                  return;
>>
>>          spin_lock_irqsave(&dev->vbl_lock, irqflags);
>> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
>> vblank->enabled);
>> +
>>          /* Drop our private "prevent drm_vblank_get" refcount */
>>          if (vblank->inmodeset) {
>>                  atomic_dec(&vblank->refcount);
>>
>>
>>
>> Another, maybe better, approach might be to no-op redundant calls to
>> drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to
>> drm_vblank_on() iff !vblank->inmodeset.
>>
>> -mario
>>
>>
>>>
>>>>
>>>> -mario
>>>>
>>>>
>>>>>>
>>>>>> Otherwise kms drivers would have to be careful to never call
>>>>>> drm_vblank_off multiple times before calling drm_vblank_on, but the help
>>>>>> text to drm_vblank_on() claims that unbalanced calls to these functions
>>>>>> are perfectly fine.
>>>>>>
>>>>>> -mario
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25 19:30                                           ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 19:30 UTC (permalink / raw)
  To: Ville Syrjälä,
	Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König



On 01/25/2016 07:51 PM, Daniel Vetter wrote:
> On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
>> Readding Daniel, which somehow got dropped from the cc.
>>
>> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
>>> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>>>
>>>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>>>
>>>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>>>
>>>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>>>>>> See
>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>>>>>> just a few days.
>>>>>>>>>>
>>>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>>>
>>>>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>>>>>> to happen when turning off the CRTC:
>>>>>>>>>
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>>>
>>>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>>>> clearly jumps backwards.
>>>>>>>>
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>>>>>>>
>>>>>>>> Same here.
>>>>>>>
>>>>>>> At least one of the jumps is expected, because this is around turning
>>>>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>>>>>> though.
>>>>>>>
>>>>>>>
>>>>>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>>>>>> drm_vblank_on() are always called around the times when the hw counter
>>>>>>>> might get reset. Or at least that's how it should be.
>>>>>>>
>>>>>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>>>>>> the above with) or Mario's patch as well, but clearly something's still
>>>>>>> wrong. It's certainly possible that it's something in the driver, but
>>>>>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>>>>>> work fine (ignoring the regression discussed in this thread)... Do
>>>>>>> drm_vblank_on/off require something else to handle this correctly?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I suspect it is because vblank_disable_and_save calls
>>>>>> drm_update_vblank_count() unconditionally, even if vblank irqs are
>>>>>> already off.
>>>>>>
>>>>>> So on a manual display disable -> reenable you get something like
>>>>>>
>>>>>> At disable:
>>>>>>
>>>>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
>>>>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
>>>>>> final count.
>>>>>>
>>>>>> Then the crtc is shut down and its hw counter resets to zero.
>>>>>>
>>>>>> At reenable:
>>>>>>
>>>>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
>>>>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
>>>>>> drm_vblank_off -> vblank_disable_and_save -> A pointless
>>>>>> drm_update_vblank_count() while the hw counter is already reset to zero
>>>>>> --> Unwanted counter jump.
>>>>>>
>>>>>>
>>>>>> The problem doesn't happen on a pure modeset to a different video
>>>>>> resolution/refresh rate, as then we only have one call into
>>>>>> atombios_crtc_dpms(DPMS_OFF).
>>>>>>
>>>>>> I think the fix is to fix vblank_disable_and_save() to only call
>>>>>> drm_update_vblank_count() if vblank irqs get actually disabled, not on
>>>>>> no-op calls. I will try that now.
>>>>>
>>>>> It does that on purpose. Otherwise the vblank counter would appear to
>>>>> have stalled while the interrupt was off.
>>>>>
>>>>
>>>> Ok, that's what the comments there say, although i don't see atm. why
>>>> that perceived stall would be a big problem. I checked all callers of
>>>> vblank_disable_and_save(). They are all careful to not call that
>>>> function if vblanks are already disabled. The only exception is
>>>> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
>>>> drivers which have resetting hw counters or other problematic behaviour
>>>> during modesets etc. then this will break. E.g., calling the vblank
>>>> timestamping stuff is also not safe/well-defined during modesets when
>>>> the timestamping constants are not (yet) updated to reflect the new mode
>>>> timing of the modeset in progress.
>>>
>>> The idea is to maintain the appearance that the counter ticks all the
>>> time as long as the crtc is active. While that may not be really
>>> required in case if no one is currently interested in the vblank
>>> counter, I think it's a nice thing to have just to make the behaviour
>>> of the counter consistent.
>>>
>>> As far as calling drm_vblank_off() after the hw counter got reset, well,
>>> that not correct. It should be called before the reset.
>>
>> What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
>> first call to DMPS_OFF will call drm_vblank_off() and really disable
>> vblank-irqs if they were running, updating the counts/ts a last time. But
>> then the dpms off will reset the hw counter to zero. When one reenables the
>> display, a second call to DPMS_OFF will now call drm_vblank_off again when
>> it apparently shouldn't.
>>
>> I just tested this patch, which fixes the counter jumps on radeon-kms with
>> my or Daniel's drm_vblank_off patches to radeon:
>
> This might be due to the legacy helpers, which just love to redundantly
> disable stuff that's off already. The problem I see with no-oping these
> out is that for atomic drivers (which really should get this right) this
> might paper over bugs: E.g. when you forget to call _off() when disabling
> the crtc, then calling _on() twice in a row is indeed a serious bug.
> Similar when you forget to call _on() and have multiple _off() calls in a
> row.
>
> So not sure what to do here.
> -Daniel
>

Yes, the legacy helpers cause two calls to dpms off if one disables a 
display. First during display disable as intended. Then when one 
reenables the display during modesetting as part of 
crtc_funcs->prepare() - at least on radeon.

Maybe the minimum thing that would help is to just check for 
vblank->inmodeset in drm_vblank_off(). If that would be the case we'd 
know it is a redundant call and could no-op it and do a 
WARN_ON(vblank->inmodeset)?

drm_vblank_on() i don't know how to treat, but that one calls 
drm_reset_vblank_timestamp() which should be less problematic if called 
redundantly.

Now the patch i want to try next to fix the drm_vblank_pre/post_modeset 
regression in Linux 4.4/4.5 is to add a ...

if ((diff > 1) && vblank->inmodeset) diff = 1;

... to the bottom of drm_update_vblank_count(). That should hopefully 
restore the pre/post_modeset behavior as close to the original behavior 
as possible. As a side effect it would also prevent the counter jump 
caused by redundant calls to drm_vblank_off().

-mario

>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 607f493..d739d93 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned
>> int pipe)
>>          spin_lock_irqsave(&dev->event_lock, irqflags);
>>
>>          spin_lock(&dev->vbl_lock);
>> -       vblank_disable_and_save(dev, pipe);
>> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
>> vblank->enabled);
>> +
>> +       if (vblank->enabled)
>> +               vblank_disable_and_save(dev, pipe);
>>          wake_up(&vblank->queue);
>>
>>          /*
>> @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned
>> int pipe)
>>                  return;
>>
>>          spin_lock_irqsave(&dev->vbl_lock, irqflags);
>> +       DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,
>> vblank->enabled);
>> +
>>          /* Drop our private "prevent drm_vblank_get" refcount */
>>          if (vblank->inmodeset) {
>>                  atomic_dec(&vblank->refcount);
>>
>>
>>
>> Another, maybe better, approach might be to no-op redundant calls to
>> drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to
>> drm_vblank_on() iff !vblank->inmodeset.
>>
>> -mario
>>
>>
>>>
>>>>
>>>> -mario
>>>>
>>>>
>>>>>>
>>>>>> Otherwise kms drivers would have to be careful to never call
>>>>>> drm_vblank_off multiple times before calling drm_vblank_on, but the help
>>>>>> text to drm_vblank_on() claims that unbalanced calls to these functions
>>>>>> are perfectly fine.
>>>>>>
>>>>>> -mario
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 19:30                                           ` Mario Kleiner
@ 2016-01-25 20:32                                             ` Daniel Vetter
  -1 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-25 20:32 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Ville Syrjälä,
	Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König

On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:
> 
> 
> On 01/25/2016 07:51 PM, Daniel Vetter wrote:
> >On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
> >>Readding Daniel, which somehow got dropped from the cc.
> >>
> >>On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
> >>>On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
> >>>>
> >>>>
> >>>>On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> >>>>>On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
> >>>>>>
> >>>>>>
> >>>>>>On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> >>>>>>>On 23.01.2016 00:18, Ville Syrjälä wrote:
> >>>>>>>>On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> >>>>>>>>>
> >>>>>>>>>[ Trimming KDE folks from Cc ]
> >>>>>>>>>
> >>>>>>>>>On 21.01.2016 19:09, Daniel Vetter wrote:
> >>>>>>>>>>On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >>>>>>>>>>>On 21.01.2016 16:58, Daniel Vetter wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>Can you please point me at the vblank on/off jump bug please?
> >>>>>>>>>>>
> >>>>>>>>>>>AFAIR I originally reported it in response to
> >>>>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >>>>>>>>>>>, but I can't find that in the archives, so maybe that was just on IRC.
> >>>>>>>>>>>See
> >>>>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >>>>>>>>>>>. Basically, I ran into the bug fixed by your patch because the counter
> >>>>>>>>>>>jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >>>>>>>>>>>just a few days.
> >>>>>>>>>>
> >>>>>>>>>>Ok, so just uncovered the overflow bug.
> >>>>>>>>>
> >>>>>>>>>Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> >>>>>>>>>counter jumping bug (similar to the bug this thread is about), which
> >>>>>>>>>exposed the overflow bug, is still alive and kicking in 4.5. It seems
> >>>>>>>>>to happen when turning off the CRTC:
> >>>>>>>>>
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> >>>>>>>>
> >>>>>>>>Not sure what bug we're talking about here, but here the hw counter
> >>>>>>>>clearly jumps backwards.
> >>>>>>>>
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> >>>>>>>>
> >>>>>>>>Same here.
> >>>>>>>
> >>>>>>>At least one of the jumps is expected, because this is around turning
> >>>>>>>off the CRTC for DPMS off. Don't know yet why there are two jumps back
> >>>>>>>though.
> >>>>>>>
> >>>>>>>
> >>>>>>>>These things just don't happen on i915 because drm_vblank_off() and
> >>>>>>>>drm_vblank_on() are always called around the times when the hw counter
> >>>>>>>>might get reset. Or at least that's how it should be.
> >>>>>>>
> >>>>>>>Which is of course the idea of Daniel's patch (which is what I'm getting
> >>>>>>>the above with) or Mario's patch as well, but clearly something's still
> >>>>>>>wrong. It's certainly possible that it's something in the driver, but
> >>>>>>>since calling drm_vblank_pre/post_modeset from the same places seems to
> >>>>>>>work fine (ignoring the regression discussed in this thread)... Do
> >>>>>>>drm_vblank_on/off require something else to handle this correctly?
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>I suspect it is because vblank_disable_and_save calls
> >>>>>>drm_update_vblank_count() unconditionally, even if vblank irqs are
> >>>>>>already off.
> >>>>>>
> >>>>>>So on a manual display disable -> reenable you get something like
> >>>>>>
> >>>>>>At disable:
> >>>>>>
> >>>>>>Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
> >>>>>>vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
> >>>>>>final count.
> >>>>>>
> >>>>>>Then the crtc is shut down and its hw counter resets to zero.
> >>>>>>
> >>>>>>At reenable:
> >>>>>>
> >>>>>>Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
> >>>>>>atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
> >>>>>>drm_vblank_off -> vblank_disable_and_save -> A pointless
> >>>>>>drm_update_vblank_count() while the hw counter is already reset to zero
> >>>>>>--> Unwanted counter jump.
> >>>>>>
> >>>>>>
> >>>>>>The problem doesn't happen on a pure modeset to a different video
> >>>>>>resolution/refresh rate, as then we only have one call into
> >>>>>>atombios_crtc_dpms(DPMS_OFF).
> >>>>>>
> >>>>>>I think the fix is to fix vblank_disable_and_save() to only call
> >>>>>>drm_update_vblank_count() if vblank irqs get actually disabled, not on
> >>>>>>no-op calls. I will try that now.
> >>>>>
> >>>>>It does that on purpose. Otherwise the vblank counter would appear to
> >>>>>have stalled while the interrupt was off.
> >>>>>
> >>>>
> >>>>Ok, that's what the comments there say, although i don't see atm. why
> >>>>that perceived stall would be a big problem. I checked all callers of
> >>>>vblank_disable_and_save(). They are all careful to not call that
> >>>>function if vblanks are already disabled. The only exception is
> >>>>drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
> >>>>drivers which have resetting hw counters or other problematic behaviour
> >>>>during modesets etc. then this will break. E.g., calling the vblank
> >>>>timestamping stuff is also not safe/well-defined during modesets when
> >>>>the timestamping constants are not (yet) updated to reflect the new mode
> >>>>timing of the modeset in progress.
> >>>
> >>>The idea is to maintain the appearance that the counter ticks all the
> >>>time as long as the crtc is active. While that may not be really
> >>>required in case if no one is currently interested in the vblank
> >>>counter, I think it's a nice thing to have just to make the behaviour
> >>>of the counter consistent.
> >>>
> >>>As far as calling drm_vblank_off() after the hw counter got reset, well,
> >>>that not correct. It should be called before the reset.
> >>
> >>What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
> >>first call to DMPS_OFF will call drm_vblank_off() and really disable
> >>vblank-irqs if they were running, updating the counts/ts a last time. But
> >>then the dpms off will reset the hw counter to zero. When one reenables the
> >>display, a second call to DPMS_OFF will now call drm_vblank_off again when
> >>it apparently shouldn't.
> >>
> >>I just tested this patch, which fixes the counter jumps on radeon-kms with
> >>my or Daniel's drm_vblank_off patches to radeon:
> >
> >This might be due to the legacy helpers, which just love to redundantly
> >disable stuff that's off already. The problem I see with no-oping these
> >out is that for atomic drivers (which really should get this right) this
> >might paper over bugs: E.g. when you forget to call _off() when disabling
> >the crtc, then calling _on() twice in a row is indeed a serious bug.
> >Similar when you forget to call _on() and have multiple _off() calls in a
> >row.
> >
> >So not sure what to do here.
> >-Daniel
> >
> 
> Yes, the legacy helpers cause two calls to dpms off if one disables a
> display. First during display disable as intended. Then when one reenables
> the display during modesetting as part of crtc_funcs->prepare() - at least
> on radeon.
> 
> Maybe the minimum thing that would help is to just check for
> vblank->inmodeset in drm_vblank_off(). If that would be the case we'd know
> it is a redundant call and could no-op it and do a
> WARN_ON(vblank->inmodeset)?

I have that here locally, blows up all over the place on radoen. And also
would blow up everywhere else.

I was thinking of adding the vblank->inmodeset check to radeon/amdgpu,
with a note why it's needed (legacy crtc helpers just suck).

> drm_vblank_on() i don't know how to treat, but that one calls
> drm_reset_vblank_timestamp() which should be less problematic if called
> redundantly.

I think even legacy crtc helpers don't enable stuff again if it's not been
disabled before. So on drm_vblank_on() we might be able to put a WARN_ON
in place ...

> Now the patch i want to try next to fix the drm_vblank_pre/post_modeset
> regression in Linux 4.4/4.5 is to add a ...
> 
> if ((diff > 1) && vblank->inmodeset) diff = 1;
> 
> ... to the bottom of drm_update_vblank_count(). That should hopefully
> restore the pre/post_modeset behavior as close to the original behavior as
> possible. As a side effect it would also prevent the counter jump caused by
> redundant calls to drm_vblank_off().

Hm, can we just frob pre/post_modeset only with some checks? I'd like to
not put that kind of "I have no idea about my hw state" hacks into the new
helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to
enforce correct usage, which would be a real bummer imo.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25 20:32                                             ` Daniel Vetter
  0 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-25 20:32 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Michel Dänzer, LKML, dri-devel, Alex Deucher,
	Christian König, Vlastimil Babka

On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:
> 
> 
> On 01/25/2016 07:51 PM, Daniel Vetter wrote:
> >On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
> >>Readding Daniel, which somehow got dropped from the cc.
> >>
> >>On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
> >>>On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
> >>>>
> >>>>
> >>>>On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> >>>>>On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
> >>>>>>
> >>>>>>
> >>>>>>On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> >>>>>>>On 23.01.2016 00:18, Ville Syrjälä wrote:
> >>>>>>>>On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
> >>>>>>>>>
> >>>>>>>>>[ Trimming KDE folks from Cc ]
> >>>>>>>>>
> >>>>>>>>>On 21.01.2016 19:09, Daniel Vetter wrote:
> >>>>>>>>>>On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> >>>>>>>>>>>On 21.01.2016 16:58, Daniel Vetter wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>Can you please point me at the vblank on/off jump bug please?
> >>>>>>>>>>>
> >>>>>>>>>>>AFAIR I originally reported it in response to
> >>>>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> >>>>>>>>>>>, but I can't find that in the archives, so maybe that was just on IRC.
> >>>>>>>>>>>See
> >>>>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> >>>>>>>>>>>. Basically, I ran into the bug fixed by your patch because the counter
> >>>>>>>>>>>jumped forward on every DPMS off, so it hit the 32-bit boundary after
> >>>>>>>>>>>just a few days.
> >>>>>>>>>>
> >>>>>>>>>>Ok, so just uncovered the overflow bug.
> >>>>>>>>>
> >>>>>>>>>Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
> >>>>>>>>>counter jumping bug (similar to the bug this thread is about), which
> >>>>>>>>>exposed the overflow bug, is still alive and kicking in 4.5. It seems
> >>>>>>>>>to happen when turning off the CRTC:
> >>>>>>>>>
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
> >>>>>>>>
> >>>>>>>>Not sure what bug we're talking about here, but here the hw counter
> >>>>>>>>clearly jumps backwards.
> >>>>>>>>
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
> >>>>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
> >>>>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
> >>>>>>>>
> >>>>>>>>Same here.
> >>>>>>>
> >>>>>>>At least one of the jumps is expected, because this is around turning
> >>>>>>>off the CRTC for DPMS off. Don't know yet why there are two jumps back
> >>>>>>>though.
> >>>>>>>
> >>>>>>>
> >>>>>>>>These things just don't happen on i915 because drm_vblank_off() and
> >>>>>>>>drm_vblank_on() are always called around the times when the hw counter
> >>>>>>>>might get reset. Or at least that's how it should be.
> >>>>>>>
> >>>>>>>Which is of course the idea of Daniel's patch (which is what I'm getting
> >>>>>>>the above with) or Mario's patch as well, but clearly something's still
> >>>>>>>wrong. It's certainly possible that it's something in the driver, but
> >>>>>>>since calling drm_vblank_pre/post_modeset from the same places seems to
> >>>>>>>work fine (ignoring the regression discussed in this thread)... Do
> >>>>>>>drm_vblank_on/off require something else to handle this correctly?
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>I suspect it is because vblank_disable_and_save calls
> >>>>>>drm_update_vblank_count() unconditionally, even if vblank irqs are
> >>>>>>already off.
> >>>>>>
> >>>>>>So on a manual display disable -> reenable you get something like
> >>>>>>
> >>>>>>At disable:
> >>>>>>
> >>>>>>Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
> >>>>>>vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
> >>>>>>final count.
> >>>>>>
> >>>>>>Then the crtc is shut down and its hw counter resets to zero.
> >>>>>>
> >>>>>>At reenable:
> >>>>>>
> >>>>>>Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
> >>>>>>atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
> >>>>>>drm_vblank_off -> vblank_disable_and_save -> A pointless
> >>>>>>drm_update_vblank_count() while the hw counter is already reset to zero
> >>>>>>--> Unwanted counter jump.
> >>>>>>
> >>>>>>
> >>>>>>The problem doesn't happen on a pure modeset to a different video
> >>>>>>resolution/refresh rate, as then we only have one call into
> >>>>>>atombios_crtc_dpms(DPMS_OFF).
> >>>>>>
> >>>>>>I think the fix is to fix vblank_disable_and_save() to only call
> >>>>>>drm_update_vblank_count() if vblank irqs get actually disabled, not on
> >>>>>>no-op calls. I will try that now.
> >>>>>
> >>>>>It does that on purpose. Otherwise the vblank counter would appear to
> >>>>>have stalled while the interrupt was off.
> >>>>>
> >>>>
> >>>>Ok, that's what the comments there say, although i don't see atm. why
> >>>>that perceived stall would be a big problem. I checked all callers of
> >>>>vblank_disable_and_save(). They are all careful to not call that
> >>>>function if vblanks are already disabled. The only exception is
> >>>>drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
> >>>>drivers which have resetting hw counters or other problematic behaviour
> >>>>during modesets etc. then this will break. E.g., calling the vblank
> >>>>timestamping stuff is also not safe/well-defined during modesets when
> >>>>the timestamping constants are not (yet) updated to reflect the new mode
> >>>>timing of the modeset in progress.
> >>>
> >>>The idea is to maintain the appearance that the counter ticks all the
> >>>time as long as the crtc is active. While that may not be really
> >>>required in case if no one is currently interested in the vblank
> >>>counter, I think it's a nice thing to have just to make the behaviour
> >>>of the counter consistent.
> >>>
> >>>As far as calling drm_vblank_off() after the hw counter got reset, well,
> >>>that not correct. It should be called before the reset.
> >>
> >>What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
> >>first call to DMPS_OFF will call drm_vblank_off() and really disable
> >>vblank-irqs if they were running, updating the counts/ts a last time. But
> >>then the dpms off will reset the hw counter to zero. When one reenables the
> >>display, a second call to DPMS_OFF will now call drm_vblank_off again when
> >>it apparently shouldn't.
> >>
> >>I just tested this patch, which fixes the counter jumps on radeon-kms with
> >>my or Daniel's drm_vblank_off patches to radeon:
> >
> >This might be due to the legacy helpers, which just love to redundantly
> >disable stuff that's off already. The problem I see with no-oping these
> >out is that for atomic drivers (which really should get this right) this
> >might paper over bugs: E.g. when you forget to call _off() when disabling
> >the crtc, then calling _on() twice in a row is indeed a serious bug.
> >Similar when you forget to call _on() and have multiple _off() calls in a
> >row.
> >
> >So not sure what to do here.
> >-Daniel
> >
> 
> Yes, the legacy helpers cause two calls to dpms off if one disables a
> display. First during display disable as intended. Then when one reenables
> the display during modesetting as part of crtc_funcs->prepare() - at least
> on radeon.
> 
> Maybe the minimum thing that would help is to just check for
> vblank->inmodeset in drm_vblank_off(). If that would be the case we'd know
> it is a redundant call and could no-op it and do a
> WARN_ON(vblank->inmodeset)?

I have that here locally, blows up all over the place on radoen. And also
would blow up everywhere else.

I was thinking of adding the vblank->inmodeset check to radeon/amdgpu,
with a note why it's needed (legacy crtc helpers just suck).

> drm_vblank_on() i don't know how to treat, but that one calls
> drm_reset_vblank_timestamp() which should be less problematic if called
> redundantly.

I think even legacy crtc helpers don't enable stuff again if it's not been
disabled before. So on drm_vblank_on() we might be able to put a WARN_ON
in place ...

> Now the patch i want to try next to fix the drm_vblank_pre/post_modeset
> regression in Linux 4.4/4.5 is to add a ...
> 
> if ((diff > 1) && vblank->inmodeset) diff = 1;
> 
> ... to the bottom of drm_update_vblank_count(). That should hopefully
> restore the pre/post_modeset behavior as close to the original behavior as
> possible. As a side effect it would also prevent the counter jump caused by
> redundant calls to drm_vblank_off().

Hm, can we just frob pre/post_modeset only with some checks? I'd like to
not put that kind of "I have no idea about my hw state" hacks into the new
helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to
enforce correct usage, which would be a real bummer imo.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 20:32                                             ` Daniel Vetter
@ 2016-01-25 21:42                                               ` Mario Kleiner
  -1 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 21:42 UTC (permalink / raw)
  To: Ville Syrjälä,
	Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König



On 01/25/2016 09:32 PM, Daniel Vetter wrote:
> On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:
>>
>>
>> On 01/25/2016 07:51 PM, Daniel Vetter wrote:
>>> On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
>>>> Readding Daniel, which somehow got dropped from the cc.
>>>>
>>>> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
>>>>> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>>>>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>>>>>
>>>>>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>>>>>
>>>>>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>>>>>
>>>>>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>>>>>>>> See
>>>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>>>>>>>> just a few days.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>>>>>
>>>>>>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>>>>>>>> to happen when turning off the CRTC:
>>>>>>>>>>>
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>>>>>
>>>>>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>>>>>> clearly jumps backwards.
>>>>>>>>>>
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>>>>>>>>>
>>>>>>>>>> Same here.
>>>>>>>>>
>>>>>>>>> At least one of the jumps is expected, because this is around turning
>>>>>>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>>>>>>>> though.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>>>>>>>> drm_vblank_on() are always called around the times when the hw counter
>>>>>>>>>> might get reset. Or at least that's how it should be.
>>>>>>>>>
>>>>>>>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>>>>>>>> the above with) or Mario's patch as well, but clearly something's still
>>>>>>>>> wrong. It's certainly possible that it's something in the driver, but
>>>>>>>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>>>>>>>> work fine (ignoring the regression discussed in this thread)... Do
>>>>>>>>> drm_vblank_on/off require something else to handle this correctly?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I suspect it is because vblank_disable_and_save calls
>>>>>>>> drm_update_vblank_count() unconditionally, even if vblank irqs are
>>>>>>>> already off.
>>>>>>>>
>>>>>>>> So on a manual display disable -> reenable you get something like
>>>>>>>>
>>>>>>>> At disable:
>>>>>>>>
>>>>>>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
>>>>>>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
>>>>>>>> final count.
>>>>>>>>
>>>>>>>> Then the crtc is shut down and its hw counter resets to zero.
>>>>>>>>
>>>>>>>> At reenable:
>>>>>>>>
>>>>>>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
>>>>>>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
>>>>>>>> drm_vblank_off -> vblank_disable_and_save -> A pointless
>>>>>>>> drm_update_vblank_count() while the hw counter is already reset to zero
>>>>>>>> --> Unwanted counter jump.
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem doesn't happen on a pure modeset to a different video
>>>>>>>> resolution/refresh rate, as then we only have one call into
>>>>>>>> atombios_crtc_dpms(DPMS_OFF).
>>>>>>>>
>>>>>>>> I think the fix is to fix vblank_disable_and_save() to only call
>>>>>>>> drm_update_vblank_count() if vblank irqs get actually disabled, not on
>>>>>>>> no-op calls. I will try that now.
>>>>>>>
>>>>>>> It does that on purpose. Otherwise the vblank counter would appear to
>>>>>>> have stalled while the interrupt was off.
>>>>>>>
>>>>>>
>>>>>> Ok, that's what the comments there say, although i don't see atm. why
>>>>>> that perceived stall would be a big problem. I checked all callers of
>>>>>> vblank_disable_and_save(). They are all careful to not call that
>>>>>> function if vblanks are already disabled. The only exception is
>>>>>> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
>>>>>> drivers which have resetting hw counters or other problematic behaviour
>>>>>> during modesets etc. then this will break. E.g., calling the vblank
>>>>>> timestamping stuff is also not safe/well-defined during modesets when
>>>>>> the timestamping constants are not (yet) updated to reflect the new mode
>>>>>> timing of the modeset in progress.
>>>>>
>>>>> The idea is to maintain the appearance that the counter ticks all the
>>>>> time as long as the crtc is active. While that may not be really
>>>>> required in case if no one is currently interested in the vblank
>>>>> counter, I think it's a nice thing to have just to make the behaviour
>>>>> of the counter consistent.
>>>>>
>>>>> As far as calling drm_vblank_off() after the hw counter got reset, well,
>>>>> that not correct. It should be called before the reset.
>>>>
>>>> What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
>>>> first call to DMPS_OFF will call drm_vblank_off() and really disable
>>>> vblank-irqs if they were running, updating the counts/ts a last time. But
>>>> then the dpms off will reset the hw counter to zero. When one reenables the
>>>> display, a second call to DPMS_OFF will now call drm_vblank_off again when
>>>> it apparently shouldn't.
>>>>
>>>> I just tested this patch, which fixes the counter jumps on radeon-kms with
>>>> my or Daniel's drm_vblank_off patches to radeon:
>>>
>>> This might be due to the legacy helpers, which just love to redundantly
>>> disable stuff that's off already. The problem I see with no-oping these
>>> out is that for atomic drivers (which really should get this right) this
>>> might paper over bugs: E.g. when you forget to call _off() when disabling
>>> the crtc, then calling _on() twice in a row is indeed a serious bug.
>>> Similar when you forget to call _on() and have multiple _off() calls in a
>>> row.
>>>
>>> So not sure what to do here.
>>> -Daniel
>>>
>>
>> Yes, the legacy helpers cause two calls to dpms off if one disables a
>> display. First during display disable as intended. Then when one reenables
>> the display during modesetting as part of crtc_funcs->prepare() - at least
>> on radeon.
>>
>> Maybe the minimum thing that would help is to just check for
>> vblank->inmodeset in drm_vblank_off(). If that would be the case we'd know
>> it is a redundant call and could no-op it and do a
>> WARN_ON(vblank->inmodeset)?
>
> I have that here locally, blows up all over the place on radoen. And also
> would blow up everywhere else.
>

You mean the WARN_ON causes ugliness? The no-op on redundant calls to 
drm_vblank_off would hopefully not blow up anything but prevent blow ups?

The problem is that there should not ever be a call to the 
drm_update_vblank_count() function once a crtc is in modeset/dpms off 
etc., not only because of the hw vblank counters being reset to zero, 
but also because vblank timestamps computed may be wrong, going 
backwards etc. That could again confuse clients.

> I was thinking of adding the vblank->inmodeset check to radeon/amdgpu,
> with a note why it's needed (legacy crtc helpers just suck).

Maybe you could do that check in radeon/amdgpu, but still also leave it 
in drm_vblank_off()? If all kms drivers properly avoid redundant calls 
as part of legacy paths then the WARN_ON and no-op in drm_vblank_off 
should not ever trigger, unless there is a real bug, right? In which 
case it should hopefully prevent worse things like a hanging composited 
desktop, or login, and instead just make noise in the kernel log?

>
>> drm_vblank_on() i don't know how to treat, but that one calls
>> drm_reset_vblank_timestamp() which should be less problematic if called
>> redundantly.
>
> I think even legacy crtc helpers don't enable stuff again if it's not been
> disabled before. So on drm_vblank_on() we might be able to put a WARN_ON
> in place ...

Hm, logging thist stuff here during modesets and display dis/enable i 
see lots of drm_vblank_on that come in "pairs" only about two dozen 
msecs apart.

>
>> Now the patch i want to try next to fix the drm_vblank_pre/post_modeset
>> regression in Linux 4.4/4.5 is to add a ...
>>
>> if ((diff > 1) && vblank->inmodeset) diff = 1;
>>
>> ... to the bottom of drm_update_vblank_count(). That should hopefully
>> restore the pre/post_modeset behavior as close to the original behavior as
>> possible. As a side effect it would also prevent the counter jump caused by
>> redundant calls to drm_vblank_off().
>
> Hm, can we just frob pre/post_modeset only with some checks? I'd like to
> not put that kind of "I have no idea about my hw state" hacks into the new
> helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to
> enforce correct usage, which would be a real bummer imo.
> -Daniel
>

We could check for only (vblank->inmodeset & 0x2) to only apply it to 
the legacy pre/post path, trusting that the drm_vblank_off/on path will 
be made robust in a different way, e.g., by the stuff discussed above 
and careful implementation in each kms driver that uses those. Atm. 
radeon doesn't use off/on, so your enablement patch set can make sure it 
does the right thing from the beginning.

rockchip-kms may need similar treatment to radeon to avoid redundant calls.

Btw. how the patch to drm_update_vblank_count() close to the bottom 
would actually look is more like:

if ((diff > 1) &&
     ((vblank->inmodeset & 0x2) || (flags & DRM_CALLED_FROM_VBLIRQ)))
          diff = 1;

Another problem we have is that the implementation of the vblank 
timestamp doublebuffer with our custom sequence locking is only save 
against concurrent readers as long as the increment for store_vblank() 
is +1, ie. diff = 1 in that code.

Other diff values like 2 would cause us to write to the other timestamp 
slot that is potentially read by some concurrent readers at the same 
time and they would get corrupted timestamps.

Now outside vblank irq drm_update_vblank_count() is only supposed to be 
called while there aren't any concurrent readers ie., vblank->refcount 
== 0, and the way drm_vblank_on/off are implemented this seems to be the 
case, so anything goes.

if called from drm_handle_vblank() aka (flags & DRM_CALLED_FROM_VBLIRQ), 
we have to assume that there are concurrent readers, because vblank irqs 
are usually only kept active if vblank->refcount is > 0. This means only 
+1 increments are allowed. In the past this was always the case, but 
with the new implementation since Linux 4.4, it could happen that we get 
diff > 1 if the vblank irq gets deferred by more than 1 video refresh 
cycle, e.g., due to long irq off periods, maybe preemption on a 
RT_PREEMPT kernel, long held locks, firmware triggered SMI's etc. That 
would cause bad corruption of timestamps.

So unless or until we also rewrite the timestamp caching, we need that 
extra protection against diff > 1 in vblank irq.

-mario

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25 21:42                                               ` Mario Kleiner
  0 siblings, 0 replies; 59+ messages in thread
From: Mario Kleiner @ 2016-01-25 21:42 UTC (permalink / raw)
  To: Ville Syrjälä,
	Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König



On 01/25/2016 09:32 PM, Daniel Vetter wrote:
> On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:
>>
>>
>> On 01/25/2016 07:51 PM, Daniel Vetter wrote:
>>> On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
>>>> Readding Daniel, which somehow got dropped from the cc.
>>>>
>>>> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
>>>>> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>>>>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>>>>>
>>>>>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>>>>>
>>>>>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>>>>>
>>>>>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>>>>>>>> See
>>>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>>>>>>>> just a few days.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>>>>>
>>>>>>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>>>>>>>> to happen when turning off the CRTC:
>>>>>>>>>>>
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>>>>>
>>>>>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>>>>>> clearly jumps backwards.
>>>>>>>>>>
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1
>>>>>>>>>>
>>>>>>>>>> Same here.
>>>>>>>>>
>>>>>>>>> At least one of the jumps is expected, because this is around turning
>>>>>>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>>>>>>>> though.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>>>>>>>> drm_vblank_on() are always called around the times when the hw counter
>>>>>>>>>> might get reset. Or at least that's how it should be.
>>>>>>>>>
>>>>>>>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>>>>>>>> the above with) or Mario's patch as well, but clearly something's still
>>>>>>>>> wrong. It's certainly possible that it's something in the driver, but
>>>>>>>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>>>>>>>> work fine (ignoring the regression discussed in this thread)... Do
>>>>>>>>> drm_vblank_on/off require something else to handle this correctly?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I suspect it is because vblank_disable_and_save calls
>>>>>>>> drm_update_vblank_count() unconditionally, even if vblank irqs are
>>>>>>>> already off.
>>>>>>>>
>>>>>>>> So on a manual display disable -> reenable you get something like
>>>>>>>>
>>>>>>>> At disable:
>>>>>>>>
>>>>>>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
>>>>>>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
>>>>>>>> final count.
>>>>>>>>
>>>>>>>> Then the crtc is shut down and its hw counter resets to zero.
>>>>>>>>
>>>>>>>> At reenable:
>>>>>>>>
>>>>>>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
>>>>>>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
>>>>>>>> drm_vblank_off -> vblank_disable_and_save -> A pointless
>>>>>>>> drm_update_vblank_count() while the hw counter is already reset to zero
>>>>>>>> --> Unwanted counter jump.
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem doesn't happen on a pure modeset to a different video
>>>>>>>> resolution/refresh rate, as then we only have one call into
>>>>>>>> atombios_crtc_dpms(DPMS_OFF).
>>>>>>>>
>>>>>>>> I think the fix is to fix vblank_disable_and_save() to only call
>>>>>>>> drm_update_vblank_count() if vblank irqs get actually disabled, not on
>>>>>>>> no-op calls. I will try that now.
>>>>>>>
>>>>>>> It does that on purpose. Otherwise the vblank counter would appear to
>>>>>>> have stalled while the interrupt was off.
>>>>>>>
>>>>>>
>>>>>> Ok, that's what the comments there say, although i don't see atm. why
>>>>>> that perceived stall would be a big problem. I checked all callers of
>>>>>> vblank_disable_and_save(). They are all careful to not call that
>>>>>> function if vblanks are already disabled. The only exception is
>>>>>> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms
>>>>>> drivers which have resetting hw counters or other problematic behaviour
>>>>>> during modesets etc. then this will break. E.g., calling the vblank
>>>>>> timestamping stuff is also not safe/well-defined during modesets when
>>>>>> the timestamping constants are not (yet) updated to reflect the new mode
>>>>>> timing of the modeset in progress.
>>>>>
>>>>> The idea is to maintain the appearance that the counter ticks all the
>>>>> time as long as the crtc is active. While that may not be really
>>>>> required in case if no one is currently interested in the vblank
>>>>> counter, I think it's a nice thing to have just to make the behaviour
>>>>> of the counter consistent.
>>>>>
>>>>> As far as calling drm_vblank_off() after the hw counter got reset, well,
>>>>> that not correct. It should be called before the reset.
>>>>
>>>> What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The
>>>> first call to DMPS_OFF will call drm_vblank_off() and really disable
>>>> vblank-irqs if they were running, updating the counts/ts a last time. But
>>>> then the dpms off will reset the hw counter to zero. When one reenables the
>>>> display, a second call to DPMS_OFF will now call drm_vblank_off again when
>>>> it apparently shouldn't.
>>>>
>>>> I just tested this patch, which fixes the counter jumps on radeon-kms with
>>>> my or Daniel's drm_vblank_off patches to radeon:
>>>
>>> This might be due to the legacy helpers, which just love to redundantly
>>> disable stuff that's off already. The problem I see with no-oping these
>>> out is that for atomic drivers (which really should get this right) this
>>> might paper over bugs: E.g. when you forget to call _off() when disabling
>>> the crtc, then calling _on() twice in a row is indeed a serious bug.
>>> Similar when you forget to call _on() and have multiple _off() calls in a
>>> row.
>>>
>>> So not sure what to do here.
>>> -Daniel
>>>
>>
>> Yes, the legacy helpers cause two calls to dpms off if one disables a
>> display. First during display disable as intended. Then when one reenables
>> the display during modesetting as part of crtc_funcs->prepare() - at least
>> on radeon.
>>
>> Maybe the minimum thing that would help is to just check for
>> vblank->inmodeset in drm_vblank_off(). If that would be the case we'd know
>> it is a redundant call and could no-op it and do a
>> WARN_ON(vblank->inmodeset)?
>
> I have that here locally, blows up all over the place on radoen. And also
> would blow up everywhere else.
>

You mean the WARN_ON causes ugliness? The no-op on redundant calls to 
drm_vblank_off would hopefully not blow up anything but prevent blow ups?

The problem is that there should not ever be a call to the 
drm_update_vblank_count() function once a crtc is in modeset/dpms off 
etc., not only because of the hw vblank counters being reset to zero, 
but also because vblank timestamps computed may be wrong, going 
backwards etc. That could again confuse clients.

> I was thinking of adding the vblank->inmodeset check to radeon/amdgpu,
> with a note why it's needed (legacy crtc helpers just suck).

Maybe you could do that check in radeon/amdgpu, but still also leave it 
in drm_vblank_off()? If all kms drivers properly avoid redundant calls 
as part of legacy paths then the WARN_ON and no-op in drm_vblank_off 
should not ever trigger, unless there is a real bug, right? In which 
case it should hopefully prevent worse things like a hanging composited 
desktop, or login, and instead just make noise in the kernel log?

>
>> drm_vblank_on() i don't know how to treat, but that one calls
>> drm_reset_vblank_timestamp() which should be less problematic if called
>> redundantly.
>
> I think even legacy crtc helpers don't enable stuff again if it's not been
> disabled before. So on drm_vblank_on() we might be able to put a WARN_ON
> in place ...

Hm, logging thist stuff here during modesets and display dis/enable i 
see lots of drm_vblank_on that come in "pairs" only about two dozen 
msecs apart.

>
>> Now the patch i want to try next to fix the drm_vblank_pre/post_modeset
>> regression in Linux 4.4/4.5 is to add a ...
>>
>> if ((diff > 1) && vblank->inmodeset) diff = 1;
>>
>> ... to the bottom of drm_update_vblank_count(). That should hopefully
>> restore the pre/post_modeset behavior as close to the original behavior as
>> possible. As a side effect it would also prevent the counter jump caused by
>> redundant calls to drm_vblank_off().
>
> Hm, can we just frob pre/post_modeset only with some checks? I'd like to
> not put that kind of "I have no idea about my hw state" hacks into the new
> helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to
> enforce correct usage, which would be a real bummer imo.
> -Daniel
>

We could check for only (vblank->inmodeset & 0x2) to only apply it to 
the legacy pre/post path, trusting that the drm_vblank_off/on path will 
be made robust in a different way, e.g., by the stuff discussed above 
and careful implementation in each kms driver that uses those. Atm. 
radeon doesn't use off/on, so your enablement patch set can make sure it 
does the right thing from the beginning.

rockchip-kms may need similar treatment to radeon to avoid redundant calls.

Btw. how the patch to drm_update_vblank_count() close to the bottom 
would actually look is more like:

if ((diff > 1) &&
     ((vblank->inmodeset & 0x2) || (flags & DRM_CALLED_FROM_VBLIRQ)))
          diff = 1;

Another problem we have is that the implementation of the vblank 
timestamp doublebuffer with our custom sequence locking is only save 
against concurrent readers as long as the increment for store_vblank() 
is +1, ie. diff = 1 in that code.

Other diff values like 2 would cause us to write to the other timestamp 
slot that is potentially read by some concurrent readers at the same 
time and they would get corrupted timestamps.

Now outside vblank irq drm_update_vblank_count() is only supposed to be 
called while there aren't any concurrent readers ie., vblank->refcount 
== 0, and the way drm_vblank_on/off are implemented this seems to be the 
case, so anything goes.

if called from drm_handle_vblank() aka (flags & DRM_CALLED_FROM_VBLIRQ), 
we have to assume that there are concurrent readers, because vblank irqs 
are usually only kept active if vblank->refcount is > 0. This means only 
+1 increments are allowed. In the past this was always the case, but 
with the new implementation since Linux 4.4, it could happen that we get 
diff > 1 if the vblank irq gets deferred by more than 1 video refresh 
cycle, e.g., due to long irq off periods, maybe preemption on a 
RT_PREEMPT kernel, long held locks, firmware triggered SMI's etc. That 
would cause bad corruption of timestamps.

So unless or until we also rewrite the timestamp caching, we need that 
extra protection against diff > 1 in vblank irq.

-mario
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
  2016-01-25 21:42                                               ` Mario Kleiner
@ 2016-01-25 22:05                                                 ` Daniel Vetter
  -1 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-25 22:05 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Ville Syrjälä,
	Michel Dänzer, Alex Deucher, Vlastimil Babka, LKML,
	dri-devel, Christian König

On Mon, Jan 25, 2016 at 10:42 PM, Mario Kleiner
<mario.kleiner.de@gmail.com> wrote:
>>
>>> Now the patch i want to try next to fix the drm_vblank_pre/post_modeset
>>> regression in Linux 4.4/4.5 is to add a ...
>>>
>>> if ((diff > 1) && vblank->inmodeset) diff = 1;
>>>
>>> ... to the bottom of drm_update_vblank_count(). That should hopefully
>>> restore the pre/post_modeset behavior as close to the original behavior
>>> as
>>> possible. As a side effect it would also prevent the counter jump caused
>>> by
>>> redundant calls to drm_vblank_off().
>>
>>
>> Hm, can we just frob pre/post_modeset only with some checks? I'd like to
>> not put that kind of "I have no idea about my hw state" hacks into the new
>> helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to
>> enforce correct usage, which would be a real bummer imo.
>> -Daniel
>>
>
> We could check for only (vblank->inmodeset & 0x2) to only apply it to the
> legacy pre/post path, trusting that the drm_vblank_off/on path will be made
> robust in a different way, e.g., by the stuff discussed above and careful
> implementation in each kms driver that uses those. Atm. radeon doesn't use
> off/on, so your enablement patch set can make sure it does the right thing
> from the beginning.
>
> rockchip-kms may need similar treatment to radeon to avoid redundant calls.
>
> Btw. how the patch to drm_update_vblank_count() close to the bottom would
> actually look is more like:
>
> if ((diff > 1) &&
>     ((vblank->inmodeset & 0x2) || (flags & DRM_CALLED_FROM_VBLIRQ)))
>          diff = 1;

Yeah I think that should work as a short-term fix for radoen. When you
do that, can you pls do a second patch to give the magic 0x2 and 0x1
some meaning? Otherwise this is super-hard to understand code ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
@ 2016-01-25 22:05                                                 ` Daniel Vetter
  0 siblings, 0 replies; 59+ messages in thread
From: Daniel Vetter @ 2016-01-25 22:05 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Michel Dänzer, LKML, dri-devel, Alex Deucher,
	Christian König, Vlastimil Babka

On Mon, Jan 25, 2016 at 10:42 PM, Mario Kleiner
<mario.kleiner.de@gmail.com> wrote:
>>
>>> Now the patch i want to try next to fix the drm_vblank_pre/post_modeset
>>> regression in Linux 4.4/4.5 is to add a ...
>>>
>>> if ((diff > 1) && vblank->inmodeset) diff = 1;
>>>
>>> ... to the bottom of drm_update_vblank_count(). That should hopefully
>>> restore the pre/post_modeset behavior as close to the original behavior
>>> as
>>> possible. As a side effect it would also prevent the counter jump caused
>>> by
>>> redundant calls to drm_vblank_off().
>>
>>
>> Hm, can we just frob pre/post_modeset only with some checks? I'd like to
>> not put that kind of "I have no idea about my hw state" hacks into the new
>> helpers. Otherwise not even atomic drivers can start to gain WARN_ONs to
>> enforce correct usage, which would be a real bummer imo.
>> -Daniel
>>
>
> We could check for only (vblank->inmodeset & 0x2) to only apply it to the
> legacy pre/post path, trusting that the drm_vblank_off/on path will be made
> robust in a different way, e.g., by the stuff discussed above and careful
> implementation in each kms driver that uses those. Atm. radeon doesn't use
> off/on, so your enablement patch set can make sure it does the right thing
> from the beginning.
>
> rockchip-kms may need similar treatment to radeon to avoid redundant calls.
>
> Btw. how the patch to drm_update_vblank_count() close to the bottom would
> actually look is more like:
>
> if ((diff > 1) &&
>     ((vblank->inmodeset & 0x2) || (flags & DRM_CALLED_FROM_VBLIRQ)))
>          diff = 1;

Yeah I think that should work as a short-term fix for radoen. When you
do that, can you pls do a second patch to give the magic 0x2 and 0x1
some meaning? Otherwise this is super-hard to understand code ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2016-01-25 22:05 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-15 10:34 linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon Vlastimil Babka
2016-01-15 12:26 ` Ville Syrjälä
2016-01-15 12:40   ` Vlastimil Babka
2016-01-16  4:24   ` Mario Kleiner
2016-01-16  4:24     ` Mario Kleiner
2016-01-18 10:49     ` Vlastimil Babka
2016-01-18 14:06       ` Vlastimil Babka
2016-01-18 14:14         ` Christian König
2016-01-18 14:14           ` Christian König
2016-01-20 20:25       ` Vlastimil Babka
2016-01-20 20:25         ` Vlastimil Babka
2016-01-20 20:32       ` Mario Kleiner
2016-01-20 20:32         ` Mario Kleiner
2016-01-21  3:43         ` Michel Dänzer
2016-01-21  3:43           ` Michel Dänzer
2016-01-21  5:31           ` Mario Kleiner
2016-01-21  5:31             ` Mario Kleiner
2016-01-21  6:38             ` Michel Dänzer
2016-01-21  6:38               ` Michel Dänzer
2016-01-21  6:41               ` Michel Dänzer
2016-01-21  6:41                 ` Michel Dänzer
2016-01-21  7:58                 ` Daniel Vetter
2016-01-21  7:58                   ` Daniel Vetter
2016-01-21  8:36                   ` Michel Dänzer
2016-01-21  8:36                     ` Michel Dänzer
2016-01-21 10:09                     ` Daniel Vetter
2016-01-21 10:09                       ` Daniel Vetter
2016-01-22  3:06                       ` Michel Dänzer
2016-01-22  3:06                         ` Michel Dänzer
2016-01-22 15:18                         ` Ville Syrjälä
2016-01-22 15:18                           ` Ville Syrjälä
2016-01-22 18:29                           ` Mario Kleiner
2016-01-22 18:29                             ` Mario Kleiner
2016-01-23 18:23                             ` Mario Kleiner
2016-01-23 18:23                               ` Mario Kleiner
2016-01-25  4:15                           ` Michel Dänzer
2016-01-25  4:15                             ` Michel Dänzer
2016-01-25 13:16                             ` Mario Kleiner
2016-01-25 13:16                               ` Mario Kleiner
2016-01-25 13:23                               ` Ville Syrjälä
2016-01-25 13:44                                 ` Mario Kleiner
2016-01-25 13:44                                   ` Mario Kleiner
2016-01-25 14:53                                   ` Ville Syrjälä
2016-01-25 14:53                                     ` Ville Syrjälä
2016-01-25 16:38                                     ` Mario Kleiner
2016-01-25 18:51                                       ` Daniel Vetter
2016-01-25 18:51                                         ` Daniel Vetter
2016-01-25 19:30                                         ` Mario Kleiner
2016-01-25 19:30                                           ` Mario Kleiner
2016-01-25 20:32                                           ` Daniel Vetter
2016-01-25 20:32                                             ` Daniel Vetter
2016-01-25 21:42                                             ` Mario Kleiner
2016-01-25 21:42                                               ` Mario Kleiner
2016-01-25 22:05                                               ` Daniel Vetter
2016-01-25 22:05                                                 ` Daniel Vetter
2016-01-21  8:28               ` Mario Kleiner
2016-01-21  8:28                 ` Mario Kleiner
2016-01-21  9:15                 ` Vlastimil Babka
2016-01-21  9:15                   ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.