From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757745AbcAYQik (ORCPT ); Mon, 25 Jan 2016 11:38:40 -0500 Received: from mail-wm0-f45.google.com ([74.125.82.45]:35477 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757716AbcAYQif (ORCPT ); Mon, 25 Jan 2016 11:38:35 -0500 Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon To: =?UTF-8?B?VmlsbGUgU3lyasOkbMOk?= References: <56A07D97.6030606@daenzer.net> <20160121075849.GH19130@phenom.ffwll.local> <56A0989E.30006@daenzer.net> <20160121100905.GL19130@phenom.ffwll.local> <56A19C98.8020208@daenzer.net> <20160122151835.GM23290@intel.com> <56A5A171.7000205@daenzer.net> <56A6203D.3030803@gmail.com> <20160125132310.GS23290@intel.com> <56A626D5.2040808@gmail.com> <20160125145309.GT23290@intel.com> Cc: =?UTF-8?Q?Michel_D=c3=a4nzer?= , Alex Deucher , Vlastimil Babka , LKML , dri-devel@lists.freedesktop.org, =?UTF-8?Q?Christian_K=c3=b6nig?= , Daniel Vetter From: Mario Kleiner Message-ID: <56A64F86.9080605@gmail.com> Date: Mon, 25 Jan 2016 17:38:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <20160125145309.GT23290@intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrjälä wrote: > On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: >> >> >> On 01/25/2016 02:23 PM, Ville Syrjälä wrote: >>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: >>>> >>>> >>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote: >>>>> On 23.01.2016 00:18, Ville Syrjälä wrote: >>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: >>>>>>> >>>>>>> [ Trimming KDE folks from Cc ] >>>>>>> >>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote: >>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: >>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote: >>>>>>>>>> >>>>>>>>>> Can you please point me at the vblank on/off jump bug please? >>>>>>>>> >>>>>>>>> AFAIR I originally reported it in response to >>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html >>>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC. >>>>>>>>> See >>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html >>>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter >>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after >>>>>>>>> just a few days. >>>>>>>> >>>>>>>> Ok, so just uncovered the overflow bug. >>>>>>> >>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off >>>>>>> counter jumping bug (similar to the bug this thread is about), which >>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems >>>>>>> to happen when turning off the CRTC: >>>>>>> >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 >>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] >>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 >>>>>> >>>>>> Not sure what bug we're talking about here, but here the hw counter >>>>>> clearly jumps backwards. >>>>>> >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 >>>>>> >>>>>> Same here. >>>>> >>>>> At least one of the jumps is expected, because this is around turning >>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back >>>>> though. >>>>> >>>>> >>>>>> These things just don't happen on i915 because drm_vblank_off() and >>>>>> drm_vblank_on() are always called around the times when the hw counter >>>>>> might get reset. Or at least that's how it should be. >>>>> >>>>> Which is of course the idea of Daniel's patch (which is what I'm getting >>>>> the above with) or Mario's patch as well, but clearly something's still >>>>> wrong. It's certainly possible that it's something in the driver, but >>>>> since calling drm_vblank_pre/post_modeset from the same places seems to >>>>> work fine (ignoring the regression discussed in this thread)... Do >>>>> drm_vblank_on/off require something else to handle this correctly? >>>>> >>>>> >>>> >>>> I suspect it is because vblank_disable_and_save calls >>>> drm_update_vblank_count() unconditionally, even if vblank irqs are >>>> already off. >>>> >>>> So on a manual display disable -> reenable you get something like >>>> >>>> At disable: >>>> >>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> >>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes >>>> final count. >>>> >>>> Then the crtc is shut down and its hw counter resets to zero. >>>> >>>> At reenable: >>>> >>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> >>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> >>>> drm_vblank_off -> vblank_disable_and_save -> A pointless >>>> drm_update_vblank_count() while the hw counter is already reset to zero >>>> --> Unwanted counter jump. >>>> >>>> >>>> The problem doesn't happen on a pure modeset to a different video >>>> resolution/refresh rate, as then we only have one call into >>>> atombios_crtc_dpms(DPMS_OFF). >>>> >>>> I think the fix is to fix vblank_disable_and_save() to only call >>>> drm_update_vblank_count() if vblank irqs get actually disabled, not on >>>> no-op calls. I will try that now. >>> >>> It does that on purpose. Otherwise the vblank counter would appear to >>> have stalled while the interrupt was off. >>> >> >> Ok, that's what the comments there say, although i don't see atm. why >> that perceived stall would be a big problem. I checked all callers of >> vblank_disable_and_save(). They are all careful to not call that >> function if vblanks are already disabled. The only exception is >> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms >> drivers which have resetting hw counters or other problematic behaviour >> during modesets etc. then this will break. E.g., calling the vblank >> timestamping stuff is also not safe/well-defined during modesets when >> the timestamping constants are not (yet) updated to reflect the new mode >> timing of the modeset in progress. > > The idea is to maintain the appearance that the counter ticks all the > time as long as the crtc is active. While that may not be really > required in case if no one is currently interested in the vblank > counter, I think it's a nice thing to have just to make the behaviour > of the counter consistent. > > As far as calling drm_vblank_off() after the hw counter got reset, well, > that not correct. It should be called before the reset. What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The first call to DMPS_OFF will call drm_vblank_off() and really disable vblank-irqs if they were running, updating the counts/ts a last time. But then the dpms off will reset the hw counter to zero. When one reenables the display, a second call to DPMS_OFF will now call drm_vblank_off again when it apparently shouldn't. I just tested this patch, which fixes the counter jumps on radeon-kms with my or Daniel's drm_vblank_off patches to radeon: diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 607f493..d739d93 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe) spin_lock_irqsave(&dev->event_lock, irqflags); spin_lock(&dev->vbl_lock); - vblank_disable_and_save(dev, pipe); + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, vblank->enabled); + + if (vblank->enabled) + vblank_disable_and_save(dev, pipe); wake_up(&vblank->queue); /* @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe) return; spin_lock_irqsave(&dev->vbl_lock, irqflags); + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, vblank->enabled); + /* Drop our private "prevent drm_vblank_get" refcount */ if (vblank->inmodeset) { atomic_dec(&vblank->refcount); Another, maybe better, approach might be to no-op redundant calls to drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to drm_vblank_on() iff !vblank->inmodeset. -mario > >> >> -mario >> >> >>>> >>>> Otherwise kms drivers would have to be careful to never call >>>> drm_vblank_off multiple times before calling drm_vblank_on, but the help >>>> text to drm_vblank_on() claims that unbalanced calls to these functions >>>> are perfectly fine. >>>> >>>> -mario >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >