On 2020-03-13 1:35 p.m., Kazlauskas, Nicholas wrote: > On 2020-03-12 10:32 a.m., Alex Deucher wrote: >> On Thu, Mar 5, 2020 at 4:21 PM Mario Kleiner >> wrote: >>> >>> Commit '16f17eda8bad ("drm/amd/display: Send vblank and user >>> events at vsartup for DCN")' introduces a new way of pageflip >>> completion handling for DCN, and some trouble. >>> >>> The current implementation introduces a race condition, which >>> can cause pageflip completion events to be sent out one vblank >>> too early, thereby confusing userspace and causing flicker: >>> >>> prepare_flip_isr(): >>> >>> 1. Pageflip programming takes the ddev->event_lock. >>> 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED >>> 3. Releases ddev->event_lock. >>> >>> --> Deadline for surface address regs double-buffering passes on >>>      target pipe. >>> >>> 4. dc_commit_updates_for_stream() MMIO programs the new pageflip >>>     into hw, but too late for current vblank. >>> >>> => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete >>>     in current vblank due to missing the double-buffering deadline >>>     by a tiny bit. >>> >>> 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires, >>>     dm_dcn_crtc_high_irq() gets called. >>> >>> 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the >>>     pageflip has been completed/will complete in this vblank and >>>     sends out pageflip completion event to userspace and resets >>>     pflip_status = AMDGPU_FLIP_NONE. >>> >>> => Flip completion event sent out one vblank too early. >>> >>> This behaviour has been observed during my testing with measurement >>> hardware a couple of time. >>> >>> The commit message says that the extra flip event code was added to >>> dm_dcn_crtc_high_irq() to prevent missing to send out pageflip events >>> in case the pflip irq doesn't fire, because the "DCH HUBP" component >>> is clock gated and doesn't fire pflip irqs in that state. Also that >>> this clock gating may happen if no planes are active. According to >>> Nicholas, the clock gating can also happen if psr is active, and the >>> gating is controlled independently by the hardware, so difficult to >>> detect if and when the completion code in above commit is needed. >>> >>> This patch tries the following solution: It only executes the extra >>> pflip >>> completion code in dm_dcn_crtc_high_irq() iff the hardware reports >>> that there aren't any surface updated pending in the double-buffered >>> surface scanout address registers. Otherwise it leaves pflip completion >>> to the pflip irq handler, for a more race-free experience. >>> >>> This would only guard against the order of events mentioned above. >>> If Step 5 (VSTARTUP trigger) happens before step 4 then this won't help >>> at all, because 1-3 + 5 might happen even without the hw being >>> programmed >>> at all, ie. no surface update pending because none yet programmed >>> into hw. >>> >>> Therefore this patch also changes locking in amdgpu_dm_commit_planes(), >>> so that prepare_flip_isr() and dc_commit_updates_for_stream() are done >>> under event_lock protection within the same critical section. >>> >>> v2: Take Nicholas comments into account, try a different solution. >>> >>> Lightly tested on Polaris (locking) and Raven (the whole DCN stuff). >>> Seems to work without causing obvious new trouble. >> >> Nick, any comments on this?  Can we get this committed or do you think >> it needs additional rework? >> >> Thanks, >> >> Alex > > Hi Alex, Mario, > > This might be a little strange, but if we want to get this in as a fix > for regressions caused by the original vblank and user events at > vstartup patch then I'm actually going to give my reviewed by on the > *v1* of this patch (but not this v2): > > Reviewed-by: Nicholas Kazlauskas > > You can feel free to apply that one. > > Reason 1: After having thought about it some more I don't think we > enable anything today that has hubp powered down at the same time we > expect to be waiting for a flip - eg. DMCU powering down HUBP during PSR > entry. Static screen interrupt should happen after that flip finishes I > think. > > The CRTC can still be powered on with zero planes, and I don't think any > userspace explicitly asks for vblank events in this case but it doesn't > hurt to have the check. > > Reason 2: This new patch will need much more thorough testing from side > to fully understand the consequences of locking the entire DC commit > sequence. For just a page flip that sounds fine, but for anything more > than (eg. full updates, modesets, etc) I don't think we want to be > disabling interrupts for potentially many milliseconds. Ah! I was wondering where the attached splat comes from, but I think this explains it: With this patch amdgpu_dm_commit_planes keeps the pcrtc->dev->event_lock spinlock locked while calling dc_commit_updates_for_stream, which ends up calling smu_set_display_count, which tries to lock a mutex. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer