On Sun, Jun 14, 2020 at 06:16:56PM +0200, Stefan Wahren wrote: > Am 11.06.20 um 15:34 schrieb Maxime Ripard: > > Hi Stefan, > > > > On Sat, Jun 06, 2020 at 10:06:12AM +0200, Stefan Wahren wrote: > >> Hi Maxime, > >> > >> Am 05.06.20 um 16:35 schrieb Maxime Ripard: > >>> Hi Stefan, > >>> > >>> On Wed, Jun 03, 2020 at 07:32:30PM +0200, Stefan Wahren wrote: > >>>> Am 02.06.20 um 17:54 schrieb Maxime Ripard: > >>>> FWIW this is the first patch which breaks X on my Raspberry Pi 3 B. > >>>> > >>>> Here are the bisect results: > >>>> > >>>> 587d6e4a529a8d807a5c0bae583dd432d77064d6 bad (black screen, no heartbeat) > >>>> > >>>> b0523c7b1c9d0edcd6c0fe6d2cb558a9ad5c60a8 good > >>>> > >>>> 2c6a651cac6359cb0244a40d3b7a14e72918f169 good > >>>> > >>>> 1705c3cb40906863ec0d24ee5ea5092f5ee2e994 bad (black screen, but heartbeat) > >>>> > >>>> 601527fea6bb226abd088a864e74b25368218e87 good > >>>> > >>>> 2165607ede34d229d0cbce916c70c7fb6c0337be good > >>>> > >>>> f094f388fc2df848227e2ae648df2c97872df42b good > >>>> > >>>> 020de18840a1075b2671736c6cc2e451030fad74 bad (black screen, but heartbeat) > >>>> > >>>> 4c4da3823e4d1a8189e96a59a79451fff372f70b good > >>>> > >>>> 020de18840a1075b2671736c6cc2e451030fad74 is the first bad commit > >>>> commit 020de18840a1075b2671736c6cc2e451030fad74 > >>>> Author: Maxime Ripard > >>>> Date:   Mon Jan 6 17:17:29 2020 +0100 > >>>> > >>>>     drm/vc4: hdmi: rework connectors and encoders > >>>>     > >>>>     the vc4_hdmi driver has some custom structures to hold the data it > >>>> needs to > >>>>     associate with the drm_encoder and drm_connector structures. > >>>>     > >>>>     However, it allocates them separately from the vc4_hdmi structure which > >>>>     makes it more complicated than it needs to be. > >>>>     > >>>>     Move those structures to be contained by vc4_hdmi and update the code > >>>>     accordingly. > >>>>     > >>>>     Signed-off-by: Maxime Ripard > >>> So it looks like there was two issues on the Pi3. The first one was > >>> causing the timeouts (and therefore likely the black screen but > >>> heartbeat case you had) and I've fixed it. > >>> > >>> However, I can indeed reproduce the case with the black screen / no > >>> heartbeat you mentionned. My bisection however returns that it's the > >>> patch "drm/vc4: hdmi: Implement finer-grained hooks" that is at fault. > >>> I've pushed my updated branch, if you have some spare time, it would be > >>> great if you could confirm it on your Pi. > >> yesterday i checked out your latest rpi4-kms branch, but i was still > >> facing similiar issues with my Raspberry Pi 3 and multi_v7_defconfig > >> (heartbeat stops, splashscreen freeze, heartbeat is abnormal fast). So i > >> tried to bisect but the offending commit didn't cause an issue the > >> second time. > >> > >> By accident i noticed that a simple reboot seems to hang for at least 8 > >> minutes (using b0523c7b1c9d0edcd the base of your branch). This usually > >> take a few seconds. So i consider this base on linux-next as too > >> unstable for reliable testing. > >> > >> Is it possible to rebase this on something more stable like linux-5.7 or > >> at least drm-misc-next? This should avoid chasing unrelated issues. > > I've rebased it on 5.7 here: > > https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux.git/log/?h=rpi4-kms-5.7 > > > > And it looks to be indeed an issue coming from next. That branch can > > start the desktop just fine on an RPi3 here. It would be great if you > > could confirm on your end. > > > > Thanks! > > Maxime > > thank you very much. The good news are that the "black screen, but > heartbeat" issue and reboot hang are gone. Unfortunately the "no > heartbeat" issue is still there. > > Here are more details about the issue. It doesn't occur everytime. I > would guess the probability is about 40 percent, which made bisecting > much harder. Are you sure about that 40% reliability? I found out that the culprit was that the commit we mentionned was actually running atomic_disable before our own custom callbacks, meaning that we would run the custom callbacks with the clocks and the power domain shut down, resulting in a stall. I was seeing it all the time when X was shutting down the display, but maybe you were changing the resolution between the framebuffer console or something, and since the power domain is shut down asynchronously, it wasn't running fast enough for the next enable to come up and re-enable it again? > It is reproducible on my 2 Raspberry Pi 3 B Rev 1.2. It is > also seems independent from the display because the problem occured on > my Computer display and my TV. But only on HDMI, right? I've pushed a new branch with that fix. Maxime