On Mon, 2022-05-09 at 15:13 +0200, Mark Menzynski wrote: > I think there are no direct issues with initialization in the state how it > is now. I suspect it's because "drm_kms_helper_poll_enable()" starts the > first worker thread with a delay, which gives enough time to initialize > required resources. I changed the initialization part to keep it consistent > with the finish part, which is the one causing troubles. I think I may have misspoke on what the issue was here since I was about to leave work. The MST bit might not actually be an issue, however I think nouveau_fbcon_init() being called before nouveau_display_init() would be an issue since nouveau_fbcon_init() can trigger a modeset - and we can't perform modesets before nouveau_display_init() has set things up. Looking at the docs for drm_kms_helper_poll_disable() - I think the actual fix here (to ensure that we still call drm_kms_helper_poll_disable() at the right time during suspend) would be to actually add another call to drm_kms_helper_poll_disable() into nouveau_fbcon_fini() before we call nouveau_fbcon_accel_fini() and everything after. This should make sure that we stop the output polling work early on driver unload, and since the docs mention that it's OK to disable polling more then once with drm_kms_helper_poll_disable() I don't see any issues with that. > > I am not sure where I could move "drm_kms_helper_poll_enable/disable()", > since it is defined in "drm/drm_probe_helper.c", which is only included in > "nouveau_display.c" and "nouveau_connector.c". Both creating a new function > in "nouveau_display.c", and including "probe_helper.h" and using poll_enable > in a different file like "nouveau_fbcon.c" seem like too big changes for > such small fix. I don't know. > > Can this new proposed order break something in the finish part as well? > Maybe it would be just better to change the order of "nouveau_drm_finish" > and keep the current order of "noueau_drm_init"? > > On Thu, May 5, 2022 at 9:57 PM Lyude Paul wrote: > > Hmm, I think we might just need to move the drm_kms_helper_poll_enable() > > call > > to the end here instead of all of nouveau_display_init(). I realized this > > because in nouveau_display_init() it seems that we actually rely on > > nouveau_display_init() to setup hotplug interrupts - which we do actually > > need > > this early on in the driver probe process. > > > > That being said though, drm_kms_helper_poll_enable() shouldn't be required > > for > > MST short HPD IRQs from working so moving that instead should work. > > > > On Wed, 2022-05-04 at 19:18 +0200, Mark Menzynski wrote: > > > Resources needed for output poll workers are destroyed in > > > nouveau_fbcon_fini() before output poll workers are cleared in > > > nouveau_display_fini(). This means there is a time between fbcon_fini > > > and display_fini, where if output poll happens, it crashes. > > > > > > BUG: KASAN: use-after-free in > > > __drm_fb_helper_initial_config_and_unlock.cold+0x1f3/0x291 > > > [drm_kms_helper] > > > > > > Cc: Ben Skeggs > > > Cc: Karol Herbst > > > Cc: Lyude Paul > > > Cc: David Airlie > > > Cc: Daniel Vetter > > > Cc: Sumit Semwal > > > Cc: "Christian König" > > > Cc: dri-devel@lists.freedesktop.org > > > Cc: nouveau@lists.freedesktop.org > > > Cc: linux-kernel@vger.kernel.org > > > Cc: linux-media@vger.kernel.org > > > Cc: linaro-mm-sig@lists.linaro.org > > > Signed-off-by: Mark Menzynski > > > --- > > >  drivers/gpu/drm/nouveau/nouveau_drm.c | 17 ++++++++--------- > > >  1 file changed, 8 insertions(+), 9 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c > > > b/drivers/gpu/drm/nouveau/nouveau_drm.c > > > index 561309d447e0..773efdd20d2f 100644 > > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > > > @@ -588,12 +588,6 @@ nouveau_drm_device_init(struct drm_device *dev) > > >         if (ret) > > >                 goto fail_dispctor; > > >   > > > -       if (dev->mode_config.num_crtc) { > > > -               ret = nouveau_display_init(dev, false, false); > > > -               if (ret) > > > -                       goto fail_dispinit; > > > -       } > > > - > > >         nouveau_debugfs_init(drm); > > >         nouveau_hwmon_init(dev); > > >         nouveau_svm_init(drm); > > > @@ -601,6 +595,12 @@ nouveau_drm_device_init(struct drm_device *dev) > > >         nouveau_fbcon_init(dev); > > >         nouveau_led_init(dev); > > >   > > > +       if (dev->mode_config.num_crtc) { > > > +               ret = nouveau_display_init(dev, false, false); > > > +               if (ret) > > > +                       goto fail_dispinit; > > > +       } > > > + > > >         if (nouveau_pmops_runtime()) { > > >                 pm_runtime_use_autosuspend(dev->dev); > > >                 pm_runtime_set_autosuspend_delay(dev->dev, 5000); > > > @@ -641,15 +641,14 @@ nouveau_drm_device_fini(struct drm_device *dev) > > >                 pm_runtime_forbid(dev->dev); > > >         } > > >   > > > +       if (dev->mode_config.num_crtc) > > > +               nouveau_display_fini(dev, false, false); > > >         nouveau_led_fini(dev); > > >         nouveau_fbcon_fini(dev); > > >         nouveau_dmem_fini(drm); > > >         nouveau_svm_fini(drm); > > >         nouveau_hwmon_fini(dev); > > >         nouveau_debugfs_fini(drm); > > > - > > > -       if (dev->mode_config.num_crtc) > > > -               nouveau_display_fini(dev, false, false); > > >         nouveau_display_destroy(dev); > > >   > > >         nouveau_accel_fini(drm); > > -- Cheers, Lyude Paul (she/her) Software Engineer at Red Hat