All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 111588] Framebuffer corruption when a fb which is not being scanned out gets removed
@ 2019-09-08 10:00 bugzilla-daemon
  2019-09-08 12:37 ` bugzilla-daemon
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: bugzilla-daemon @ 2019-09-08 10:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2928 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111588

            Bug ID: 111588
           Summary: Framebuffer corruption when a fb which is not being
                    scanned out gets removed
           Product: DRI
           Version: DRI git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: not set
          Priority: not set
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: jwrdegoede@fedoraproject.org

This is a weird issue, which I noticed when working on this plymouth fix:
https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59

On boot to ensure a smooth handover of the framebuffer from plymouth to gdm the
following happens:

1) plymouth starts, does an addfb, becomes master, set the fb as the fb the
crtc should scanout
2) gdm starts tells plymouth to "deactivate", plymouth drops master (but does
not exit)
3) gdm becomes master, installs its own fb to scanout, the fb being scanned out
is now owned by gdm
4) gdm tells plymouth it may quit now
5) plymouth exits, without calling rmfb or closing the /dev/dri/card0 fd,
relying on the kernel to cleanup
6) all is well

The bug fix from the above merge requests make plymouth actually cleanup behind
itself, this is necessary to avoid issues with hotunplug (see the plymouth MR
for details). My first attempt at this simply made plymouth always do the
cleanup, both on hotunplug and exit as that was the most straight forward to
do.
This changes the sequence to:

1-4) Idem as above
5) Plymouth internally calls src/plugins/renderers/drm/plugin.c:
   ply_renderer_buffer_free() this does:
    drmModeRmFB(...);
    munmap (buffer->map_address, buffer->map_size);
    destroy_dumb_buffer_request.handle = buffer->handle;
    drmIoctl (fd, DRM_IOCTL_MODE_DESTROY_DUMB, &destroy_dumb_buffer_request);
   Followed by calling close() on the fd.
6) Plymouth exits
7) 5 and/or 6 cause the gdm framebuffer being all messed up, it looks like a
   wrong pitch or tiling setting

Note that when 5 is executed plymouth no longer is master and the fb being
removed is no longer being scanned out, so this really should not be able to
influence the current kms state, yet it does.

This is 100% reproducable for me with Fedora 30 + master plymouth + 5.3.0-rc7
on a R7 250E (SAPPHIRE Ultimate Radeon R7 250) using the amdgpu driver. I know
that the default is to use the radeon driver with the R7 250E, but I was using
the amdgpu driver deliberately to reproduce:
https://bugzilla.redhat.com/show_bug.cgi?id=1490490

For now I've modified the plymouth fix to only call ply_renderer_buffer_free()
+ close() on hot-unplug and to still leave cleanup to the kernel on exit, but
it would be nice to get to the bottom of this.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4401 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug 111588] Framebuffer corruption when a fb which is not being scanned out gets removed
  2019-09-08 10:00 [Bug 111588] Framebuffer corruption when a fb which is not being scanned out gets removed bugzilla-daemon
@ 2019-09-08 12:37 ` bugzilla-daemon
  2019-09-09 16:14 ` bugzilla-daemon
  2019-11-19  9:51 ` bugzilla-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2019-09-08 12:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3864 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111588

--- Comment #1 from Hans de Goede <jwrdegoede@fedoraproject.org> ---
I just realized I left out one bit of info which might be useful, to debug this
I added the following change to the kernel:

diff --git a/drivers/gpu/drm/drm_framebuffer.c
b/drivers/gpu/drm/drm_framebuffer.c
index 57564318ceea..4712bfb9ae05 100644
--- a/drivers/gpu/drm/drm_framebuffer.c
+++ b/drivers/gpu/drm/drm_framebuffer.c
@@ -464,6 +464,7 @@ int drm_mode_rmfb(struct drm_device *dev, u32 fb_id,
        if (drm_framebuffer_read_refcount(fb) > 1) {
                struct drm_mode_rmfb_work arg;

+               pr_err("drm_modr_rmfb calling drm_framebuffer_remove\n");
                INIT_WORK_ONSTACK(&arg.work, drm_mode_rmfb_work_fn);
                INIT_LIST_HEAD(&arg.fbs);
                list_add_tail(&fb->filp_head, &arg.fbs);
@@ -471,8 +472,10 @@ int drm_mode_rmfb(struct drm_device *dev, u32 fb_id,
                schedule_work(&arg.work);
                flush_work(&arg.work);
                destroy_work_on_stack(&arg.work);
-       } else
+       } else {
+               pr_err("drm_modr_rmfb calling drm_framebuffer_put\n");
                drm_framebuffer_put(fb);
+       }

        return 0;

@@ -669,11 +672,13 @@ void drm_fb_release(struct drm_file *priv)
         */
        list_for_each_entry_safe(fb, tfb, &priv->fbs, filp_head) {
                if (drm_framebuffer_read_refcount(fb) > 1) {
+                       pr_err("drm_fb_release calling
drm_framebuffer_remove\n");
                        list_move_tail(&fb->filp_head, &arg.fbs);
                } else {
                        list_del_init(&fb->filp_head);

                        /* This drops the fpriv->fbs reference. */
+                       pr_err("drm_fb_release calling drm_framebuffer_put\n");
                        drm_framebuffer_put(fb);
                }
        }
@@ -863,6 +868,8 @@ static int atomic_remove_fb(struct drm_framebuffer *fb)
                if (plane->state->fb != fb)
                        continue;

+               pr_err("atomic_remove_fb found plane still using fb\n");
+
                plane_state = drm_atomic_get_plane_state(state, plane);
                if (IS_ERR(plane_state)) {
                        ret = PTR_ERR(plane_state);


In the working case, so where we let the kernel do the fb cleanup itself, I
see:

Plymouth removes fb it creates to test for 32bpp support:
kernel: drm_modr_rmfb calling drm_framebuffer_put

gdm starts, does page-flipping, resulting in a number of:

kernel: drm_modr_rmfb calling drm_framebuffer_put
kernel: drm_modr_rmfb calling drm_framebuffer_put
...

lines

And then plymouth exits without any cleanup, so we get:
kernel: drm_fb_release calling drm_framebuffer_put

Followed by more:

kernel: drm_modr_rmfb calling drm_framebuffer_put
kernel: drm_modr_rmfb calling drm_framebuffer_put
...

>From gdm.

In the broken case, where ply_renderer_buffer_free() gets called on
plymouth-quit, I only see:

kernel: drm_modr_rmfb calling drm_framebuffer_put
kernel: drm_modr_rmfb calling drm_framebuffer_put
...

lines, wihch is expected as the fb is rmfb-ed before the fd is closed.

Note that we never hit:

@@ -863,6 +868,8 @@ static int atomic_remove_fb(struct drm_framebuffer *fb)
                if (plane->state->fb != fb)
                        continue;

+               pr_err("atomic_remove_fb found plane still using fb\n");
+
                plane_state = drm_atomic_get_plane_state(state, plane);
                if (IS_ERR(plane_state)) {
                        ret = PTR_ERR(plane_state);


So AFAICT userspace is doing everything correctly even in the broken case.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4928 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Bug 111588] Framebuffer corruption when a fb which is not being scanned out gets removed
  2019-09-08 10:00 [Bug 111588] Framebuffer corruption when a fb which is not being scanned out gets removed bugzilla-daemon
  2019-09-08 12:37 ` bugzilla-daemon
@ 2019-09-09 16:14 ` bugzilla-daemon
  2019-11-19  9:51 ` bugzilla-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2019-09-09 16:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 877 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111588

--- Comment #2 from Michel Dänzer <michel@daenzer.net> ---
(In reply to Hans de Goede from comment #0)
> 5) Plymouth internally calls src/plugins/renderers/drm/plugin.c:
>    ply_renderer_buffer_free() this does:
>     drmModeRmFB(...);
>     munmap (buffer->map_address, buffer->map_size);
>     destroy_dumb_buffer_request.handle = buffer->handle;
>     drmIoctl (fd, DRM_IOCTL_MODE_DESTROY_DUMB, &destroy_dumb_buffer_request);
>    Followed by calling close() on the fd.
> 6) Plymouth exits
> 7) 5 and/or 6 cause the gdm framebuffer being all messed up, it looks like a
>    wrong pitch or tiling setting

Would be interesting if you could further narrow down which step (or even
sub-step of 5) exactly triggers the problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1826 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug 111588] Framebuffer corruption when a fb which is not being scanned out gets removed
  2019-09-08 10:00 [Bug 111588] Framebuffer corruption when a fb which is not being scanned out gets removed bugzilla-daemon
  2019-09-08 12:37 ` bugzilla-daemon
  2019-09-09 16:14 ` bugzilla-daemon
@ 2019-11-19  9:51 ` bugzilla-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2019-11-19  9:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 805 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111588

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |MOVED
             Status|NEW                         |RESOLVED

--- Comment #3 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/902.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2452 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-19  9:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-08 10:00 [Bug 111588] Framebuffer corruption when a fb which is not being scanned out gets removed bugzilla-daemon
2019-09-08 12:37 ` bugzilla-daemon
2019-09-09 16:14 ` bugzilla-daemon
2019-11-19  9:51 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.