* Re: regression with mainline kernel
@ 2021-11-13 17:06 ` Linus Torvalds
0 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2021-11-13 17:06 UTC (permalink / raw)
To: Sudip Mukherjee
Cc: linux-kernel, Nicholas Verne, dri-devel,
open list:VIRTIO GPU DRIVER, Gerd Hoffmann, Gurchetan Singh
[ Hmm. This email was marked as spam for me. I see no obvious reason
for it being marked as spam, but it happens.. ]
On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
<sudipm.mukherjee@gmail.com> wrote:
>
> # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> drm/virtio: implement context init: add virtio_gpu_fence_event
Hmm. Judging from your automated screenshots, the login never appears.
> And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed
> the problem I was seeing on my qemu test of x86_64. The qemu image is
> based on Ubuntu.
Presumably either that commit is somehow buggy in itself - or it does
exactly what it means to do, and the new poll() semantics just
confuses the heck out of the X server (or wayland or whatever).
And honestly, if I read that thing correctly, the patch is entirely
broken. The new poll function (virtio_gpu_poll()) will unconditionally
remove the first event from the event list, and then report "Yeah, I
had events".
This is completely bogus for a few reasons:
- poll() really should be idempotent, because the poll function gets
called multiple times
- it doesn't even seem to check that the event that it removes is the
new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will
unconditionally just remove random events.
- it does seem to check the "vfpriv->ring_idx_mask" and do the old
thing if that is zero, but I see absolutely no reason for that (and
that check itself has caused problems, see below)
Honestly, my reaction to this all is that that commit is fundamentally
broken and probably should be reverted regardless as "this commit does
bad things".
HOWEVER - it has had a fix for a NULL pointer dereference in the
meantime - can you check whether the current top of tree happens to
work for you? Maybe your problem isn't due to "that commit does
unnatural things", but simply due to the bug fixed in d89c0c8322ec
("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
And if it's still broken with that commit, I'll happily revert it and
people need to go back to the drawing board.
In fact, I would really suggest that people look at that
virtio_gpu_poll() function regardless. That odd "let's unconditionally
just drop events in the poll function is really REALLY broken
behavior.
Linus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: regression with mainline kernel
@ 2021-11-13 17:06 ` Linus Torvalds
0 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2021-11-13 17:06 UTC (permalink / raw)
To: Sudip Mukherjee
Cc: linux-kernel, Nicholas Verne, dri-devel,
open list:VIRTIO GPU DRIVER, Daniel Vetter, Gurchetan Singh
[ Hmm. This email was marked as spam for me. I see no obvious reason
for it being marked as spam, but it happens.. ]
On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
<sudipm.mukherjee@gmail.com> wrote:
>
> # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> drm/virtio: implement context init: add virtio_gpu_fence_event
Hmm. Judging from your automated screenshots, the login never appears.
> And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed
> the problem I was seeing on my qemu test of x86_64. The qemu image is
> based on Ubuntu.
Presumably either that commit is somehow buggy in itself - or it does
exactly what it means to do, and the new poll() semantics just
confuses the heck out of the X server (or wayland or whatever).
And honestly, if I read that thing correctly, the patch is entirely
broken. The new poll function (virtio_gpu_poll()) will unconditionally
remove the first event from the event list, and then report "Yeah, I
had events".
This is completely bogus for a few reasons:
- poll() really should be idempotent, because the poll function gets
called multiple times
- it doesn't even seem to check that the event that it removes is the
new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will
unconditionally just remove random events.
- it does seem to check the "vfpriv->ring_idx_mask" and do the old
thing if that is zero, but I see absolutely no reason for that (and
that check itself has caused problems, see below)
Honestly, my reaction to this all is that that commit is fundamentally
broken and probably should be reverted regardless as "this commit does
bad things".
HOWEVER - it has had a fix for a NULL pointer dereference in the
meantime - can you check whether the current top of tree happens to
work for you? Maybe your problem isn't due to "that commit does
unnatural things", but simply due to the bug fixed in d89c0c8322ec
("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
And if it's still broken with that commit, I'll happily revert it and
people need to go back to the drawing board.
In fact, I would really suggest that people look at that
virtio_gpu_poll() function regardless. That odd "let's unconditionally
just drop events in the poll function is really REALLY broken
behavior.
Linus
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: regression with mainline kernel
2021-11-13 17:06 ` Linus Torvalds
(?)
@ 2021-11-13 21:34 ` Sudip Mukherjee
-1 siblings, 0 replies; 16+ messages in thread
From: Sudip Mukherjee @ 2021-11-13 21:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Gurchetan Singh, Nicholas Verne, Gerd Hoffmann,
dri-devel, open list:VIRTIO GPU DRIVER, Daniel Vetter
Hi Linus,
On Sat, Nov 13, 2021 at 5:07 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> [ Hmm. This email was marked as spam for me. I see no obvious reason
> for it being marked as spam, but it happens.. ]
>
> On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
> <sudipm.mukherjee@gmail.com> wrote:
> >
> > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> > drm/virtio: implement context init: add virtio_gpu_fence_event
>
> Hmm. Judging from your automated screenshots, the login never appears.
>
<snip>
>
> HOWEVER - it has had a fix for a NULL pointer dereference in the
> meantime - can you check whether the current top of tree happens to
> work for you? Maybe your problem isn't due to "that commit does
> unnatural things", but simply due to the bug fixed in d89c0c8322ec
> ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
>
> And if it's still broken with that commit, I'll happily revert it and
> people need to go back to the drawing board.
I sent another mail yesterday which is now at
https://lore.kernel.org/lkml/CADVatmOOzCxAgLhCu1tTz=44sgRDXds5-oMZ3V0w4f5kLCLKrw@mail.gmail.com/
I will just pase that here for you.
Last night's test on 66f4beaa6c1d worked fine. So I guess this has now
been fixed.
I have not done a bisect to see what has fixed it, but looking at the
log I think it will be that NULL pointer fix.
--
Regards
Sudip
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: regression with mainline kernel
@ 2021-11-13 21:34 ` Sudip Mukherjee
0 siblings, 0 replies; 16+ messages in thread
From: Sudip Mukherjee @ 2021-11-13 21:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Nicholas Verne, dri-devel,
open list:VIRTIO GPU DRIVER, Gerd Hoffmann, Gurchetan Singh
Hi Linus,
On Sat, Nov 13, 2021 at 5:07 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> [ Hmm. This email was marked as spam for me. I see no obvious reason
> for it being marked as spam, but it happens.. ]
>
> On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
> <sudipm.mukherjee@gmail.com> wrote:
> >
> > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> > drm/virtio: implement context init: add virtio_gpu_fence_event
>
> Hmm. Judging from your automated screenshots, the login never appears.
>
<snip>
>
> HOWEVER - it has had a fix for a NULL pointer dereference in the
> meantime - can you check whether the current top of tree happens to
> work for you? Maybe your problem isn't due to "that commit does
> unnatural things", but simply due to the bug fixed in d89c0c8322ec
> ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
>
> And if it's still broken with that commit, I'll happily revert it and
> people need to go back to the drawing board.
I sent another mail yesterday which is now at
https://lore.kernel.org/lkml/CADVatmOOzCxAgLhCu1tTz=44sgRDXds5-oMZ3V0w4f5kLCLKrw@mail.gmail.com/
I will just pase that here for you.
Last night's test on 66f4beaa6c1d worked fine. So I guess this has now
been fixed.
I have not done a bisect to see what has fixed it, but looking at the
log I think it will be that NULL pointer fix.
--
Regards
Sudip
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: regression with mainline kernel
@ 2021-11-13 21:34 ` Sudip Mukherjee
0 siblings, 0 replies; 16+ messages in thread
From: Sudip Mukherjee @ 2021-11-13 21:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Nicholas Verne, dri-devel,
open list:VIRTIO GPU DRIVER, Daniel Vetter, Gurchetan Singh
Hi Linus,
On Sat, Nov 13, 2021 at 5:07 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> [ Hmm. This email was marked as spam for me. I see no obvious reason
> for it being marked as spam, but it happens.. ]
>
> On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
> <sudipm.mukherjee@gmail.com> wrote:
> >
> > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> > drm/virtio: implement context init: add virtio_gpu_fence_event
>
> Hmm. Judging from your automated screenshots, the login never appears.
>
<snip>
>
> HOWEVER - it has had a fix for a NULL pointer dereference in the
> meantime - can you check whether the current top of tree happens to
> work for you? Maybe your problem isn't due to "that commit does
> unnatural things", but simply due to the bug fixed in d89c0c8322ec
> ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
>
> And if it's still broken with that commit, I'll happily revert it and
> people need to go back to the drawing board.
I sent another mail yesterday which is now at
https://lore.kernel.org/lkml/CADVatmOOzCxAgLhCu1tTz=44sgRDXds5-oMZ3V0w4f5kLCLKrw@mail.gmail.com/
I will just pase that here for you.
Last night's test on 66f4beaa6c1d worked fine. So I guess this has now
been fixed.
I have not done a bisect to see what has fixed it, but looking at the
log I think it will be that NULL pointer fix.
--
Regards
Sudip
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: regression with mainline kernel
2021-11-13 17:06 ` Linus Torvalds
(?)
@ 2021-11-15 14:24 ` Daniel Vetter
-1 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2021-11-15 14:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: Sudip Mukherjee, linux-kernel, Gurchetan Singh, Nicholas Verne,
Gerd Hoffmann, dri-devel, open list:VIRTIO GPU DRIVER,
Daniel Vetter
On Sat, Nov 13, 2021 at 09:06:57AM -0800, Linus Torvalds wrote:
> [ Hmm. This email was marked as spam for me. I see no obvious reason
> for it being marked as spam, but it happens.. ]
>
> On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
> <sudipm.mukherjee@gmail.com> wrote:
> >
> > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> > drm/virtio: implement context init: add virtio_gpu_fence_event
>
> Hmm. Judging from your automated screenshots, the login never appears.
Greg also reported a regression, plus I looked at the commit and had
questions too.
> > And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed
> > the problem I was seeing on my qemu test of x86_64. The qemu image is
> > based on Ubuntu.
>
> Presumably either that commit is somehow buggy in itself - or it does
> exactly what it means to do, and the new poll() semantics just
> confuses the heck out of the X server (or wayland or whatever).
>
> And honestly, if I read that thing correctly, the patch is entirely
> broken. The new poll function (virtio_gpu_poll()) will unconditionally
> remove the first event from the event list, and then report "Yeah, I
> had events".
>
> This is completely bogus for a few reasons:
>
> - poll() really should be idempotent, because the poll function gets
> called multiple times
>
> - it doesn't even seem to check that the event that it removes is the
> new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will
> unconditionally just remove random events.
>
> - it does seem to check the "vfpriv->ring_idx_mask" and do the old
> thing if that is zero, but I see absolutely no reason for that (and
> that check itself has caused problems, see below)
>
> Honestly, my reaction to this all is that that commit is fundamentally
> broken and probably should be reverted regardless as "this commit does
> bad things".
>
> HOWEVER - it has had a fix for a NULL pointer dereference in the
> meantime - can you check whether the current top of tree happens to
> work for you? Maybe your problem isn't due to "that commit does
> unnatural things", but simply due to the bug fixed in d89c0c8322ec
> ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
>
> And if it's still broken with that commit, I'll happily revert it and
> people need to go back to the drawing board.
>
> In fact, I would really suggest that people look at that
> virtio_gpu_poll() function regardless. That odd "let's unconditionally
> just drop events in the poll function is really REALLY broken
> behavior.
Tbh I haven't looked at the poll implementation, but it's fishy on
principles as in drm drivers shouldn't reinvent them. The commit message
cites vmwgfx as example for a private driver drm_event, but that works
perfectly fine with standard drm_poll (and is meant to work perfectly fine
with standard drm_poll).
So if it's buggy on top of questionable I think revert might be the right
choice irrespective of whether there's some fixes in-flight.
So if you end up pushing that revert:
References: https://lore.kernel.org/dri-devel/YZJrutLaiwozLfSw@phenom.ffwll.local/
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Plus credit Greg too and all that ofc.
But lets wait a bit for virtio folks to chime in, it's only Monday :-)
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: regression with mainline kernel
@ 2021-11-15 14:24 ` Daniel Vetter
0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2021-11-15 14:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Nicholas Verne, dri-devel, Gurchetan Singh,
Gerd Hoffmann, open list:VIRTIO GPU DRIVER, Sudip Mukherjee
On Sat, Nov 13, 2021 at 09:06:57AM -0800, Linus Torvalds wrote:
> [ Hmm. This email was marked as spam for me. I see no obvious reason
> for it being marked as spam, but it happens.. ]
>
> On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
> <sudipm.mukherjee@gmail.com> wrote:
> >
> > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> > drm/virtio: implement context init: add virtio_gpu_fence_event
>
> Hmm. Judging from your automated screenshots, the login never appears.
Greg also reported a regression, plus I looked at the commit and had
questions too.
> > And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed
> > the problem I was seeing on my qemu test of x86_64. The qemu image is
> > based on Ubuntu.
>
> Presumably either that commit is somehow buggy in itself - or it does
> exactly what it means to do, and the new poll() semantics just
> confuses the heck out of the X server (or wayland or whatever).
>
> And honestly, if I read that thing correctly, the patch is entirely
> broken. The new poll function (virtio_gpu_poll()) will unconditionally
> remove the first event from the event list, and then report "Yeah, I
> had events".
>
> This is completely bogus for a few reasons:
>
> - poll() really should be idempotent, because the poll function gets
> called multiple times
>
> - it doesn't even seem to check that the event that it removes is the
> new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will
> unconditionally just remove random events.
>
> - it does seem to check the "vfpriv->ring_idx_mask" and do the old
> thing if that is zero, but I see absolutely no reason for that (and
> that check itself has caused problems, see below)
>
> Honestly, my reaction to this all is that that commit is fundamentally
> broken and probably should be reverted regardless as "this commit does
> bad things".
>
> HOWEVER - it has had a fix for a NULL pointer dereference in the
> meantime - can you check whether the current top of tree happens to
> work for you? Maybe your problem isn't due to "that commit does
> unnatural things", but simply due to the bug fixed in d89c0c8322ec
> ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
>
> And if it's still broken with that commit, I'll happily revert it and
> people need to go back to the drawing board.
>
> In fact, I would really suggest that people look at that
> virtio_gpu_poll() function regardless. That odd "let's unconditionally
> just drop events in the poll function is really REALLY broken
> behavior.
Tbh I haven't looked at the poll implementation, but it's fishy on
principles as in drm drivers shouldn't reinvent them. The commit message
cites vmwgfx as example for a private driver drm_event, but that works
perfectly fine with standard drm_poll (and is meant to work perfectly fine
with standard drm_poll).
So if it's buggy on top of questionable I think revert might be the right
choice irrespective of whether there's some fixes in-flight.
So if you end up pushing that revert:
References: https://lore.kernel.org/dri-devel/YZJrutLaiwozLfSw@phenom.ffwll.local/
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Plus credit Greg too and all that ofc.
But lets wait a bit for virtio folks to chime in, it's only Monday :-)
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: regression with mainline kernel
@ 2021-11-15 14:24 ` Daniel Vetter
0 siblings, 0 replies; 16+ messages in thread
From: Daniel Vetter @ 2021-11-15 14:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Nicholas Verne, dri-devel, Gurchetan Singh,
Daniel Vetter, open list:VIRTIO GPU DRIVER, Sudip Mukherjee
On Sat, Nov 13, 2021 at 09:06:57AM -0800, Linus Torvalds wrote:
> [ Hmm. This email was marked as spam for me. I see no obvious reason
> for it being marked as spam, but it happens.. ]
>
> On Thu, Nov 11, 2021 at 12:52 PM Sudip Mukherjee
> <sudipm.mukherjee@gmail.com> wrote:
> >
> > # first bad commit: [cd7f5ca33585918febe5e2f6dc090a21cfa775b0]
> > drm/virtio: implement context init: add virtio_gpu_fence_event
>
> Hmm. Judging from your automated screenshots, the login never appears.
Greg also reported a regression, plus I looked at the commit and had
questions too.
> > And, indeed reverting cd7f5ca33585 on top of debe436e77c7 has fixed
> > the problem I was seeing on my qemu test of x86_64. The qemu image is
> > based on Ubuntu.
>
> Presumably either that commit is somehow buggy in itself - or it does
> exactly what it means to do, and the new poll() semantics just
> confuses the heck out of the X server (or wayland or whatever).
>
> And honestly, if I read that thing correctly, the patch is entirely
> broken. The new poll function (virtio_gpu_poll()) will unconditionally
> remove the first event from the event list, and then report "Yeah, I
> had events".
>
> This is completely bogus for a few reasons:
>
> - poll() really should be idempotent, because the poll function gets
> called multiple times
>
> - it doesn't even seem to check that the event that it removes is the
> new VIRTGPU_EVENT_FENCE_SIGNALED_INTERNAL kind of event, so it will
> unconditionally just remove random events.
>
> - it does seem to check the "vfpriv->ring_idx_mask" and do the old
> thing if that is zero, but I see absolutely no reason for that (and
> that check itself has caused problems, see below)
>
> Honestly, my reaction to this all is that that commit is fundamentally
> broken and probably should be reverted regardless as "this commit does
> bad things".
>
> HOWEVER - it has had a fix for a NULL pointer dereference in the
> meantime - can you check whether the current top of tree happens to
> work for you? Maybe your problem isn't due to "that commit does
> unnatural things", but simply due to the bug fixed in d89c0c8322ec
> ("drm/virtio: Fix NULL dereference error in virtio_gpu_poll").
>
> And if it's still broken with that commit, I'll happily revert it and
> people need to go back to the drawing board.
>
> In fact, I would really suggest that people look at that
> virtio_gpu_poll() function regardless. That odd "let's unconditionally
> just drop events in the poll function is really REALLY broken
> behavior.
Tbh I haven't looked at the poll implementation, but it's fishy on
principles as in drm drivers shouldn't reinvent them. The commit message
cites vmwgfx as example for a private driver drm_event, but that works
perfectly fine with standard drm_poll (and is meant to work perfectly fine
with standard drm_poll).
So if it's buggy on top of questionable I think revert might be the right
choice irrespective of whether there's some fixes in-flight.
So if you end up pushing that revert:
References: https://lore.kernel.org/dri-devel/YZJrutLaiwozLfSw@phenom.ffwll.local/
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Plus credit Greg too and all that ofc.
But lets wait a bit for virtio folks to chime in, it's only Monday :-)
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 16+ messages in thread