All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: BUG / design challenge: vmwgfx + PRIME handle free == clobbering errno
@ 2024-01-09 16:06 Xaver Hugl
  2024-01-10  0:28 ` Zack Rusin
  0 siblings, 1 reply; 6+ messages in thread
From: Xaver Hugl @ 2024-01-09 16:06 UTC (permalink / raw)
  To: zack.rusin; +Cc: linux-graphics-maintainer, stefan.hoffmeister, dri-devel

Hi,

KWin does use DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT. Tying the check to
DRM_CLIENT_CAP_ATOMIC instead would IMO make more sense though (even if it's
still weird) and would work with older versions of KWin and other compositors.

Greetings,
Xaver Hugl

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG / design challenge: vmwgfx + PRIME handle free == clobbering errno
  2024-01-09 16:06 BUG / design challenge: vmwgfx + PRIME handle free == clobbering errno Xaver Hugl
@ 2024-01-10  0:28 ` Zack Rusin
  2024-01-11 16:38   ` Xaver Hugl
  0 siblings, 1 reply; 6+ messages in thread
From: Zack Rusin @ 2024-01-10  0:28 UTC (permalink / raw)
  To: Xaver Hugl; +Cc: linux-graphics-maintainer, stefan.hoffmeister, dri-devel

On Tue, Jan 9, 2024 at 11:06 AM Xaver Hugl <xaver.hugl@kde.org> wrote:
>
> Hi,
>
> KWin does use DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT.

Can you point me to the code that implements it? Just wanted to take a
quick look, because I didn't see the cursor on KDE 6 after fixing the
kernel oops.

> Tying the check to
> DRM_CLIENT_CAP_ATOMIC instead would IMO make more sense though (even if it's
> still weird) and would work with older versions of KWin and other compositors.

Unfortunately xf86-video-modesetting advertises CLIENT_CAP_ATOMIC and
uses GL where our GL driver assumes the prime object is not GEM. Not
impossible, as mentioned before, we can always add code to the kernel
that handles both but I don't think there's any particularly clean
solutions here. We'll probably play with a few solutions and see which
one is the cleanest.

z

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG / design challenge: vmwgfx + PRIME handle free == clobbering errno
  2024-01-10  0:28 ` Zack Rusin
@ 2024-01-11 16:38   ` Xaver Hugl
  0 siblings, 0 replies; 6+ messages in thread
From: Xaver Hugl @ 2024-01-11 16:38 UTC (permalink / raw)
  To: Zack Rusin; +Cc: linux-graphics-maintainer, stefan.hoffmeister, dri-devel

[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]

Am Mi., 10. Jan. 2024 um 01:28 Uhr schrieb Zack Rusin <
zack.rusin@broadcom.com>:

> On Tue, Jan 9, 2024 at 11:06 AM Xaver Hugl <xaver.hugl@kde.org> wrote:
> >
> > Hi,
> >
> > KWin does use DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT.
>
> Can you point me to the code that implements it? Just wanted to take a
> quick look, because I didn't see the cursor on KDE 6 after fixing the
> kernel oops.
>
It's here:
https://invent.kde.org/plasma/kwin/-/blob/a879c59a082707e0e7dfa8ebeb7b239551ec9c71/src/backends/drm/drm_gpu.cpp#L153

> > Tying the check to
> > DRM_CLIENT_CAP_ATOMIC instead would IMO make more sense though (even if
> it's
> > still weird) and would work with older versions of KWin and other
> compositors.
>
> Unfortunately xf86-video-modesetting advertises CLIENT_CAP_ATOMIC and
> uses GL where our GL driver assumes the prime object is not GEM. Not
> impossible, as mentioned before, we can always add code to the kernel
> that handles both but I don't think there's any particularly clean
> solutions here. We'll probably play with a few solutions and see which
> one is the cleanest.
>
> z
>

Ah, that's unfortunate.

[-- Attachment #2: Type: text/html, Size: 1899 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG / design challenge: vmwgfx + PRIME handle free == clobbering errno
  2023-12-23  1:27 ` Zack Rusin
@ 2024-01-12  9:30   ` Daniel Vetter
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Vetter @ 2024-01-12  9:30 UTC (permalink / raw)
  To: Zack Rusin; +Cc: VMware Graphics Reviewers, Stefan Hoffmeister, dri-devel

On Fri, Dec 22, 2023 at 08:27:14PM -0500, Zack Rusin wrote:
> On Tue, Dec 19, 2023 at 11:15 AM Stefan Hoffmeister
> <stefan.hoffmeister@econos.de> wrote:
> >
> >
> > Hi,
> >
> > vmwgfx implements drmPrimeFDToHandle in terms of the TTM resource manager.
> >
> > At the same time, the driver advertises
> >
> >         .driver_features =
> >         DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM,
> >
> > Given that, a userland application will call drmCloseBufferHandle to
> > correctly release any previously acquired handle.
> >
> > In kernel language, this translates to ioctl
> > DRM_IOCTL_PRIME_FD_TO_HANDLE and DRM_IOCTL_GEM_CLOSE.
> >
> > Problems:
> >
> > a) because the application uses drmCloseBufferHandle, and that goes
> > through GEM (the driver claims "I am GEM"!), vmwgfx will always return
> > EINVAL, because the driver is _not_ GEM, but TTM on that aspect.
> >
> > This will always clobber errno for userspace, which is not at all
> > helpful for meaningful error handling.
> >
> > Additionally, the driver offers no formal API contract which would
> > allow applications to detect that the driver does not like be talked
> > to like a GEM driver, although it is _claims_ to be GEM driver.

Yeah this is really badly breaking expectations of generic userspace :-/

> > b) there is no (documented) means to release the handle acquired from
> > a call to drmPrimeFDToHandle; this implies that in principle any call
> > to drmPrimeFDToHandle will leak the handle.
> >
> > Questions:
> >
> > a) how can errno clobbering by vmwgfx with EINVAL be avoided?
> >
> > b) what is the correct way to release the TTM resources allocated
> > through drmPrimeFDToHandle?
> >
> >
> > For context: the KDE Plasma Desktop kwin DRM integration layer makes
> > extensive use of drmPrimeFDToHandle. With the way the VMware vmwgfx
> > driver behaves, I see these options:
> >
> > a) correctly check the return code of drmCloseBufferHandle and blindly
> > log all the EINVAL errors triggered by the VMware driver, putting
> > blame on the application for mis-managing handles
> > b) drop any return code from drmCloseBufferHandle onto the floor,
> > completely disregarding application integrity checking
> > c) hard-code a check for "vmwgfx" as the driver in use and spam "this
> > driver owned by VMware / Broadcom is broken, ignore the following
> > EINVAL and secondary effects" for integrity checks
> >
> > I do not like any of these options, to be honest.
> >
> > Many thanks for any input
> 
> Thanks for the report! This is mainly an artifact of vmwgfx predating
> GEM and really basically all of drm, so the ioctl interface on top of
> which vmwgfx operates is a bit bonkers. I know there are a number of
> examples where drm graphics drivers were breaking the ioctl interface,
> but we've tried avoiding that in vmwgfx and so supporting our old
> userspace and new userspace, which are incompatible, is incredibly
> complex and fragile. Our approach was to, in general, deal with those
> as they come. Our userspace, also in general, deals with surfaces,
> which aren't GEM's, as you've noticed, but instead are backed by a GEM
> object, so expects the handle to prime to not be GEM's. As you can
> imagine it's a bit hard to make sure ioctl's for which userspace has
> the opposite expectations are working fine (we've done it for some
> already though, the "dumb" ones being an example).
> 
> If you'd have an IGT test that tests for this without requiring CRC's
> then we'd get it fixed quickly. Otherwise making it work is difficult,
> not because this particular bug is difficult to fix, but because it's
> difficult to fix it while keeping the xorg drivers from 2012 working.
> We'll likely just make sure the prime ioctl's return proper GEM
> objects for clients that advertise certain cap's. Is KWin advertising
> DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT?  That'd be a trivial, albeit
> unexpected, way of making sure the old userspace gets the semantics it
> expects, while making sure the new clients get proper behavior.

Instead of trying to detect generic userspace, which I think is futile
because there's really not anything beyoned "generic userspace expects
gem handles, so give it to them", would it be possible to detect the
legacy driver somehow? Because generic kms clients are really not required
to use any specific feature at all that kms provides, so I don't think
there's a way to reliably detect those that works across everything.

Like some legacy ioctl/parameter set that only the legacy xorg-modesetting
driver uses should fit the bill here, or maybe a combination of them. Some
ideas:

- if you get an ADDFB call with a legacy buffer (not a gem buffer), mark
  the client as legacy and then return legacy buffer handles, not gem ones
  from fd2handle.

- maybe similar for "you get a legacy buffer handle for SETCURSOR ioctl"

- maybe there's a unique setup sequence that only the xorg driver uses?
  Like a sequence of ioctl/ioctl parameters that's unique to the old xorg
  driver.

It's still brittle, but since no one's working on the legacy driver
anymore you only need to chase this once and then be done. And it /should/
allow you to implement standard gem behaviour by default for all these
standard ioctls that generic userspace expects to work, and only use the
old behaviour when you detect the legacy userspace and have set the
fpriv->uses_legacy_buffer_handles flag.

A really bonkers fix would be to merge the buffer handle xarrays into one
and use one of the tag bits to denote whether it's a legacy or gem one,
and then allow GEM_CLOSE to close them all. But that's really ugly and
probably way too much surgery.

Cheers, Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG / design challenge: vmwgfx + PRIME handle free == clobbering errno
  2023-12-19 16:15 Stefan Hoffmeister
@ 2023-12-23  1:27 ` Zack Rusin
  2024-01-12  9:30   ` Daniel Vetter
  0 siblings, 1 reply; 6+ messages in thread
From: Zack Rusin @ 2023-12-23  1:27 UTC (permalink / raw)
  To: Stefan Hoffmeister; +Cc: VMware Graphics Reviewers, dri-devel

On Tue, Dec 19, 2023 at 11:15 AM Stefan Hoffmeister
<stefan.hoffmeister@econos.de> wrote:
>
>
> Hi,
>
> vmwgfx implements drmPrimeFDToHandle in terms of the TTM resource manager.
>
> At the same time, the driver advertises
>
>         .driver_features =
>         DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM,
>
> Given that, a userland application will call drmCloseBufferHandle to
> correctly release any previously acquired handle.
>
> In kernel language, this translates to ioctl
> DRM_IOCTL_PRIME_FD_TO_HANDLE and DRM_IOCTL_GEM_CLOSE.
>
> Problems:
>
> a) because the application uses drmCloseBufferHandle, and that goes
> through GEM (the driver claims "I am GEM"!), vmwgfx will always return
> EINVAL, because the driver is _not_ GEM, but TTM on that aspect.
>
> This will always clobber errno for userspace, which is not at all
> helpful for meaningful error handling.
>
> Additionally, the driver offers no formal API contract which would
> allow applications to detect that the driver does not like be talked
> to like a GEM driver, although it is _claims_ to be GEM driver.
>
> b) there is no (documented) means to release the handle acquired from
> a call to drmPrimeFDToHandle; this implies that in principle any call
> to drmPrimeFDToHandle will leak the handle.
>
> Questions:
>
> a) how can errno clobbering by vmwgfx with EINVAL be avoided?
>
> b) what is the correct way to release the TTM resources allocated
> through drmPrimeFDToHandle?
>
>
> For context: the KDE Plasma Desktop kwin DRM integration layer makes
> extensive use of drmPrimeFDToHandle. With the way the VMware vmwgfx
> driver behaves, I see these options:
>
> a) correctly check the return code of drmCloseBufferHandle and blindly
> log all the EINVAL errors triggered by the VMware driver, putting
> blame on the application for mis-managing handles
> b) drop any return code from drmCloseBufferHandle onto the floor,
> completely disregarding application integrity checking
> c) hard-code a check for "vmwgfx" as the driver in use and spam "this
> driver owned by VMware / Broadcom is broken, ignore the following
> EINVAL and secondary effects" for integrity checks
>
> I do not like any of these options, to be honest.
>
> Many thanks for any input

Thanks for the report! This is mainly an artifact of vmwgfx predating
GEM and really basically all of drm, so the ioctl interface on top of
which vmwgfx operates is a bit bonkers. I know there are a number of
examples where drm graphics drivers were breaking the ioctl interface,
but we've tried avoiding that in vmwgfx and so supporting our old
userspace and new userspace, which are incompatible, is incredibly
complex and fragile. Our approach was to, in general, deal with those
as they come. Our userspace, also in general, deals with surfaces,
which aren't GEM's, as you've noticed, but instead are backed by a GEM
object, so expects the handle to prime to not be GEM's. As you can
imagine it's a bit hard to make sure ioctl's for which userspace has
the opposite expectations are working fine (we've done it for some
already though, the "dumb" ones being an example).

If you'd have an IGT test that tests for this without requiring CRC's
then we'd get it fixed quickly. Otherwise making it work is difficult,
not because this particular bug is difficult to fix, but because it's
difficult to fix it while keeping the xorg drivers from 2012 working.
We'll likely just make sure the prime ioctl's return proper GEM
objects for clients that advertise certain cap's. Is KWin advertising
DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT?  That'd be a trivial, albeit
unexpected, way of making sure the old userspace gets the semantics it
expects, while making sure the new clients get proper behavior.

z

^ permalink raw reply	[flat|nested] 6+ messages in thread

* BUG / design challenge: vmwgfx + PRIME handle free == clobbering errno
@ 2023-12-19 16:15 Stefan Hoffmeister
  2023-12-23  1:27 ` Zack Rusin
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hoffmeister @ 2023-12-19 16:15 UTC (permalink / raw)
  To: dri-devel, Zack Rusin, VMware Graphics Reviewers


Hi,

vmwgfx implements drmPrimeFDToHandle in terms of the TTM resource manager.

At the same time, the driver advertises

	.driver_features =
	DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM,

Given that, a userland application will call drmCloseBufferHandle to  
correctly release any previously acquired handle.

In kernel language, this translates to ioctl  
DRM_IOCTL_PRIME_FD_TO_HANDLE and DRM_IOCTL_GEM_CLOSE.

Problems:

a) because the application uses drmCloseBufferHandle, and that goes  
through GEM (the driver claims "I am GEM"!), vmwgfx will always return  
EINVAL, because the driver is _not_ GEM, but TTM on that aspect.

This will always clobber errno for userspace, which is not at all  
helpful for meaningful error handling.

Additionally, the driver offers no formal API contract which would  
allow applications to detect that the driver does not like be talked  
to like a GEM driver, although it is _claims_ to be GEM driver.

b) there is no (documented) means to release the handle acquired from  
a call to drmPrimeFDToHandle; this implies that in principle any call  
to drmPrimeFDToHandle will leak the handle.

Questions:

a) how can errno clobbering by vmwgfx with EINVAL be avoided?

b) what is the correct way to release the TTM resources allocated  
through drmPrimeFDToHandle?


For context: the KDE Plasma Desktop kwin DRM integration layer makes  
extensive use of drmPrimeFDToHandle. With the way the VMware vmwgfx  
driver behaves, I see these options:

a) correctly check the return code of drmCloseBufferHandle and blindly  
log all the EINVAL errors triggered by the VMware driver, putting  
blame on the application for mis-managing handles
b) drop any return code from drmCloseBufferHandle onto the floor,  
completely disregarding application integrity checking
c) hard-code a check for "vmwgfx" as the driver in use and spam "this  
driver owned by VMware / Broadcom is broken, ignore the following  
EINVAL and secondary effects" for integrity checks

I do not like any of these options, to be honest.

Many thanks for any input
Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-12  9:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-09 16:06 BUG / design challenge: vmwgfx + PRIME handle free == clobbering errno Xaver Hugl
2024-01-10  0:28 ` Zack Rusin
2024-01-11 16:38   ` Xaver Hugl
  -- strict thread matches above, loose matches on Subject: below --
2023-12-19 16:15 Stefan Hoffmeister
2023-12-23  1:27 ` Zack Rusin
2024-01-12  9:30   ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.