All of lore.kernel.org
 help / color / mirror / Atom feed
* Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints
@ 2021-04-19 12:05 Simon Ser
  2021-04-20 10:17 ` Daniel Stone
  0 siblings, 1 reply; 5+ messages in thread
From: Simon Ser @ 2021-04-19 12:05 UTC (permalink / raw)
  To: DRI Development, wayland; +Cc: Daniel Vetter, Emil Velikov, Laurent Pinchart

Hi all,

I'm working on a Wayland extension [1] that, among other things, allows
compositors to advertise the preferred device to be used by Wayland
clients.

In general, compositors will send a render node. However, in the case
of split render/display SoCs, things get a little bit complicated.

The Wayland compositor will find the display-only DRM device (usually
at /dev/dri/card0). This DRM device will have a DRM primary node, but
no DRM render node.

The Wayland compositor will create a GBM device from this display-only
device, then create an EGL display. Under the hood, Mesa's kmsro will
kick in and magically open a render node from a different device.

However the compositor has no knowledge of this, and has no way to
discover the render node opened by kmsro.

This is an issue because the compositor has no render node to send to
Wayland clients. The compositor is forced to send a primary node to
clients. Clients will need to open the primary node and rely on Mesa's
renderonly to once again magically open the render node under the hood.

In general clients cannot be expected to be able to open primary nodes.
This happens to work on systemd distributions because udev sets a
special uaccess tag on the primary node that makes logind grant
permissions to users physically logged in on a VT.

This will fall apart on non-logind systems and on systems where no user
is physically logged in. Additionally, it may be desirable to deny
access to primary nodes in sandboxes.

So I believe the best way forward would be for the compositor to send
the render node to clients. This could prevent clients to allocate
buffers suitable for scan-out, but that can be fixed with some kind of
buffer constraints negotiation, like we presented at XDC 2020 [2].

There are a few solutions:

1. Require compositors to discover the render device by trying to import
   a buffer. For each available render device, the compositor would
   allocate a buffer, export it as a DMA-BUF, import it to the
   display-only device, then try to drmModeAddFB.
2. Allow compositors to query the render device magically opened by
   kmsro. This could be done either via EGL_EXT_device_drm, or via a
   new EGL extension.
3. Allow compositors to query the kernel drivers to know which devices
   are compatible with each other. Some uAPI to query a compatible
   display device from a render-only device, or vice-versa, has been
   suggested in the past.

(1) has a number of limitations and gotchas. It requires allocating
real buffers, this has a rather big cost for something done at
compositor initialization time. It requires to select a buffer format
and modifier compatible with both devices, so it can't be isolated in
a simple function (and e.g. shared between all compositors in libdrm).
Some drivers will allow to drmModeAddFB buffers that can't be scanned
out, and will only reject the buffer at atomic commit time.

(2) wouldn't work with non-EGL APIs such as Vulkan. Eric Anholt seemed
pretty opposed to this idea, but I didn't fully understood why.

I don't know how feasible (3) is. The kernel drivers must be able to
decide whether buffers coming from another driver can be scanned out,
but how early can they give an answer? Can they give an answer solely
based on a DRM node, and not a DMA-BUF?

Feedback is welcome. Do you agree with the premise that compositors
need access to the render node? Do you have any other potential solution
in mind? Which solution would you prefer?

Thanks,

Simon

[1]: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/8
[2]: https://xdc2020.x.org/event/9/contributions/634/
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints
  2021-04-19 12:05 Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints Simon Ser
@ 2021-04-20 10:17 ` Daniel Stone
  2021-04-20 11:14   ` Daniel Vetter
  2021-04-20 23:11   ` Eric Anholt
  0 siblings, 2 replies; 5+ messages in thread
From: Daniel Stone @ 2021-04-20 10:17 UTC (permalink / raw)
  To: Simon Ser
  Cc: Daniel Vetter, Emil Velikov, DRI Development, wayland, Laurent Pinchart


[-- Attachment #1.1: Type: text/plain, Size: 7457 bytes --]

Hi,

On Mon, 19 Apr 2021 at 13:06, Simon Ser <contact@emersion.fr> wrote:

> I'm working on a Wayland extension [1] that, among other things, allows
> compositors to advertise the preferred device to be used by Wayland
> clients.
>
> In general, compositors will send a render node. However, in the case
> of split render/display SoCs, things get a little bit complicated.
>
> [...]
>

Thanks for the write-up Simon!


> There are a few solutions:
>
> 1. Require compositors to discover the render device by trying to import
>    a buffer. For each available render device, the compositor would
>    allocate a buffer, export it as a DMA-BUF, import it to the
>    display-only device, then try to drmModeAddFB.
>

I don't think this is actually tractable? Assuming that 'allocate a buffer'
means 'obtain a gbm_device for the render node directly and allocate a
gbm_bo from it', even with compatible formats and modifiers this will fail
for more restrictive display hardware. imx-drm and pl111 (combined with vc4
on some Raspberry Pis) will fail this, since they'll take different
allocation paths when they're bound through kmsro vs. directly, accounting
for things like contiguous allocation. So we'd get false negatives on at
least some platforms.


> 2. Allow compositors to query the render device magically opened by
>    kmsro. This could be done either via EGL_EXT_device_drm, or via a
>    new EGL extension.
>

This would be my strong preference, and I don't entirely understand
anholt's pushback here. The way I see it, GBM is about allocation for
scanout, and EGL is about rendering. If, on a split GPU/display system, we
create a gbm_device from a KMS display-only device node, then creating an
EGLDisplay from that magically binds us to a completely different DRM GPU
node, and anything using that EGLDisplay will use that GPU device to render.

Being able to discover the GPU device node through the device query is
really useful, because it tells us exactly what implicit magic EGL did
under the hood, and about the device that EGL will use. Being able to
discover the display node is much less useful; it does tell us how GBM will
allocate buffers, but the user already knows which device is in use because
they supplied it to GBM. I see the display node as a property of GBM, and
the GPU node as a property of EGL, even if EGL does do (*waves hands*)
stuff under the hood to ensure the two are compatible.

If we had EGL_EXT_explicit_device, things get even more weird, I think;
would the device query on an EGLDisplay created with a combination of a
gbm_device native display handle and an explicit EGLDevice handle return
the scanout device from GBM or the GPU device from EGL? On my reading, I'd
expect it to be the latter; if the queries returned very different things
based on whether GPU device selection was implicit (returning the KMS node)
or explicit (GPU node), that would definitely violate the principle of
least surprise.


> 3. Allow compositors to query the kernel drivers to know which devices
>    are compatible with each other. Some uAPI to query a compatible
>    display device from a render-only device, or vice-versa, has been
>    suggested in the past.
>

What does 'compatible' mean? Would an Intel iGPU and and AMD dGPU be
compatible with each other? Would a Mali GPU bound to system memory through
AMBA be as compatible with the display controller as it would with an AMD
GPU on PCIE? I think a query which only exposed whether or not devices
could share dmabufs with each other is far too generic to be helpful for
the actual usecase we have, as well as not being useful enough for other
usecases ('well you _can_ use dmabufs from your AMD GPU on your Mali GPU,
but only if they were allocated in the right domain').


> (1) has a number of limitations and gotchas. It requires allocating
> real buffers, this has a rather big cost for something done at
> compositor initialization time. It requires to select a buffer format
> and modifier compatible with both devices, so it can't be isolated in
> a simple function (and e.g. shared between all compositors in libdrm).
>

We're already going to have to do throwaway allocations to make Intel's
tiled modes work; I'd rather not extend this out to doing throwaway
allocations across device combinations as well as modifier lists.


> Some drivers will allow to drmModeAddFB buffers that can't be scanned
> out, and will only reject the buffer at atomic commit time.
>

This is 100% a KMS driver bug and should be fixed there. It's not
catastrophic, since commits can fail for any reason or none at all and
compositors are expected to handle this, but they should absolutely be
rejecting buffers which can never be scanned out at all at AddFB time.


> (2) wouldn't work with non-EGL APIs such as Vulkan. Eric Anholt seemed
> pretty opposed to this idea, but I didn't fully understood why.
>

Well, Vulkan doesn't have GBM in the same way, right? In the Vulkan case,
we already know exactly what the GPU is, because it's the VkPhysicalDevice
you had to explicitly select to create the VkDevice etc; if you're using
GBM it's because you've _also_ created a gbm_device for the KMS node and
are allocating gbm_bos to import to VkDeviceMemory/VkImage, so you already
have both pieces of information. (If you're creating VkDeviceMemory/VkImage
in Vulkan then exporting dmabuf from there, since there's no way to specify
a target device, it's a blind guess as to whether it'll actually work for
KMS. Maybe it will! But maybe not.)


> I don't know how feasible (3) is. The kernel drivers must be able to
> decide whether buffers coming from another driver can be scanned out,
> but how early can they give an answer? Can they give an answer solely
> based on a DRM node, and not a DMA-BUF?
>

Maybe! But maybe not.


> Feedback is welcome. Do you agree with the premise that compositors
> need access to the render node?


Yes, strongly. Compositors may optimise for direct paths (e.g. direct
scanout of client buffers through KMS, directly providing client buffers to
media codecs for streaming) where possible. But they must always have a
'device of last resort': if these optimal paths are not possible (your
codec doesn't like your client buffers, you can't do direct scanout because
a notification occluded your client content and you've run out of overlay
planes, you're on Intel and your display FIFO size is measured in bits),
the compositor needs to know that it can access the client buffers somehow.

This is done by always importing into a GPU device - for most current
compositors as an EGLImage, for some others as a VkImage - and falling back
to GL composition paths, or GL blits, or even ReadPixels if strictly
necessary, so your client content continues to be accessible.

There is no way to do this without telling the client what that GPU device
node is, so it can allocate accordingly. Thanks to the implicit device
selection performed when creating an EGLDisplay from a gbm_device, we
cannot currently discover what that device node is.


> Do you have any other potential solution in mind?


I can't think of any right now, but am open to hearing them.


> Which solution would you prefer?


For all the reasons above, strongly #2, i.e. that querying the DRM device
node from the EGLDevice returned by querying an EGLDisplay created from a
gbm_device, returns the GPU device's render node and not the KMS device's
primary node.

Cheers,
Daniel

[-- Attachment #1.2: Type: text/html, Size: 9891 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints
  2021-04-20 10:17 ` Daniel Stone
@ 2021-04-20 11:14   ` Daniel Vetter
  2021-04-26  9:47     ` Simon Ser
  2021-04-20 23:11   ` Eric Anholt
  1 sibling, 1 reply; 5+ messages in thread
From: Daniel Vetter @ 2021-04-20 11:14 UTC (permalink / raw)
  To: Daniel Stone; +Cc: Emil Velikov, DRI Development, wayland, Laurent Pinchart

Just 2 comments on the kernel aspects here.

On Tue, Apr 20, 2021 at 12:18 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi,
>
> On Mon, 19 Apr 2021 at 13:06, Simon Ser <contact@emersion.fr> wrote:
>>
>> I'm working on a Wayland extension [1] that, among other things, allows
>> compositors to advertise the preferred device to be used by Wayland
>> clients.
>>
>> In general, compositors will send a render node. However, in the case
>> of split render/display SoCs, things get a little bit complicated.
>>
>> [...]
>
>
> Thanks for the write-up Simon!
>
>>
>> There are a few solutions:
>>
>> 1. Require compositors to discover the render device by trying to import
>>    a buffer. For each available render device, the compositor would
>>    allocate a buffer, export it as a DMA-BUF, import it to the
>>    display-only device, then try to drmModeAddFB.
>
>
> I don't think this is actually tractable? Assuming that 'allocate a buffer' means 'obtain a gbm_device for the render node directly and allocate a gbm_bo from it', even with compatible formats and modifiers this will fail for more restrictive display hardware. imx-drm and pl111 (combined with vc4 on some Raspberry Pis) will fail this, since they'll take different allocation paths when they're bound through kmsro vs. directly, accounting for things like contiguous allocation. So we'd get false negatives on at least some platforms.
>
>>
>> 2. Allow compositors to query the render device magically opened by
>>    kmsro. This could be done either via EGL_EXT_device_drm, or via a
>>    new EGL extension.
>
>
> This would be my strong preference, and I don't entirely understand anholt's pushback here. The way I see it, GBM is about allocation for scanout, and EGL is about rendering. If, on a split GPU/display system, we create a gbm_device from a KMS display-only device node, then creating an EGLDisplay from that magically binds us to a completely different DRM GPU node, and anything using that EGLDisplay will use that GPU device to render.
>
> Being able to discover the GPU device node through the device query is really useful, because it tells us exactly what implicit magic EGL did under the hood, and about the device that EGL will use. Being able to discover the display node is much less useful; it does tell us how GBM will allocate buffers, but the user already knows which device is in use because they supplied it to GBM. I see the display node as a property of GBM, and the GPU node as a property of EGL, even if EGL does do (*waves hands*) stuff under the hood to ensure the two are compatible.
>
> If we had EGL_EXT_explicit_device, things get even more weird, I think; would the device query on an EGLDisplay created with a combination of a gbm_device native display handle and an explicit EGLDevice handle return the scanout device from GBM or the GPU device from EGL? On my reading, I'd expect it to be the latter; if the queries returned very different things based on whether GPU device selection was implicit (returning the KMS node) or explicit (GPU node), that would definitely violate the principle of least surprise.
>
>>
>> 3. Allow compositors to query the kernel drivers to know which devices
>>    are compatible with each other. Some uAPI to query a compatible
>>    display device from a render-only device, or vice-versa, has been
>>    suggested in the past.
>
>
> What does 'compatible' mean? Would an Intel iGPU and and AMD dGPU be compatible with each other? Would a Mali GPU bound to system memory through AMBA be as compatible with the display controller as it would with an AMD GPU on PCIE? I think a query which only exposed whether or not devices could share dmabufs with each other is far too generic to be helpful for the actual usecase we have, as well as not being useful enough for other usecases ('well you _can_ use dmabufs from your AMD GPU on your Mali GPU, but only if they were allocated in the right domain').
>
>>
>> (1) has a number of limitations and gotchas. It requires allocating
>> real buffers, this has a rather big cost for something done at
>> compositor initialization time. It requires to select a buffer format
>> and modifier compatible with both devices, so it can't be isolated in
>> a simple function (and e.g. shared between all compositors in libdrm).
>
>
> We're already going to have to do throwaway allocations to make Intel's tiled modes work; I'd rather not extend this out to doing throwaway allocations across device combinations as well as modifier lists.
>
>>
>> Some drivers will allow to drmModeAddFB buffers that can't be scanned
>> out, and will only reject the buffer at atomic commit time.
>
>
> This is 100% a KMS driver bug and should be fixed there. It's not catastrophic, since commits can fail for any reason or none at all and compositors are expected to handle this, but they should absolutely be rejecting buffers which can never be scanned out at all at AddFB time.

Yup. Kernel is supposed to reject as early as possible, main points
for scanning out something for display are
- FD2HANDLE aka  dma-buf import. If it's not contig, but the device
requires contig, it should fail here. This takes into account IOMMU,
but hilariously there's some display IP where only half the CRTC are
connected to an IOMMU, the other half needs physically contig memory
...
- AddFB2, if you got any of the metadata combos wrong (like
modifiers/fourcc, alignment and all that)
- atomic TEST_ONLY for anything more specific for a given combo
(running out of bw/special hw converters are the big ones)

I think with more helper rollout we've gotten a lot better at this,
but probably still lots of bugs around.

>> (2) wouldn't work with non-EGL APIs such as Vulkan. Eric Anholt seemed
>> pretty opposed to this idea, but I didn't fully understood why.
>
>
> Well, Vulkan doesn't have GBM in the same way, right? In the Vulkan case, we already know exactly what the GPU is, because it's the VkPhysicalDevice you had to explicitly select to create the VkDevice etc; if you're using GBM it's because you've _also_ created a gbm_device for the KMS node and are allocating gbm_bos to import to VkDeviceMemory/VkImage, so you already have both pieces of information. (If you're creating VkDeviceMemory/VkImage in Vulkan then exporting dmabuf from there, since there's no way to specify a target device, it's a blind guess as to whether it'll actually work for KMS. Maybe it will! But maybe not.)
>
>>
>> I don't know how feasible (3) is. The kernel drivers must be able to
>> decide whether buffers coming from another driver can be scanned out,
>> but how early can they give an answer? Can they give an answer solely
>> based on a DRM node, and not a DMA-BUF?
>
>
> Maybe! But maybe not.

Just replying on this one: This feels a lot like the kernel should
know about which mesa you have installed. Which really isn't the
kernel's job.

E.g. if you have a mesa without panfrost, then panfrost + some display
kms thing is definitely not compatible. But if you have it installed,
then they are. Feel free to make this arbitrarily more nasty with
stuff like "mesa is there, supports the combo except not yet the
specific afbc modifier combo you actually required".

Unless I'm completely off this doesn't sound like something the kernel
should be involved in at all.

I think the one thing the kernel should provide here is which kind of
backing storage types each device can work with, to cover stuff like
cma vs scatter-gather shmem. The long-standing idea was to expose
these as dma-buf heaps and then sprinkle some links in sysfs, but that
idea is as far away from working code as ever.
-Daniel

>> Feedback is welcome. Do you agree with the premise that compositors
>> need access to the render node?
>
>
> Yes, strongly. Compositors may optimise for direct paths (e.g. direct scanout of client buffers through KMS, directly providing client buffers to media codecs for streaming) where possible. But they must always have a 'device of last resort': if these optimal paths are not possible (your codec doesn't like your client buffers, you can't do direct scanout because a notification occluded your client content and you've run out of overlay planes, you're on Intel and your display FIFO size is measured in bits), the compositor needs to know that it can access the client buffers somehow.
>
> This is done by always importing into a GPU device - for most current compositors as an EGLImage, for some others as a VkImage - and falling back to GL composition paths, or GL blits, or even ReadPixels if strictly necessary, so your client content continues to be accessible.
>
> There is no way to do this without telling the client what that GPU device node is, so it can allocate accordingly. Thanks to the implicit device selection performed when creating an EGLDisplay from a gbm_device, we cannot currently discover what that device node is.
>
>>
>> Do you have any other potential solution in mind?
>
>
> I can't think of any right now, but am open to hearing them.
>
>>
>> Which solution would you prefer?
>
>
> For all the reasons above, strongly #2, i.e. that querying the DRM device node from the EGLDevice returned by querying an EGLDisplay created from a gbm_device, returns the GPU device's render node and not the KMS device's primary node.
>
> Cheers,
> Daniel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints
  2021-04-20 10:17 ` Daniel Stone
  2021-04-20 11:14   ` Daniel Vetter
@ 2021-04-20 23:11   ` Eric Anholt
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Anholt @ 2021-04-20 23:11 UTC (permalink / raw)
  To: Daniel Stone
  Cc: Daniel Vetter, Emil Velikov, DRI Development, wayland, Laurent Pinchart

On Tue, Apr 20, 2021 at 3:18 AM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi,
>
> On Mon, 19 Apr 2021 at 13:06, Simon Ser <contact@emersion.fr> wrote:
>>
>> I'm working on a Wayland extension [1] that, among other things, allows
>> compositors to advertise the preferred device to be used by Wayland
>> clients.
>>
>> In general, compositors will send a render node. However, in the case
>> of split render/display SoCs, things get a little bit complicated.
>>
>> [...]
>
>
> Thanks for the write-up Simon!
>
>>
>> There are a few solutions:
>>
>> 1. Require compositors to discover the render device by trying to import
>>    a buffer. For each available render device, the compositor would
>>    allocate a buffer, export it as a DMA-BUF, import it to the
>>    display-only device, then try to drmModeAddFB.
>
>
> I don't think this is actually tractable? Assuming that 'allocate a buffer' means 'obtain a gbm_device for the render node directly and allocate a gbm_bo from it', even with compatible formats and modifiers this will fail for more restrictive display hardware. imx-drm and pl111 (combined with vc4 on some Raspberry Pis) will fail this, since they'll take different allocation paths when they're bound through kmsro vs. directly, accounting for things like contiguous allocation. So we'd get false negatives on at least some platforms.
>
>>
>> 2. Allow compositors to query the render device magically opened by
>>    kmsro. This could be done either via EGL_EXT_device_drm, or via a
>>    new EGL extension.
>
>
> This would be my strong preference, and I don't entirely understand anholt's pushback here. The way I see it, GBM is about allocation for scanout, and EGL is about rendering. If, on a split GPU/display system, we create a gbm_device from a KMS display-only device node, then creating an EGLDisplay from that magically binds us to a completely different DRM GPU node, and anything using that EGLDisplay will use that GPU device to render.
>
> Being able to discover the GPU device node through the device query is really useful, because it tells us exactly what implicit magic EGL did under the hood, and about the device that EGL will use. Being able to discover the display node is much less useful; it does tell us how GBM will allocate buffers, but the user already knows which device is in use because they supplied it to GBM. I see the display node as a property of GBM, and the GPU node as a property of EGL, even if EGL does do (*waves hands*) stuff under the hood to ensure the two are compatible.

I guess if we're assuming that the caller definitely knows about the
display device and is asking EGL for the render node in order to do
smarter buffer sharing between display and render, I can see it.  My
objection was that getting the render node in that discussion was
apparently some workaround for other brokenness, and was going to
result in software that didn't work on pl111 and vc4 displays because
it was trying to dodge kmsro.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints
  2021-04-20 11:14   ` Daniel Vetter
@ 2021-04-26  9:47     ` Simon Ser
  0 siblings, 0 replies; 5+ messages in thread
From: Simon Ser @ 2021-04-26  9:47 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Emil Velikov, DRI Development, wayland, Laurent Pinchart

Thanks for the feedback! Replying to both Daniels below.

On Tuesday, April 20th, 2021 at 1:14 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> On Tue, Apr 20, 2021 at 12:18 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> >> There are a few solutions:
> >>
> >> 1. Require compositors to discover the render device by trying to import
> >>    a buffer. For each available render device, the compositor would
> >>    allocate a buffer, export it as a DMA-BUF, import it to the
> >>    display-only device, then try to drmModeAddFB.
> >
> >
> > I don't think this is actually tractable? Assuming that 'allocate a buffer'
> > means 'obtain a gbm_device for the render node directly and allocate a
> > gbm_bo from it', even with compatible formats and modifiers this will fail
> > for more restrictive display hardware. imx-drm and pl111 (combined with vc4
> > on some Raspberry Pis) will fail this, since they'll take different
> > allocation paths when they're bound through kmsro vs. directly, accounting
> > for things like contiguous allocation. So we'd get false negatives on at
> > least some platforms.

Right. It would work for drivers using renderonly_create_gpu_import_for_resource,
but wouldn't work for drivers using renderonly_create_kms_dumb_buffer_for_resource.

Doing the reverse (creating a buffer on the display-only device, import it on
the render-only device) wouldn't work either, since some drivers will perform
DMA transfers under-the-hood without telling user-space. So the check would
succeed even if the pipeline is inefficient.

> >> 2. Allow compositors to query the render device magically opened by
> >>    kmsro. This could be done either via EGL_EXT_device_drm, or via a
> >>    new EGL extension.
> >
> > This would be my strong preference, and I don't entirely understand
> > anholt's pushback here. The way I see it, GBM is about allocation for
> > scanout, and EGL is about rendering. If, on a split GPU/display system, we
> > create a gbm_device from a KMS display-only device node, then creating an
> > EGLDisplay from that magically binds us to a completely different DRM GPU
> > node, and anything using that EGLDisplay will use that GPU device to
> > render.
> >
> > Being able to discover the GPU device node through the device query is
> > really useful, because it tells us exactly what implicit magic EGL did
> > under the hood, and about the device that EGL will use. Being able to
> > discover the display node is much less useful; it does tell us how GBM will
> > allocate buffers, but the user already knows which device is in use because
> > they supplied it to GBM. I see the display node as a property of GBM, and
> > the GPU node as a property of EGL, even if EGL does do (*waves hands*)
> > stuff under the hood to ensure the two are compatible.

GBM can be used for non-scanout allocation too. This is useful for clients that
want to have more control over the allocation parameters, the swapchain and
want to manage DMA-BUFs directly.

> > If we had EGL_EXT_explicit_device, things get even more weird, I think;
> > would the device query on an EGLDisplay created with a combination of a
> > gbm_device native display handle and an explicit EGLDevice handle return
> > the scanout device from GBM or the GPU device from EGL? On my reading, I'd
> > expect it to be the latter; if the queries returned very different things
> > based on whether GPU device selection was implicit (returning the KMS node)
> > or explicit (GPU node), that would definitely violate the principle of
> > least surprise.

Right now as implemented by Mesa, things are pretty weird indeed.
EGL_EXT_device_enumeration will only expose a list of one EGLDevice per DRM
render node, plus one EGLDevice for software rendering. If you create an
EGLDisplay from a gbm_device from a display-only device, then a new EGLDevice
for this display-only device gets created and is returned by
EGL_EXT_device_query.

It would make more sense to me to return the EGLDevice for the render-only
device in EGL_EXT_device_query. However that means we need to pass back this
information through Mesa's DRI interface and throught the Gallium interface.
That sounds pretty annoying, but was what I tried to suggest earlier here:

https://gitlab.freedesktop.org/mesa/mesa/-/issues/4178#note_815040

> >> 3. Allow compositors to query the kernel drivers to know which devices
> >>    are compatible with each other. Some uAPI to query a compatible
> >>    display device from a render-only device, or vice-versa, has been
> >>    suggested in the past.
> >
> >
> > What does 'compatible' mean? Would an Intel iGPU and and AMD dGPU be
> > compatible with each other? Would a Mali GPU bound to system memory through
> > AMBA be as compatible with the display controller as it would with an AMD
> > GPU on PCIE? I think a query which only exposed whether or not devices
> > could share dmabufs with each other is far too generic to be helpful for
> > the actual usecase we have, as well as not being useful enough for other
> > usecases ('well you _can_ use dmabufs from your AMD GPU on your Mali GPU,
> > but only if they were allocated in the right domain').

"Compatible" would mean that both devices can directly access the buffers
without doing any copy nor transfer. In other words, drivers using
kmsro/renderonly today would be compatible with each other.

> >> (1) has a number of limitations and gotchas. It requires allocating
> >> real buffers, this has a rather big cost for something done at
> >> compositor initialization time. It requires to select a buffer format
> >> and modifier compatible with both devices, so it can't be isolated in
> >> a simple function (and e.g. shared between all compositors in libdrm).
> >
> > We're already going to have to do throwaway allocations to make Intel's
> > tiled modes work; I'd rather not extend this out to doing throwaway
> > allocations across device combinations as well as modifier lists.
> >
> >> Some drivers will allow to drmModeAddFB buffers that can't be scanned
> >> out, and will only reject the buffer at atomic commit time.
> >
> > This is 100% a KMS driver bug and should be fixed there. It's not
> > catastrophic, since commits can fail for any reason or none at all and
> > compositors are expected to handle this, but they should absolutely be
> > rejecting buffers which can never be scanned out at all at AddFB time.

This is sometimes more tricky than expected. I was thinking of AMD here,
which has a "scanout" flag in the implicit buffer metadata, which indicates
whether a buffer can be scanned out. However the flag was introduced only
pretty recently and various code-paths still don't set it (old user-space,
dumb buffers iirc). This should be fixed, but in the meantime the kernel
needs to accept buffers that potentially can't be scanned out.

> Yup. Kernel is supposed to reject as early as possible, main points
> for scanning out something for display are
> - FD2HANDLE aka  dma-buf import. If it's not contig, but the device
> requires contig, it should fail here. This takes into account IOMMU,
> but hilariously there's some display IP where only half the CRTC are
> connected to an IOMMU, the other half needs physically contig memory
> ...
> - AddFB2, if you got any of the metadata combos wrong (like
> modifiers/fourcc, alignment and all that)
> - atomic TEST_ONLY for anything more specific for a given combo
> (running out of bw/special hw converters are the big ones)
>
> I think with more helper rollout we've gotten a lot better at this,
> but probably still lots of bugs around.
>
> >> (2) wouldn't work with non-EGL APIs such as Vulkan. Eric Anholt seemed
> >> pretty opposed to this idea, but I didn't fully understood why.
> >
> >
> > Well, Vulkan doesn't have GBM in the same way, right? In the Vulkan case,
> > we already know exactly what the GPU is, because it's the VkPhysicalDevice
> > you had to explicitly select to create the VkDevice etc; if you're using
> > GBM it's because you've _also_ created a gbm_device for the KMS node and
> > are allocating gbm_bos to import to VkDeviceMemory/VkImage, so you already
> > have both pieces of information. (If you're creating VkDeviceMemory/VkImage
> > in Vulkan then exporting dmabuf from there, since there's no way to specify
> > a target device, it's a blind guess as to whether it'll actually work for
> > KMS. Maybe it will! But maybe not.)

So far my plan for Vulkan was to create a GBM device for the DRM node used
by KMS, then select the VkPhysicalDevice that matches the DRM node (via
VK_EXT_physical_device_drm). Obviously that doesn't work for split
render/display SoCs, because there's no VkPhysicalDevice that matches the
display-only DRM node used for KMS. I also wouldn't want to pick _any_
VkPhysicalDevice, because that would be just a guessing game and can result
in very sub-optimal rendering pipelines.

The EGL query wouldn't be very helpful for Vulkan. A GBM query or a kernel
query would be much more useful.

> >> I don't know how feasible (3) is. The kernel drivers must be able to
> >> decide whether buffers coming from another driver can be scanned out,
> >> but how early can they give an answer? Can they give an answer solely
> >> based on a DRM node, and not a DMA-BUF?
> >
> >
> > Maybe! But maybe not.
>
> Just replying on this one: This feels a lot like the kernel should
> know about which mesa you have installed. Which really isn't the
> kernel's job.
>
> E.g. if you have a mesa without panfrost, then panfrost + some display
> kms thing is definitely not compatible. But if you have it installed,
> then they are. Feel free to make this arbitrarily more nasty with
> stuff like "mesa is there, supports the combo except not yet the
> specific afbc modifier combo you actually required".
>
> Unless I'm completely off this doesn't sound like something the kernel
> should be involved in at all.
>
> I think the one thing the kernel should provide here is which kind of
> backing storage types each device can work with, to cover stuff like
> cma vs scatter-gather shmem. The long-standing idea was to expose
> these as dma-buf heaps and then sprinkle some links in sysfs, but that
> idea is as far away from working code as ever.

I'm really not looking for a "please give me a guarantee that any buffer
allocated on device A will work on device B too". What I'm looking for is
much closer to the backing storage idea you're suggesting. Basically: if I
allocate a buffer on device A, and import it on device B, will this perform
any copy/transfer under the hood or not?

If any transfer is involved, I think the compositor should prefer allocating
an intermediate shadow buffer, and explicitly perform a blit.

> >> Feedback is welcome. Do you agree with the premise that compositors
> >> need access to the render node?
> >
> > Yes, strongly. Compositors may optimise for direct paths (e.g. direct
> > scanout of client buffers through KMS, directly providing client buffers to
> > media codecs for streaming) where possible. But they must always have a
> > 'device of last resort': if these optimal paths are not possible (your
> > codec doesn't like your client buffers, you can't do direct scanout because
> > a notification occluded your client content and you've run out of overlay
> > planes, you're on Intel and your display FIFO size is measured in bits),
> > the compositor needs to know that it can access the client buffers somehow.
> >
> > This is done by always importing into a GPU device - for most current
> > compositors as an EGLImage, for some others as a VkImage - and falling back
> > to GL composition paths, or GL blits, or even ReadPixels if strictly
> > necessary, so your client content continues to be accessible.
> >
> > There is no way to do this without telling the client what that GPU device
> > node is, so it can allocate accordingly. Thanks to the implicit device
> > selection performed when creating an EGLDisplay from a gbm_device, we
> > cannot currently discover what that device node is.
> >
> >>
> >> Do you have any other potential solution in mind?
> >
> >
> > I can't think of any right now, but am open to hearing them.
> >
> >>
> >> Which solution would you prefer?
> >
> >
> > For all the reasons above, strongly #2, i.e. that querying the DRM device
> > node from the EGLDevice returned by querying an EGLDisplay created from a
> > gbm_device, returns the GPU device's render node and not the KMS device's
> > primary node.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-26  9:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 12:05 Split render/display SoCs, Mesa's renderonly, and Wayland dmabuf hints Simon Ser
2021-04-20 10:17 ` Daniel Stone
2021-04-20 11:14   ` Daniel Vetter
2021-04-26  9:47     ` Simon Ser
2021-04-20 23:11   ` Eric Anholt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.