All of lore.kernel.org
 help / color / mirror / Atom feed
* Unix Device Memory Allocation project
@ 2016-10-04 23:47 James Jones
  2016-10-05  8:42 ` Benjamin Gaignard
  2016-10-18 23:40 ` Marek Olšák
  0 siblings, 2 replies; 30+ messages in thread
From: James Jones @ 2016-10-04 23:47 UTC (permalink / raw)
  To: dri-devel

Hello everyone,

As many are aware, we took up the issue of surface/memory allocation at 
XDC this year.  The outcome of that discussion was the beginnings of a 
design proposal for a library that would server as a cross-device, 
cross-process surface allocator.  In the past week I've started to 
condense some of my notes from that discussion down to code & a design 
document.  I've posted the first pieces to a github repository here:

   https://github.com/cubanismo/allocator

This isn't anything close to usable code yet.  Just headers and docs, 
and incomplete ones at that.  However, feel free to check it out if 
you're interested in discussing the design.

Thanks,
-James
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-04 23:47 Unix Device Memory Allocation project James Jones
@ 2016-10-05  8:42 ` Benjamin Gaignard
  2016-10-05 12:19   ` Rob Clark
  2016-10-18 23:40 ` Marek Olšák
  1 sibling, 1 reply; 30+ messages in thread
From: Benjamin Gaignard @ 2016-10-05  8:42 UTC (permalink / raw)
  To: James Jones; +Cc: dri-devel

Hi,

Do you already have in mind a list of targeted driver/backend/plugin ?
How do you expect to enumerate devices capabilities ? by adding a new
generic ioctl or a configuration file in userland ?

Maybe it is to early for those questions but anyway I'm interested by
this memory allocation thread.

Regards,
Benjamin

2016-10-05 1:47 GMT+02:00 James Jones <jajones@nvidia.com>:
> Hello everyone,
>
> As many are aware, we took up the issue of surface/memory allocation at XDC
> this year.  The outcome of that discussion was the beginnings of a design
> proposal for a library that would server as a cross-device, cross-process
> surface allocator.  In the past week I've started to condense some of my
> notes from that discussion down to code & a design document.  I've posted
> the first pieces to a github repository here:
>
>   https://github.com/cubanismo/allocator
>
> This isn't anything close to usable code yet.  Just headers and docs, and
> incomplete ones at that.  However, feel free to check it out if you're
> interested in discussing the design.
>
> Thanks,
> -James
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Benjamin Gaignard

Graphic Study Group

Linaro.org │ Open source software for ARM SoCs

Follow Linaro: Facebook | Twitter | Blog
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-05  8:42 ` Benjamin Gaignard
@ 2016-10-05 12:19   ` Rob Clark
  0 siblings, 0 replies; 30+ messages in thread
From: Rob Clark @ 2016-10-05 12:19 UTC (permalink / raw)
  To: Benjamin Gaignard; +Cc: James Jones, dri-devel

This would be a purely userspace interface (in that user just
interacts w/ a userspace shared lib, the driver specific backend may
do it's own ioctls to query whatever is needed from the hw)..

So might be something like, for each device in the sharing use-case, call:

  allocator_dev_t allocator_load(int fd);

where loader figures out what backend to dlopen() based on the fd.
There could probably be, for example, a generic liballocator-v4l.so
that works for all v4l devices.  And for gl drivers, this would be a
gbm sorta thing.  For some APIs which don't expose a device fd, we
might need some sort of extension within that API.  (For example
OpenMAX..  but maybe if we wait long enough that problem just goes
away.)

a bit handwavey at the moment..

BR,
-R

On Wed, Oct 5, 2016 at 4:42 AM, Benjamin Gaignard
<benjamin.gaignard@linaro.org> wrote:
> Hi,
>
> Do you already have in mind a list of targeted driver/backend/plugin ?
> How do you expect to enumerate devices capabilities ? by adding a new
> generic ioctl or a configuration file in userland ?
>
> Maybe it is to early for those questions but anyway I'm interested by
> this memory allocation thread.
>
> Regards,
> Benjamin
>
> 2016-10-05 1:47 GMT+02:00 James Jones <jajones@nvidia.com>:
>> Hello everyone,
>>
>> As many are aware, we took up the issue of surface/memory allocation at XDC
>> this year.  The outcome of that discussion was the beginnings of a design
>> proposal for a library that would server as a cross-device, cross-process
>> surface allocator.  In the past week I've started to condense some of my
>> notes from that discussion down to code & a design document.  I've posted
>> the first pieces to a github repository here:
>>
>>   https://github.com/cubanismo/allocator
>>
>> This isn't anything close to usable code yet.  Just headers and docs, and
>> incomplete ones at that.  However, feel free to check it out if you're
>> interested in discussing the design.
>>
>> Thanks,
>> -James
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Benjamin Gaignard
>
> Graphic Study Group
>
> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: Facebook | Twitter | Blog
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-04 23:47 Unix Device Memory Allocation project James Jones
  2016-10-05  8:42 ` Benjamin Gaignard
@ 2016-10-18 23:40 ` Marek Olšák
  2016-10-19  0:08   ` James Jones
                     ` (3 more replies)
  1 sibling, 4 replies; 30+ messages in thread
From: Marek Olšák @ 2016-10-18 23:40 UTC (permalink / raw)
  To: James Jones; +Cc: dri-devel

Hi,

The text below describes how open source AMDGPU buffer sharing works.
I hope you'll find some useful bits in it.


Producer = allocates a buffer (or texture), and exports its handle
(DMABUF, etc.), and can use the buffer in various ways

Consumer = imports the handle, and can use the buffer in various ways


*** Producer-consumer interaction. ***

1) On handle export, the producer receives these flags:

- READ, WRITE, READ+WRITE: Describe the expected usage in the consumer.
  * The producer decides if it needs to disable compression based on
those flags.

- EXPLICIT_FLUSH flag: Meaning that the producer will explicitly
receive a "flush_resource" call before the consumer starts using the
buffer. This is a hint that the producer doesn't have to keep track of
"when to do decompression" when sharing the buffer with the consumer.


2) Passing metadata (tiling, pixel ordering, format, layout) info
between the producer and consumer:

- All AMDGPU buffer/texture allocations have 256 bytes (64 dwords) of
internal per-allocation metadata storage that lives in the kernel
space. There are amdgpu-specific ioctls that can "set" and "get" the
metadata. Any process that has a buffer handle can do that.
  * The produces writes the metadata, the consumer reads it.

- The producer-consumer interop API doesn't know about the metadata.
All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
  * There was a note during the talk that DMABUF doesn't have any
metadata. Well, I just told you that it has, but it's private to
amdgpu and possibly accessible to other kernel drivers too.
  * We can build upon this idea. I think the worst thing to do would
be to add metadata handling to driver-agnostic userspace APIs. Really,
driver-agnostic APIs shouldn't know about that, because they can't
understand all the hw-specific information encoded in the metadata.
Also, when you want to change the metadata format, you only have to
update the affected drivers, not userspace APIs.


3) Internal AMDGPU metadata storage format
- The header contains: Vendor ID, PCI ID, and version number.
- The header is followed by PCI-ID-specific data. The PCI ID and the
version number define the format.
- If the consumer runs on a different device, it must read the header
and parse the metadata based on that. It implies that the
driver-specific consumer code needs to know about all potential
producer devices.


Bottom line: DMABUF handles alone are fully sufficient for sharing
buffers/textures between devices and processes from the AMDGPU point
of view.

HW driver implementation: The driver doesn't know anything about the
users of exported or imported buffers. It only acts based on the few
flags described in section 1. So far that's all we've needed.


*** Use cases ***

1) DRI (producer: application; consumer: X server)
- The producer receives these flags: READ, EXPLICIT_FLUSH. The X
server will treat the shared "texture" as read-only. EXPLICIT_FLUSH
ensures the texture can be compressed, and "flush_resource" will be
called as part of SwapBuffers and "glFlush: GL_FRONT".
- The X server can run on a different device. In that case, the window
system API passes the "LINEAR" flag to the driver during allocation.
That's suboptimal and fixable.


2) OpenGL-OpenCL interop (OpenGL always exports handles, OpenCL always
imports handles)
- Possible flags: READ, WRITE, READ+WRITE
- OpenCL doesn't give us any other flags, so we are stuck with those.
- Inter-device sharing is possible if the consumer understands the
producer's metadata and tiling layouts.

(amdgpu actually stores 2 different metadata blocks per allocation,
but the simpler one is too limited and has only 8 bytes)

Marek


On Wed, Oct 5, 2016 at 1:47 AM, James Jones <jajones@nvidia.com> wrote:
> Hello everyone,
>
> As many are aware, we took up the issue of surface/memory allocation at XDC
> this year.  The outcome of that discussion was the beginnings of a design
> proposal for a library that would server as a cross-device, cross-process
> surface allocator.  In the past week I've started to condense some of my
> notes from that discussion down to code & a design document.  I've posted
> the first pieces to a github repository here:
>
>   https://github.com/cubanismo/allocator
>
> This isn't anything close to usable code yet.  Just headers and docs, and
> incomplete ones at that.  However, feel free to check it out if you're
> interested in discussing the design.
>
> Thanks,
> -James
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-18 23:40 ` Marek Olšák
@ 2016-10-19  0:08   ` James Jones
  2016-10-19  6:31     ` Daniel Vetter
  2016-10-19  6:23   ` Daniel Vetter
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: James Jones @ 2016-10-19  0:08 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

Thanks for the detailed writeup, and it was good to meet you at XDC.  Below:

On 10/18/2016 04:40 PM, Marek Olšák wrote:
> Hi,
>
> The text below describes how open source AMDGPU buffer sharing works.
> I hope you'll find some useful bits in it.
>
>
> Producer = allocates a buffer (or texture), and exports its handle
> (DMABUF, etc.), and can use the buffer in various ways
>
> Consumer = imports the handle, and can use the buffer in various ways
>
>
> *** Producer-consumer interaction. ***
>
> 1) On handle export, the producer receives these flags:
>
> - READ, WRITE, READ+WRITE: Describe the expected usage in the consumer.
>   * The producer decides if it needs to disable compression based on
> those flags.
>
> - EXPLICIT_FLUSH flag: Meaning that the producer will explicitly
> receive a "flush_resource" call before the consumer starts using the
> buffer. This is a hint that the producer doesn't have to keep track of
> "when to do decompression" when sharing the buffer with the consumer.
>
>
> 2) Passing metadata (tiling, pixel ordering, format, layout) info
> between the producer and consumer:
>
> - All AMDGPU buffer/texture allocations have 256 bytes (64 dwords) of
> internal per-allocation metadata storage that lives in the kernel
> space. There are amdgpu-specific ioctls that can "set" and "get" the
> metadata. Any process that has a buffer handle can do that.
>   * The produces writes the metadata, the consumer reads it.
>
> - The producer-consumer interop API doesn't know about the metadata.
> All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
>   * There was a note during the talk that DMABUF doesn't have any
> metadata. Well, I just told you that it has, but it's private to
> amdgpu and possibly accessible to other kernel drivers too.

OK.  I believe someone pointed this out during my talk or afterwards as 
well.  Some drivers are using this method, but there seems to be some 
debate over whether this is the preferred general design.  Others have 
told me this isn't the right mechanism to store this sort of metadata, 
but I'm not familiar with the specific counter arguments.

>   * We can build upon this idea. I think the worst thing to do would
> be to add metadata handling to driver-agnostic userspace APIs. Really,
> driver-agnostic APIs shouldn't know about that, because they can't
> understand all the hw-specific information encoded in the metadata.
> Also, when you want to change the metadata format, you only have to
> update the affected drivers, not userspace APIs.

How does this kernel-side metadata interact with userspace driver 
suballocation, or application-managed suballocation in APIs such as Vulkan?

Thanks,
-James

> 3) Internal AMDGPU metadata storage format
> - The header contains: Vendor ID, PCI ID, and version number.
> - The header is followed by PCI-ID-specific data. The PCI ID and the
> version number define the format.
> - If the consumer runs on a different device, it must read the header
> and parse the metadata based on that. It implies that the
> driver-specific consumer code needs to know about all potential
> producer devices.
>
>
> Bottom line: DMABUF handles alone are fully sufficient for sharing
> buffers/textures between devices and processes from the AMDGPU point
> of view.
>
> HW driver implementation: The driver doesn't know anything about the
> users of exported or imported buffers. It only acts based on the few
> flags described in section 1. So far that's all we've needed.
>
>
> *** Use cases ***
>
> 1) DRI (producer: application; consumer: X server)
> - The producer receives these flags: READ, EXPLICIT_FLUSH. The X
> server will treat the shared "texture" as read-only. EXPLICIT_FLUSH
> ensures the texture can be compressed, and "flush_resource" will be
> called as part of SwapBuffers and "glFlush: GL_FRONT".
> - The X server can run on a different device. In that case, the window
> system API passes the "LINEAR" flag to the driver during allocation.
> That's suboptimal and fixable.
>
>
> 2) OpenGL-OpenCL interop (OpenGL always exports handles, OpenCL always
> imports handles)
> - Possible flags: READ, WRITE, READ+WRITE
> - OpenCL doesn't give us any other flags, so we are stuck with those.
> - Inter-device sharing is possible if the consumer understands the
> producer's metadata and tiling layouts.
>
> (amdgpu actually stores 2 different metadata blocks per allocation,
> but the simpler one is too limited and has only 8 bytes)
>
> Marek
>
>
> On Wed, Oct 5, 2016 at 1:47 AM, James Jones <jajones@nvidia.com> wrote:
>> Hello everyone,
>>
>> As many are aware, we took up the issue of surface/memory allocation at XDC
>> this year.  The outcome of that discussion was the beginnings of a design
>> proposal for a library that would server as a cross-device, cross-process
>> surface allocator.  In the past week I've started to condense some of my
>> notes from that discussion down to code & a design document.  I've posted
>> the first pieces to a github repository here:
>>
>>   https://github.com/cubanismo/allocator
>>
>> This isn't anything close to usable code yet.  Just headers and docs, and
>> incomplete ones at that.  However, feel free to check it out if you're
>> interested in discussing the design.
>>
>> Thanks,
>> -James
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-18 23:40 ` Marek Olšák
  2016-10-19  0:08   ` James Jones
@ 2016-10-19  6:23   ` Daniel Vetter
  2016-10-19 12:15     ` Christian König
  2016-10-19 13:15     ` Marek Olšák
  2016-10-19  6:49   ` Michel Dänzer
  2016-10-19 12:33   ` Nicolai Hähnle
  3 siblings, 2 replies; 30+ messages in thread
From: Daniel Vetter @ 2016-10-19  6:23 UTC (permalink / raw)
  To: Marek Olšák; +Cc: James Jones, dri-devel

On Wed, Oct 19, 2016 at 1:40 AM, Marek Olšák <maraeo@gmail.com> wrote:
> - The producer-consumer interop API doesn't know about the metadata.
> All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
>   * There was a note during the talk that DMABUF doesn't have any
> metadata. Well, I just told you that it has, but it's private to
> amdgpu and possibly accessible to other kernel drivers too.
>   * We can build upon this idea. I think the worst thing to do would
> be to add metadata handling to driver-agnostic userspace APIs. Really,
> driver-agnostic APIs shouldn't know about that, because they can't
> understand all the hw-specific information encoded in the metadata.
> Also, when you want to change the metadata format, you only have to
> update the affected drivers, not userspace APIs.

That's a bit a surprise to hear, since "can't we just add a bit of
opaque metadata to dma-buf" came up all the time over the past years,
and died all the time again. dma-buf shouldn't imo be just yet another
linux IPC mechanism and protocol, which is pretty much what you end up
doing when you add this stuff. DRM runs all kinds of compositors with
all kinds of existing userspace proto, and with reasonable ones like
Wayland vendors can add whatever extensions they want. Plus there's
all the interop with v4l and every other kernel subsytem. Trying to
standardize that into some blob that works for everyone is imo nigh
impossible.

On top of that dma-buf is the wrong thing - you don't want this on
buffers, but on surfaces. At least when it's time to reallocate. And
oh dear I have seen what happens when soc vendors extend this design
to cover that use-case, plus dynamic reallocation and all that. Imo
there should be no way at all this ever comes close to dma-buf itself.

And tbh I think it's a bit silly that amd snuck this in through
amdgpu. But as long as you don't expect this to spread I guess it'll
be fine.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-19  0:08   ` James Jones
@ 2016-10-19  6:31     ` Daniel Vetter
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Vetter @ 2016-10-19  6:31 UTC (permalink / raw)
  To: James Jones; +Cc: dri-devel

On Wed, Oct 19, 2016 at 2:08 AM, James Jones <jajones@nvidia.com> wrote:
>>   * We can build upon this idea. I think the worst thing to do would
>> be to add metadata handling to driver-agnostic userspace APIs. Really,
>> driver-agnostic APIs shouldn't know about that, because they can't
>> understand all the hw-specific information encoded in the metadata.
>> Also, when you want to change the metadata format, you only have to
>> update the affected drivers, not userspace APIs.
>
> How does this kernel-side metadata interact with userspace driver
> suballocation, or application-managed suballocation in APIs such as Vulkan?

Perfect point for why the kernel (well dma-buf) imo really shouldn't
be in the business of handling the metadata. With explicit fencing (on
android) suballocation makes perfect sense also for shared buffer, and
then boom.

So yes, we either need protocol-extension support for vendors like on
Wayland, or backshoehorn it into existing stuff like dri3, but that
where it needs to be. And given that part of the reasons for this is
to allow cross-vendor interop the benefit of having something vendor
specific like these amgpu metadata blobs is out of the window anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-18 23:40 ` Marek Olšák
  2016-10-19  0:08   ` James Jones
  2016-10-19  6:23   ` Daniel Vetter
@ 2016-10-19  6:49   ` Michel Dänzer
  2016-10-19 12:33   ` Nicolai Hähnle
  3 siblings, 0 replies; 30+ messages in thread
From: Michel Dänzer @ 2016-10-19  6:49 UTC (permalink / raw)
  To: Marek Olšák, James Jones; +Cc: dri-devel

On 19/10/16 08:40 AM, Marek Olšák wrote:
> 
> 1) DRI (producer: application; consumer: X server)
> - The producer receives these flags: READ, EXPLICIT_FLUSH. The X
> server will treat the shared "texture" as read-only.

FWIW, no, the X server doesn't treat buffers shared with clients via DRI
as read-only.

In particular, pixmaps created from client-side buffers via DRI3 are
normal pixmaps which can be used for all X11 functionality where any
other pixmap can be used. At least the Plasma (KDE) desktop is already
making use of that, as I discovered when looking into
https://bugs.freedesktop.org/show_bug.cgi?id=95475 .

Similarly, with DRI2, shared buffers can be written to by the X server
as well as the client, in particular the (fake) front buffers used for
backing GLX window front buffers and GLX pixmaps.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-19  6:23   ` Daniel Vetter
@ 2016-10-19 12:15     ` Christian König
  2016-10-19 13:15     ` Marek Olšák
  1 sibling, 0 replies; 30+ messages in thread
From: Christian König @ 2016-10-19 12:15 UTC (permalink / raw)
  To: Daniel Vetter, Marek Olšák; +Cc: James Jones, dri-devel

Am 19.10.2016 um 08:23 schrieb Daniel Vetter:
> On Wed, Oct 19, 2016 at 1:40 AM, Marek Olšák <maraeo@gmail.com> wrote:
>> - The producer-consumer interop API doesn't know about the metadata.
>> All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
>>    * There was a note during the talk that DMABUF doesn't have any
>> metadata. Well, I just told you that it has, but it's private to
>> amdgpu and possibly accessible to other kernel drivers too.
>>    * We can build upon this idea. I think the worst thing to do would
>> be to add metadata handling to driver-agnostic userspace APIs. Really,
>> driver-agnostic APIs shouldn't know about that, because they can't
>> understand all the hw-specific information encoded in the metadata.
>> Also, when you want to change the metadata format, you only have to
>> update the affected drivers, not userspace APIs.
> That's a bit a surprise to hear, since "can't we just add a bit of
> opaque metadata to dma-buf" came up all the time over the past years,
> and died all the time again. dma-buf shouldn't imo be just yet another
> linux IPC mechanism and protocol, which is pretty much what you end up
> doing when you add this stuff. DRM runs all kinds of compositors with
> all kinds of existing userspace proto, and with reasonable ones like
> Wayland vendors can add whatever extensions they want. Plus there's
> all the interop with v4l and every other kernel subsytem. Trying to
> standardize that into some blob that works for everyone is imo nigh
> impossible.
>
> On top of that dma-buf is the wrong thing - you don't want this on
> buffers, but on surfaces. At least when it's time to reallocate. And
> oh dear I have seen what happens when soc vendors extend this design
> to cover that use-case, plus dynamic reallocation and all that. Imo
> there should be no way at all this ever comes close to dma-buf itself.
>
> And tbh I think it's a bit silly that amd snuck this in through
> amdgpu. But as long as you don't expect this to spread I guess it'll
> be fine.

Actually we didn't started to do it like this with amdgpu. Radeon works 
exactly the same way.

Additional to that there is a really good reason to do it like this: The 
kernel needs this information to program the CRTC for scanning out 
tilled surfaces as well.

So you either end up with additions to the mode setting call to 
transport all the device specific information about the data layout in 
the buffer or you attach this information directly to your buffer handle.

We choose the later and I think that this is a rather nice solution 
compared to all the headache you run into pushing this information 
through all the different protocols and APIs (DRI2,DRI3,Wayland,Modeset).

Regards,
Christian.

> -Daniel


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-18 23:40 ` Marek Olšák
                     ` (2 preceding siblings ...)
  2016-10-19  6:49   ` Michel Dänzer
@ 2016-10-19 12:33   ` Nicolai Hähnle
       [not found]     ` <CAAxE2A7ih_84H7w361msVYzRb8jb4ye8Psc1e5CO6gjJ2frO6g@mail.gmail.com>
  3 siblings, 1 reply; 30+ messages in thread
From: Nicolai Hähnle @ 2016-10-19 12:33 UTC (permalink / raw)
  To: Marek Olšák, James Jones; +Cc: dri-devel

On 19.10.2016 01:40, Marek Olšák wrote:
>   * We can build upon this idea. I think the worst thing to do would
> be to add metadata handling to driver-agnostic userspace APIs. Really,
> driver-agnostic APIs shouldn't know about that, because they can't
> understand all the hw-specific information encoded in the metadata.
> Also, when you want to change the metadata format, you only have to
> update the affected drivers, not userspace APIs.

I don't fully agree with that. In a PRIME setting, where you have a 
compositor running on an integrated GPU and an application on a dGPU, 
there may well be a benefit to finding a tiling format that the dGPU can 
produce and the iGPU can consume. Admittedly I don't know whether that's 
actually possible today when they're from different vendors, but 
resigning ourselves to linear only for all time seems a bit pessimistic.

Nicolai
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-19  6:23   ` Daniel Vetter
  2016-10-19 12:15     ` Christian König
@ 2016-10-19 13:15     ` Marek Olšák
  2016-10-19 14:10       ` Daniel Vetter
  1 sibling, 1 reply; 30+ messages in thread
From: Marek Olšák @ 2016-10-19 13:15 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: James Jones, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3164 bytes --]

On Oct 19, 2016 8:24 AM, "Daniel Vetter" <daniel@ffwll.ch> wrote:
>
> On Wed, Oct 19, 2016 at 1:40 AM, Marek Olšák <maraeo@gmail.com> wrote:
> > - The producer-consumer interop API doesn't know about the metadata.
> > All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
> >   * There was a note during the talk that DMABUF doesn't have any
> > metadata. Well, I just told you that it has, but it's private to
> > amdgpu and possibly accessible to other kernel drivers too.
> >   * We can build upon this idea. I think the worst thing to do would
> > be to add metadata handling to driver-agnostic userspace APIs. Really,
> > driver-agnostic APIs shouldn't know about that, because they can't
> > understand all the hw-specific information encoded in the metadata.
> > Also, when you want to change the metadata format, you only have to
> > update the affected drivers, not userspace APIs.
>
> That's a bit a surprise to hear, since "can't we just add a bit of
> opaque metadata to dma-buf" came up all the time over the past years,
> and died all the time again. dma-buf shouldn't imo be just yet another
> linux IPC mechanism and protocol, which is pretty much what you end up
> doing when you add this stuff. DRM runs all kinds of compositors with
> all kinds of existing userspace proto, and with reasonable ones like
> Wayland vendors can add whatever extensions they want. Plus there's
> all the interop with v4l and every other kernel subsytem. Trying to
> standardize that into some blob that works for everyone is imo nigh
> impossible.
>
> On top of that dma-buf is the wrong thing - you don't want this on
> buffers, but on surfaces. At least when it's time to reallocate. And
> oh dear I have seen what happens when soc vendors extend this design
> to cover that use-case, plus dynamic reallocation and all that. Imo
> there should be no way at all this ever comes close to dma-buf itself.
>
> And tbh I think it's a bit silly that amd snuck this in through
> amdgpu. But as long as you don't expect this to spread I guess it'll
> be fine.

LOL. It's not per DMABUF, it's per buffer, so you need a KMS handle to
access it from userspace.

We've had per buffer metadata in Radeon since KMS, which I believe first
appeared in 2009. It's 4 bytes large and is used to communicate tiling
flags between Mesa, DDX, and the kernel display code. It was a widely
accepted solution back then and Red Hat was the main developer. So yeah,
pretty much all people except Intel were collaborating on "sneaking" this
in in 2009. I think radeon driver developers deserve an apology for that
language.

Amdgpu extended that metadata to 8 bytes and it's used in the same way as
radeon. Additionally, amdgpu added opaque metadata having 256 bytes for use
by userspace drivers only. The kernel driver isn't supposed to read it or
parse it. The format is negotiated between userspace driver developers for
sharing of more complex allocations than 2D displayable surfaces.

Marek


> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

[-- Attachment #1.2: Type: text/html, Size: 3817 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
       [not found]       ` <CAAxE2A53E_r9uA=FG_A63aBVgqaTWBuAzDZfDvRe9K+0EWmFeQ@mail.gmail.com>
@ 2016-10-19 13:40         ` Marek Olšák
  0 siblings, 0 replies; 30+ messages in thread
From: Marek Olšák @ 2016-10-19 13:40 UTC (permalink / raw)
  To: Nicolai Haehnle; +Cc: James Jones, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1241 bytes --]

On Oct 19, 2016 2:33 PM, "Nicolai Hähnle" <nhaehnle@gmail.com> wrote:
>
> On 19.10.2016 01:40, Marek Olšák wrote:
>>
>>   * We can build upon this idea. I think the worst thing to do would
>> be to add metadata handling to driver-agnostic userspace APIs. Really,
>> driver-agnostic APIs shouldn't know about that, because they can't
>> understand all the hw-specific information encoded in the metadata.
>> Also, when you want to change the metadata format, you only have to
>> update the affected drivers, not userspace APIs.
>
>
> I don't fully agree with that. In a PRIME setting, where you have a
compositor running on an integrated GPU and an application on a dGPU, there
may well be a benefit to finding a tiling format that the dGPU can produce
and the iGPU can consume. Admittedly I don't know whether that's actually
possible today when they're from different vendors, but resigning ourselves
to linear only for all time seems a bit pessimistic.

Yeah, that's a good point and I even mentioned it near the end of my post.
It could be solved by passing the PCI ID of the other device to the driver
instead of the linear flag. Then the driver can decide whether a linear
layout is necessary or not.

Marek

[-- Attachment #1.2: Type: text/html, Size: 1495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-19 13:15     ` Marek Olšák
@ 2016-10-19 14:10       ` Daniel Vetter
  2016-10-19 16:46         ` Marek Olšák
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2016-10-19 14:10 UTC (permalink / raw)
  To: Marek Olšák; +Cc: James Jones, dri-devel

On Wed, Oct 19, 2016 at 03:15:08PM +0200, Marek Olšák wrote:
> On Oct 19, 2016 8:24 AM, "Daniel Vetter" <daniel@ffwll.ch> wrote:
> > On Wed, Oct 19, 2016 at 1:40 AM, Marek Olšák <maraeo@gmail.com> wrote:
> > > - The producer-consumer interop API doesn't know about the metadata.
> > > All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
> > >   * There was a note during the talk that DMABUF doesn't have any
> > > metadata. Well, I just told you that it has, but it's private to
> > > amdgpu and possibly accessible to other kernel drivers too.
> > >   * We can build upon this idea. I think the worst thing to do would
> > > be to add metadata handling to driver-agnostic userspace APIs. Really,
> > > driver-agnostic APIs shouldn't know about that, because they can't
> > > understand all the hw-specific information encoded in the metadata.
> > > Also, when you want to change the metadata format, you only have to
> > > update the affected drivers, not userspace APIs.
> >
> > That's a bit a surprise to hear, since "can't we just add a bit of
> > opaque metadata to dma-buf" came up all the time over the past years,
> > and died all the time again. dma-buf shouldn't imo be just yet another
> > linux IPC mechanism and protocol, which is pretty much what you end up
> > doing when you add this stuff. DRM runs all kinds of compositors with
> > all kinds of existing userspace proto, and with reasonable ones like
> > Wayland vendors can add whatever extensions they want. Plus there's
> > all the interop with v4l and every other kernel subsytem. Trying to
> > standardize that into some blob that works for everyone is imo nigh
> > impossible.
> >
> > On top of that dma-buf is the wrong thing - you don't want this on
> > buffers, but on surfaces. At least when it's time to reallocate. And
> > oh dear I have seen what happens when soc vendors extend this design
> > to cover that use-case, plus dynamic reallocation and all that. Imo
> > there should be no way at all this ever comes close to dma-buf itself.
> >
> > And tbh I think it's a bit silly that amd snuck this in through
> > amdgpu. But as long as you don't expect this to spread I guess it'll
> > be fine.
> 
> LOL. It's not per DMABUF, it's per buffer, so you need a KMS handle to
> access it from userspace.

Seems to be on gem buffers, not any KMS object (like framebuffers). For
cross driver that corresponds to dma-buf, and the discussion here is about
cross-driver/vendor buffer sharing.

> We've had per buffer metadata in Radeon since KMS, which I believe first
> appeared in 2009. It's 4 bytes large and is used to communicate tiling
> flags between Mesa, DDX, and the kernel display code. It was a widely
> accepted solution back then and Red Hat was the main developer. So yeah,
> pretty much all people except Intel were collaborating on "sneaking" this
> in in 2009. I think radeon driver developers deserve an apology for that
> language.
> 
> Amdgpu extended that metadata to 8 bytes and it's used in the same way as
> radeon. Additionally, amdgpu added opaque metadata having 256 bytes for use
> by userspace drivers only. The kernel driver isn't supposed to read it or
> parse it. The format is negotiated between userspace driver developers for
> sharing of more complex allocations than 2D displayable surfaces.

Metadata needed for kms (what Christian also pointed out) is what everyone
did (intel included) and I think that's perfectly reasonable. And I was
aware of that radeon is doing that since the dawn of ages since forever.

What I think is not really ok is opaque metadata blobs that the kernel
never ever inspect, but just carries around. That essentially means you're
reimplementing some bad form of IPC, and I dont think that's something the
drm subsystem (or dma-buf) really should be doing. Because you still have
that real protocol in userspace (dri2/3, wayland, whatever), but now with
a side channel with no documented ordering and synchronization. It gets
the job done for single-vendor buffer metadata transport, but as soon as
there's more than one vendor, or as soon as you need to reallocate buffers
dynamically because the usage changes it gets bad imo (and I've seen what
that looks like on android in various forms). And that consensus (at least
among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
meeting we've had over 5 years ago. Not sure we're gaining anything with a
"who's older" competition.

Anyways it's there and it's uabi so will never disappear. Just wanted to
make sure it's clear that for dma-buf we've discussed this years ago, and
decided it wasn't a great idea. And I think that's still correct.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-19 14:10       ` Daniel Vetter
@ 2016-10-19 16:46         ` Marek Olšák
  2016-10-20  6:31           ` Daniel Vetter
  0 siblings, 1 reply; 30+ messages in thread
From: Marek Olšák @ 2016-10-19 16:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: James Jones, dri-devel

On Wed, Oct 19, 2016 at 4:10 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Oct 19, 2016 at 03:15:08PM +0200, Marek Olšák wrote:
>> On Oct 19, 2016 8:24 AM, "Daniel Vetter" <daniel@ffwll.ch> wrote:
>> > On Wed, Oct 19, 2016 at 1:40 AM, Marek Olšák <maraeo@gmail.com> wrote:
>> > > - The producer-consumer interop API doesn't know about the metadata.
>> > > All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
>> > >   * There was a note during the talk that DMABUF doesn't have any
>> > > metadata. Well, I just told you that it has, but it's private to
>> > > amdgpu and possibly accessible to other kernel drivers too.
>> > >   * We can build upon this idea. I think the worst thing to do would
>> > > be to add metadata handling to driver-agnostic userspace APIs. Really,
>> > > driver-agnostic APIs shouldn't know about that, because they can't
>> > > understand all the hw-specific information encoded in the metadata.
>> > > Also, when you want to change the metadata format, you only have to
>> > > update the affected drivers, not userspace APIs.
>> >
>> > That's a bit a surprise to hear, since "can't we just add a bit of
>> > opaque metadata to dma-buf" came up all the time over the past years,
>> > and died all the time again. dma-buf shouldn't imo be just yet another
>> > linux IPC mechanism and protocol, which is pretty much what you end up
>> > doing when you add this stuff. DRM runs all kinds of compositors with
>> > all kinds of existing userspace proto, and with reasonable ones like
>> > Wayland vendors can add whatever extensions they want. Plus there's
>> > all the interop with v4l and every other kernel subsytem. Trying to
>> > standardize that into some blob that works for everyone is imo nigh
>> > impossible.
>> >
>> > On top of that dma-buf is the wrong thing - you don't want this on
>> > buffers, but on surfaces. At least when it's time to reallocate. And
>> > oh dear I have seen what happens when soc vendors extend this design
>> > to cover that use-case, plus dynamic reallocation and all that. Imo
>> > there should be no way at all this ever comes close to dma-buf itself.
>> >
>> > And tbh I think it's a bit silly that amd snuck this in through
>> > amdgpu. But as long as you don't expect this to spread I guess it'll
>> > be fine.
>>
>> LOL. It's not per DMABUF, it's per buffer, so you need a KMS handle to
>> access it from userspace.
>
> Seems to be on gem buffers, not any KMS object (like framebuffers). For
> cross driver that corresponds to dma-buf, and the discussion here is about
> cross-driver/vendor buffer sharing.
>
>> We've had per buffer metadata in Radeon since KMS, which I believe first
>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>> flags between Mesa, DDX, and the kernel display code. It was a widely
>> accepted solution back then and Red Hat was the main developer. So yeah,
>> pretty much all people except Intel were collaborating on "sneaking" this
>> in in 2009. I think radeon driver developers deserve an apology for that
>> language.
>>
>> Amdgpu extended that metadata to 8 bytes and it's used in the same way as
>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes for use
>> by userspace drivers only. The kernel driver isn't supposed to read it or
>> parse it. The format is negotiated between userspace driver developers for
>> sharing of more complex allocations than 2D displayable surfaces.
>
> Metadata needed for kms (what Christian also pointed out) is what everyone
> did (intel included) and I think that's perfectly reasonable. And I was
> aware of that radeon is doing that since the dawn of ages since forever.
>
> What I think is not really ok is opaque metadata blobs that the kernel
> never ever inspect, but just carries around. That essentially means you're
> reimplementing some bad form of IPC, and I dont think that's something the
> drm subsystem (or dma-buf) really should be doing. Because you still have
> that real protocol in userspace (dri2/3, wayland, whatever), but now with
> a side channel with no documented ordering and synchronization. It gets
> the job done for single-vendor buffer metadata transport, but as soon as
> there's more than one vendor, or as soon as you need to reallocate buffers
> dynamically because the usage changes it gets bad imo (and I've seen what

The metadata is immutable after allocation, so it's not a
communication channel. There is no synchronization or ordering needed
for immutable metadata. That implies that a shared buffer can't be
reused for an entirely different purpose. It can only be used as-is or
freed.

For suballocated memory, the idea is to reallocate it as a separate
buffer on the first "handle" export, so that shared suballocated
buffers don't exist.

> that looks like on android in various forms). And that consensus (at least
> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
> meeting we've had over 5 years ago. Not sure we're gaining anything with a
> "who's older" competition.
>
> Anyways it's there and it's uabi so will never disappear. Just wanted to
> make sure it's clear that for dma-buf we've discussed this years ago, and
> decided it wasn't a great idea. And I think that's still correct.

The arguments against blob metadata sound reasonable to me. I'm pretty
sceptic that window system protocols will make driver-specific
metadata blobs redundant anytime soon though. It seems the protocols
don't get much attention nowadays and there is no incentive to do
things differently in that area. At least that's how it appears to me,
but I'm not involved in that.

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-19 16:46         ` Marek Olšák
@ 2016-10-20  6:31           ` Daniel Vetter
  2017-01-03 23:38             ` Marek Olšák
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2016-10-20  6:31 UTC (permalink / raw)
  To: Marek Olšák; +Cc: James Jones, dri-devel

On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>> We've had per buffer metadata in Radeon since KMS, which I believe first
>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>>> flags between Mesa, DDX, and the kernel display code. It was a widely
>>> accepted solution back then and Red Hat was the main developer. So yeah,
>>> pretty much all people except Intel were collaborating on "sneaking" this
>>> in in 2009. I think radeon driver developers deserve an apology for that
>>> language.
>>>
>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way as
>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes for use
>>> by userspace drivers only. The kernel driver isn't supposed to read it or
>>> parse it. The format is negotiated between userspace driver developers for
>>> sharing of more complex allocations than 2D displayable surfaces.
>>
>> Metadata needed for kms (what Christian also pointed out) is what everyone
>> did (intel included) and I think that's perfectly reasonable. And I was
>> aware of that radeon is doing that since the dawn of ages since forever.
>>
>> What I think is not really ok is opaque metadata blobs that the kernel
>> never ever inspect, but just carries around. That essentially means you're
>> reimplementing some bad form of IPC, and I dont think that's something the
>> drm subsystem (or dma-buf) really should be doing. Because you still have
>> that real protocol in userspace (dri2/3, wayland, whatever), but now with
>> a side channel with no documented ordering and synchronization. It gets
>> the job done for single-vendor buffer metadata transport, but as soon as
>> there's more than one vendor, or as soon as you need to reallocate buffers
>> dynamically because the usage changes it gets bad imo (and I've seen what
>
> The metadata is immutable after allocation, so it's not a
> communication channel. There is no synchronization or ordering needed
> for immutable metadata. That implies that a shared buffer can't be
> reused for an entirely different purpose. It can only be used as-is or
> freed.
>
> For suballocated memory, the idea is to reallocate it as a separate
> buffer on the first "handle" export, so that shared suballocated
> buffers don't exist.

Yeah, once it becomes mutable the fun starts imo. I didn't realize
that you're treating it strictly immutable since at least the kernel
ioctl has both set and get (and that's the thing I looked at).
Immutable stuff shouldn't be any problem (except that of course it
won't work cross-driver in any fashion)

>> that looks like on android in various forms). And that consensus (at least
>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
>> meeting we've had over 5 years ago. Not sure we're gaining anything with a
>> "who's older" competition.
>>
>> Anyways it's there and it's uabi so will never disappear. Just wanted to
>> make sure it's clear that for dma-buf we've discussed this years ago, and
>> decided it wasn't a great idea. And I think that's still correct.
>
> The arguments against blob metadata sound reasonable to me. I'm pretty
> sceptic that window system protocols will make driver-specific
> metadata blobs redundant anytime soon though. It seems the protocols
> don't get much attention nowadays and there is no incentive to do
> things differently in that area. At least that's how it appears to me,
> but I'm not involved in that.

Folks are working on protocols again, at least I think the plan is to
make all that shared buffer allocation dance also work over
compositor/client situation (would be a bit pointless without that).
And agreed there'll always be driver-specific stuff which is opaque to
everyone else, but I hope at least in the future that all gets
shuffled around through protocol extensions. And not in the way every
Android gfx stack seems to work, where everyone has their own
vendor-private ipc-over-dma-buf thing. Wayland definitely got this
right, both protocol versioning and being able to add any kind of
new/vendor-private protocol endpoints to any wayland protocol. X is a
lot more pain, but since it finally looks like the world is switching
away from it we might get away with  a simpler protocol there. At
least all the tricky reallocation dances seem to matter a lot more on
mobile/tablets/phones, and there Wayland starts to rule.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2016-10-20  6:31           ` Daniel Vetter
@ 2017-01-03 23:38             ` Marek Olšák
  2017-01-03 23:43               ` James Jones
                                 ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Marek Olšák @ 2017-01-03 23:38 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: James Jones, dri-devel

On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>> We've had per buffer metadata in Radeon since KMS, which I believe first
>>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>>>> flags between Mesa, DDX, and the kernel display code. It was a widely
>>>> accepted solution back then and Red Hat was the main developer. So yeah,
>>>> pretty much all people except Intel were collaborating on "sneaking" this
>>>> in in 2009. I think radeon driver developers deserve an apology for that
>>>> language.
>>>>
>>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way as
>>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes for use
>>>> by userspace drivers only. The kernel driver isn't supposed to read it or
>>>> parse it. The format is negotiated between userspace driver developers for
>>>> sharing of more complex allocations than 2D displayable surfaces.
>>>
>>> Metadata needed for kms (what Christian also pointed out) is what everyone
>>> did (intel included) and I think that's perfectly reasonable. And I was
>>> aware of that radeon is doing that since the dawn of ages since forever.
>>>
>>> What I think is not really ok is opaque metadata blobs that the kernel
>>> never ever inspect, but just carries around. That essentially means you're
>>> reimplementing some bad form of IPC, and I dont think that's something the
>>> drm subsystem (or dma-buf) really should be doing. Because you still have
>>> that real protocol in userspace (dri2/3, wayland, whatever), but now with
>>> a side channel with no documented ordering and synchronization. It gets
>>> the job done for single-vendor buffer metadata transport, but as soon as
>>> there's more than one vendor, or as soon as you need to reallocate buffers
>>> dynamically because the usage changes it gets bad imo (and I've seen what
>>
>> The metadata is immutable after allocation, so it's not a
>> communication channel. There is no synchronization or ordering needed
>> for immutable metadata. That implies that a shared buffer can't be
>> reused for an entirely different purpose. It can only be used as-is or
>> freed.
>>
>> For suballocated memory, the idea is to reallocate it as a separate
>> buffer on the first "handle" export, so that shared suballocated
>> buffers don't exist.
>
> Yeah, once it becomes mutable the fun starts imo. I didn't realize
> that you're treating it strictly immutable since at least the kernel
> ioctl has both set and get (and that's the thing I looked at).
> Immutable stuff shouldn't be any problem (except that of course it
> won't work cross-driver in any fashion)
>
>>> that looks like on android in various forms). And that consensus (at least
>>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
>>> meeting we've had over 5 years ago. Not sure we're gaining anything with a
>>> "who's older" competition.
>>>
>>> Anyways it's there and it's uabi so will never disappear. Just wanted to
>>> make sure it's clear that for dma-buf we've discussed this years ago, and
>>> decided it wasn't a great idea. And I think that's still correct.
>>
>> The arguments against blob metadata sound reasonable to me. I'm pretty
>> sceptic that window system protocols will make driver-specific
>> metadata blobs redundant anytime soon though. It seems the protocols
>> don't get much attention nowadays and there is no incentive to do
>> things differently in that area. At least that's how it appears to me,
>> but I'm not involved in that.
>
> Folks are working on protocols again, at least I think the plan is to
> make all that shared buffer allocation dance also work over
> compositor/client situation (would be a bit pointless without that).
> And agreed there'll always be driver-specific stuff which is opaque to
> everyone else, but I hope at least in the future that all gets
> shuffled around through protocol extensions. And not in the way every
> Android gfx stack seems to work, where everyone has their own
> vendor-private ipc-over-dma-buf thing. Wayland definitely got this
> right, both protocol versioning and being able to add any kind of
> new/vendor-private protocol endpoints to any wayland protocol. X is a
> lot more pain, but since it finally looks like the world is switching
> away from it we might get away with  a simpler protocol there. At
> least all the tricky reallocation dances seem to matter a lot more on
> mobile/tablets/phones, and there Wayland starts to rule.

I've been thinking about it, and it looks like we're gonna continue
using immutable per-BO metadata (buffer layout, tiling description,
compression flags). The reasons are that everything else is less
economical, and the current "modifier" work done in EGL/GBM is
insufficient for our hardware - we need approx. 96 bytes of metadata
for proper buffer sharing (not just for display, but also 3D interop -
MSAA, mipmapping, compression), while EGL modifiers only support 8
bytes of metadata. However, that doesn't matter, because:

These are the components that need to work with the BO metadata:
- Mesa driver backend
- AMDGPU kernel driver

These are the components that should never know about the BO metadata:
- Any Mesa shared code
- EGL
- GBM
- Window system protocols
- Display servers
- DDXs

The more components you need to change when the requirements change,
the less economical the whole thing is, and the more painful the
deployment is.

Interop with other vendors would be trivial - the kernel drivers can
exchange buffer layouts, and DRM can have an interface for it.
Userspace doesn't have to know about any of that. (It also seems kinda
dangerous to use userspace as a middle man for passing the
metadata/modifiers around)

Speaking of compression for display, especially the separate
compression buffer: That should be fully contained in the main DMABUF
and described by the per-BO metadata. Some other drivers want to use a
separate DMABUF for the compression buffer - while that may sound good
in theory, it's not economical for the reason described above.

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-03 23:38             ` Marek Olšák
@ 2017-01-03 23:43               ` James Jones
  2017-01-04  0:06                 ` Marek Olšák
  2017-01-04  8:46               ` Stéphane Marchesin
  2017-01-04 12:03               ` Daniel Stone
  2 siblings, 1 reply; 30+ messages in thread
From: James Jones @ 2017-01-03 23:43 UTC (permalink / raw)
  To: Marek Olšák, Daniel Vetter; +Cc: dri-devel

On 01/03/2017 03:38 PM, Marek Olšák wrote:
> On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>> We've had per buffer metadata in Radeon since KMS, which I believe first
>>>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>>>>> flags between Mesa, DDX, and the kernel display code. It was a widely
>>>>> accepted solution back then and Red Hat was the main developer. So yeah,
>>>>> pretty much all people except Intel were collaborating on "sneaking" this
>>>>> in in 2009. I think radeon driver developers deserve an apology for that
>>>>> language.
>>>>>
>>>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way as
>>>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes for use
>>>>> by userspace drivers only. The kernel driver isn't supposed to read it or
>>>>> parse it. The format is negotiated between userspace driver developers for
>>>>> sharing of more complex allocations than 2D displayable surfaces.
>>>>
>>>> Metadata needed for kms (what Christian also pointed out) is what everyone
>>>> did (intel included) and I think that's perfectly reasonable. And I was
>>>> aware of that radeon is doing that since the dawn of ages since forever.
>>>>
>>>> What I think is not really ok is opaque metadata blobs that the kernel
>>>> never ever inspect, but just carries around. That essentially means you're
>>>> reimplementing some bad form of IPC, and I dont think that's something the
>>>> drm subsystem (or dma-buf) really should be doing. Because you still have
>>>> that real protocol in userspace (dri2/3, wayland, whatever), but now with
>>>> a side channel with no documented ordering and synchronization. It gets
>>>> the job done for single-vendor buffer metadata transport, but as soon as
>>>> there's more than one vendor, or as soon as you need to reallocate buffers
>>>> dynamically because the usage changes it gets bad imo (and I've seen what
>>>
>>> The metadata is immutable after allocation, so it's not a
>>> communication channel. There is no synchronization or ordering needed
>>> for immutable metadata. That implies that a shared buffer can't be
>>> reused for an entirely different purpose. It can only be used as-is or
>>> freed.
>>>
>>> For suballocated memory, the idea is to reallocate it as a separate
>>> buffer on the first "handle" export, so that shared suballocated
>>> buffers don't exist.
>>
>> Yeah, once it becomes mutable the fun starts imo. I didn't realize
>> that you're treating it strictly immutable since at least the kernel
>> ioctl has both set and get (and that's the thing I looked at).
>> Immutable stuff shouldn't be any problem (except that of course it
>> won't work cross-driver in any fashion)
>>
>>>> that looks like on android in various forms). And that consensus (at least
>>>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
>>>> meeting we've had over 5 years ago. Not sure we're gaining anything with a
>>>> "who's older" competition.
>>>>
>>>> Anyways it's there and it's uabi so will never disappear. Just wanted to
>>>> make sure it's clear that for dma-buf we've discussed this years ago, and
>>>> decided it wasn't a great idea. And I think that's still correct.
>>>
>>> The arguments against blob metadata sound reasonable to me. I'm pretty
>>> sceptic that window system protocols will make driver-specific
>>> metadata blobs redundant anytime soon though. It seems the protocols
>>> don't get much attention nowadays and there is no incentive to do
>>> things differently in that area. At least that's how it appears to me,
>>> but I'm not involved in that.
>>
>> Folks are working on protocols again, at least I think the plan is to
>> make all that shared buffer allocation dance also work over
>> compositor/client situation (would be a bit pointless without that).
>> And agreed there'll always be driver-specific stuff which is opaque to
>> everyone else, but I hope at least in the future that all gets
>> shuffled around through protocol extensions. And not in the way every
>> Android gfx stack seems to work, where everyone has their own
>> vendor-private ipc-over-dma-buf thing. Wayland definitely got this
>> right, both protocol versioning and being able to add any kind of
>> new/vendor-private protocol endpoints to any wayland protocol. X is a
>> lot more pain, but since it finally looks like the world is switching
>> away from it we might get away with  a simpler protocol there. At
>> least all the tricky reallocation dances seem to matter a lot more on
>> mobile/tablets/phones, and there Wayland starts to rule.
>
> I've been thinking about it, and it looks like we're gonna continue
> using immutable per-BO metadata (buffer layout, tiling description,
> compression flags). The reasons are that everything else is less
> economical, and the current "modifier" work done in EGL/GBM is
> insufficient for our hardware - we need approx. 96 bytes of metadata
> for proper buffer sharing (not just for display, but also 3D interop -
> MSAA, mipmapping, compression), while EGL modifiers only support 8
> bytes of metadata. However, that doesn't matter, because:
>
> These are the components that need to work with the BO metadata:
> - Mesa driver backend
> - AMDGPU kernel driver
>
> These are the components that should never know about the BO metadata:
> - Any Mesa shared code
> - EGL
> - GBM
> - Window system protocols
> - Display servers
> - DDXs
>
> The more components you need to change when the requirements change,
> the less economical the whole thing is, and the more painful the
> deployment is.
>
> Interop with other vendors would be trivial - the kernel drivers can
> exchange buffer layouts, and DRM can have an interface for it.
> Userspace doesn't have to know about any of that. (It also seems kinda
> dangerous to use userspace as a middle man for passing the
> metadata/modifiers around)

Could you elaborate one what seems dangerous about it?

Thanks,
-James

> Speaking of compression for display, especially the separate
> compression buffer: That should be fully contained in the main DMABUF
> and described by the per-BO metadata. Some other drivers want to use a
> separate DMABUF for the compression buffer - while that may sound good
> in theory, it's not economical for the reason described above.
>
> Marek
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-03 23:43               ` James Jones
@ 2017-01-04  0:06                 ` Marek Olšák
  2017-01-04  0:19                   ` James Jones
  0 siblings, 1 reply; 30+ messages in thread
From: Marek Olšák @ 2017-01-04  0:06 UTC (permalink / raw)
  To: James Jones; +Cc: dri-devel

On Wed, Jan 4, 2017 at 12:43 AM, James Jones <jajones@nvidia.com> wrote:
> On 01/03/2017 03:38 PM, Marek Olšák wrote:
>>
>> On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>
>>> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>>>
>>>>>> We've had per buffer metadata in Radeon since KMS, which I believe
>>>>>> first
>>>>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>>>>>> flags between Mesa, DDX, and the kernel display code. It was a widely
>>>>>> accepted solution back then and Red Hat was the main developer. So
>>>>>> yeah,
>>>>>> pretty much all people except Intel were collaborating on "sneaking"
>>>>>> this
>>>>>> in in 2009. I think radeon driver developers deserve an apology for
>>>>>> that
>>>>>> language.
>>>>>>
>>>>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way
>>>>>> as
>>>>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes
>>>>>> for use
>>>>>> by userspace drivers only. The kernel driver isn't supposed to read it
>>>>>> or
>>>>>> parse it. The format is negotiated between userspace driver developers
>>>>>> for
>>>>>> sharing of more complex allocations than 2D displayable surfaces.
>>>>>
>>>>>
>>>>> Metadata needed for kms (what Christian also pointed out) is what
>>>>> everyone
>>>>> did (intel included) and I think that's perfectly reasonable. And I was
>>>>> aware of that radeon is doing that since the dawn of ages since
>>>>> forever.
>>>>>
>>>>> What I think is not really ok is opaque metadata blobs that the kernel
>>>>> never ever inspect, but just carries around. That essentially means
>>>>> you're
>>>>> reimplementing some bad form of IPC, and I dont think that's something
>>>>> the
>>>>> drm subsystem (or dma-buf) really should be doing. Because you still
>>>>> have
>>>>> that real protocol in userspace (dri2/3, wayland, whatever), but now
>>>>> with
>>>>> a side channel with no documented ordering and synchronization. It gets
>>>>> the job done for single-vendor buffer metadata transport, but as soon
>>>>> as
>>>>> there's more than one vendor, or as soon as you need to reallocate
>>>>> buffers
>>>>> dynamically because the usage changes it gets bad imo (and I've seen
>>>>> what
>>>>
>>>>
>>>> The metadata is immutable after allocation, so it's not a
>>>> communication channel. There is no synchronization or ordering needed
>>>> for immutable metadata. That implies that a shared buffer can't be
>>>> reused for an entirely different purpose. It can only be used as-is or
>>>> freed.
>>>>
>>>> For suballocated memory, the idea is to reallocate it as a separate
>>>> buffer on the first "handle" export, so that shared suballocated
>>>> buffers don't exist.
>>>
>>>
>>> Yeah, once it becomes mutable the fun starts imo. I didn't realize
>>> that you're treating it strictly immutable since at least the kernel
>>> ioctl has both set and get (and that's the thing I looked at).
>>> Immutable stuff shouldn't be any problem (except that of course it
>>> won't work cross-driver in any fashion)
>>>
>>>>> that looks like on android in various forms). And that consensus (at
>>>>> least
>>>>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
>>>>> meeting we've had over 5 years ago. Not sure we're gaining anything
>>>>> with a
>>>>> "who's older" competition.
>>>>>
>>>>> Anyways it's there and it's uabi so will never disappear. Just wanted
>>>>> to
>>>>> make sure it's clear that for dma-buf we've discussed this years ago,
>>>>> and
>>>>> decided it wasn't a great idea. And I think that's still correct.
>>>>
>>>>
>>>> The arguments against blob metadata sound reasonable to me. I'm pretty
>>>> sceptic that window system protocols will make driver-specific
>>>> metadata blobs redundant anytime soon though. It seems the protocols
>>>> don't get much attention nowadays and there is no incentive to do
>>>> things differently in that area. At least that's how it appears to me,
>>>> but I'm not involved in that.
>>>
>>>
>>> Folks are working on protocols again, at least I think the plan is to
>>> make all that shared buffer allocation dance also work over
>>> compositor/client situation (would be a bit pointless without that).
>>> And agreed there'll always be driver-specific stuff which is opaque to
>>> everyone else, but I hope at least in the future that all gets
>>> shuffled around through protocol extensions. And not in the way every
>>> Android gfx stack seems to work, where everyone has their own
>>> vendor-private ipc-over-dma-buf thing. Wayland definitely got this
>>> right, both protocol versioning and being able to add any kind of
>>> new/vendor-private protocol endpoints to any wayland protocol. X is a
>>> lot more pain, but since it finally looks like the world is switching
>>> away from it we might get away with  a simpler protocol there. At
>>> least all the tricky reallocation dances seem to matter a lot more on
>>> mobile/tablets/phones, and there Wayland starts to rule.
>>
>>
>> I've been thinking about it, and it looks like we're gonna continue
>> using immutable per-BO metadata (buffer layout, tiling description,
>> compression flags). The reasons are that everything else is less
>> economical, and the current "modifier" work done in EGL/GBM is
>> insufficient for our hardware - we need approx. 96 bytes of metadata
>> for proper buffer sharing (not just for display, but also 3D interop -
>> MSAA, mipmapping, compression), while EGL modifiers only support 8
>> bytes of metadata. However, that doesn't matter, because:
>>
>> These are the components that need to work with the BO metadata:
>> - Mesa driver backend
>> - AMDGPU kernel driver
>>
>> These are the components that should never know about the BO metadata:
>> - Any Mesa shared code
>> - EGL
>> - GBM
>> - Window system protocols
>> - Display servers
>> - DDXs
>>
>> The more components you need to change when the requirements change,
>> the less economical the whole thing is, and the more painful the
>> deployment is.
>>
>> Interop with other vendors would be trivial - the kernel drivers can
>> exchange buffer layouts, and DRM can have an interface for it.
>> Userspace doesn't have to know about any of that. (It also seems kinda
>> dangerous to use userspace as a middle man for passing the
>> metadata/modifiers around)
>
>
> Could you elaborate one what seems dangerous about it?

While that wasn't the main argument, a malicious app could modify the
modifiers before they reach the consumer.

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04  0:06                 ` Marek Olšák
@ 2017-01-04  0:19                   ` James Jones
  0 siblings, 0 replies; 30+ messages in thread
From: James Jones @ 2017-01-04  0:19 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

On 01/03/2017 04:06 PM, Marek Olšák wrote:
> On Wed, Jan 4, 2017 at 12:43 AM, James Jones <jajones@nvidia.com> wrote:
>> On 01/03/2017 03:38 PM, Marek Olšák wrote:
>>>
>>> On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>
>>>> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>>>>
>>>>>>> We've had per buffer metadata in Radeon since KMS, which I believe
>>>>>>> first
>>>>>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>>>>>>> flags between Mesa, DDX, and the kernel display code. It was a widely
>>>>>>> accepted solution back then and Red Hat was the main developer. So
>>>>>>> yeah,
>>>>>>> pretty much all people except Intel were collaborating on "sneaking"
>>>>>>> this
>>>>>>> in in 2009. I think radeon driver developers deserve an apology for
>>>>>>> that
>>>>>>> language.
>>>>>>>
>>>>>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way
>>>>>>> as
>>>>>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes
>>>>>>> for use
>>>>>>> by userspace drivers only. The kernel driver isn't supposed to read it
>>>>>>> or
>>>>>>> parse it. The format is negotiated between userspace driver developers
>>>>>>> for
>>>>>>> sharing of more complex allocations than 2D displayable surfaces.
>>>>>>
>>>>>>
>>>>>> Metadata needed for kms (what Christian also pointed out) is what
>>>>>> everyone
>>>>>> did (intel included) and I think that's perfectly reasonable. And I was
>>>>>> aware of that radeon is doing that since the dawn of ages since
>>>>>> forever.
>>>>>>
>>>>>> What I think is not really ok is opaque metadata blobs that the kernel
>>>>>> never ever inspect, but just carries around. That essentially means
>>>>>> you're
>>>>>> reimplementing some bad form of IPC, and I dont think that's something
>>>>>> the
>>>>>> drm subsystem (or dma-buf) really should be doing. Because you still
>>>>>> have
>>>>>> that real protocol in userspace (dri2/3, wayland, whatever), but now
>>>>>> with
>>>>>> a side channel with no documented ordering and synchronization. It gets
>>>>>> the job done for single-vendor buffer metadata transport, but as soon
>>>>>> as
>>>>>> there's more than one vendor, or as soon as you need to reallocate
>>>>>> buffers
>>>>>> dynamically because the usage changes it gets bad imo (and I've seen
>>>>>> what
>>>>>
>>>>>
>>>>> The metadata is immutable after allocation, so it's not a
>>>>> communication channel. There is no synchronization or ordering needed
>>>>> for immutable metadata. That implies that a shared buffer can't be
>>>>> reused for an entirely different purpose. It can only be used as-is or
>>>>> freed.
>>>>>
>>>>> For suballocated memory, the idea is to reallocate it as a separate
>>>>> buffer on the first "handle" export, so that shared suballocated
>>>>> buffers don't exist.
>>>>
>>>>
>>>> Yeah, once it becomes mutable the fun starts imo. I didn't realize
>>>> that you're treating it strictly immutable since at least the kernel
>>>> ioctl has both set and get (and that's the thing I looked at).
>>>> Immutable stuff shouldn't be any problem (except that of course it
>>>> won't work cross-driver in any fashion)
>>>>
>>>>>> that looks like on android in various forms). And that consensus (at
>>>>>> least
>>>>>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
>>>>>> meeting we've had over 5 years ago. Not sure we're gaining anything
>>>>>> with a
>>>>>> "who's older" competition.
>>>>>>
>>>>>> Anyways it's there and it's uabi so will never disappear. Just wanted
>>>>>> to
>>>>>> make sure it's clear that for dma-buf we've discussed this years ago,
>>>>>> and
>>>>>> decided it wasn't a great idea. And I think that's still correct.
>>>>>
>>>>>
>>>>> The arguments against blob metadata sound reasonable to me. I'm pretty
>>>>> sceptic that window system protocols will make driver-specific
>>>>> metadata blobs redundant anytime soon though. It seems the protocols
>>>>> don't get much attention nowadays and there is no incentive to do
>>>>> things differently in that area. At least that's how it appears to me,
>>>>> but I'm not involved in that.
>>>>
>>>>
>>>> Folks are working on protocols again, at least I think the plan is to
>>>> make all that shared buffer allocation dance also work over
>>>> compositor/client situation (would be a bit pointless without that).
>>>> And agreed there'll always be driver-specific stuff which is opaque to
>>>> everyone else, but I hope at least in the future that all gets
>>>> shuffled around through protocol extensions. And not in the way every
>>>> Android gfx stack seems to work, where everyone has their own
>>>> vendor-private ipc-over-dma-buf thing. Wayland definitely got this
>>>> right, both protocol versioning and being able to add any kind of
>>>> new/vendor-private protocol endpoints to any wayland protocol. X is a
>>>> lot more pain, but since it finally looks like the world is switching
>>>> away from it we might get away with  a simpler protocol there. At
>>>> least all the tricky reallocation dances seem to matter a lot more on
>>>> mobile/tablets/phones, and there Wayland starts to rule.
>>>
>>>
>>> I've been thinking about it, and it looks like we're gonna continue
>>> using immutable per-BO metadata (buffer layout, tiling description,
>>> compression flags). The reasons are that everything else is less
>>> economical, and the current "modifier" work done in EGL/GBM is
>>> insufficient for our hardware - we need approx. 96 bytes of metadata
>>> for proper buffer sharing (not just for display, but also 3D interop -
>>> MSAA, mipmapping, compression), while EGL modifiers only support 8
>>> bytes of metadata. However, that doesn't matter, because:
>>>
>>> These are the components that need to work with the BO metadata:
>>> - Mesa driver backend
>>> - AMDGPU kernel driver
>>>
>>> These are the components that should never know about the BO metadata:
>>> - Any Mesa shared code
>>> - EGL
>>> - GBM
>>> - Window system protocols
>>> - Display servers
>>> - DDXs
>>>
>>> The more components you need to change when the requirements change,
>>> the less economical the whole thing is, and the more painful the
>>> deployment is.
>>>
>>> Interop with other vendors would be trivial - the kernel drivers can
>>> exchange buffer layouts, and DRM can have an interface for it.
>>> Userspace doesn't have to know about any of that. (It also seems kinda
>>> dangerous to use userspace as a middle man for passing the
>>> metadata/modifiers around)
>>
>>
>> Could you elaborate one what seems dangerous about it?
>
> While that wasn't the main argument, a malicious app could modify the
> modifiers before they reach the consumer.

I understand this wasn't your key point, but I've had trouble following 
similar assertions in the past, so I wanted to understand your view.  I 
haven't yet seen a good reason this is worse when using userspace rather 
than kernelspace to maintain the modifiers.  My take is that the worst 
you can get is corrupted content, which is no worse than if an untrusted 
application provided an invalid pitch/width/height/etc. attribute, which 
has been possible for years.

I'll respond to your other points after thinking about them some more, 
if others don't beat me to it.

Thanks,
-James

> Marek
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-03 23:38             ` Marek Olšák
  2017-01-03 23:43               ` James Jones
@ 2017-01-04  8:46               ` Stéphane Marchesin
  2017-01-04 12:03               ` Daniel Stone
  2 siblings, 0 replies; 30+ messages in thread
From: Stéphane Marchesin @ 2017-01-04  8:46 UTC (permalink / raw)
  To: Marek Olšák; +Cc: James Jones, dri-devel

On Tue, Jan 3, 2017 at 3:38 PM, Marek Olšák <maraeo@gmail.com> wrote:
> On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák <maraeo@gmail.com> wrote:
>>>>> We've had per buffer metadata in Radeon since KMS, which I believe first
>>>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>>>>> flags between Mesa, DDX, and the kernel display code. It was a widely
>>>>> accepted solution back then and Red Hat was the main developer. So yeah,
>>>>> pretty much all people except Intel were collaborating on "sneaking" this
>>>>> in in 2009. I think radeon driver developers deserve an apology for that
>>>>> language.
>>>>>
>>>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way as
>>>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes for use
>>>>> by userspace drivers only. The kernel driver isn't supposed to read it or
>>>>> parse it. The format is negotiated between userspace driver developers for
>>>>> sharing of more complex allocations than 2D displayable surfaces.
>>>>
>>>> Metadata needed for kms (what Christian also pointed out) is what everyone
>>>> did (intel included) and I think that's perfectly reasonable. And I was
>>>> aware of that radeon is doing that since the dawn of ages since forever.
>>>>
>>>> What I think is not really ok is opaque metadata blobs that the kernel
>>>> never ever inspect, but just carries around. That essentially means you're
>>>> reimplementing some bad form of IPC, and I dont think that's something the
>>>> drm subsystem (or dma-buf) really should be doing. Because you still have
>>>> that real protocol in userspace (dri2/3, wayland, whatever), but now with
>>>> a side channel with no documented ordering and synchronization. It gets
>>>> the job done for single-vendor buffer metadata transport, but as soon as
>>>> there's more than one vendor, or as soon as you need to reallocate buffers
>>>> dynamically because the usage changes it gets bad imo (and I've seen what
>>>
>>> The metadata is immutable after allocation, so it's not a
>>> communication channel. There is no synchronization or ordering needed
>>> for immutable metadata. That implies that a shared buffer can't be
>>> reused for an entirely different purpose. It can only be used as-is or
>>> freed.
>>>
>>> For suballocated memory, the idea is to reallocate it as a separate
>>> buffer on the first "handle" export, so that shared suballocated
>>> buffers don't exist.
>>
>> Yeah, once it becomes mutable the fun starts imo. I didn't realize
>> that you're treating it strictly immutable since at least the kernel
>> ioctl has both set and get (and that's the thing I looked at).
>> Immutable stuff shouldn't be any problem (except that of course it
>> won't work cross-driver in any fashion)
>>
>>>> that looks like on android in various forms). And that consensus (at least
>>>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
>>>> meeting we've had over 5 years ago. Not sure we're gaining anything with a
>>>> "who's older" competition.
>>>>
>>>> Anyways it's there and it's uabi so will never disappear. Just wanted to
>>>> make sure it's clear that for dma-buf we've discussed this years ago, and
>>>> decided it wasn't a great idea. And I think that's still correct.
>>>
>>> The arguments against blob metadata sound reasonable to me. I'm pretty
>>> sceptic that window system protocols will make driver-specific
>>> metadata blobs redundant anytime soon though. It seems the protocols
>>> don't get much attention nowadays and there is no incentive to do
>>> things differently in that area. At least that's how it appears to me,
>>> but I'm not involved in that.
>>
>> Folks are working on protocols again, at least I think the plan is to
>> make all that shared buffer allocation dance also work over
>> compositor/client situation (would be a bit pointless without that).
>> And agreed there'll always be driver-specific stuff which is opaque to
>> everyone else, but I hope at least in the future that all gets
>> shuffled around through protocol extensions. And not in the way every
>> Android gfx stack seems to work, where everyone has their own
>> vendor-private ipc-over-dma-buf thing. Wayland definitely got this
>> right, both protocol versioning and being able to add any kind of
>> new/vendor-private protocol endpoints to any wayland protocol. X is a
>> lot more pain, but since it finally looks like the world is switching
>> away from it we might get away with  a simpler protocol there. At
>> least all the tricky reallocation dances seem to matter a lot more on
>> mobile/tablets/phones, and there Wayland starts to rule.
>
> I've been thinking about it, and it looks like we're gonna continue
> using immutable per-BO metadata (buffer layout, tiling description,
> compression flags). The reasons are that everything else is less
> economical, and the current "modifier" work done in EGL/GBM is
> insufficient for our hardware - we need approx. 96 bytes of metadata
> for proper buffer sharing (not just for display, but also 3D interop -
> MSAA, mipmapping, compression), while EGL modifiers only support 8
> bytes of metadata. However, that doesn't matter, because:
>
> These are the components that need to work with the BO metadata:
> - Mesa driver backend
> - AMDGPU kernel driver
>
> These are the components that should never know about the BO metadata:
> - Any Mesa shared code
> - EGL
> - GBM
> - Window system protocols
> - Display servers
> - DDXs
>
> The more components you need to change when the requirements change,
> the less economical the whole thing is, and the more painful the
> deployment is.

While you are right in a world where only AMDGPU exists, once you
start doing interop things fall apart. Common examples include
exporting a buffer to an external consumer that requires a certain
format (for example if you're displaying with UDL).


>
> Interop with other vendors would be trivial - the kernel drivers can
> exchange buffer layouts, and DRM can have an interface for it.

It's not just DRM, there are other consumers of dmabuf like v4l2. I
agree that you could move the negotiation in the kernel, but this
seems more complicated and less flexible. In short: if it doesn't need
to be in the kernel, it probably shouldn't be.

> Userspace doesn't have to know about any of that. (It also seems kinda
> dangerous to use userspace as a middle man for passing the
> metadata/modifiers around)

I don't think that's an issue in practice... If the app decides to
corrupt its own rendering, meh.

Stéphane

>
> Speaking of compression for display, especially the separate
> compression buffer: That should be fully contained in the main DMABUF
> and described by the per-BO metadata. Some other drivers want to use a
> separate DMABUF for the compression buffer - while that may sound good
> in theory, it's not economical for the reason described above.
>
> Marek
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-03 23:38             ` Marek Olšák
  2017-01-03 23:43               ` James Jones
  2017-01-04  8:46               ` Stéphane Marchesin
@ 2017-01-04 12:03               ` Daniel Stone
  2017-01-04 13:06                 ` Rob Clark
  2 siblings, 1 reply; 30+ messages in thread
From: Daniel Stone @ 2017-01-04 12:03 UTC (permalink / raw)
  To: Marek Olšák; +Cc: James Jones, dri-devel

Hi Marek,

On 3 January 2017 at 23:38, Marek Olšák <maraeo@gmail.com> wrote:
> I've been thinking about it, and it looks like we're gonna continue
> using immutable per-BO metadata (buffer layout, tiling description,
> compression flags). The reasons are that everything else is less
> economical, and the current "modifier" work done in EGL/GBM is
> insufficient for our hardware - we need approx. 96 bytes of metadata
> for proper buffer sharing (not just for display, but also 3D interop -
> MSAA, mipmapping, compression), while EGL modifiers only support 8
> bytes of metadata. However, that doesn't matter, because:

You're right that no-one attempts to describe MSAA/full-miptree layout
within modifiers, and that's good because that's not what they're
supposed to do. The various consumers (DRM framebuffers, EGLImages,
wl_buffers, V4L2 buffers) all just work with flat, fully-resolved, 2D
image blocks. If you were adding miptree detail to KMS modifiers, then
I'd expect to see matching patches to, e.g., select LOD for each of
the above.

So I don't see how the above is relevant to the problems that the
allocator solves, unless we really can scan out from (and exchange
between different-vendor GPUs) far more exotic buffer formats these
days.

> These are the components that need to work with the BO metadata:
> - Mesa driver backend
> - AMDGPU kernel driver

You've pretty correctly identified this though, and I'm happy to run
you through how Wayland works wrt EGL and buffer interchange on IRC,
if you'd like. But as DanV says, the client<->compositor protocol is
entirely contained within Mesa, so you can change it entirely
arbitrarily without worrying about version desync.

> These are the components that should never know about the BO metadata:
> - Any Mesa shared code
> - EGL
> - GBM
> - Window system protocols
> - Display servers
> - DDXs

Again, most of these don't seem overly relevant, since the types of
allocations you're talking about are not going to transit these
components in the first place.

> The more components you need to change when the requirements change,
> the less economical the whole thing is, and the more painful the
> deployment is.

I don't think anyone disagrees; the point was to write this such that
no changes would be required to any of those components. As a trivial
example, between the GETPLANE2 ioctl and being able to pass modifiers
into GBM, Weston can now instruct Mesa to render buffers with
compression or exotic tiling formats, without ever having to have
specific knowledge of what those formats mean. Adding more formats
doesn't mean changing Weston, because it doesn't know or care about
the details.

> Interop with other vendors would be trivial - the kernel drivers can
> exchange buffer layouts, and DRM can have an interface for it.

Describing them might not be the most difficult thing in the world,
though the regret starts to pile up as, thanks to the wonders of PCI-E
ARM systems, almost all of AMD / Intel / NVIDIA / ARM / Qualcomm have
to be mutually aware of each other's buffer-descriptor layout, and
every version thereof (sure you won't ever need more than 96 bytes,
ever?). But how does that solve allocation? How does my amdgpu kernel
driver 'know' whether its buffers are going to be scanned out or run
through intermediate GPU composition, and furthermore whether that
will happen on an AMD, Intel, or NVIDIA GPU? How does my Intel GPU
know that its output will be consumed by a media encode engine as well
as scanned out, so it can't use exotic tiling modes?

Putting this kind of negotiation in the kernel was roundly rejected a
long time ago, not least as the display pipeline arrangement is a
policy decision made by userspace, frame by frame.

> Userspace doesn't have to know about any of that. (It also seems kinda
> dangerous to use userspace as a middle man for passing the
> metadata/modifiers around)

Why dangerous? If it can be dangerous, i.e. a malicious userspace
driver can compromise your system, then I'd be looking at the
validation in your kernel driver really ...

> Speaking of compression for display, especially the separate
> compression buffer: That should be fully contained in the main DMABUF
> and described by the per-BO metadata. Some other drivers want to use a
> separate DMABUF for the compression buffer - while that may sound good
> in theory, it's not economical for the reason described above.

'Some other drivers want to use a separate DMABUF', or 'some other
hardware demands the data be separate'. Same with luma/chroma plane
separation. Anyway, it doesn't really matter unless you're sharing
render-compression formats across vendors, and AFBC is the only case
of that I know of currently.

Cheers,
Daniel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04 12:03               ` Daniel Stone
@ 2017-01-04 13:06                 ` Rob Clark
  2017-01-04 14:54                   ` Daniel Vetter
  0 siblings, 1 reply; 30+ messages in thread
From: Rob Clark @ 2017-01-04 13:06 UTC (permalink / raw)
  To: Daniel Stone; +Cc: James Jones, dri-devel

On Wed, Jan 4, 2017 at 7:03 AM, Daniel Stone <daniel@fooishbar.org> wrote:
>> Speaking of compression for display, especially the separate
>> compression buffer: That should be fully contained in the main DMABUF
>> and described by the per-BO metadata. Some other drivers want to use a
>> separate DMABUF for the compression buffer - while that may sound good
>> in theory, it's not economical for the reason described above.
>
> 'Some other drivers want to use a separate DMABUF', or 'some other
> hardware demands the data be separate'. Same with luma/chroma plane
> separation. Anyway, it doesn't really matter unless you're sharing
> render-compression formats across vendors, and AFBC is the only case
> of that I know of currently.


jfwiw, UBWC on newer snapdragons too.. seems like we can share these
not just between gpu (render to and sample from) and display, but also
v4l2 decoder/encoder (and maybe camera?)

I *think* we probably can treat the metadata buffers as a separate
plane.. at least we can for render target and blit src/dst, but not
100% sure about sampling from a UBWC buffer.. that might force us to
have them in a single buffer.

(Either way, the fourcc modifiers, and related EGL extension to query
modifiers, should be sufficient)

BR,
-R
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04 13:06                 ` Rob Clark
@ 2017-01-04 14:54                   ` Daniel Vetter
  2017-01-04 15:47                     ` Rob Clark
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2017-01-04 14:54 UTC (permalink / raw)
  To: Rob Clark; +Cc: James Jones, dri-devel

On Wed, Jan 04, 2017 at 08:06:24AM -0500, Rob Clark wrote:
> On Wed, Jan 4, 2017 at 7:03 AM, Daniel Stone <daniel@fooishbar.org> wrote:
> >> Speaking of compression for display, especially the separate
> >> compression buffer: That should be fully contained in the main DMABUF
> >> and described by the per-BO metadata. Some other drivers want to use a
> >> separate DMABUF for the compression buffer - while that may sound good
> >> in theory, it's not economical for the reason described above.
> >
> > 'Some other drivers want to use a separate DMABUF', or 'some other
> > hardware demands the data be separate'. Same with luma/chroma plane
> > separation. Anyway, it doesn't really matter unless you're sharing
> > render-compression formats across vendors, and AFBC is the only case
> > of that I know of currently.
> 
> 
> jfwiw, UBWC on newer snapdragons too.. seems like we can share these
> not just between gpu (render to and sample from) and display, but also
> v4l2 decoder/encoder (and maybe camera?)
> 
> I *think* we probably can treat the metadata buffers as a separate
> plane.. at least we can for render target and blit src/dst, but not
> 100% sure about sampling from a UBWC buffer.. that might force us to
> have them in a single buffer.

Conceptually treating them as two planes, and everywhere requiring that
they're allocated from the same BO are orthogonal things. At least that's
our plan with intel render compression last time I understood the current
state ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04 14:54                   ` Daniel Vetter
@ 2017-01-04 15:47                     ` Rob Clark
  2017-01-04 16:02                       ` Christian König
  0 siblings, 1 reply; 30+ messages in thread
From: Rob Clark @ 2017-01-04 15:47 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: James Jones, dri-devel

On Wed, Jan 4, 2017 at 9:54 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Jan 04, 2017 at 08:06:24AM -0500, Rob Clark wrote:
>> On Wed, Jan 4, 2017 at 7:03 AM, Daniel Stone <daniel@fooishbar.org> wrote:
>> >> Speaking of compression for display, especially the separate
>> >> compression buffer: That should be fully contained in the main DMABUF
>> >> and described by the per-BO metadata. Some other drivers want to use a
>> >> separate DMABUF for the compression buffer - while that may sound good
>> >> in theory, it's not economical for the reason described above.
>> >
>> > 'Some other drivers want to use a separate DMABUF', or 'some other
>> > hardware demands the data be separate'. Same with luma/chroma plane
>> > separation. Anyway, it doesn't really matter unless you're sharing
>> > render-compression formats across vendors, and AFBC is the only case
>> > of that I know of currently.
>>
>>
>> jfwiw, UBWC on newer snapdragons too.. seems like we can share these
>> not just between gpu (render to and sample from) and display, but also
>> v4l2 decoder/encoder (and maybe camera?)
>>
>> I *think* we probably can treat the metadata buffers as a separate
>> plane.. at least we can for render target and blit src/dst, but not
>> 100% sure about sampling from a UBWC buffer.. that might force us to
>> have them in a single buffer.
>
> Conceptually treating them as two planes, and everywhere requiring that
> they're allocated from the same BO are orthogonal things. At least that's
> our plan with intel render compression last time I understood the current
> state ;-)

If the position of the different parts of the buffer are somewhere
required to be a function of w/h/bpp/etc then I'm not sure if there is
a strong advantage to treating them as separate BOs.. although I
suppose it doesn't preclude it either.  As far as plumbing it through
mesa/st, it seems convenient to have a single buffer.  (We have kind
of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
that.. but I've not thought through those details so much yet.)

BR,
-R

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04 15:47                     ` Rob Clark
@ 2017-01-04 16:02                       ` Christian König
  2017-01-04 16:16                         ` Rob Clark
  2017-01-04 16:59                         ` Daniel Stone
  0 siblings, 2 replies; 30+ messages in thread
From: Christian König @ 2017-01-04 16:02 UTC (permalink / raw)
  To: Rob Clark, Daniel Vetter; +Cc: James Jones, dri-devel

Am 04.01.2017 um 16:47 schrieb Rob Clark:
> On Wed, Jan 4, 2017 at 9:54 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Wed, Jan 04, 2017 at 08:06:24AM -0500, Rob Clark wrote:
>>> On Wed, Jan 4, 2017 at 7:03 AM, Daniel Stone <daniel@fooishbar.org> wrote:
>>>>> Speaking of compression for display, especially the separate
>>>>> compression buffer: That should be fully contained in the main DMABUF
>>>>> and described by the per-BO metadata. Some other drivers want to use a
>>>>> separate DMABUF for the compression buffer - while that may sound good
>>>>> in theory, it's not economical for the reason described above.
>>>> 'Some other drivers want to use a separate DMABUF', or 'some other
>>>> hardware demands the data be separate'. Same with luma/chroma plane
>>>> separation. Anyway, it doesn't really matter unless you're sharing
>>>> render-compression formats across vendors, and AFBC is the only case
>>>> of that I know of currently.
>>>
>>> jfwiw, UBWC on newer snapdragons too.. seems like we can share these
>>> not just between gpu (render to and sample from) and display, but also
>>> v4l2 decoder/encoder (and maybe camera?)
>>>
>>> I *think* we probably can treat the metadata buffers as a separate
>>> plane.. at least we can for render target and blit src/dst, but not
>>> 100% sure about sampling from a UBWC buffer.. that might force us to
>>> have them in a single buffer.
>> Conceptually treating them as two planes, and everywhere requiring that
>> they're allocated from the same BO are orthogonal things. At least that's
>> our plan with intel render compression last time I understood the current
>> state ;-)
> If the position of the different parts of the buffer are somewhere
> required to be a function of w/h/bpp/etc then I'm not sure if there is
> a strong advantage to treating them as separate BOs.. although I
> suppose it doesn't preclude it either.  As far as plumbing it through
> mesa/st, it seems convenient to have a single buffer.  (We have kind
> of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
> that.. but I've not thought through those details so much yet.)

Well I don't want to ruin your day, but there are different requirements 
from different hardware.

For example the UVD engine found in all AMD graphics cards since r600 
must have both planes in a single BO because the memory controller can 
only handle a rather small offset between the planes.

On the other hand I know of embedded MPEG2/H264 decoders where the 
different planes must be on different memory channels. In this case I 
can imagine that you want one BO for each plane, because otherwise the 
device must stitch together one buffer object from two different memory 
regions (of course possible, but rather ugly).

So if we want to cover everything we essentially need to support all 
variants of one plane per BO as well as all planes in one BO with 
DMA-Buf. A bit tricky isn't it?

Regards,
Christian.

>
> BR,
> -R
>
>> -Daniel
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04 16:02                       ` Christian König
@ 2017-01-04 16:16                         ` Rob Clark
  2017-01-04 16:26                           ` Christian König
  2017-01-04 16:59                         ` Daniel Stone
  1 sibling, 1 reply; 30+ messages in thread
From: Rob Clark @ 2017-01-04 16:16 UTC (permalink / raw)
  To: Christian König; +Cc: James Jones, dri-devel

On Wed, Jan 4, 2017 at 11:02 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 04.01.2017 um 16:47 schrieb Rob Clark:
>>
>> On Wed, Jan 4, 2017 at 9:54 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>
>>> On Wed, Jan 04, 2017 at 08:06:24AM -0500, Rob Clark wrote:
>>>>
>>>> On Wed, Jan 4, 2017 at 7:03 AM, Daniel Stone <daniel@fooishbar.org>
>>>> wrote:
>>>>>>
>>>>>> Speaking of compression for display, especially the separate
>>>>>> compression buffer: That should be fully contained in the main DMABUF
>>>>>> and described by the per-BO metadata. Some other drivers want to use a
>>>>>> separate DMABUF for the compression buffer - while that may sound good
>>>>>> in theory, it's not economical for the reason described above.
>>>>>
>>>>> 'Some other drivers want to use a separate DMABUF', or 'some other
>>>>> hardware demands the data be separate'. Same with luma/chroma plane
>>>>> separation. Anyway, it doesn't really matter unless you're sharing
>>>>> render-compression formats across vendors, and AFBC is the only case
>>>>> of that I know of currently.
>>>>
>>>>
>>>> jfwiw, UBWC on newer snapdragons too.. seems like we can share these
>>>> not just between gpu (render to and sample from) and display, but also
>>>> v4l2 decoder/encoder (and maybe camera?)
>>>>
>>>> I *think* we probably can treat the metadata buffers as a separate
>>>> plane.. at least we can for render target and blit src/dst, but not
>>>> 100% sure about sampling from a UBWC buffer.. that might force us to
>>>> have them in a single buffer.
>>>
>>> Conceptually treating them as two planes, and everywhere requiring that
>>> they're allocated from the same BO are orthogonal things. At least that's
>>> our plan with intel render compression last time I understood the current
>>> state ;-)
>>
>> If the position of the different parts of the buffer are somewhere
>> required to be a function of w/h/bpp/etc then I'm not sure if there is
>> a strong advantage to treating them as separate BOs.. although I
>> suppose it doesn't preclude it either.  As far as plumbing it through
>> mesa/st, it seems convenient to have a single buffer.  (We have kind
>> of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
>> that.. but I've not thought through those details so much yet.)
>
>
> Well I don't want to ruin your day, but there are different requirements
> from different hardware.
>
> For example the UVD engine found in all AMD graphics cards since r600 must
> have both planes in a single BO because the memory controller can only
> handle a rather small offset between the planes.
>
> On the other hand I know of embedded MPEG2/H264 decoders where the different
> planes must be on different memory channels. In this case I can imagine that
> you want one BO for each plane, because otherwise the device must stitch
> together one buffer object from two different memory regions (of course
> possible, but rather ugly).

true, but for a vendor specific compression/metadata plane, I think I
can ignore oddball settop box SoC constraints and care more about just
other devices that support the same compression.

> So if we want to cover everything we essentially need to support all
> variants of one plane per BO as well as all planes in one BO with DMA-Buf. A
> bit tricky isn't it?

Just to make sure we are on same page, I was only really talking about
whether to have color+meta in same bo or treat it similar to two plane
yuv (ie. pair of fd+offset tuples).  Not generic/vanilla (untiled,
uncompressed, etc) multiplanar YUV.

It probably isn't even important that various different vendor's
compression schemes are handled the same way.  Maybe on intel it is
easier to treat it as two planes everywhere, but qcom easier to treat
as one.  Application just sees it as one or more fd+offset tuples
(when it queries EGL img) and passes those blindly through to addfb2.

Oh, and for some extra fun, I think video decoder can hand me
compressed NV12 where both Y and UV have their own meta buffer.  So if
we treat as separate planes, that becomes four planes.  (Hopefully no
compressed I420, or that becomes 6 planes! :-P)

BR,
-R

> Regards,
> Christian.
>
>>
>> BR,
>> -R
>>
>>> -Daniel
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04 16:16                         ` Rob Clark
@ 2017-01-04 16:26                           ` Christian König
  0 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2017-01-04 16:26 UTC (permalink / raw)
  To: Rob Clark; +Cc: James Jones, dri-devel

Am 04.01.2017 um 17:16 schrieb Rob Clark:
> On Wed, Jan 4, 2017 at 11:02 AM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 04.01.2017 um 16:47 schrieb Rob Clark:
>>> On Wed, Jan 4, 2017 at 9:54 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>> On Wed, Jan 04, 2017 at 08:06:24AM -0500, Rob Clark wrote:
>>>>> On Wed, Jan 4, 2017 at 7:03 AM, Daniel Stone <daniel@fooishbar.org>
>>>>> wrote:
>>>>>>> Speaking of compression for display, especially the separate
>>>>>>> compression buffer: That should be fully contained in the main DMABUF
>>>>>>> and described by the per-BO metadata. Some other drivers want to use a
>>>>>>> separate DMABUF for the compression buffer - while that may sound good
>>>>>>> in theory, it's not economical for the reason described above.
>>>>>> 'Some other drivers want to use a separate DMABUF', or 'some other
>>>>>> hardware demands the data be separate'. Same with luma/chroma plane
>>>>>> separation. Anyway, it doesn't really matter unless you're sharing
>>>>>> render-compression formats across vendors, and AFBC is the only case
>>>>>> of that I know of currently.
>>>>>
>>>>> jfwiw, UBWC on newer snapdragons too.. seems like we can share these
>>>>> not just between gpu (render to and sample from) and display, but also
>>>>> v4l2 decoder/encoder (and maybe camera?)
>>>>>
>>>>> I *think* we probably can treat the metadata buffers as a separate
>>>>> plane.. at least we can for render target and blit src/dst, but not
>>>>> 100% sure about sampling from a UBWC buffer.. that might force us to
>>>>> have them in a single buffer.
>>>> Conceptually treating them as two planes, and everywhere requiring that
>>>> they're allocated from the same BO are orthogonal things. At least that's
>>>> our plan with intel render compression last time I understood the current
>>>> state ;-)
>>> If the position of the different parts of the buffer are somewhere
>>> required to be a function of w/h/bpp/etc then I'm not sure if there is
>>> a strong advantage to treating them as separate BOs.. although I
>>> suppose it doesn't preclude it either.  As far as plumbing it through
>>> mesa/st, it seems convenient to have a single buffer.  (We have kind
>>> of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
>>> that.. but I've not thought through those details so much yet.)
>>
>> Well I don't want to ruin your day, but there are different requirements
>> from different hardware.
>>
>> For example the UVD engine found in all AMD graphics cards since r600 must
>> have both planes in a single BO because the memory controller can only
>> handle a rather small offset between the planes.
>>
>> On the other hand I know of embedded MPEG2/H264 decoders where the different
>> planes must be on different memory channels. In this case I can imagine that
>> you want one BO for each plane, because otherwise the device must stitch
>> together one buffer object from two different memory regions (of course
>> possible, but rather ugly).
> true, but for a vendor specific compression/metadata plane, I think I
> can ignore oddball settop box SoC constraints and care more about just
> other devices that support the same compression.
>
>> So if we want to cover everything we essentially need to support all
>> variants of one plane per BO as well as all planes in one BO with DMA-Buf. A
>> bit tricky isn't it?
> Just to make sure we are on same page, I was only really talking about
> whether to have color+meta in same bo or treat it similar to two plane
> yuv (ie. pair of fd+offset tuples).  Not generic/vanilla (untiled,
> uncompressed, etc) multiplanar YUV.

Ups, sorry. I didn't realized that.

Na, putting the metadata into the BO is probably only a good idea if the 
Metadata can be evaluated by the device and not the CPU as well.

>
> It probably isn't even important that various different vendor's
> compression schemes are handled the same way.  Maybe on intel it is
> easier to treat it as two planes everywhere, but qcom easier to treat
> as one.  Application just sees it as one or more fd+offset tuples
> (when it queries EGL img) and passes those blindly through to addfb2.

Yeah, I mean that's the real core of the problem.

On the one hand we want device from different vendors to understand each 
other and there are certain cases where even completely different 
devices can work with the same data.

On the other hand each vendor has extremely specialized data formats for 
certain use cases and it is unlikely that somebody else can handle those.

> Oh, and for some extra fun, I think video decoder can hand me
> compressed NV12 where both Y and UV have their own meta buffer.  So if
> we treat as separate planes, that becomes four planes.  (Hopefully no
> compressed I420, or that becomes 6 planes! :-P)

Well talking about extra fun. We additionally have this neat interlaced 
NV12 format that both NVidia and AMD uses for their video decoding.

E.g. one Y plane top field, one UV plane top field, one Y plane bottom 
field and UV plane bottom field.

That makes 4 planes where plane 1 & 3 and 2 & 4 must have the same 
stride but are otherwise unrelated to each other and can have separate 
metadata.

Regards,
Christian.



>
> BR,
> -R
>
>> Regards,
>> Christian.
>>
>>> BR,
>>> -R
>>>
>>>> -Daniel
>>>> --
>>>> Daniel Vetter
>>>> Software Engineer, Intel Corporation
>>>> http://blog.ffwll.ch
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04 16:02                       ` Christian König
  2017-01-04 16:16                         ` Rob Clark
@ 2017-01-04 16:59                         ` Daniel Stone
  2017-01-16 22:54                           ` Marek Olšák
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel Stone @ 2017-01-04 16:59 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel, James Jones

Hi Christian,

On 4 January 2017 at 16:02, Christian König <deathsimple@vodafone.de> wrote:
> Am 04.01.2017 um 16:47 schrieb Rob Clark:
>> If the position of the different parts of the buffer are somewhere
>> required to be a function of w/h/bpp/etc then I'm not sure if there is
>> a strong advantage to treating them as separate BOs.. although I
>> suppose it doesn't preclude it either.  As far as plumbing it through
>> mesa/st, it seems convenient to have a single buffer.  (We have kind
>> of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
>> that.. but I've not thought through those details so much yet.)
>
> Well I don't want to ruin your day, but there are different requirements
> from different hardware.
>
> For example the UVD engine found in all AMD graphics cards since r600 must
> have both planes in a single BO because the memory controller can only
> handle a rather small offset between the planes.

This is, to a large extent, also true of Intel.

> On the other hand I know of embedded MPEG2/H264 decoders where the different
> planes must be on different memory channels. In this case I can imagine
> you want one BO for each plane, because otherwise the device must stitch
> together one buffer object from two different memory regions (of course
> possible, but rather ugly).

Not just embedded, but quite a few platforms where the ratio of
required to available memory bandwidth is ... somewhat different to
larger discrete systems. Striping allocations such that luma and
chroma live on different memory channels isn't uncommon.

But I think this is all orthogonal. If you keep auxiliary planes in
separate BOs to metadata, you can still handle both cases. How to
place buffers is purely an _allocation_ concern, where single vs.
multiple BO is purely about addressing them. So your allocator API may
become a little more complex - something which only device-specific
userspace will ever address - whilst keeping a unified
addressing/handle system for the generic parts of userspace which
shouldn't have to care about whether the underlying hardware demands a
small offset or a completely separate allocation.

Having API pegged to the single-underlying-BO concept has been a giant
pain for those who can't use single BOs. I don't see anything good
coming of the idea for cross-device/cross-vendor sharing either, since
it encodes yet more magic implicit detail into buffer sharing. Since
that detail ultimately has to be resolved _somewhere_, it's a problem
avoided rather than a problem solved.

Cheers,
Daniel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-04 16:59                         ` Daniel Stone
@ 2017-01-16 22:54                           ` Marek Olšák
  2017-01-23  7:38                             ` Daniel Vetter
  0 siblings, 1 reply; 30+ messages in thread
From: Marek Olšák @ 2017-01-16 22:54 UTC (permalink / raw)
  To: Daniel Stone; +Cc: James Jones, dri-devel

Thanks for all the feedback. Things are much clearer now.

Yeah, we can use the BO modifiers for simple 2D images / planes if
that's the general direction. I think we can even stuff the
compression data buffer offset into those 64 bits, considering it's
not very large (e.g. below 4GB and low bits are unused due to
alignment).

For OpenCL at least, we have to keep using the 256-bytes-large per-BO
metadata to describe more complex allocations.

Marek


On Wed, Jan 4, 2017 at 5:59 PM, Daniel Stone <daniel@fooishbar.org> wrote:
> Hi Christian,
>
> On 4 January 2017 at 16:02, Christian König <deathsimple@vodafone.de> wrote:
>> Am 04.01.2017 um 16:47 schrieb Rob Clark:
>>> If the position of the different parts of the buffer are somewhere
>>> required to be a function of w/h/bpp/etc then I'm not sure if there is
>>> a strong advantage to treating them as separate BOs.. although I
>>> suppose it doesn't preclude it either.  As far as plumbing it through
>>> mesa/st, it seems convenient to have a single buffer.  (We have kind
>>> of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
>>> that.. but I've not thought through those details so much yet.)
>>
>> Well I don't want to ruin your day, but there are different requirements
>> from different hardware.
>>
>> For example the UVD engine found in all AMD graphics cards since r600 must
>> have both planes in a single BO because the memory controller can only
>> handle a rather small offset between the planes.
>
> This is, to a large extent, also true of Intel.
>
>> On the other hand I know of embedded MPEG2/H264 decoders where the different
>> planes must be on different memory channels. In this case I can imagine
>> you want one BO for each plane, because otherwise the device must stitch
>> together one buffer object from two different memory regions (of course
>> possible, but rather ugly).
>
> Not just embedded, but quite a few platforms where the ratio of
> required to available memory bandwidth is ... somewhat different to
> larger discrete systems. Striping allocations such that luma and
> chroma live on different memory channels isn't uncommon.
>
> But I think this is all orthogonal. If you keep auxiliary planes in
> separate BOs to metadata, you can still handle both cases. How to
> place buffers is purely an _allocation_ concern, where single vs.
> multiple BO is purely about addressing them. So your allocator API may
> become a little more complex - something which only device-specific
> userspace will ever address - whilst keeping a unified
> addressing/handle system for the generic parts of userspace which
> shouldn't have to care about whether the underlying hardware demands a
> small offset or a completely separate allocation.
>
> Having API pegged to the single-underlying-BO concept has been a giant
> pain for those who can't use single BOs. I don't see anything good
> coming of the idea for cross-device/cross-vendor sharing either, since
> it encodes yet more magic implicit detail into buffer sharing. Since
> that detail ultimately has to be resolved _somewhere_, it's a problem
> avoided rather than a problem solved.
>
> Cheers,
> Daniel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Unix Device Memory Allocation project
  2017-01-16 22:54                           ` Marek Olšák
@ 2017-01-23  7:38                             ` Daniel Vetter
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Vetter @ 2017-01-23  7:38 UTC (permalink / raw)
  To: Marek Olšák; +Cc: James Jones, dri-devel

On Mon, Jan 16, 2017 at 11:54:14PM +0100, Marek Olšák wrote:
> Thanks for all the feedback. Things are much clearer now.
> 
> Yeah, we can use the BO modifiers for simple 2D images / planes if
> that's the general direction. I think we can even stuff the
> compression data buffer offset into those 64 bits, considering it's
> not very large (e.g. below 4GB and low bits are unused due to
> alignment).

For compression data the idea is to have an aux plane, with offset/stride.
At least that's what the plan for i915 is.
-Daniel

> 
> For OpenCL at least, we have to keep using the 256-bytes-large per-BO
> metadata to describe more complex allocations.
> 
> Marek
> 
> 
> On Wed, Jan 4, 2017 at 5:59 PM, Daniel Stone <daniel@fooishbar.org> wrote:
> > Hi Christian,
> >
> > On 4 January 2017 at 16:02, Christian König <deathsimple@vodafone.de> wrote:
> >> Am 04.01.2017 um 16:47 schrieb Rob Clark:
> >>> If the position of the different parts of the buffer are somewhere
> >>> required to be a function of w/h/bpp/etc then I'm not sure if there is
> >>> a strong advantage to treating them as separate BOs.. although I
> >>> suppose it doesn't preclude it either.  As far as plumbing it through
> >>> mesa/st, it seems convenient to have a single buffer.  (We have kind
> >>> of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
> >>> that.. but I've not thought through those details so much yet.)
> >>
> >> Well I don't want to ruin your day, but there are different requirements
> >> from different hardware.
> >>
> >> For example the UVD engine found in all AMD graphics cards since r600 must
> >> have both planes in a single BO because the memory controller can only
> >> handle a rather small offset between the planes.
> >
> > This is, to a large extent, also true of Intel.
> >
> >> On the other hand I know of embedded MPEG2/H264 decoders where the different
> >> planes must be on different memory channels. In this case I can imagine
> >> you want one BO for each plane, because otherwise the device must stitch
> >> together one buffer object from two different memory regions (of course
> >> possible, but rather ugly).
> >
> > Not just embedded, but quite a few platforms where the ratio of
> > required to available memory bandwidth is ... somewhat different to
> > larger discrete systems. Striping allocations such that luma and
> > chroma live on different memory channels isn't uncommon.
> >
> > But I think this is all orthogonal. If you keep auxiliary planes in
> > separate BOs to metadata, you can still handle both cases. How to
> > place buffers is purely an _allocation_ concern, where single vs.
> > multiple BO is purely about addressing them. So your allocator API may
> > become a little more complex - something which only device-specific
> > userspace will ever address - whilst keeping a unified
> > addressing/handle system for the generic parts of userspace which
> > shouldn't have to care about whether the underlying hardware demands a
> > small offset or a completely separate allocation.
> >
> > Having API pegged to the single-underlying-BO concept has been a giant
> > pain for those who can't use single BOs. I don't see anything good
> > coming of the idea for cross-device/cross-vendor sharing either, since
> > it encodes yet more magic implicit detail into buffer sharing. Since
> > that detail ultimately has to be resolved _somewhere_, it's a problem
> > avoided rather than a problem solved.
> >
> > Cheers,
> > Daniel
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-01-23  7:38 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-04 23:47 Unix Device Memory Allocation project James Jones
2016-10-05  8:42 ` Benjamin Gaignard
2016-10-05 12:19   ` Rob Clark
2016-10-18 23:40 ` Marek Olšák
2016-10-19  0:08   ` James Jones
2016-10-19  6:31     ` Daniel Vetter
2016-10-19  6:23   ` Daniel Vetter
2016-10-19 12:15     ` Christian König
2016-10-19 13:15     ` Marek Olšák
2016-10-19 14:10       ` Daniel Vetter
2016-10-19 16:46         ` Marek Olšák
2016-10-20  6:31           ` Daniel Vetter
2017-01-03 23:38             ` Marek Olšák
2017-01-03 23:43               ` James Jones
2017-01-04  0:06                 ` Marek Olšák
2017-01-04  0:19                   ` James Jones
2017-01-04  8:46               ` Stéphane Marchesin
2017-01-04 12:03               ` Daniel Stone
2017-01-04 13:06                 ` Rob Clark
2017-01-04 14:54                   ` Daniel Vetter
2017-01-04 15:47                     ` Rob Clark
2017-01-04 16:02                       ` Christian König
2017-01-04 16:16                         ` Rob Clark
2017-01-04 16:26                           ` Christian König
2017-01-04 16:59                         ` Daniel Stone
2017-01-16 22:54                           ` Marek Olšák
2017-01-23  7:38                             ` Daniel Vetter
2016-10-19  6:49   ` Michel Dänzer
2016-10-19 12:33   ` Nicolai Hähnle
     [not found]     ` <CAAxE2A7ih_84H7w361msVYzRb8jb4ye8Psc1e5CO6gjJ2frO6g@mail.gmail.com>
     [not found]       ` <CAAxE2A53E_r9uA=FG_A63aBVgqaTWBuAzDZfDvRe9K+0EWmFeQ@mail.gmail.com>
2016-10-19 13:40         ` Marek Olšák

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.