All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] doc: gpu: Add document describing buffer exchange
@ 2021-09-05 12:27 Daniel Stone
  2021-09-06 12:28 ` Simon Ser
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Daniel Stone @ 2021-09-05 12:27 UTC (permalink / raw)
  To: dri-devel

Since there's a lot of confusion around this, document both the rules
and the best practice around negotiating, allocating, importing, and
using buffers when crossing context/process/device/subsystem boundaries.

This ties up all of dmabuf, formats and modifiers, and their usage.

Signed-off-by: Daniel Stone <daniels@collabora.com>
---

This is just a quick first draft, inspired by:
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637

It's not complete or perfect, but I'm off to eat a roast then have a
nice walk in the sun, so figured it'd be better to dash it off rather
than let it rot on my hard drive.


 .../gpu/exchanging-pixel-buffers.rst          | 285 ++++++++++++++++++
 Documentation/gpu/index.rst                   |   1 +
 2 files changed, 286 insertions(+)
 create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst

diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst b/Documentation/gpu/exchanging-pixel-buffers.rst
new file mode 100644
index 000000000000..75c4de13d5c8
--- /dev/null
+++ b/Documentation/gpu/exchanging-pixel-buffers.rst
@@ -0,0 +1,285 @@
+.. Copyright 2021 Collabora Ltd.
+
+========================
+Exchanging pixel buffers
+========================
+
+As originally designed, the Linux graphics subsystem had extremely limited
+support for sharing pixel-buffer allocations between processes, devices, and
+subsystems. Modern systems require extensive integration between all three
+classes; this document details how applications and kernel subsystems should
+approach this sharing for two-dimensional image data.
+
+It is written with reference to the DRM subsystem for GPU and display devices,
+V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
+support, however any other subsystems should also follow this design and advice.
+
+
+Formats and modifiers
+=====================
+
+Each buffer must have an underlying format. This format describes the data which
+can be stored and loaded for each pixel. Although each subsystem has its own
+format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be
+reused wherever possible, as they are the standard descriptions used for
+interchange.
+
+Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
+the translation between one or more pixels in memory, and the color data
+contained within that memory. The number and type of color channels are
+described: whether they are RGB or YUV, integer or floating-point, the size
+of each channel and their locations within the pixel memory, and the
+relationship between color planes.
+
+For example, `DRM_FORMAT_ARGB8888` describes a format in which each pixel has a
+single 32-bit value in memory. Alpha, red, green, and blue, color channels are
+available at 8-byte precision per channel, ordered respectively from most to
+least significant bits in little-endian storage. As a more complex example,
+`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
+stored in separate memory planes, where the chroma plane is stored at half the
+resolution in both dimensions (i.e. one U/V chroma sample is stored for each 2x2
+pixel grouping).
+
+Format modifiers describe a translation mechanism between these per-pixel memory
+samples, and the actual memory storage for the buffer. The most straightforward
+modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel has
+contiguous storage beginning at (0,0); each pixel's location in memory will be
+`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
+format, and most convenient for CPU access.
+
+Modern hardware employs much more sophisticated access mechanisms, typically
+making use of tiled access and possibly also compression. For example, the
+`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
+are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
+memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
+stores pixels (4,0) to (7,3) inclusive.
+
+Some modifiers may modify the number of memory buffers required to store the
+data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
+memory buffer to RGB formats in which it stores data about the status of every
+tile, notably including whether the tile is fully populated with pixel data, or
+can be expanded from a single solid color.
+
+These extended layouts are highly vendor-specific, and even specific to
+particular generations or configurations of devices per-vendor. For this reason,
+support of modifiers must be explicitly enumerated and negotiated by all users
+in order to ensure a compatible and optimal pipeline, as discussed below.
+
+
+Dimensions and size
+===================
+
+Each pixel buffer must be accompanied by logical pixel dimensions. This refers
+to the number of unique samples which can be extracted from, or stored to, the
+underlying memory storage. For example, even though a 1920x1080
+`DRM_FORMAT_NV12` buffer has a luma plane containing 1920x1080 samples for the Y
+component, and 960x540 samples for the U and V components, the overall buffer is
+still described as having dimensions of 1920x1080.
+
+The in-memory storage of a buffer is not guaranteed to begin immediately at the
+base address of the underlying memory, nor is it guaranteed that the memory
+storage is tightly clipped to either dimension.
+
+Each plane must therefore be described with an `offset` in bytes, which will be
+added to the base address of the memory storage before performing any per-pixel
+calculations. This may be used to combine multiple planes into a single pixel
+buffer; for example, `DRM_FORMAT_NV12` may be stored in a single memory buffer
+where the luma plane's storage begins immediately at the start of the buffer
+with an offset of 0, and the chroma plane's storage begins after the offset of
+the luma plane as expressed through its offset.
+
+Each plane must also have a `stride` in bytes, expressing the offset in memory
+between two contiguous scanlines. For example, a `DRM_FORMAT_MOD_LINEAR` buffer
+with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
+order to allow for aligned access patterns. In this case, the buffer will still
+be described with a width of 1000, however the stride will be `1024 * bpp`,
+indicating that there are 24 pixels at the positive extreme of the x axis whose
+values are not significant.
+
+Buffers may also be padded further in the y dimension, simply by allocating a
+larger area than would ordinarily be required. For example, many media decoders
+are not able to natively output buffers of height 1080, but instead require an
+effective height of 1088 pixels. In this case, the buffer continues to be
+described as having a height of 1080, with the memory allocation for each buffer
+being increased to account for the extra padding.
+
+
+Enumeration
+===========
+
+Every user of pixel buffers must be able to enumerate a set of supported formats
+and modifiers, described together. Within KMS, this is achieved with the
+`IN_FORMATS` property on each DRM plane, listing the supported DRM formats, and
+the modifiers supported for each format. In userspace, this is supported through
+the `EGL_EXT_image_dma_buf_import_modifiers` extension entrypoints for EGL, the
+`VK_EXT_image_drm_format_modifier` extension for Vulkan, and the
+`zwp_linux_dmabuf_v1` extension for Wayland.
+
+Each of these interfaces allows users to query a set of supported
+format+modifier combinations.
+
+
+Negotiation
+===========
+
+It is the responsibility of userspace to negotiate an acceptable format+modifier
+combination for its usage. This is performed through a simple intersection of
+lists. For example, if a user wants to use Vulkan to render an image to be
+displayed on a KMS plane, it must:
+  - query KMS for the `IN_FORMATS` property for the given plane
+  - query Vulkan for the supported formats for its physical device
+  - intersect these formats to determine the most appropriate one
+  - for this format, intersect the lists of supported modifiers for both KMS and
+    Vulkan, to obtain a final list of acceptable modifiers for that format
+
+This intersection must be performed for all usages. For example, if the user
+also wishes to encode the image to a video stream, it must query the media API
+it intends to use for encoding for the set of modifiers it supports, and
+additionally intersect against this list.
+
+If the intersection of all lists is an empty list, it is not possible to share
+buffers in this way, and an alternate strategy must be considered (e.g. using
+CPU access routines to copy data between the different uses, with the
+corresponding performance cost).
+
+The resulting modifier list is unsorted; the order is not significant.
+
+
+Allocation
+==========
+
+Once userspace has determined an appropriate format, and corresponding list of
+acceptable modifiers, it must allocate the buffer. As there is no universal
+buffer-allocation interface available at either kernel or userspace level, the
+client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
+a media API.
+
+Each allocation request must take, at a minimum: the pixel format, a list of
+acceptable modifiers, and the buffer's width and height. Each API may extend
+this set of properties in different ways, such as allowing allocation in more
+than two dimensions, intended usage patterns, etc.
+
+The component which allocates the buffer will make an arbitrary choice of what
+it considers the 'best' modifier within the acceptable list for the requested
+allocation, any padding required, and further properties of the underlying
+memory buffers such as whether they are stored in system or device-specific
+memory, whether or not they are physically contiguous, and their cache mode.
+These properties of the memory buffer are not visible to userspace, however the
+`dma-heaps` API is an effort to address this.
+
+After allocation, the client must query the allocator to determine the actual
+modifier selected for the buffer, as well as the per-plane offset and stride.
+Allocators are not permitted to vary the format in use, to select a modifier not
+provided within the acceptable list, nor to vary the pixel dimensions other than
+the padding expressed through offset, stride, and size.
+
+
+Import
+======
+
+To use a buffer within a different context, device, or subsystem, the user
+passes these parameters (format, modifier, width, height, and per-plane offset
+and stride) to an importing API.
+
+Each memory plane is referred to by a buffer handle, which may be unique or
+duplicated within a buffer. For example, a `DRM_FORMAT_NV12` buffer may have the
+luma and chroma buffers combined into a single memory buffer by use of the
+per-plane offset parameters, or they may be completely separate allocations in
+memory. For this reason, each import and allocation API must provide a separate
+handle for each plane.
+
+Each kernel subsystem has its own types and interfaces for buffer management.
+DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
+are not portable between contexts, processes, devices, or subsystems.
+
+To address this, `dma-buf` handles are used as the universal interchange for
+buffers. Subsystem-specific operations are used to export native buffer handles
+to a `dma-buf` file descriptor, and to import those file descriptors into a
+native buffer handle. dma-buf file descriptors can be transferred between
+contexts, processes, devices, and subsystems.
+
+For example, a Wayland media player may use V4L2 to decode a video frame into
+a `DRM_FORMAT_NV12` buffer. This will result in two memory planes (luma and
+chroma) being dequeued by the user from V4L2. These planes are then exported to
+one dma-buf file descriptor per plane, these descriptors are then sent along
+with the metadata (format, modifier, width, height, per-plane offset and stride)
+to the Wayland server. The Wayland server will then import these file
+descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
+through Vulkan, or a `drm_fb` for use through KMS; each of these import
+operations will take the same metadata and convert the dma-buf file descriptors
+into their native buffer handles.
+
+
+Implicit modifiers
+==================
+
+The concept of modifiers post-dates all of the subsystems mentioned above. As
+such, it has been retrofitted into all of these APIs, and in order to ensure
+backwards compatibility, support is needed for drivers and userspace which do
+not (yet) support modifiers.
+
+As an example, GBM is used to allocate buffers to be shared between EGL for
+rendering and KMS for display. It has two entrypoints for allocating buffers:
+`gbm_bo_create` which only takes the format, width, height, and a usage token,
+and `gbm_bo_create_with_modifiers` which extends this with a list of modifiers.
+
+In the latter case, the allocation is as discussed above, being provided with a
+list of acceptable modifiers that the implementation can choose from (or fail if
+it is not possible to allocate within those constraints). In the former case
+where modifiers are not provided, the GBM implementation must make its own
+choice as to what is likely to be the 'best' layout. Such a choice is entirely
+implementation-specific: some will internally use tiled layouts which are not
+CPU-accessible if the implementation decides that is a good idea through
+whatever heuristic. It is the implementation's responsibility to ensure that
+this choice is appropriate.
+
+To support this case where the layout is not known because there is no awareness
+of modifiers, a special `DRM_FORMAT_MOD_INVALID` token has been defined. This
+pseudo-modifier declares that the layout is not known, and that the driver
+should use its own logic to determine what the underlying layout may be.
+
+There are four cases where this token may be used:
+  - during enumeration, an interface may return `DRM_FORMAT_MOD_INVALID`, either
+    as the sole member of a modifier list to declare that explicit modifiers are
+    not supported, or as part of a larger list to declare that implicit modifiers
+    may be used
+  - during allocation, a user may supply `DRM_FORMAT_MOD_INVALID`, either as the
+    sole member of a modifier list (equivalent to not supplying a modifier list
+    at all) to declare that explicit modifiers are not supported and must not be
+    used, or as part of a larger list to declare that an allocation using implicit
+    modifiers is acceptable
+  - in a post-allocation query, an implementation may return
+    `DRM_FORMAT_MOD_INVALID` as the modifier of the allocated buffer to declare
+    that the underlying layout is implementation-defined and that an explicit
+    modifier description is not available; per the above rules, this may only be
+    returned when the user has included `DRM_FORMAT_MOD_INVALID` as part of the
+    list of acceptable modifiers, or not provided a list
+  - when importing a buffer, the user may supply `DRM_FORMAT_MOD_INVALID` as the
+    buffer modifier (or not supply a modifier) to indicate that the modifier is
+    unknown for whatever reason; this is only acceptable when the buffer has
+    not been allocated with an explicit modifier
+
+It follows from this that a buffer chain must be either fully implicit or fully
+explicit. For example, if a user wishes to allocate a buffer for use between
+GPU, display, and media, but the media API does not support modifiers, then the
+user **must not** allocate the buffer with explicit modifiers and attempt to
+import the buffer into the media API with no modifier, but either perform the
+allocation using implicit modifiers, or allocate the buffer for media use
+separately and copy between the two buffers.
+
+As one exception to the above, allocations may be 'upgraded' from implicit
+to explicit modifiers. For example, if the buffer is allocated with
+`gbm_bo_create` (taking no modifiers), the user may then query the modifier with
+`gbm_bo_get_modifier` and then use this modifier as an explicit modifier token
+if a valid modifier is returned.
+
+When allocating buffers for exchange between different users and modifiers are
+not available, implementations are strongly encouraged to use
+`DRM_FORMAT_MOD_LINEAR` for their allocation, as this is the universal baseline
+for exchange.
+
+Any new users - userspace programs and protocols, kernel subsystems, etc -
+wishing to exchange buffers must offer interoperability through dma-buf file
+descriptors for memory planes, DRM format tokens to describe the format, DRM
+format modifiers to describe the layout in memory, at least width and height for
+dimensions, and at least offset and stride for each memory plane.
diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
index b9c1214d8f23..cb12f2654ed7 100644
--- a/Documentation/gpu/index.rst
+++ b/Documentation/gpu/index.rst
@@ -10,6 +10,7 @@ Linux GPU Driver Developer's Guide
    drm-kms
    drm-kms-helpers
    drm-uapi
+   exchanging-pixel-buffers
    driver-uapi
    drm-client
    drivers
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-09-05 12:27 [PATCH] doc: gpu: Add document describing buffer exchange Daniel Stone
@ 2021-09-06 12:28 ` Simon Ser
  2021-11-09  0:18   ` James Jones
  2021-09-06 17:13 ` Robert Beckett
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Simon Ser @ 2021-09-06 12:28 UTC (permalink / raw)
  To: Daniel Stone; +Cc: dri-devel

> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
> using buffers when crossing context/process/device/subsystem boundaries.
>
> This ties up all of dmabuf, formats and modifiers, and their usage.
>
> Signed-off-by: Daniel Stone <daniels@collabora.com>

Thanks a lot for this write-up! This looks very good to me, a few comments
below.

> ---
>
> This is just a quick first draft, inspired by:
>   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
>
> It's not complete or perfect, but I'm off to eat a roast then have a
> nice walk in the sun, so figured it'd be better to dash it off rather
> than let it rot on my hard drive.
>
>
>  .../gpu/exchanging-pixel-buffers.rst          | 285 ++++++++++++++++++
>  Documentation/gpu/index.rst                   |   1 +
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
>
> diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst b/Documentation/gpu/exchanging-pixel-buffers.rst
> new file mode 100644
> index 000000000000..75c4de13d5c8
> --- /dev/null
> +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
> @@ -0,0 +1,285 @@
> +.. Copyright 2021 Collabora Ltd.
> +
> +========================
> +Exchanging pixel buffers
> +========================
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and advice.
> +
> +
> +Formats and modifiers
> +=====================
> +
> +Each buffer must have an underlying format. This format describes the data which
> +can be stored and loaded for each pixel. Although each subsystem has its own
> +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be

RST uses double backticks for inline code blocks (applies to the whole document).

> +reused wherever possible, as they are the standard descriptions used for
> +interchange.

Maybe mention that the canonical source of formats and modifiers can be found
in include/uapi/drm/drm_fourcc.h.

> +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
> +the translation between one or more pixels in memory, and the color data
> +contained within that memory. The number and type of color channels are

Pekka uses the term "color value", which I find a bit better than repeating
"data".

> +described: whether they are RGB or YUV, integer or floating-point, the size
> +of each channel and their locations within the pixel memory, and the
> +relationship between color planes.
> +
> +For example, `DRM_FORMAT_ARGB8888` describes a format in which each pixel has a
> +single 32-bit value in memory. Alpha, red, green, and blue, color channels are
> +available at 8-byte precision per channel, ordered respectively from most to
> +least significant bits in little-endian storage. As a more complex example,
> +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
> +stored in separate memory planes, where the chroma plane is stored at half the
> +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 2x2
> +pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel memory
> +samples, and the actual memory storage for the buffer. The most straightforward
> +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel has
> +contiguous storage beginning at (0,0); each pixel's location in memory will be
> +`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
> +format, and most convenient for CPU access.

Hm, maybe in more simple terms we could explain that the pixels are stored
sequentially row-by-row from the top-left corner to the bottom-right one?

Maybe we can drop the "base" from the formula and say that each pixel's
location in memory will be at offset `y * stride + x * bpp`? Or maybe this is
confusing with offset being mentioned below as an additional parameter?

> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
> +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
> +stores pixels (4,0) to (7,3) inclusive.
> +
> +Some modifiers may modify the number of memory buffers required to store the

Hm. I think that mentioning a "memory buffer" here is a bit confusing. It seems
like this document is about exchanging "pixel buffers", each being composed of
one or more "memory buffers". Maybe we can use "image" instead of "buffer" for
the higher-concept of "bunch of pixel values which can be displayed on screen"?
That would align with user-space APIs like Vulkan and EGL.

> +data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
> +memory buffer to RGB formats in which it stores data about the status of every
> +tile, notably including whether the tile is fully populated with pixel data, or
> +can be expanded from a single solid color.

Is it a requirement that these two memory planes must be separate memory buffers
for I915_FORMAT_MOD_Y_TILED_CCS?

> +These extended layouts are highly vendor-specific, and even specific to
> +particular generations or configurations of devices per-vendor. For this reason,
> +support of modifiers must be explicitly enumerated and negotiated by all users
> +in order to ensure a compatible and optimal pipeline, as discussed below.
> +
> +
> +Dimensions and size
> +===================
> +
> +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
> +to the number of unique samples which can be extracted from, or stored to, the
> +underlying memory storage. For example, even though a 1920x1080
> +`DRM_FORMAT_NV12` buffer has a luma plane containing 1920x1080 samples for the Y
> +component, and 960x540 samples for the U and V components, the overall buffer is
> +still described as having dimensions of 1920x1080.
> +
> +The in-memory storage of a buffer is not guaranteed to begin immediately at the
> +base address of the underlying memory, nor is it guaranteed that the memory
> +storage is tightly clipped to either dimension.
> +
> +Each plane must therefore be described with an `offset` in bytes, which will be
> +added to the base address of the memory storage before performing any per-pixel
> +calculations. This may be used to combine multiple planes into a single pixel
> +buffer; for example, `DRM_FORMAT_NV12` may be stored in a single memory buffer
> +where the luma plane's storage begins immediately at the start of the buffer
> +with an offset of 0, and the chroma plane's storage begins after the offset of
> +the luma plane as expressed through its offset.

"and the chroma plane's storage follows, with its offset set to the size of the
preceding luma plane"

is maybe a bit clearer?

> +Each plane must also have a `stride` in bytes, expressing the offset in memory
> +between two contiguous scanlines. For example, a `DRM_FORMAT_MOD_LINEAR` buffer

Is "scanline" a better word than "row"? I personally find "row" a bit more
descriptive, but maybe "scanline" is technically more accurate.

> +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
> +order to allow for aligned access patterns. In this case, the buffer will still
> +be described with a width of 1000, however the stride will be `1024 * bpp`,
> +indicating that there are 24 pixels at the positive extreme of the x axis whose
> +values are not significant.
> +
> +Buffers may also be padded further in the y dimension, simply by allocating a
> +larger area than would ordinarily be required. For example, many media decoders
> +are not able to natively output buffers of height 1080, but instead require an
> +effective height of 1088 pixels. In this case, the buffer continues to be
> +described as having a height of 1080, with the memory allocation for each buffer
> +being increased to account for the extra padding.
> +
> +
> +Enumeration
> +===========
> +
> +Every user of pixel buffers must be able to enumerate a set of supported formats
> +and modifiers, described together. Within KMS, this is achieved with the
> +`IN_FORMATS` property on each DRM plane, listing the supported DRM formats, and
> +the modifiers supported for each format. In userspace, this is supported through
> +the `EGL_EXT_image_dma_buf_import_modifiers` extension entrypoints for EGL, the
> +`VK_EXT_image_drm_format_modifier` extension for Vulkan, and the
> +`zwp_linux_dmabuf_v1` extension for Wayland.
> +
> +Each of these interfaces allows users to query a set of supported
> +format+modifier combinations.
> +
> +Negotiation
> +===========
> +
> +It is the responsibility of userspace to negotiate an acceptable format+modifier
> +combination for its usage. This is performed through a simple intersection of
> +lists. For example, if a user wants to use Vulkan to render an image to be
> +displayed on a KMS plane, it must:
> +  - query KMS for the `IN_FORMATS` property for the given plane
> +  - query Vulkan for the supported formats for its physical device

… with the right VkImageUsageFlagBits and VkImageCreateFlagBits set? (Just to
make it clear the lists really depend on usage.)

> +  - intersect these formats to determine the most appropriate one
> +  - for this format, intersect the lists of supported modifiers for both KMS and
> +    Vulkan, to obtain a final list of acceptable modifiers for that format
> +
> +This intersection must be performed for all usages. For example, if the user
> +also wishes to encode the image to a video stream, it must query the media API
> +it intends to use for encoding for the set of modifiers it supports, and
> +additionally intersect against this list.
> +
> +If the intersection of all lists is an empty list, it is not possible to share
> +buffers in this way, and an alternate strategy must be considered (e.g. using
> +CPU access routines to copy data between the different uses, with the
> +corresponding performance cost).
> +
> +The resulting modifier list is unsorted; the order is not significant.
> +
> +
> +Allocation
> +==========
> +
> +Once userspace has determined an appropriate format, and corresponding list of
> +acceptable modifiers, it must allocate the buffer. As there is no universal
> +buffer-allocation interface available at either kernel or userspace level, the
> +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
> +a media API.
> +
> +Each allocation request must take, at a minimum: the pixel format, a list of
> +acceptable modifiers, and the buffer's width and height. Each API may extend
> +this set of properties in different ways, such as allowing allocation in more
> +than two dimensions, intended usage patterns, etc.
> +
> +The component which allocates the buffer will make an arbitrary choice of what
> +it considers the 'best' modifier within the acceptable list for the requested
> +allocation, any padding required, and further properties of the underlying
> +memory buffers such as whether they are stored in system or device-specific
> +memory, whether or not they are physically contiguous, and their cache mode.
> +These properties of the memory buffer are not visible to userspace, however the
> +`dma-heaps` API is an effort to address this.
> +
> +After allocation, the client must query the allocator to determine the actual
> +modifier selected for the buffer, as well as the per-plane offset and stride.
> +Allocators are not permitted to vary the format in use, to select a modifier not
> +provided within the acceptable list, nor to vary the pixel dimensions other than
> +the padding expressed through offset, stride, and size.
> +
> +
> +Import
> +======
> +
> +To use a buffer within a different context, device, or subsystem, the user
> +passes these parameters (format, modifier, width, height, and per-plane offset
> +and stride) to an importing API.
> +
> +Each memory plane is referred to by a buffer handle, which may be unique or
> +duplicated within a buffer. For example, a `DRM_FORMAT_NV12` buffer may have the
> +luma and chroma buffers combined into a single memory buffer by use of the
> +per-plane offset parameters, or they may be completely separate allocations in
> +memory. For this reason, each import and allocation API must provide a separate
> +handle for each plane.

Vulkan doesn't quite do this, by default it only allows one memory buffer per
pixel buffer, and requires the driver to implement an additional extension when
the image is "disjoint". Later on, should we mention the inode as a way to
figure out whether all DMA-BUFs refer to the same memory buffer? Or maybe it's
better to mention that in the Vulkan docs…

> +Each kernel subsystem has its own types and interfaces for buffer management.
> +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
> +are not portable between contexts, processes, devices, or subsystems.
> +
> +To address this, `dma-buf` handles are used as the universal interchange for
> +buffers. Subsystem-specific operations are used to export native buffer handles
> +to a `dma-buf` file descriptor, and to import those file descriptors into a
> +native buffer handle. dma-buf file descriptors can be transferred between
> +contexts, processes, devices, and subsystems.
> +
> +For example, a Wayland media player may use V4L2 to decode a video frame into
> +a `DRM_FORMAT_NV12` buffer. This will result in two memory planes (luma and
> +chroma) being dequeued by the user from V4L2. These planes are then exported to
> +one dma-buf file descriptor per plane, these descriptors are then sent along
> +with the metadata (format, modifier, width, height, per-plane offset and stride)
> +to the Wayland server. The Wayland server will then import these file
> +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
> +through Vulkan, or a `drm_fb` for use through KMS; each of these import
> +operations will take the same metadata and convert the dma-buf file descriptors
> +into their native buffer handles.

It would be nice to mention that even if the intersected modifier list wasn't
empty, the import can fail if the buffer doesn't have the right constraints for
the intended usage (e.g. bad alignment).

> +
> +Implicit modifiers
> +==================
> +
> +The concept of modifiers post-dates all of the subsystems mentioned above. As
> +such, it has been retrofitted into all of these APIs, and in order to ensure
> +backwards compatibility, support is needed for drivers and userspace which do
> +not (yet) support modifiers.
> +
> +As an example, GBM is used to allocate buffers to be shared between EGL for
> +rendering and KMS for display. It has two entrypoints for allocating buffers:
> +`gbm_bo_create` which only takes the format, width, height, and a usage token,
> +and `gbm_bo_create_with_modifiers` which extends this with a list of modifiers.
> +
> +In the latter case, the allocation is as discussed above, being provided with a
> +list of acceptable modifiers that the implementation can choose from (or fail if
> +it is not possible to allocate within those constraints). In the former case
> +where modifiers are not provided, the GBM implementation must make its own
> +choice as to what is likely to be the 'best' layout. Such a choice is entirely
> +implementation-specific: some will internally use tiled layouts which are not
> +CPU-accessible if the implementation decides that is a good idea through
> +whatever heuristic. It is the implementation's responsibility to ensure that
> +this choice is appropriate.
> +
> +To support this case where the layout is not known because there is no awareness
> +of modifiers, a special `DRM_FORMAT_MOD_INVALID` token has been defined. This
> +pseudo-modifier declares that the layout is not known, and that the driver
> +should use its own logic to determine what the underlying layout may be.

Just to drive the point home, maybe mention explicitly that INVALID != LINEAR?

> +There are four cases where this token may be used:
> +  - during enumeration, an interface may return `DRM_FORMAT_MOD_INVALID`, either
> +    as the sole member of a modifier list to declare that explicit modifiers are
> +    not supported, or as part of a larger list to declare that implicit modifiers
> +    may be used
> +  - during allocation, a user may supply `DRM_FORMAT_MOD_INVALID`, either as the
> +    sole member of a modifier list (equivalent to not supplying a modifier list
> +    at all) to declare that explicit modifiers are not supported and must not be
> +    used, or as part of a larger list to declare that an allocation using implicit
> +    modifiers is acceptable
> +  - in a post-allocation query, an implementation may return
> +    `DRM_FORMAT_MOD_INVALID` as the modifier of the allocated buffer to declare
> +    that the underlying layout is implementation-defined and that an explicit
> +    modifier description is not available; per the above rules, this may only be
> +    returned when the user has included `DRM_FORMAT_MOD_INVALID` as part of the
> +    list of acceptable modifiers, or not provided a list
> +  - when importing a buffer, the user may supply `DRM_FORMAT_MOD_INVALID` as the
> +    buffer modifier (or not supply a modifier) to indicate that the modifier is
> +    unknown for whatever reason; this is only acceptable when the buffer has
> +    not been allocated with an explicit modifier

These are good rules, but only Wayland uses them. For instance GBM will ignore
INVALID in modifier lists, and iirc KMS will error out if INVALID is supplied
at import time?

> +It follows from this that a buffer chain must be either fully implicit or fully
> +explicit. For example, if a user wishes to allocate a buffer for use between
> +GPU, display, and media, but the media API does not support modifiers, then the
> +user **must not** allocate the buffer with explicit modifiers and attempt to
> +import the buffer into the media API with no modifier, but either perform the
> +allocation using implicit modifiers, or allocate the buffer for media use
> +separately and copy between the two buffers.
> +
> +As one exception to the above, allocations may be 'upgraded' from implicit
> +to explicit modifiers. For example, if the buffer is allocated with
> +`gbm_bo_create` (taking no modifiers), the user may then query the modifier with
> +`gbm_bo_get_modifier` and then use this modifier as an explicit modifier token
> +if a valid modifier is returned.

Hm, I wonder if there's a good use-case for this upgrade? I feel like things
would be simpler without the exception.

> +When allocating buffers for exchange between different users and modifiers are
> +not available, implementations are strongly encouraged to use
> +`DRM_FORMAT_MOD_LINEAR` for their allocation, as this is the universal baseline
> +for exchange.

Maybe spell out that "users" may mean different APIs or different devices.
Sharing a pixel buffer between two separate devices via GBM will only work
if USE_LINEAR is provided.

> +Any new users - userspace programs and protocols, kernel subsystems, etc -
> +wishing to exchange buffers must offer interoperability through dma-buf file
> +descriptors for memory planes, DRM format tokens to describe the format, DRM
> +format modifiers to describe the layout in memory, at least width and height for
> +dimensions, and at least offset and stride for each memory plane.
> diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
> index b9c1214d8f23..cb12f2654ed7 100644
> --- a/Documentation/gpu/index.rst
> +++ b/Documentation/gpu/index.rst
> @@ -10,6 +10,7 @@ Linux GPU Driver Developer's Guide
>     drm-kms
>     drm-kms-helpers
>     drm-uapi
> +   exchanging-pixel-buffers
>     driver-uapi
>     drm-client
>     drivers
> --
> 2.31.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-09-05 12:27 [PATCH] doc: gpu: Add document describing buffer exchange Daniel Stone
  2021-09-06 12:28 ` Simon Ser
@ 2021-09-06 17:13 ` Robert Beckett
  2021-09-08  9:34 ` Pekka Paalanen
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Robert Beckett @ 2021-09-06 17:13 UTC (permalink / raw)
  To: dri-devel



On 05/09/2021 13:27, Daniel Stone wrote:
> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
> using buffers when crossing context/process/device/subsystem boundaries.
> 
> This ties up all of dmabuf, formats and modifiers, and their usage.
> 
> Signed-off-by: Daniel Stone <daniels@collabora.com>
> ---
> 
> This is just a quick first draft, inspired by:
>    https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
> 
> It's not complete or perfect, but I'm off to eat a roast then have a
> nice walk in the sun, so figured it'd be better to dash it off rather
> than let it rot on my hard drive.
> 
> 
>   .../gpu/exchanging-pixel-buffers.rst          | 285 ++++++++++++++++++
>   Documentation/gpu/index.rst                   |   1 +
>   2 files changed, 286 insertions(+)
>   create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
> 
> diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst b/Documentation/gpu/exchanging-pixel-buffers.rst
> new file mode 100644
> index 000000000000..75c4de13d5c8
> --- /dev/null
> +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
> @@ -0,0 +1,285 @@
> +.. Copyright 2021 Collabora Ltd.
> +
> +========================
> +Exchanging pixel buffers
> +========================
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and advice.
> +
> +
> +Formats and modifiers
> +=====================
> +
> +Each buffer must have an underlying format. This format describes the data which
> +can be stored and loaded for each pixel. Although each subsystem has its own
> +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be
> +reused wherever possible, as they are the standard descriptions used for
> +interchange.
> +
> +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
> +the translation between one or more pixels in memory, and the color data
> +contained within that memory. The number and type of color channels are
> +described: whether they are RGB or YUV, integer or floating-point, the size
> +of each channel and their locations within the pixel memory, and the
> +relationship between color planes.
> +
> +For example, `DRM_FORMAT_ARGB8888` describes a format in which each pixel has a
> +single 32-bit value in memory. Alpha, red, green, and blue, color channels are
> +available at 8-byte precision per channel, ordered respectively from most to

think you meant 8-bit there

> +least significant bits in little-endian storage. As a more complex example,
> +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
> +stored in separate memory planes, where the chroma plane is stored at half the
> +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 2x2
> +pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel memory
> +samples, and the actual memory storage for the buffer. The most straightforward
> +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel has
> +contiguous storage beginning at (0,0); each pixel's location in memory will be
> +`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
> +format, and most convenient for CPU access.
> +
> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
> +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
> +stores pixels (4,0) to (7,3) inclusive.
> +
> +Some modifiers may modify the number of memory buffers required to store the
> +data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
> +memory buffer to RGB formats in which it stores data about the status of every
> +tile, notably including whether the tile is fully populated with pixel data, or
> +can be expanded from a single solid color.
> +
> +These extended layouts are highly vendor-specific, and even specific to
> +particular generations or configurations of devices per-vendor. For this reason,
> +support of modifiers must be explicitly enumerated and negotiated by all users
> +in order to ensure a compatible and optimal pipeline, as discussed below.
> +
> +
> +Dimensions and size
> +===================
> +
> +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
> +to the number of unique samples which can be extracted from, or stored to, the
> +underlying memory storage. For example, even though a 1920x1080
> +`DRM_FORMAT_NV12` buffer has a luma plane containing 1920x1080 samples for the Y
> +component, and 960x540 samples for the U and V components, the overall buffer is
> +still described as having dimensions of 1920x1080.
> +
> +The in-memory storage of a buffer is not guaranteed to begin immediately at the
> +base address of the underlying memory, nor is it guaranteed that the memory
> +storage is tightly clipped to either dimension.
> +
> +Each plane must therefore be described with an `offset` in bytes, which will be
> +added to the base address of the memory storage before performing any per-pixel
> +calculations. This may be used to combine multiple planes into a single pixel
> +buffer; for example, `DRM_FORMAT_NV12` may be stored in a single memory buffer
> +where the luma plane's storage begins immediately at the start of the buffer
> +with an offset of 0, and the chroma plane's storage begins after the offset of
> +the luma plane as expressed through its offset.
> +
> +Each plane must also have a `stride` in bytes, expressing the offset in memory
> +between two contiguous scanlines. For example, a `DRM_FORMAT_MOD_LINEAR` buffer
> +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
> +order to allow for aligned access patterns. In this case, the buffer will still
> +be described with a width of 1000, however the stride will be `1024 * bpp`,
> +indicating that there are 24 pixels at the positive extreme of the x axis whose
> +values are not significant.
> +
> +Buffers may also be padded further in the y dimension, simply by allocating a
> +larger area than would ordinarily be required. For example, many media decoders
> +are not able to natively output buffers of height 1080, but instead require an
> +effective height of 1088 pixels. In this case, the buffer continues to be
> +described as having a height of 1080, with the memory allocation for each buffer
> +being increased to account for the extra padding.
> +
> +
> +Enumeration
> +===========
> +
> +Every user of pixel buffers must be able to enumerate a set of supported formats
> +and modifiers, described together. Within KMS, this is achieved with the
> +`IN_FORMATS` property on each DRM plane, listing the supported DRM formats, and
> +the modifiers supported for each format. In userspace, this is supported through
> +the `EGL_EXT_image_dma_buf_import_modifiers` extension entrypoints for EGL, the
> +`VK_EXT_image_drm_format_modifier` extension for Vulkan, and the
> +`zwp_linux_dmabuf_v1` extension for Wayland.
> +
> +Each of these interfaces allows users to query a set of supported
> +format+modifier combinations.
> +
> +
> +Negotiation
> +===========
> +
> +It is the responsibility of userspace to negotiate an acceptable format+modifier
> +combination for its usage. This is performed through a simple intersection of
> +lists. For example, if a user wants to use Vulkan to render an image to be
> +displayed on a KMS plane, it must:
> +  - query KMS for the `IN_FORMATS` property for the given plane
> +  - query Vulkan for the supported formats for its physical device
> +  - intersect these formats to determine the most appropriate one
> +  - for this format, intersect the lists of supported modifiers for both KMS and
> +    Vulkan, to obtain a final list of acceptable modifiers for that format
> +
> +This intersection must be performed for all usages. For example, if the user
> +also wishes to encode the image to a video stream, it must query the media API
> +it intends to use for encoding for the set of modifiers it supports, and
> +additionally intersect against this list.
> +
> +If the intersection of all lists is an empty list, it is not possible to share
> +buffers in this way, and an alternate strategy must be considered (e.g. using
> +CPU access routines to copy data between the different uses, with the
> +corresponding performance cost).
> +
> +The resulting modifier list is unsorted; the order is not significant.
> +
> +
> +Allocation
> +==========
> +
> +Once userspace has determined an appropriate format, and corresponding list of
> +acceptable modifiers, it must allocate the buffer. As there is no universal
> +buffer-allocation interface available at either kernel or userspace level, the
> +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
> +a media API.
> +
> +Each allocation request must take, at a minimum: the pixel format, a list of
> +acceptable modifiers, and the buffer's width and height. Each API may extend
> +this set of properties in different ways, such as allowing allocation in more
> +than two dimensions, intended usage patterns, etc.
> +
> +The component which allocates the buffer will make an arbitrary choice of what
> +it considers the 'best' modifier within the acceptable list for the requested
> +allocation, any padding required, and further properties of the underlying
> +memory buffers such as whether they are stored in system or device-specific
> +memory, whether or not they are physically contiguous, and their cache mode.
> +These properties of the memory buffer are not visible to userspace, however the
> +`dma-heaps` API is an effort to address this.
> +
> +After allocation, the client must query the allocator to determine the actual
> +modifier selected for the buffer, as well as the per-plane offset and stride.
> +Allocators are not permitted to vary the format in use, to select a modifier not
> +provided within the acceptable list, nor to vary the pixel dimensions other than
> +the padding expressed through offset, stride, and size.
> +
> +
> +Import
> +======
> +
> +To use a buffer within a different context, device, or subsystem, the user
> +passes these parameters (format, modifier, width, height, and per-plane offset
> +and stride) to an importing API.
> +
> +Each memory plane is referred to by a buffer handle, which may be unique or
> +duplicated within a buffer. For example, a `DRM_FORMAT_NV12` buffer may have the
> +luma and chroma buffers combined into a single memory buffer by use of the
> +per-plane offset parameters, or they may be completely separate allocations in
> +memory. For this reason, each import and allocation API must provide a separate
> +handle for each plane.
> +
> +Each kernel subsystem has its own types and interfaces for buffer management.
> +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
> +are not portable between contexts, processes, devices, or subsystems.
> +
> +To address this, `dma-buf` handles are used as the universal interchange for
> +buffers. Subsystem-specific operations are used to export native buffer handles
> +to a `dma-buf` file descriptor, and to import those file descriptors into a
> +native buffer handle. dma-buf file descriptors can be transferred between
> +contexts, processes, devices, and subsystems.
> +
> +For example, a Wayland media player may use V4L2 to decode a video frame into
> +a `DRM_FORMAT_NV12` buffer. This will result in two memory planes (luma and
> +chroma) being dequeued by the user from V4L2. These planes are then exported to
> +one dma-buf file descriptor per plane, these descriptors are then sent along
> +with the metadata (format, modifier, width, height, per-plane offset and stride)
> +to the Wayland server. The Wayland server will then import these file
> +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
> +through Vulkan, or a `drm_fb` for use through KMS; each of these import
> +operations will take the same metadata and convert the dma-buf file descriptors
> +into their native buffer handles.
> +
> +
> +Implicit modifiers
> +==================
> +
> +The concept of modifiers post-dates all of the subsystems mentioned above. As
> +such, it has been retrofitted into all of these APIs, and in order to ensure
> +backwards compatibility, support is needed for drivers and userspace which do
> +not (yet) support modifiers.
> +
> +As an example, GBM is used to allocate buffers to be shared between EGL for
> +rendering and KMS for display. It has two entrypoints for allocating buffers:
> +`gbm_bo_create` which only takes the format, width, height, and a usage token,
> +and `gbm_bo_create_with_modifiers` which extends this with a list of modifiers.
> +
> +In the latter case, the allocation is as discussed above, being provided with a
> +list of acceptable modifiers that the implementation can choose from (or fail if
> +it is not possible to allocate within those constraints). In the former case
> +where modifiers are not provided, the GBM implementation must make its own
> +choice as to what is likely to be the 'best' layout. Such a choice is entirely
> +implementation-specific: some will internally use tiled layouts which are not
> +CPU-accessible if the implementation decides that is a good idea through
> +whatever heuristic. It is the implementation's responsibility to ensure that
> +this choice is appropriate.
> +
> +To support this case where the layout is not known because there is no awareness
> +of modifiers, a special `DRM_FORMAT_MOD_INVALID` token has been defined. This
> +pseudo-modifier declares that the layout is not known, and that the driver
> +should use its own logic to determine what the underlying layout may be.
> +
> +There are four cases where this token may be used:
> +  - during enumeration, an interface may return `DRM_FORMAT_MOD_INVALID`, either
> +    as the sole member of a modifier list to declare that explicit modifiers are
> +    not supported, or as part of a larger list to declare that implicit modifiers
> +    may be used
> +  - during allocation, a user may supply `DRM_FORMAT_MOD_INVALID`, either as the
> +    sole member of a modifier list (equivalent to not supplying a modifier list
> +    at all) to declare that explicit modifiers are not supported and must not be
> +    used, or as part of a larger list to declare that an allocation using implicit
> +    modifiers is acceptable
> +  - in a post-allocation query, an implementation may return
> +    `DRM_FORMAT_MOD_INVALID` as the modifier of the allocated buffer to declare
> +    that the underlying layout is implementation-defined and that an explicit
> +    modifier description is not available; per the above rules, this may only be
> +    returned when the user has included `DRM_FORMAT_MOD_INVALID` as part of the
> +    list of acceptable modifiers, or not provided a list
> +  - when importing a buffer, the user may supply `DRM_FORMAT_MOD_INVALID` as the
> +    buffer modifier (or not supply a modifier) to indicate that the modifier is
> +    unknown for whatever reason; this is only acceptable when the buffer has
> +    not been allocated with an explicit modifier
> +
> +It follows from this that a buffer chain must be either fully implicit or fully
> +explicit. For example, if a user wishes to allocate a buffer for use between
> +GPU, display, and media, but the media API does not support modifiers, then the
> +user **must not** allocate the buffer with explicit modifiers and attempt to
> +import the buffer into the media API with no modifier, but either perform the
> +allocation using implicit modifiers, or allocate the buffer for media use
> +separately and copy between the two buffers.
> +
> +As one exception to the above, allocations may be 'upgraded' from implicit
> +to explicit modifiers. For example, if the buffer is allocated with
> +`gbm_bo_create` (taking no modifiers), the user may then query the modifier with
> +`gbm_bo_get_modifier` and then use this modifier as an explicit modifier token
> +if a valid modifier is returned.
> +
> +When allocating buffers for exchange between different users and modifiers are
> +not available, implementations are strongly encouraged to use
> +`DRM_FORMAT_MOD_LINEAR` for their allocation, as this is the universal baseline
> +for exchange.
> +
> +Any new users - userspace programs and protocols, kernel subsystems, etc -
> +wishing to exchange buffers must offer interoperability through dma-buf file
> +descriptors for memory planes, DRM format tokens to describe the format, DRM
> +format modifiers to describe the layout in memory, at least width and height for
> +dimensions, and at least offset and stride for each memory plane.
> diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
> index b9c1214d8f23..cb12f2654ed7 100644
> --- a/Documentation/gpu/index.rst
> +++ b/Documentation/gpu/index.rst
> @@ -10,6 +10,7 @@ Linux GPU Driver Developer's Guide
>      drm-kms
>      drm-kms-helpers
>      drm-uapi
> +   exchanging-pixel-buffers
>      driver-uapi
>      drm-client
>      drivers
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-09-05 12:27 [PATCH] doc: gpu: Add document describing buffer exchange Daniel Stone
  2021-09-06 12:28 ` Simon Ser
  2021-09-06 17:13 ` Robert Beckett
@ 2021-09-08  9:34 ` Pekka Paalanen
  2021-09-08  9:44   ` Simon Ser
  2021-09-08 18:16 ` Daniel Vetter
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Pekka Paalanen @ 2021-09-08  9:34 UTC (permalink / raw)
  To: Daniel Stone; +Cc: dri-devel, Simon Ser, Robert Beckett

[-- Attachment #1: Type: text/plain, Size: 21933 bytes --]

On Sun,  5 Sep 2021 13:27:42 +0100
Daniel Stone <daniels@collabora.com> wrote:

> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
> using buffers when crossing context/process/device/subsystem boundaries.
> 
> This ties up all of dmabuf, formats and modifiers, and their usage.
> 
> Signed-off-by: Daniel Stone <daniels@collabora.com>

Hi,

I checked the comments from Simon and Bob, and I agree with them. Below
are some more from me.

There is room for adding a glossary for the terms, like what is the
difference between a buffer, pixel buffer and a memory buffer, and
things like pixel data, color value, stride, etc.

For example:

image
	Conceptually a two-dimensional array of pixels. The pixels may
	be stored in one or more memory buffers. Has width and height
	in pixels, pixel format and modifier (implicit or explicit).

memory buffer
	A piece of memory for storing (parts of) pixel data. Has stride
	and size in bytes and at least one handle in some API. May
	contain one or more planes.

plane
	A two-dimensional array of some or all of an image's color and
	alpha channel values.

pixel
	A picture element. Has a single color value which is defined by
	one or more color channels values, e.g. R, G and B, or Y, Cb
	and Cr. May also have an alpha value as an additional
	channel.

pixel data
	Bytes or bits that represent some or all of the color/alpha
	channel values of a pixel or an image. The data for one pixel
	may be spread over several planes or memory buffers depending
	on format and modifier.

color value
	A tuple of numbers, representing a color. Each element in the
	tuple is a color channel value.

color channel
	One of the dimensions in a color model. For example, RGB model
	has channels R, G, and B. Alpha channel is sometimes counted as
	a color channel as well.

pixel format
	A description of how pixel data represents the pixel's color
	and alpha values.

modifier
	A description of how pixel data is laid out in memory buffers.

alpha
	A value that denotes the color coverage in a pixel. Sometimes
	used for translucency instead.

stride
	????


> ---
> 
> This is just a quick first draft, inspired by:
>   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
> 
> It's not complete or perfect, but I'm off to eat a roast then have a
> nice walk in the sun, so figured it'd be better to dash it off rather
> than let it rot on my hard drive.

For a quick draft, this is quite excellent.

> 
>  .../gpu/exchanging-pixel-buffers.rst          | 285 ++++++++++++++++++
>  Documentation/gpu/index.rst                   |   1 +
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
> 
> diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst b/Documentation/gpu/exchanging-pixel-buffers.rst
> new file mode 100644
> index 000000000000..75c4de13d5c8
> --- /dev/null
> +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
> @@ -0,0 +1,285 @@
> +.. Copyright 2021 Collabora Ltd.
> +
> +========================
> +Exchanging pixel buffers
> +========================
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and advice.
> +
> +
> +Formats and modifiers
> +=====================
> +
> +Each buffer must have an underlying format. This format describes the data which
> +can be stored and loaded for each pixel. Although each subsystem has its own
> +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be
> +reused wherever possible, as they are the standard descriptions used for
> +interchange.
> +
> +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
> +the translation between one or more pixels in memory, and the color data
> +contained within that memory. The number and type of color channels are
> +described: whether they are RGB or YUV, integer or floating-point, the size
> +of each channel and their locations within the pixel memory, and the
> +relationship between color planes.
> +
> +For example, `DRM_FORMAT_ARGB8888` describes a format in which each pixel has a
> +single 32-bit value in memory. Alpha, red, green, and blue, color channels are
> +available at 8-byte precision per channel, ordered respectively from most to

8-bit

> +least significant bits in little-endian storage. As a more complex example,

I'd add something like:

	DRM_FORMAT_* definitions do not depend on CPU or
	device endianness, the byte pattern in memory is always as
	described in the format definition, usually little-endian.

That's the consensus nowadays, right?

> +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
> +stored in separate memory planes, where the chroma plane is stored at half the
> +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 2x2
> +pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel memory
> +samples, and the actual memory storage for the buffer. The most straightforward
> +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel has
> +contiguous storage beginning at (0,0); each pixel's location in memory will be
> +`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
> +format, and most convenient for CPU access.
> +
> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
> +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
> +stores pixels (4,0) to (7,3) inclusive.
> +
> +Some modifiers may modify the number of memory buffers required to store the
> +data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
> +memory buffer to RGB formats in which it stores data about the status of every
> +tile, notably including whether the tile is fully populated with pixel data, or
> +can be expanded from a single solid color.
> +
> +These extended layouts are highly vendor-specific, and even specific to
> +particular generations or configurations of devices per-vendor. For this reason,
> +support of modifiers must be explicitly enumerated and negotiated by all users
> +in order to ensure a compatible and optimal pipeline, as discussed below.
> +

Btw. there was a fun argument whether the same modifier value could
mean different things on different devices. There were also arguments
that a certain modifier could reference additional implicit memory on
the device - memory that can only be accessed by very specific devices.

I think AMLOGIC_FBC_LAYOUT_SCATTER was one of those.


> +
> +Dimensions and size
> +===================
> +
> +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
> +to the number of unique samples which can be extracted from, or stored to, the
> +underlying memory storage. For example, even though a 1920x1080
> +`DRM_FORMAT_NV12` buffer has a luma plane containing 1920x1080 samples for the Y
> +component, and 960x540 samples for the U and V components, the overall buffer is
> +still described as having dimensions of 1920x1080.
> +
> +The in-memory storage of a buffer is not guaranteed to begin immediately at the
> +base address of the underlying memory, nor is it guaranteed that the memory
> +storage is tightly clipped to either dimension.
> +
> +Each plane must therefore be described with an `offset` in bytes, which will be
> +added to the base address of the memory storage before performing any per-pixel
> +calculations. This may be used to combine multiple planes into a single pixel
> +buffer; for example, `DRM_FORMAT_NV12` may be stored in a single memory buffer
> +where the luma plane's storage begins immediately at the start of the buffer
> +with an offset of 0, and the chroma plane's storage begins after the offset of
> +the luma plane as expressed through its offset.

I think it should be: "the chroma plane's storage begins after the luma
plane as expressed through its offset."

That is, drop "the offset of".

Or what Simon said.

> +
> +Each plane must also have a `stride` in bytes, expressing the offset in memory
> +between two contiguous scanlines. For example, a `DRM_FORMAT_MOD_LINEAR` buffer
> +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
> +order to allow for aligned access patterns. In this case, the buffer will still
> +be described with a width of 1000, however the stride will be `1024 * bpp`,
> +indicating that there are 24 pixels at the positive extreme of the x axis whose
> +values are not significant.

There was another fun discussion just recently about what is stride for
tiled layouts:
https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/687

Does one need to understand the modifier before you can do buffer size
consistency check with stride and height?


> +
> +Buffers may also be padded further in the y dimension, simply by allocating a
> +larger area than would ordinarily be required. For example, many media decoders
> +are not able to natively output buffers of height 1080, but instead require an
> +effective height of 1088 pixels. In this case, the buffer continues to be
> +described as having a height of 1080, with the memory allocation for each buffer
> +being increased to account for the extra padding.
> +
> +
> +Enumeration
> +===========
> +
> +Every user of pixel buffers must be able to enumerate a set of supported formats
> +and modifiers, described together. Within KMS, this is achieved with the
> +`IN_FORMATS` property on each DRM plane, listing the supported DRM formats, and
> +the modifiers supported for each format. In userspace, this is supported through
> +the `EGL_EXT_image_dma_buf_import_modifiers` extension entrypoints for EGL, the
> +`VK_EXT_image_drm_format_modifier` extension for Vulkan, and the
> +`zwp_linux_dmabuf_v1` extension for Wayland.
> +
> +Each of these interfaces allows users to query a set of supported
> +format+modifier combinations.
> +
> +
> +Negotiation
> +===========
> +
> +It is the responsibility of userspace to negotiate an acceptable format+modifier
> +combination for its usage. This is performed through a simple intersection of
> +lists. For example, if a user wants to use Vulkan to render an image to be
> +displayed on a KMS plane, it must:
> +  - query KMS for the `IN_FORMATS` property for the given plane
> +  - query Vulkan for the supported formats for its physical device
> +  - intersect these formats to determine the most appropriate one
> +  - for this format, intersect the lists of supported modifiers for both KMS and
> +    Vulkan, to obtain a final list of acceptable modifiers for that format
> +
> +This intersection must be performed for all usages. For example, if the user
> +also wishes to encode the image to a video stream, it must query the media API
> +it intends to use for encoding for the set of modifiers it supports, and
> +additionally intersect against this list.
> +
> +If the intersection of all lists is an empty list, it is not possible to share
> +buffers in this way, and an alternate strategy must be considered (e.g. using
> +CPU access routines to copy data between the different uses, with the
> +corresponding performance cost).
> +
> +The resulting modifier list is unsorted; the order is not significant.
> +
> +
> +Allocation
> +==========
> +
> +Once userspace has determined an appropriate format, and corresponding list of
> +acceptable modifiers, it must allocate the buffer. As there is no universal
> +buffer-allocation interface available at either kernel or userspace level, the
> +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
> +a media API.
> +
> +Each allocation request must take, at a minimum: the pixel format, a list of
> +acceptable modifiers, and the buffer's width and height. Each API may extend
> +this set of properties in different ways, such as allowing allocation in more
> +than two dimensions, intended usage patterns, etc.
> +
> +The component which allocates the buffer will make an arbitrary choice of what
> +it considers the 'best' modifier within the acceptable list for the requested
> +allocation, any padding required, and further properties of the underlying
> +memory buffers such as whether they are stored in system or device-specific
> +memory, whether or not they are physically contiguous, and their cache mode.
> +These properties of the memory buffer are not visible to userspace, however the
> +`dma-heaps` API is an effort to address this.
> +
> +After allocation, the client must query the allocator to determine the actual
> +modifier selected for the buffer, as well as the per-plane offset and stride.
> +Allocators are not permitted to vary the format in use, to select a modifier not
> +provided within the acceptable list, nor to vary the pixel dimensions other than
> +the padding expressed through offset, stride, and size.
> +
> +
> +Import
> +======
> +
> +To use a buffer within a different context, device, or subsystem, the user
> +passes these parameters (format, modifier, width, height, and per-plane offset
> +and stride) to an importing API.
> +
> +Each memory plane is referred to by a buffer handle, which may be unique or
> +duplicated within a buffer. For example, a `DRM_FORMAT_NV12` buffer may have the
> +luma and chroma buffers combined into a single memory buffer by use of the
> +per-plane offset parameters, or they may be completely separate allocations in
> +memory. For this reason, each import and allocation API must provide a separate
> +handle for each plane.
> +
> +Each kernel subsystem has its own types and interfaces for buffer management.
> +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
> +are not portable between contexts, processes, devices, or subsystems.
> +
> +To address this, `dma-buf` handles are used as the universal interchange for
> +buffers. Subsystem-specific operations are used to export native buffer handles
> +to a `dma-buf` file descriptor, and to import those file descriptors into a
> +native buffer handle. dma-buf file descriptors can be transferred between
> +contexts, processes, devices, and subsystems.
> +
> +For example, a Wayland media player may use V4L2 to decode a video frame into
> +a `DRM_FORMAT_NV12` buffer. This will result in two memory planes (luma and
> +chroma) being dequeued by the user from V4L2. These planes are then exported to
> +one dma-buf file descriptor per plane, these descriptors are then sent along
> +with the metadata (format, modifier, width, height, per-plane offset and stride)
> +to the Wayland server. The Wayland server will then import these file
> +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
> +through Vulkan, or a `drm_fb` for use through KMS; each of these import

libdrm uses just an uint32_t for the FB. drm_fb is a Weston thing only?

> +operations will take the same metadata and convert the dma-buf file descriptors
> +into their native buffer handles.
> +
> +
> +Implicit modifiers
> +==================
> +
> +The concept of modifiers post-dates all of the subsystems mentioned above. As
> +such, it has been retrofitted into all of these APIs, and in order to ensure
> +backwards compatibility, support is needed for drivers and userspace which do
> +not (yet) support modifiers.
> +
> +As an example, GBM is used to allocate buffers to be shared between EGL for
> +rendering and KMS for display. It has two entrypoints for allocating buffers:
> +`gbm_bo_create` which only takes the format, width, height, and a usage token,
> +and `gbm_bo_create_with_modifiers` which extends this with a list of modifiers.
> +
> +In the latter case, the allocation is as discussed above, being provided with a
> +list of acceptable modifiers that the implementation can choose from (or fail if
> +it is not possible to allocate within those constraints). In the former case
> +where modifiers are not provided, the GBM implementation must make its own
> +choice as to what is likely to be the 'best' layout. Such a choice is entirely
> +implementation-specific: some will internally use tiled layouts which are not
> +CPU-accessible if the implementation decides that is a good idea through
> +whatever heuristic. It is the implementation's responsibility to ensure that
> +this choice is appropriate.
> +
> +To support this case where the layout is not known because there is no awareness
> +of modifiers, a special `DRM_FORMAT_MOD_INVALID` token has been defined. This
> +pseudo-modifier declares that the layout is not known, and that the driver
> +should use its own logic to determine what the underlying layout may be.
> +
> +There are four cases where this token may be used:
> +  - during enumeration, an interface may return `DRM_FORMAT_MOD_INVALID`, either
> +    as the sole member of a modifier list to declare that explicit modifiers are
> +    not supported, or as part of a larger list to declare that implicit modifiers
> +    may be used
> +  - during allocation, a user may supply `DRM_FORMAT_MOD_INVALID`, either as the
> +    sole member of a modifier list (equivalent to not supplying a modifier list
> +    at all) to declare that explicit modifiers are not supported and must not be
> +    used, or as part of a larger list to declare that an allocation using implicit
> +    modifiers is acceptable
> +  - in a post-allocation query, an implementation may return
> +    `DRM_FORMAT_MOD_INVALID` as the modifier of the allocated buffer to declare
> +    that the underlying layout is implementation-defined and that an explicit
> +    modifier description is not available; per the above rules, this may only be
> +    returned when the user has included `DRM_FORMAT_MOD_INVALID` as part of the
> +    list of acceptable modifiers, or not provided a list
> +  - when importing a buffer, the user may supply `DRM_FORMAT_MOD_INVALID` as the
> +    buffer modifier (or not supply a modifier) to indicate that the modifier is
> +    unknown for whatever reason; this is only acceptable when the buffer has
> +    not been allocated with an explicit modifier
> +
> +It follows from this that a buffer chain must be either fully implicit or fully

"a buffer operations chain" perhaps?

This is about that one specific buffer, not a chain of buffers. Well,
depending on what you mean by buffer... I think the buffer is always
the same even though it may have many different handles and
representations in different APIs simultaneously.

> +explicit. For example, if a user wishes to allocate a buffer for use between
> +GPU, display, and media, but the media API does not support modifiers, then the
> +user **must not** allocate the buffer with explicit modifiers and attempt to
> +import the buffer into the media API with no modifier, but either perform the
> +allocation using implicit modifiers, or allocate the buffer for media use
> +separately and copy between the two buffers.
> +
> +As one exception to the above, allocations may be 'upgraded' from implicit
> +to explicit modifiers. For example, if the buffer is allocated with
> +`gbm_bo_create` (taking no modifiers), the user may then query the modifier with
> +`gbm_bo_get_modifier` and then use this modifier as an explicit modifier token
> +if a valid modifier is returned.
> +
> +When allocating buffers for exchange between different users and modifiers are
> +not available, implementations are strongly encouraged to use
> +`DRM_FORMAT_MOD_LINEAR` for their allocation, as this is the universal baseline
> +for exchange.

Here I might point out something like:

However, it is possible that importing a buffer with implicit modifier
to another device or subsystem than where it was allocated results in
incorrect interpretation of the buffer contents. Therefore generic
userspace should avoid attempting that.

> +
> +Any new users - userspace programs and protocols, kernel subsystems, etc -
> +wishing to exchange buffers must offer interoperability through dma-buf file
> +descriptors for memory planes, DRM format tokens to describe the format, DRM
> +format modifiers to describe the layout in memory, at least width and height for
> +dimensions, and at least offset and stride for each memory plane.
> diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
> index b9c1214d8f23..cb12f2654ed7 100644
> --- a/Documentation/gpu/index.rst
> +++ b/Documentation/gpu/index.rst
> @@ -10,6 +10,7 @@ Linux GPU Driver Developer's Guide
>     drm-kms
>     drm-kms-helpers
>     drm-uapi
> +   exchanging-pixel-buffers
>     driver-uapi
>     drm-client
>     drivers

Really nice write-up!

Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-09-08  9:34 ` Pekka Paalanen
@ 2021-09-08  9:44   ` Simon Ser
  2021-11-09  0:21     ` James Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Ser @ 2021-09-08  9:44 UTC (permalink / raw)
  To: Pekka Paalanen; +Cc: Daniel Stone, dri-devel, Robert Beckett

> stride
> 	????

I think what's clear is:

- Per-plane property
- In bytes
- Offset between two consecutive rows

How that applies to weird YUV formats is the tricky question…

> Btw. there was a fun argument whether the same modifier value could
> mean different things on different devices. There were also arguments
> that a certain modifier could reference additional implicit memory on
> the device - memory that can only be accessed by very specific devices.
>
> I think AMLOGIC_FBC_LAYOUT_SCATTER was one of those.

A recent exmaple of this is [1].

[1]: https://patchwork.freedesktop.org/patch/452461/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-09-05 12:27 [PATCH] doc: gpu: Add document describing buffer exchange Daniel Stone
                   ` (2 preceding siblings ...)
  2021-09-08  9:34 ` Pekka Paalanen
@ 2021-09-08 18:16 ` Daniel Vetter
  2023-08-03 15:47 ` [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics Daniel Stone
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2021-09-08 18:16 UTC (permalink / raw)
  To: Daniel Stone; +Cc: dri-devel

On Sun, Sep 05, 2021 at 01:27:42PM +0100, Daniel Stone wrote:
> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
> using buffers when crossing context/process/device/subsystem boundaries.
> 
> This ties up all of dmabuf, formats and modifiers, and their usage.
> 
> Signed-off-by: Daniel Stone <daniels@collabora.com>
> ---
> 
> This is just a quick first draft, inspired by:
>   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
> 
> It's not complete or perfect, but I'm off to eat a roast then have a
> nice walk in the sun, so figured it'd be better to dash it off rather
> than let it rot on my hard drive.
> 
> 
>  .../gpu/exchanging-pixel-buffers.rst          | 285 ++++++++++++++++++

I think we should stuff this into the dma-buf.rst page instead of hiding
it in gpu?

Maybe then link to it from everywhere, so from a the prime stuff in gpu,
and from whatever doc there is for the v4l import/export ioctls.

>  Documentation/gpu/index.rst                   |   1 +
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
> 
> diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst b/Documentation/gpu/exchanging-pixel-buffers.rst
> new file mode 100644
> index 000000000000..75c4de13d5c8
> --- /dev/null
> +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
> @@ -0,0 +1,285 @@
> +.. Copyright 2021 Collabora Ltd.
> +
> +========================
> +Exchanging pixel buffers
> +========================
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and advice.
> +
> +
> +Formats and modifiers
> +=====================
> +
> +Each buffer must have an underlying format. This format describes the data which
> +can be stored and loaded for each pixel. Although each subsystem has its own
> +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be
> +reused wherever possible, as they are the standard descriptions used for
> +interchange.
> +
> +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
> +the translation between one or more pixels in memory, and the color data
> +contained within that memory. The number and type of color channels are
> +described: whether they are RGB or YUV, integer or floating-point, the size
> +of each channel and their locations within the pixel memory, and the
> +relationship between color planes.
> +
> +For example, `DRM_FORMAT_ARGB8888` describes a format in which each pixel has a
> +single 32-bit value in memory. Alpha, red, green, and blue, color channels are
> +available at 8-byte precision per channel, ordered respectively from most to
> +least significant bits in little-endian storage. As a more complex example,
> +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
> +stored in separate memory planes, where the chroma plane is stored at half the
> +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 2x2
> +pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel memory
> +samples, and the actual memory storage for the buffer. The most straightforward
> +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel has
> +contiguous storage beginning at (0,0); each pixel's location in memory will be
> +`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
> +format, and most convenient for CPU access.
> +
> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
> +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
> +stores pixels (4,0) to (7,3) inclusive.
> +
> +Some modifiers may modify the number of memory buffers required to store the
> +data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
> +memory buffer to RGB formats in which it stores data about the status of every
> +tile, notably including whether the tile is fully populated with pixel data, or
> +can be expanded from a single solid color.
> +
> +These extended layouts are highly vendor-specific, and even specific to
> +particular generations or configurations of devices per-vendor. For this reason,
> +support of modifiers must be explicitly enumerated and negotiated by all users
> +in order to ensure a compatible and optimal pipeline, as discussed below.
> +
> +
> +Dimensions and size
> +===================
> +
> +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
> +to the number of unique samples which can be extracted from, or stored to, the
> +underlying memory storage. For example, even though a 1920x1080
> +`DRM_FORMAT_NV12` buffer has a luma plane containing 1920x1080 samples for the Y
> +component, and 960x540 samples for the U and V components, the overall buffer is
> +still described as having dimensions of 1920x1080.
> +
> +The in-memory storage of a buffer is not guaranteed to begin immediately at the
> +base address of the underlying memory, nor is it guaranteed that the memory
> +storage is tightly clipped to either dimension.
> +
> +Each plane must therefore be described with an `offset` in bytes, which will be
> +added to the base address of the memory storage before performing any per-pixel
> +calculations. This may be used to combine multiple planes into a single pixel
> +buffer; for example, `DRM_FORMAT_NV12` may be stored in a single memory buffer
> +where the luma plane's storage begins immediately at the start of the buffer
> +with an offset of 0, and the chroma plane's storage begins after the offset of
> +the luma plane as expressed through its offset.
> +
> +Each plane must also have a `stride` in bytes, expressing the offset in memory
> +between two contiguous scanlines. For example, a `DRM_FORMAT_MOD_LINEAR` buffer
> +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
> +order to allow for aligned access patterns. In this case, the buffer will still
> +be described with a width of 1000, however the stride will be `1024 * bpp`,
> +indicating that there are 24 pixels at the positive extreme of the x axis whose
> +values are not significant.
> +
> +Buffers may also be padded further in the y dimension, simply by allocating a
> +larger area than would ordinarily be required. For example, many media decoders
> +are not able to natively output buffers of height 1080, but instead require an
> +effective height of 1088 pixels. In this case, the buffer continues to be
> +described as having a height of 1080, with the memory allocation for each buffer
> +being increased to account for the extra padding.
> +
> +
> +Enumeration
> +===========
> +
> +Every user of pixel buffers must be able to enumerate a set of supported formats
> +and modifiers, described together. Within KMS, this is achieved with the
> +`IN_FORMATS` property on each DRM plane, listing the supported DRM formats, and
> +the modifiers supported for each format. In userspace, this is supported through
> +the `EGL_EXT_image_dma_buf_import_modifiers` extension entrypoints for EGL, the
> +`VK_EXT_image_drm_format_modifier` extension for Vulkan, and the
> +`zwp_linux_dmabuf_v1` extension for Wayland.
> +
> +Each of these interfaces allows users to query a set of supported
> +format+modifier combinations.
> +
> +
> +Negotiation
> +===========
> +
> +It is the responsibility of userspace to negotiate an acceptable format+modifier
> +combination for its usage. This is performed through a simple intersection of
> +lists. For example, if a user wants to use Vulkan to render an image to be
> +displayed on a KMS plane, it must:
> +  - query KMS for the `IN_FORMATS` property for the given plane
> +  - query Vulkan for the supported formats for its physical device
> +  - intersect these formats to determine the most appropriate one
> +  - for this format, intersect the lists of supported modifiers for both KMS and
> +    Vulkan, to obtain a final list of acceptable modifiers for that format
> +
> +This intersection must be performed for all usages. For example, if the user
> +also wishes to encode the image to a video stream, it must query the media API
> +it intends to use for encoding for the set of modifiers it supports, and
> +additionally intersect against this list.
> +
> +If the intersection of all lists is an empty list, it is not possible to share
> +buffers in this way, and an alternate strategy must be considered (e.g. using
> +CPU access routines to copy data between the different uses, with the
> +corresponding performance cost).
> +
> +The resulting modifier list is unsorted; the order is not significant.
> +
> +
> +Allocation
> +==========
> +
> +Once userspace has determined an appropriate format, and corresponding list of
> +acceptable modifiers, it must allocate the buffer. As there is no universal
> +buffer-allocation interface available at either kernel or userspace level, the
> +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
> +a media API.
> +
> +Each allocation request must take, at a minimum: the pixel format, a list of
> +acceptable modifiers, and the buffer's width and height. Each API may extend
> +this set of properties in different ways, such as allowing allocation in more
> +than two dimensions, intended usage patterns, etc.
> +
> +The component which allocates the buffer will make an arbitrary choice of what
> +it considers the 'best' modifier within the acceptable list for the requested
> +allocation, any padding required, and further properties of the underlying
> +memory buffers such as whether they are stored in system or device-specific
> +memory, whether or not they are physically contiguous, and their cache mode.
> +These properties of the memory buffer are not visible to userspace, however the
> +`dma-heaps` API is an effort to address this.
> +
> +After allocation, the client must query the allocator to determine the actual
> +modifier selected for the buffer, as well as the per-plane offset and stride.
> +Allocators are not permitted to vary the format in use, to select a modifier not
> +provided within the acceptable list, nor to vary the pixel dimensions other than
> +the padding expressed through offset, stride, and size.
> +
> +
> +Import
> +======
> +
> +To use a buffer within a different context, device, or subsystem, the user
> +passes these parameters (format, modifier, width, height, and per-plane offset
> +and stride) to an importing API.
> +
> +Each memory plane is referred to by a buffer handle, which may be unique or
> +duplicated within a buffer. For example, a `DRM_FORMAT_NV12` buffer may have the
> +luma and chroma buffers combined into a single memory buffer by use of the
> +per-plane offset parameters, or they may be completely separate allocations in
> +memory. For this reason, each import and allocation API must provide a separate
> +handle for each plane.
> +
> +Each kernel subsystem has its own types and interfaces for buffer management.
> +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
> +are not portable between contexts, processes, devices, or subsystems.
> +
> +To address this, `dma-buf` handles are used as the universal interchange for
> +buffers. Subsystem-specific operations are used to export native buffer handles
> +to a `dma-buf` file descriptor, and to import those file descriptors into a
> +native buffer handle. dma-buf file descriptors can be transferred between
> +contexts, processes, devices, and subsystems.
> +
> +For example, a Wayland media player may use V4L2 to decode a video frame into
> +a `DRM_FORMAT_NV12` buffer. This will result in two memory planes (luma and
> +chroma) being dequeued by the user from V4L2. These planes are then exported to
> +one dma-buf file descriptor per plane, these descriptors are then sent along
> +with the metadata (format, modifier, width, height, per-plane offset and stride)
> +to the Wayland server. The Wayland server will then import these file
> +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
> +through Vulkan, or a `drm_fb` for use through KMS; each of these import
> +operations will take the same metadata and convert the dma-buf file descriptors
> +into their native buffer handles.
> +
> +
> +Implicit modifiers
> +==================
> +
> +The concept of modifiers post-dates all of the subsystems mentioned above. As
> +such, it has been retrofitted into all of these APIs, and in order to ensure
> +backwards compatibility, support is needed for drivers and userspace which do
> +not (yet) support modifiers.
> +
> +As an example, GBM is used to allocate buffers to be shared between EGL for
> +rendering and KMS for display. It has two entrypoints for allocating buffers:
> +`gbm_bo_create` which only takes the format, width, height, and a usage token,
> +and `gbm_bo_create_with_modifiers` which extends this with a list of modifiers.
> +
> +In the latter case, the allocation is as discussed above, being provided with a
> +list of acceptable modifiers that the implementation can choose from (or fail if
> +it is not possible to allocate within those constraints). In the former case
> +where modifiers are not provided, the GBM implementation must make its own
> +choice as to what is likely to be the 'best' layout. Such a choice is entirely
> +implementation-specific: some will internally use tiled layouts which are not
> +CPU-accessible if the implementation decides that is a good idea through
> +whatever heuristic. It is the implementation's responsibility to ensure that
> +this choice is appropriate.
> +
> +To support this case where the layout is not known because there is no awareness
> +of modifiers, a special `DRM_FORMAT_MOD_INVALID` token has been defined. This
> +pseudo-modifier declares that the layout is not known, and that the driver
> +should use its own logic to determine what the underlying layout may be.
> +
> +There are four cases where this token may be used:
> +  - during enumeration, an interface may return `DRM_FORMAT_MOD_INVALID`, either
> +    as the sole member of a modifier list to declare that explicit modifiers are
> +    not supported, or as part of a larger list to declare that implicit modifiers
> +    may be used
> +  - during allocation, a user may supply `DRM_FORMAT_MOD_INVALID`, either as the
> +    sole member of a modifier list (equivalent to not supplying a modifier list
> +    at all) to declare that explicit modifiers are not supported and must not be
> +    used, or as part of a larger list to declare that an allocation using implicit
> +    modifiers is acceptable
> +  - in a post-allocation query, an implementation may return
> +    `DRM_FORMAT_MOD_INVALID` as the modifier of the allocated buffer to declare
> +    that the underlying layout is implementation-defined and that an explicit
> +    modifier description is not available; per the above rules, this may only be
> +    returned when the user has included `DRM_FORMAT_MOD_INVALID` as part of the
> +    list of acceptable modifiers, or not provided a list
> +  - when importing a buffer, the user may supply `DRM_FORMAT_MOD_INVALID` as the
> +    buffer modifier (or not supply a modifier) to indicate that the modifier is
> +    unknown for whatever reason; this is only acceptable when the buffer has
> +    not been allocated with an explicit modifier
> +
> +It follows from this that a buffer chain must be either fully implicit or fully
> +explicit. For example, if a user wishes to allocate a buffer for use between
> +GPU, display, and media, but the media API does not support modifiers, then the
> +user **must not** allocate the buffer with explicit modifiers and attempt to
> +import the buffer into the media API with no modifier, but either perform the
> +allocation using implicit modifiers, or allocate the buffer for media use
> +separately and copy between the two buffers.
> +
> +As one exception to the above, allocations may be 'upgraded' from implicit
> +to explicit modifiers. For example, if the buffer is allocated with
> +`gbm_bo_create` (taking no modifiers), the user may then query the modifier with
> +`gbm_bo_get_modifier` and then use this modifier as an explicit modifier token
> +if a valid modifier is returned.
> +
> +When allocating buffers for exchange between different users and modifiers are
> +not available, implementations are strongly encouraged to use
> +`DRM_FORMAT_MOD_LINEAR` for their allocation, as this is the universal baseline
> +for exchange.
> +
> +Any new users - userspace programs and protocols, kernel subsystems, etc -
> +wishing to exchange buffers must offer interoperability through dma-buf file
> +descriptors for memory planes, DRM format tokens to describe the format, DRM
> +format modifiers to describe the layout in memory, at least width and height for
> +dimensions, and at least offset and stride for each memory plane.

I think it would be good to also cover the opens here. Specifically how to
shovel around additional constraints, which are mostly an issue for
LINEAR. Stuff like offset/stride alignment and size limitations. Usually
once you have a modifier those are all implied (except maybe for size
maximums).

I think listening these as opens would be good, for completeness.

Otherwise looks great to me.
-Daniel

> diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
> index b9c1214d8f23..cb12f2654ed7 100644
> --- a/Documentation/gpu/index.rst
> +++ b/Documentation/gpu/index.rst
> @@ -10,6 +10,7 @@ Linux GPU Driver Developer's Guide
>     drm-kms
>     drm-kms-helpers
>     drm-uapi
> +   exchanging-pixel-buffers
>     driver-uapi
>     drm-client
>     drivers
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-09-06 12:28 ` Simon Ser
@ 2021-11-09  0:18   ` James Jones
  2021-11-09  9:13     ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: James Jones @ 2021-11-09  0:18 UTC (permalink / raw)
  To: Simon Ser, Daniel Stone; +Cc: dri-devel

On 9/6/21 5:28 AM, Simon Ser wrote:
>> Since there's a lot of confusion around this, document both the rules
>> and the best practice around negotiating, allocating, importing, and
>> using buffers when crossing context/process/device/subsystem boundaries.
>>
>> This ties up all of dmabuf, formats and modifiers, and their usage.
>>
>> Signed-off-by: Daniel Stone <daniels@collabora.com>
> 
> Thanks a lot for this write-up! This looks very good to me, a few comments
> below.

Agreed, it would be awesome if this were merged somewhere. IMHO, a lot 
of the non-trivial/typo suggestions below could be taken care of as 
follow-on patches, as the content here is better in than out, even if it 
could be clarified a bit.

Further feedback inline:

>> ---
>>
>> This is just a quick first draft, inspired by:
>>    https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
>>
>> It's not complete or perfect, but I'm off to eat a roast then have a
>> nice walk in the sun, so figured it'd be better to dash it off rather
>> than let it rot on my hard drive.
>>
>>
>>   .../gpu/exchanging-pixel-buffers.rst          | 285 ++++++++++++++++++
>>   Documentation/gpu/index.rst                   |   1 +
>>   2 files changed, 286 insertions(+)
>>   create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
>>
>> diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst b/Documentation/gpu/exchanging-pixel-buffers.rst
>> new file mode 100644
>> index 000000000000..75c4de13d5c8
>> --- /dev/null
>> +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
>> @@ -0,0 +1,285 @@
>> +.. Copyright 2021 Collabora Ltd.
>> +
>> +========================
>> +Exchanging pixel buffers
>> +========================
>> +
>> +As originally designed, the Linux graphics subsystem had extremely limited
>> +support for sharing pixel-buffer allocations between processes, devices, and
>> +subsystems. Modern systems require extensive integration between all three
>> +classes; this document details how applications and kernel subsystems should
>> +approach this sharing for two-dimensional image data.
>> +
>> +It is written with reference to the DRM subsystem for GPU and display devices,
>> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
>> +support, however any other subsystems should also follow this design and advice.
>> +
>> +
>> +Formats and modifiers
>> +=====================
>> +
>> +Each buffer must have an underlying format. This format describes the data which
>> +can be stored and loaded for each pixel. Although each subsystem has its own
>> +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be
> 
> RST uses double backticks for inline code blocks (applies to the whole document).
> 
>> +reused wherever possible, as they are the standard descriptions used for
>> +interchange.
> 
> Maybe mention that the canonical source of formats and modifiers can be found
> in include/uapi/drm/drm_fourcc.h.
> 
>> +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
>> +the translation between one or more pixels in memory, and the color data
>> +contained within that memory. The number and type of color channels are
> 
> Pekka uses the term "color value", which I find a bit better than repeating
> "data".
> 
>> +described: whether they are RGB or YUV, integer or floating-point, the size
>> +of each channel and their locations within the pixel memory, and the
>> +relationship between color planes.
>> +
>> +For example, `DRM_FORMAT_ARGB8888` describes a format in which each pixel has a
>> +single 32-bit value in memory. Alpha, red, green, and blue, color channels are
>> +available at 8-byte precision per channel, ordered respectively from most to
>> +least significant bits in little-endian storage. As a more complex example,
>> +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
>> +stored in separate memory planes, where the chroma plane is stored at half the
>> +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 2x2
>> +pixel grouping).
>> +
>> +Format modifiers describe a translation mechanism between these per-pixel memory
>> +samples, and the actual memory storage for the buffer. The most straightforward
>> +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel has
>> +contiguous storage beginning at (0,0); each pixel's location in memory will be
>> +`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
>> +format, and most convenient for CPU access.
> 
> Hm, maybe in more simple terms we could explain that the pixels are stored
> sequentially row-by-row from the top-left corner to the bottom-right one?

I wouldn't mention top-left. I'm not clear DRM_FORMAT_MOD_LINEAR 
excludes GL-style bottom-left-oriented images.

> Maybe we can drop the "base" from the formula and say that each pixel's
> location in memory will be at offset `y * stride + x * bpp`? Or maybe this is
> confusing with offset being mentioned below as an additional parameter?
> 
>> +Modern hardware employs much more sophisticated access mechanisms, typically
>> +making use of tiled access and possibly also compression. For example, the
>> +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
>> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
>> +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
>> +stores pixels (4,0) to (7,3) inclusive.
>> +
>> +Some modifiers may modify the number of memory buffers required to store the
> 
> Hm. I think that mentioning a "memory buffer" here is a bit confusing. It seems
> like this document is about exchanging "pixel buffers", each being composed of
> one or more "memory buffers". Maybe we can use "image" instead of "buffer" for
> the higher-concept of "bunch of pixel values which can be displayed on screen"?
> That would align with user-space APIs like Vulkan and EGL.
> 
>> +data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
>> +memory buffer to RGB formats in which it stores data about the status of every
>> +tile, notably including whether the tile is fully populated with pixel data, or
>> +can be expanded from a single solid color.
> 
> Is it a requirement that these two memory planes must be separate memory buffers
> for I915_FORMAT_MOD_Y_TILED_CCS?

I think a few decisions need to be made here:

- Can the general statement be made that separate memory planes (term 
used above) can always be either separate allocations or offsets within 
one or more allocations?

- Can this auxiliary, modifier-specific data always be used with the 
same semantics as an image plane?

If the answer to both is yes, I think the best way to describe 
modifier-specific planes would just be to generalize the memory plane 
language above and note "some modifiers introduce additional planes," 
rather than trying to describe auxiliary data as a separate concept. 
Then, the whole discussion about plane offsets in the dimension and size 
section below will clearly apply to auxiliary planes as well.

>> +These extended layouts are highly vendor-specific, and even specific to
>> +particular generations or configurations of devices per-vendor. For this reason,
>> +support of modifiers must be explicitly enumerated and negotiated by all users
>> +in order to ensure a compatible and optimal pipeline, as discussed below.
>> +
>> +
>> +Dimensions and size
>> +===================
>> +
>> +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
>> +to the number of unique samples which can be extracted from, or stored to, the
>> +underlying memory storage. For example, even though a 1920x1080
>> +`DRM_FORMAT_NV12` buffer has a luma plane containing 1920x1080 samples for the Y
>> +component, and 960x540 samples for the U and V components, the overall buffer is
>> +still described as having dimensions of 1920x1080.
>> +
>> +The in-memory storage of a buffer is not guaranteed to begin immediately at the
>> +base address of the underlying memory, nor is it guaranteed that the memory
>> +storage is tightly clipped to either dimension.
>> +
>> +Each plane must therefore be described with an `offset` in bytes, which will be
>> +added to the base address of the memory storage before performing any per-pixel
>> +calculations. This may be used to combine multiple planes into a single pixel
>> +buffer; for example, `DRM_FORMAT_NV12` may be stored in a single memory buffer
>> +where the luma plane's storage begins immediately at the start of the buffer
>> +with an offset of 0, and the chroma plane's storage begins after the offset of
>> +the luma plane as expressed through its offset.
> 
> "and the chroma plane's storage follows, with its offset set to the size of the
> preceding luma plane"
> 
> is maybe a bit clearer?
> 
>> +Each plane must also have a `stride` in bytes, expressing the offset in memory
>> +between two contiguous scanlines. For example, a `DRM_FORMAT_MOD_LINEAR` buffer
> 
> Is "scanline" a better word than "row"? I personally find "row" a bit more
> descriptive, but maybe "scanline" is technically more accurate.

scanline is a scanout-specific term IMHO.  I agree "row" is more natural 
for a generalized discussion.

>> +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
>> +order to allow for aligned access patterns. In this case, the buffer will still
>> +be described with a width of 1000, however the stride will be `1024 * bpp`,
>> +indicating that there are 24 pixels at the positive extreme of the x axis whose
>> +values are not significant.
>> +
>> +Buffers may also be padded further in the y dimension, simply by allocating a
>> +larger area than would ordinarily be required. For example, many media decoders
>> +are not able to natively output buffers of height 1080, but instead require an
>> +effective height of 1088 pixels. In this case, the buffer continues to be
>> +described as having a height of 1080, with the memory allocation for each buffer
>> +being increased to account for the extra padding.
>> +
>> +
>> +Enumeration
>> +===========
>> +
>> +Every user of pixel buffers must be able to enumerate a set of supported formats
>> +and modifiers, described together. Within KMS, this is achieved with the
>> +`IN_FORMATS` property on each DRM plane, listing the supported DRM formats, and
>> +the modifiers supported for each format. In userspace, this is supported through
>> +the `EGL_EXT_image_dma_buf_import_modifiers` extension entrypoints for EGL, the
>> +`VK_EXT_image_drm_format_modifier` extension for Vulkan, and the
>> +`zwp_linux_dmabuf_v1` extension for Wayland.
>> +
>> +Each of these interfaces allows users to query a set of supported
>> +format+modifier combinations.
>> +
>> +Negotiation
>> +===========
>> +
>> +It is the responsibility of userspace to negotiate an acceptable format+modifier
>> +combination for its usage. This is performed through a simple intersection of
>> +lists. For example, if a user wants to use Vulkan to render an image to be
>> +displayed on a KMS plane, it must:
>> +  - query KMS for the `IN_FORMATS` property for the given plane
>> +  - query Vulkan for the supported formats for its physical device
> 
> … with the right VkImageUsageFlagBits and VkImageCreateFlagBits set? (Just to
> make it clear the lists really depend on usage.)

Agreed. Very subtle and very easy to mess this up given the structure of 
the Vulkan API, so worth pointing out explicitly.

>> +  - intersect these formats to determine the most appropriate one
>> +  - for this format, intersect the lists of supported modifiers for both KMS and
>> +    Vulkan, to obtain a final list of acceptable modifiers for that format
>> +
>> +This intersection must be performed for all usages. For example, if the user
>> +also wishes to encode the image to a video stream, it must query the media API
>> +it intends to use for encoding for the set of modifiers it supports, and
>> +additionally intersect against this list.
>> +
>> +If the intersection of all lists is an empty list, it is not possible to share
>> +buffers in this way, and an alternate strategy must be considered (e.g. using
>> +CPU access routines to copy data between the different uses, with the
>> +corresponding performance cost).
>> +
>> +The resulting modifier list is unsorted; the order is not significant.

I think it's also worth pointing out that because the list is unsorted, 
selection of a final modifier from the resulting list is best left to 
drivers, which may have more information available than the modifier 
list represents on its own. E.g., don't pass in (&modifiers[0], 1), pass 
in (modifiers, <count>) and let the allocator pick its favorite for the 
specified local usage. This is especially true of APIs like Vulkan that 
allow you to specify the local usage in great detail.

>> +
>> +Allocation
>> +==========
>> +
>> +Once userspace has determined an appropriate format, and corresponding list of
>> +acceptable modifiers, it must allocate the buffer. As there is no universal
>> +buffer-allocation interface available at either kernel or userspace level, the
>> +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
>> +a media API.

Extending the thought above, once some sort of constraints API is worked 
out, the advice should probably be to allocate using the API with the 
most expressive usage whenever possible, but it's premature to recommend 
that right now.

>> +
>> +Each allocation request must take, at a minimum: the pixel format, a list of
>> +acceptable modifiers, and the buffer's width and height. Each API may extend
>> +this set of properties in different ways, such as allowing allocation in more
>> +than two dimensions, intended usage patterns, etc.
>> +
>> +The component which allocates the buffer will make an arbitrary choice of what
>> +it considers the 'best' modifier within the acceptable list for the requested
>> +allocation, any padding required, and further properties of the underlying
>> +memory buffers such as whether they are stored in system or device-specific
>> +memory, whether or not they are physically contiguous, and their cache mode.
>> +These properties of the memory buffer are not visible to userspace, however the
>> +`dma-heaps` API is an effort to address this.
>> +
>> +After allocation, the client must query the allocator to determine the actual
>> +modifier selected for the buffer, as well as the per-plane offset and stride.
>> +Allocators are not permitted to vary the format in use, to select a modifier not
>> +provided within the acceptable list, nor to vary the pixel dimensions other than
>> +the padding expressed through offset, stride, and size.
>> +
>> +
>> +Import
>> +======
>> +
>> +To use a buffer within a different context, device, or subsystem, the user
>> +passes these parameters (format, modifier, width, height, and per-plane offset
>> +and stride) to an importing API.
>> +
>> +Each memory plane is referred to by a buffer handle, which may be unique or
>> +duplicated within a buffer. For example, a `DRM_FORMAT_NV12` buffer may have the
>> +luma and chroma buffers combined into a single memory buffer by use of the
>> +per-plane offset parameters, or they may be completely separate allocations in
>> +memory. For this reason, each import and allocation API must provide a separate
>> +handle for each plane.
> 
> Vulkan doesn't quite do this, by default it only allows one memory buffer per
> pixel buffer, and requires the driver to implement an additional extension when
> the image is "disjoint". Later on, should we mention the inode as a way to
> figure out whether all DMA-BUFs refer to the same memory buffer? Or maybe it's
> better to mention that in the Vulkan docs…

Examining inodes doesn't seem like a Vulkan-specific concept. However, 
it doesn't seem specific to format modifiers either. Should that be 
mentioned in the dmabuf docs?

>> +Each kernel subsystem has its own types and interfaces for buffer management.
>> +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
>> +are not portable between contexts, processes, devices, or subsystems.
>> +
>> +To address this, `dma-buf` handles are used as the universal interchange for
>> +buffers. Subsystem-specific operations are used to export native buffer handles
>> +to a `dma-buf` file descriptor, and to import those file descriptors into a
>> +native buffer handle. dma-buf file descriptors can be transferred between
>> +contexts, processes, devices, and subsystems.
>> +
>> +For example, a Wayland media player may use V4L2 to decode a video frame into
>> +a `DRM_FORMAT_NV12` buffer. This will result in two memory planes (luma and
>> +chroma) being dequeued by the user from V4L2. These planes are then exported to
>> +one dma-buf file descriptor per plane, these descriptors are then sent along
>> +with the metadata (format, modifier, width, height, per-plane offset and stride)
>> +to the Wayland server. The Wayland server will then import these file
>> +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
>> +through Vulkan, or a `drm_fb` for use through KMS; each of these import
>> +operations will take the same metadata and convert the dma-buf file descriptors
>> +into their native buffer handles.
> 
> It would be nice to mention that even if the intersected modifier list wasn't
> empty, the import can fail if the buffer doesn't have the right constraints for
> the intended usage (e.g. bad alignment).

Agreed. In general, is there any guarantee that device A can import an 
arbitrary dma-buf FD? It seems to me it can fail for various reasons, 
and in addition, mapping it to some specific usage on that device during 
the import itself or some subsequent operation can also fail for the 
reasons mentioned above.

>> +
>> +Implicit modifiers
>> +==================
>> +
>> +The concept of modifiers post-dates all of the subsystems mentioned above. As
>> +such, it has been retrofitted into all of these APIs, and in order to ensure
>> +backwards compatibility, support is needed for drivers and userspace which do
>> +not (yet) support modifiers.
>> +
>> +As an example, GBM is used to allocate buffers to be shared between EGL for
>> +rendering and KMS for display. It has two entrypoints for allocating buffers:
>> +`gbm_bo_create` which only takes the format, width, height, and a usage token,
>> +and `gbm_bo_create_with_modifiers` which extends this with a list of modifiers.
>> +
>> +In the latter case, the allocation is as discussed above, being provided with a
>> +list of acceptable modifiers that the implementation can choose from (or fail if
>> +it is not possible to allocate within those constraints). In the former case
>> +where modifiers are not provided, the GBM implementation must make its own
>> +choice as to what is likely to be the 'best' layout. Such a choice is entirely
>> +implementation-specific: some will internally use tiled layouts which are not
>> +CPU-accessible if the implementation decides that is a good idea through
>> +whatever heuristic. It is the implementation's responsibility to ensure that
>> +this choice is appropriate.
>> +
>> +To support this case where the layout is not known because there is no awareness
>> +of modifiers, a special `DRM_FORMAT_MOD_INVALID` token has been defined. This
>> +pseudo-modifier declares that the layout is not known, and that the driver
>> +should use its own logic to determine what the underlying layout may be.
> 
> Just to drive the point home, maybe mention explicitly that INVALID != LINEAR?

Agreed. Obvious to grey beards, easy error when first approaching these 
concepts.

>> +There are four cases where this token may be used:
>> +  - during enumeration, an interface may return `DRM_FORMAT_MOD_INVALID`, either
>> +    as the sole member of a modifier list to declare that explicit modifiers are
>> +    not supported, or as part of a larger list to declare that implicit modifiers
>> +    may be used
>> +  - during allocation, a user may supply `DRM_FORMAT_MOD_INVALID`, either as the
>> +    sole member of a modifier list (equivalent to not supplying a modifier list
>> +    at all) to declare that explicit modifiers are not supported and must not be
>> +    used, or as part of a larger list to declare that an allocation using implicit
>> +    modifiers is acceptable
>> +  - in a post-allocation query, an implementation may return
>> +    `DRM_FORMAT_MOD_INVALID` as the modifier of the allocated buffer to declare
>> +    that the underlying layout is implementation-defined and that an explicit
>> +    modifier description is not available; per the above rules, this may only be
>> +    returned when the user has included `DRM_FORMAT_MOD_INVALID` as part of the
>> +    list of acceptable modifiers, or not provided a list
>> +  - when importing a buffer, the user may supply `DRM_FORMAT_MOD_INVALID` as the
>> +    buffer modifier (or not supply a modifier) to indicate that the modifier is
>> +    unknown for whatever reason; this is only acceptable when the buffer has
>> +    not been allocated with an explicit modifier
> 
> These are good rules, but only Wayland uses them. For instance GBM will ignore
> INVALID in modifier lists, and iirc KMS will error out if INVALID is supplied
> at import time?

While I've observed a few other exceptions myself, this is unfortunate, 
as inconsistency here is what prompted this work in the first place. 
Would it be possible to work towards making these rules universal, or is 
the current behavior too ingrained in the ABIs/protocols/etc.?

>> +It follows from this that a buffer chain must be either fully implicit or fully
>> +explicit. For example, if a user wishes to allocate a buffer for use between
>> +GPU, display, and media, but the media API does not support modifiers, then the
>> +user **must not** allocate the buffer with explicit modifiers and attempt to
>> +import the buffer into the media API with no modifier, but either perform the
>> +allocation using implicit modifiers, or allocate the buffer for media use
>> +separately and copy between the two buffers.
>> +
>> +As one exception to the above, allocations may be 'upgraded' from implicit
>> +to explicit modifiers. For example, if the buffer is allocated with
>> +`gbm_bo_create` (taking no modifiers), the user may then query the modifier with
>> +`gbm_bo_get_modifier` and then use this modifier as an explicit modifier token
>> +if a valid modifier is returned.
> 
> Hm, I wonder if there's a good use-case for this upgrade? I feel like things
> would be simpler without the exception.

IIRC, the Tegra Mesa driver relied on this "upgrade" to allocate buffers 
from the non-modifier-aware nouveau gallium driver it layers on top of 
and map them into the modifier-aware tegra-drm driver later, as there 
was no way to share implicit layout information between the nouveau and 
tegra-drm drivers in the kernel. However, I think trying to perform the 
same "upgrade" in the modesetting X driver caused issues with my nouveau 
format modifier patches for reasons I don't recall at the moment.

>> +When allocating buffers for exchange between different users and modifiers are
>> +not available, implementations are strongly encouraged to use
>> +`DRM_FORMAT_MOD_LINEAR` for their allocation, as this is the universal baseline
>> +for exchange.
> 
> Maybe spell out that "users" may mean different APIs or different devices.
> Sharing a pixel buffer between two separate devices via GBM will only work
> if USE_LINEAR is provided.

Yes, I always try to differentiate between actual users (people) and 
applications/components/libraries/etc. (code).

>> +Any new users - userspace programs and protocols, kernel subsystems, etc -
>> +wishing to exchange buffers must offer interoperability through dma-buf file
>> +descriptors for memory planes, DRM format tokens to describe the format, DRM
>> +format modifiers to describe the layout in memory, at least width and height for
>> +dimensions, and at least offset and stride for each memory plane.
>> diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
>> index b9c1214d8f23..cb12f2654ed7 100644
>> --- a/Documentation/gpu/index.rst
>> +++ b/Documentation/gpu/index.rst
>> @@ -10,6 +10,7 @@ Linux GPU Driver Developer's Guide
>>      drm-kms
>>      drm-kms-helpers
>>      drm-uapi
>> +   exchanging-pixel-buffers
>>      driver-uapi
>>      drm-client
>>      drivers
>> --
>> 2.31.1

Thanks again for writing this all up.

-James

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-09-08  9:44   ` Simon Ser
@ 2021-11-09  0:21     ` James Jones
  2021-11-09  9:12       ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: James Jones @ 2021-11-09  0:21 UTC (permalink / raw)
  To: Simon Ser, Pekka Paalanen; +Cc: Robert Beckett, Daniel Stone, dri-devel

On 9/8/21 2:44 AM, Simon Ser wrote:
>> stride
>> 	????
> 
> I think what's clear is:
> 
> - Per-plane property
> - In bytes
> - Offset between two consecutive rows
> 
> How that applies to weird YUV formats is the tricky question…
> 
>> Btw. there was a fun argument whether the same modifier value could
>> mean different things on different devices. There were also arguments
>> that a certain modifier could reference additional implicit memory on
>> the device - memory that can only be accessed by very specific devices.
>>
>> I think AMLOGIC_FBC_LAYOUT_SCATTER was one of those.
> 
> A recent exmaple of this is [1].
> 
> [1]: https://patchwork.freedesktop.org/patch/452461/

What was the resolution to that argument?  It took some fiddling to get 
the NV format modifiers to be robust enough that they actually do 
differentiate "identical" layouts that actually mismatch between devices 
(E.g., some of our SoC GPUs interpret layouts differently than our 
discrete GPUs, so that's reflected in the format modifier-building macro 
and hence applications can properly deduce that they can *not* share 
images directly between these devices, but can share between two similar 
discrete GPUs), so I hope the modifier definition allows that. 
Cross-device sharing using tiled formats in machines with multiple 
similar NV GPUs was an important use case for modifiers on our side.

Thanks,
-James

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-11-09  0:21     ` James Jones
@ 2021-11-09  9:12       ` Daniel Vetter
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2021-11-09  9:12 UTC (permalink / raw)
  To: James Jones; +Cc: Robert Beckett, Daniel Stone, dri-devel

On Mon, Nov 08, 2021 at 04:21:04PM -0800, James Jones wrote:
> On 9/8/21 2:44 AM, Simon Ser wrote:
> > > stride
> > > 	????
> > 
> > I think what's clear is:
> > 
> > - Per-plane property
> > - In bytes
> > - Offset between two consecutive rows
> > 
> > How that applies to weird YUV formats is the tricky question…
> > 
> > > Btw. there was a fun argument whether the same modifier value could
> > > mean different things on different devices. There were also arguments
> > > that a certain modifier could reference additional implicit memory on
> > > the device - memory that can only be accessed by very specific devices.
> > > 
> > > I think AMLOGIC_FBC_LAYOUT_SCATTER was one of those.
> > 
> > A recent exmaple of this is [1].
> > 
> > [1]: https://patchwork.freedesktop.org/patch/452461/
> 
> What was the resolution to that argument?  It took some fiddling to get the
> NV format modifiers to be robust enough that they actually do differentiate
> "identical" layouts that actually mismatch between devices (E.g., some of
> our SoC GPUs interpret layouts differently than our discrete GPUs, so that's
> reflected in the format modifier-building macro and hence applications can
> properly deduce that they can *not* share images directly between these
> devices, but can share between two similar discrete GPUs), so I hope the
> modifier definition allows that. Cross-device sharing using tiled formats in
> machines with multiple similar NV GPUs was an important use case for
> modifiers on our side.

Imo it boils down to "past mistakes don't justify continued screw-ups" or
so :-) As in, we really should make sure we make them unique if they
differ between platforms.

I think the only ok exception is if the compression uses some special
memory/buffer and hence the buffer simply cannot be exported to another
device. Or at least not any device which doesn't have access to that
special memory (and hence by necessity of being part of the same SoC or
interconnect probably knows what's going on anyway).

Another one is r/ed drivers, especially when baked into a given soc, were
it's just a bit too hard to fully figure out the layout everywhere (and
also kinda a waste of time).

But yeah it would be good to document in drm_fourcc.h that a) we screwed
up in the past and b) we shouldn't, at least not for anything that can be
used in discrete gpus.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-11-09  0:18   ` James Jones
@ 2021-11-09  9:13     ` Daniel Vetter
  2021-11-09  9:22       ` Simon Ser
  2023-08-03 15:46       ` Daniel Stone
  0 siblings, 2 replies; 23+ messages in thread
From: Daniel Vetter @ 2021-11-09  9:13 UTC (permalink / raw)
  To: James Jones; +Cc: Daniel Stone, dri-devel

On Mon, Nov 08, 2021 at 04:18:22PM -0800, James Jones wrote:
> On 9/6/21 5:28 AM, Simon Ser wrote:
> > > Since there's a lot of confusion around this, document both the rules
> > > and the best practice around negotiating, allocating, importing, and
> > > using buffers when crossing context/process/device/subsystem boundaries.
> > > 
> > > This ties up all of dmabuf, formats and modifiers, and their usage.
> > > 
> > > Signed-off-by: Daniel Stone <daniels@collabora.com>
> > 
> > Thanks a lot for this write-up! This looks very good to me, a few comments
> > below.
> 
> Agreed, it would be awesome if this were merged somewhere. IMHO, a lot of
> the non-trivial/typo suggestions below could be taken care of as follow-on
> patches, as the content here is better in than out, even if it could be
> clarified a bit.

Seconded on just landing this without trying to perfect it first, because
I was just looking for it and didn't find it anywhere :-/
-Daniel

> 
> Further feedback inline:
> 
> > > ---
> > > 
> > > This is just a quick first draft, inspired by:
> > >    https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
> > > 
> > > It's not complete or perfect, but I'm off to eat a roast then have a
> > > nice walk in the sun, so figured it'd be better to dash it off rather
> > > than let it rot on my hard drive.
> > > 
> > > 
> > >   .../gpu/exchanging-pixel-buffers.rst          | 285 ++++++++++++++++++
> > >   Documentation/gpu/index.rst                   |   1 +
> > >   2 files changed, 286 insertions(+)
> > >   create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
> > > 
> > > diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst b/Documentation/gpu/exchanging-pixel-buffers.rst
> > > new file mode 100644
> > > index 000000000000..75c4de13d5c8
> > > --- /dev/null
> > > +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
> > > @@ -0,0 +1,285 @@
> > > +.. Copyright 2021 Collabora Ltd.
> > > +
> > > +========================
> > > +Exchanging pixel buffers
> > > +========================
> > > +
> > > +As originally designed, the Linux graphics subsystem had extremely limited
> > > +support for sharing pixel-buffer allocations between processes, devices, and
> > > +subsystems. Modern systems require extensive integration between all three
> > > +classes; this document details how applications and kernel subsystems should
> > > +approach this sharing for two-dimensional image data.
> > > +
> > > +It is written with reference to the DRM subsystem for GPU and display devices,
> > > +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> > > +support, however any other subsystems should also follow this design and advice.
> > > +
> > > +
> > > +Formats and modifiers
> > > +=====================
> > > +
> > > +Each buffer must have an underlying format. This format describes the data which
> > > +can be stored and loaded for each pixel. Although each subsystem has its own
> > > +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be
> > 
> > RST uses double backticks for inline code blocks (applies to the whole document).
> > 
> > > +reused wherever possible, as they are the standard descriptions used for
> > > +interchange.
> > 
> > Maybe mention that the canonical source of formats and modifiers can be found
> > in include/uapi/drm/drm_fourcc.h.
> > 
> > > +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
> > > +the translation between one or more pixels in memory, and the color data
> > > +contained within that memory. The number and type of color channels are
> > 
> > Pekka uses the term "color value", which I find a bit better than repeating
> > "data".
> > 
> > > +described: whether they are RGB or YUV, integer or floating-point, the size
> > > +of each channel and their locations within the pixel memory, and the
> > > +relationship between color planes.
> > > +
> > > +For example, `DRM_FORMAT_ARGB8888` describes a format in which each pixel has a
> > > +single 32-bit value in memory. Alpha, red, green, and blue, color channels are
> > > +available at 8-byte precision per channel, ordered respectively from most to
> > > +least significant bits in little-endian storage. As a more complex example,
> > > +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
> > > +stored in separate memory planes, where the chroma plane is stored at half the
> > > +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 2x2
> > > +pixel grouping).
> > > +
> > > +Format modifiers describe a translation mechanism between these per-pixel memory
> > > +samples, and the actual memory storage for the buffer. The most straightforward
> > > +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel has
> > > +contiguous storage beginning at (0,0); each pixel's location in memory will be
> > > +`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
> > > +format, and most convenient for CPU access.
> > 
> > Hm, maybe in more simple terms we could explain that the pixels are stored
> > sequentially row-by-row from the top-left corner to the bottom-right one?
> 
> I wouldn't mention top-left. I'm not clear DRM_FORMAT_MOD_LINEAR excludes
> GL-style bottom-left-oriented images.
> 
> > Maybe we can drop the "base" from the formula and say that each pixel's
> > location in memory will be at offset `y * stride + x * bpp`? Or maybe this is
> > confusing with offset being mentioned below as an additional parameter?
> > 
> > > +Modern hardware employs much more sophisticated access mechanisms, typically
> > > +making use of tiled access and possibly also compression. For example, the
> > > +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
> > > +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
> > > +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
> > > +stores pixels (4,0) to (7,3) inclusive.
> > > +
> > > +Some modifiers may modify the number of memory buffers required to store the
> > 
> > Hm. I think that mentioning a "memory buffer" here is a bit confusing. It seems
> > like this document is about exchanging "pixel buffers", each being composed of
> > one or more "memory buffers". Maybe we can use "image" instead of "buffer" for
> > the higher-concept of "bunch of pixel values which can be displayed on screen"?
> > That would align with user-space APIs like Vulkan and EGL.
> > 
> > > +data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
> > > +memory buffer to RGB formats in which it stores data about the status of every
> > > +tile, notably including whether the tile is fully populated with pixel data, or
> > > +can be expanded from a single solid color.
> > 
> > Is it a requirement that these two memory planes must be separate memory buffers
> > for I915_FORMAT_MOD_Y_TILED_CCS?
> 
> I think a few decisions need to be made here:
> 
> - Can the general statement be made that separate memory planes (term used
> above) can always be either separate allocations or offsets within one or
> more allocations?
> 
> - Can this auxiliary, modifier-specific data always be used with the same
> semantics as an image plane?
> 
> If the answer to both is yes, I think the best way to describe
> modifier-specific planes would just be to generalize the memory plane
> language above and note "some modifiers introduce additional planes," rather
> than trying to describe auxiliary data as a separate concept. Then, the
> whole discussion about plane offsets in the dimension and size section below
> will clearly apply to auxiliary planes as well.
> 
> > > +These extended layouts are highly vendor-specific, and even specific to
> > > +particular generations or configurations of devices per-vendor. For this reason,
> > > +support of modifiers must be explicitly enumerated and negotiated by all users
> > > +in order to ensure a compatible and optimal pipeline, as discussed below.
> > > +
> > > +
> > > +Dimensions and size
> > > +===================
> > > +
> > > +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
> > > +to the number of unique samples which can be extracted from, or stored to, the
> > > +underlying memory storage. For example, even though a 1920x1080
> > > +`DRM_FORMAT_NV12` buffer has a luma plane containing 1920x1080 samples for the Y
> > > +component, and 960x540 samples for the U and V components, the overall buffer is
> > > +still described as having dimensions of 1920x1080.
> > > +
> > > +The in-memory storage of a buffer is not guaranteed to begin immediately at the
> > > +base address of the underlying memory, nor is it guaranteed that the memory
> > > +storage is tightly clipped to either dimension.
> > > +
> > > +Each plane must therefore be described with an `offset` in bytes, which will be
> > > +added to the base address of the memory storage before performing any per-pixel
> > > +calculations. This may be used to combine multiple planes into a single pixel
> > > +buffer; for example, `DRM_FORMAT_NV12` may be stored in a single memory buffer
> > > +where the luma plane's storage begins immediately at the start of the buffer
> > > +with an offset of 0, and the chroma plane's storage begins after the offset of
> > > +the luma plane as expressed through its offset.
> > 
> > "and the chroma plane's storage follows, with its offset set to the size of the
> > preceding luma plane"
> > 
> > is maybe a bit clearer?
> > 
> > > +Each plane must also have a `stride` in bytes, expressing the offset in memory
> > > +between two contiguous scanlines. For example, a `DRM_FORMAT_MOD_LINEAR` buffer
> > 
> > Is "scanline" a better word than "row"? I personally find "row" a bit more
> > descriptive, but maybe "scanline" is technically more accurate.
> 
> scanline is a scanout-specific term IMHO.  I agree "row" is more natural for
> a generalized discussion.
> 
> > > +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
> > > +order to allow for aligned access patterns. In this case, the buffer will still
> > > +be described with a width of 1000, however the stride will be `1024 * bpp`,
> > > +indicating that there are 24 pixels at the positive extreme of the x axis whose
> > > +values are not significant.
> > > +
> > > +Buffers may also be padded further in the y dimension, simply by allocating a
> > > +larger area than would ordinarily be required. For example, many media decoders
> > > +are not able to natively output buffers of height 1080, but instead require an
> > > +effective height of 1088 pixels. In this case, the buffer continues to be
> > > +described as having a height of 1080, with the memory allocation for each buffer
> > > +being increased to account for the extra padding.
> > > +
> > > +
> > > +Enumeration
> > > +===========
> > > +
> > > +Every user of pixel buffers must be able to enumerate a set of supported formats
> > > +and modifiers, described together. Within KMS, this is achieved with the
> > > +`IN_FORMATS` property on each DRM plane, listing the supported DRM formats, and
> > > +the modifiers supported for each format. In userspace, this is supported through
> > > +the `EGL_EXT_image_dma_buf_import_modifiers` extension entrypoints for EGL, the
> > > +`VK_EXT_image_drm_format_modifier` extension for Vulkan, and the
> > > +`zwp_linux_dmabuf_v1` extension for Wayland.
> > > +
> > > +Each of these interfaces allows users to query a set of supported
> > > +format+modifier combinations.
> > > +
> > > +Negotiation
> > > +===========
> > > +
> > > +It is the responsibility of userspace to negotiate an acceptable format+modifier
> > > +combination for its usage. This is performed through a simple intersection of
> > > +lists. For example, if a user wants to use Vulkan to render an image to be
> > > +displayed on a KMS plane, it must:
> > > +  - query KMS for the `IN_FORMATS` property for the given plane
> > > +  - query Vulkan for the supported formats for its physical device
> > 
> > … with the right VkImageUsageFlagBits and VkImageCreateFlagBits set? (Just to
> > make it clear the lists really depend on usage.)
> 
> Agreed. Very subtle and very easy to mess this up given the structure of the
> Vulkan API, so worth pointing out explicitly.
> 
> > > +  - intersect these formats to determine the most appropriate one
> > > +  - for this format, intersect the lists of supported modifiers for both KMS and
> > > +    Vulkan, to obtain a final list of acceptable modifiers for that format
> > > +
> > > +This intersection must be performed for all usages. For example, if the user
> > > +also wishes to encode the image to a video stream, it must query the media API
> > > +it intends to use for encoding for the set of modifiers it supports, and
> > > +additionally intersect against this list.
> > > +
> > > +If the intersection of all lists is an empty list, it is not possible to share
> > > +buffers in this way, and an alternate strategy must be considered (e.g. using
> > > +CPU access routines to copy data between the different uses, with the
> > > +corresponding performance cost).
> > > +
> > > +The resulting modifier list is unsorted; the order is not significant.
> 
> I think it's also worth pointing out that because the list is unsorted,
> selection of a final modifier from the resulting list is best left to
> drivers, which may have more information available than the modifier list
> represents on its own. E.g., don't pass in (&modifiers[0], 1), pass in
> (modifiers, <count>) and let the allocator pick its favorite for the
> specified local usage. This is especially true of APIs like Vulkan that
> allow you to specify the local usage in great detail.
> 
> > > +
> > > +Allocation
> > > +==========
> > > +
> > > +Once userspace has determined an appropriate format, and corresponding list of
> > > +acceptable modifiers, it must allocate the buffer. As there is no universal
> > > +buffer-allocation interface available at either kernel or userspace level, the
> > > +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
> > > +a media API.
> 
> Extending the thought above, once some sort of constraints API is worked
> out, the advice should probably be to allocate using the API with the most
> expressive usage whenever possible, but it's premature to recommend that
> right now.
> 
> > > +
> > > +Each allocation request must take, at a minimum: the pixel format, a list of
> > > +acceptable modifiers, and the buffer's width and height. Each API may extend
> > > +this set of properties in different ways, such as allowing allocation in more
> > > +than two dimensions, intended usage patterns, etc.
> > > +
> > > +The component which allocates the buffer will make an arbitrary choice of what
> > > +it considers the 'best' modifier within the acceptable list for the requested
> > > +allocation, any padding required, and further properties of the underlying
> > > +memory buffers such as whether they are stored in system or device-specific
> > > +memory, whether or not they are physically contiguous, and their cache mode.
> > > +These properties of the memory buffer are not visible to userspace, however the
> > > +`dma-heaps` API is an effort to address this.
> > > +
> > > +After allocation, the client must query the allocator to determine the actual
> > > +modifier selected for the buffer, as well as the per-plane offset and stride.
> > > +Allocators are not permitted to vary the format in use, to select a modifier not
> > > +provided within the acceptable list, nor to vary the pixel dimensions other than
> > > +the padding expressed through offset, stride, and size.
> > > +
> > > +
> > > +Import
> > > +======
> > > +
> > > +To use a buffer within a different context, device, or subsystem, the user
> > > +passes these parameters (format, modifier, width, height, and per-plane offset
> > > +and stride) to an importing API.
> > > +
> > > +Each memory plane is referred to by a buffer handle, which may be unique or
> > > +duplicated within a buffer. For example, a `DRM_FORMAT_NV12` buffer may have the
> > > +luma and chroma buffers combined into a single memory buffer by use of the
> > > +per-plane offset parameters, or they may be completely separate allocations in
> > > +memory. For this reason, each import and allocation API must provide a separate
> > > +handle for each plane.
> > 
> > Vulkan doesn't quite do this, by default it only allows one memory buffer per
> > pixel buffer, and requires the driver to implement an additional extension when
> > the image is "disjoint". Later on, should we mention the inode as a way to
> > figure out whether all DMA-BUFs refer to the same memory buffer? Or maybe it's
> > better to mention that in the Vulkan docs…
> 
> Examining inodes doesn't seem like a Vulkan-specific concept. However, it
> doesn't seem specific to format modifiers either. Should that be mentioned
> in the dmabuf docs?
> 
> > > +Each kernel subsystem has its own types and interfaces for buffer management.
> > > +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
> > > +are not portable between contexts, processes, devices, or subsystems.
> > > +
> > > +To address this, `dma-buf` handles are used as the universal interchange for
> > > +buffers. Subsystem-specific operations are used to export native buffer handles
> > > +to a `dma-buf` file descriptor, and to import those file descriptors into a
> > > +native buffer handle. dma-buf file descriptors can be transferred between
> > > +contexts, processes, devices, and subsystems.
> > > +
> > > +For example, a Wayland media player may use V4L2 to decode a video frame into
> > > +a `DRM_FORMAT_NV12` buffer. This will result in two memory planes (luma and
> > > +chroma) being dequeued by the user from V4L2. These planes are then exported to
> > > +one dma-buf file descriptor per plane, these descriptors are then sent along
> > > +with the metadata (format, modifier, width, height, per-plane offset and stride)
> > > +to the Wayland server. The Wayland server will then import these file
> > > +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
> > > +through Vulkan, or a `drm_fb` for use through KMS; each of these import
> > > +operations will take the same metadata and convert the dma-buf file descriptors
> > > +into their native buffer handles.
> > 
> > It would be nice to mention that even if the intersected modifier list wasn't
> > empty, the import can fail if the buffer doesn't have the right constraints for
> > the intended usage (e.g. bad alignment).
> 
> Agreed. In general, is there any guarantee that device A can import an
> arbitrary dma-buf FD? It seems to me it can fail for various reasons, and in
> addition, mapping it to some specific usage on that device during the import
> itself or some subsequent operation can also fail for the reasons mentioned
> above.
> 
> > > +
> > > +Implicit modifiers
> > > +==================
> > > +
> > > +The concept of modifiers post-dates all of the subsystems mentioned above. As
> > > +such, it has been retrofitted into all of these APIs, and in order to ensure
> > > +backwards compatibility, support is needed for drivers and userspace which do
> > > +not (yet) support modifiers.
> > > +
> > > +As an example, GBM is used to allocate buffers to be shared between EGL for
> > > +rendering and KMS for display. It has two entrypoints for allocating buffers:
> > > +`gbm_bo_create` which only takes the format, width, height, and a usage token,
> > > +and `gbm_bo_create_with_modifiers` which extends this with a list of modifiers.
> > > +
> > > +In the latter case, the allocation is as discussed above, being provided with a
> > > +list of acceptable modifiers that the implementation can choose from (or fail if
> > > +it is not possible to allocate within those constraints). In the former case
> > > +where modifiers are not provided, the GBM implementation must make its own
> > > +choice as to what is likely to be the 'best' layout. Such a choice is entirely
> > > +implementation-specific: some will internally use tiled layouts which are not
> > > +CPU-accessible if the implementation decides that is a good idea through
> > > +whatever heuristic. It is the implementation's responsibility to ensure that
> > > +this choice is appropriate.
> > > +
> > > +To support this case where the layout is not known because there is no awareness
> > > +of modifiers, a special `DRM_FORMAT_MOD_INVALID` token has been defined. This
> > > +pseudo-modifier declares that the layout is not known, and that the driver
> > > +should use its own logic to determine what the underlying layout may be.
> > 
> > Just to drive the point home, maybe mention explicitly that INVALID != LINEAR?
> 
> Agreed. Obvious to grey beards, easy error when first approaching these
> concepts.
> 
> > > +There are four cases where this token may be used:
> > > +  - during enumeration, an interface may return `DRM_FORMAT_MOD_INVALID`, either
> > > +    as the sole member of a modifier list to declare that explicit modifiers are
> > > +    not supported, or as part of a larger list to declare that implicit modifiers
> > > +    may be used
> > > +  - during allocation, a user may supply `DRM_FORMAT_MOD_INVALID`, either as the
> > > +    sole member of a modifier list (equivalent to not supplying a modifier list
> > > +    at all) to declare that explicit modifiers are not supported and must not be
> > > +    used, or as part of a larger list to declare that an allocation using implicit
> > > +    modifiers is acceptable
> > > +  - in a post-allocation query, an implementation may return
> > > +    `DRM_FORMAT_MOD_INVALID` as the modifier of the allocated buffer to declare
> > > +    that the underlying layout is implementation-defined and that an explicit
> > > +    modifier description is not available; per the above rules, this may only be
> > > +    returned when the user has included `DRM_FORMAT_MOD_INVALID` as part of the
> > > +    list of acceptable modifiers, or not provided a list
> > > +  - when importing a buffer, the user may supply `DRM_FORMAT_MOD_INVALID` as the
> > > +    buffer modifier (or not supply a modifier) to indicate that the modifier is
> > > +    unknown for whatever reason; this is only acceptable when the buffer has
> > > +    not been allocated with an explicit modifier
> > 
> > These are good rules, but only Wayland uses them. For instance GBM will ignore
> > INVALID in modifier lists, and iirc KMS will error out if INVALID is supplied
> > at import time?
> 
> While I've observed a few other exceptions myself, this is unfortunate, as
> inconsistency here is what prompted this work in the first place. Would it
> be possible to work towards making these rules universal, or is the current
> behavior too ingrained in the ABIs/protocols/etc.?
> 
> > > +It follows from this that a buffer chain must be either fully implicit or fully
> > > +explicit. For example, if a user wishes to allocate a buffer for use between
> > > +GPU, display, and media, but the media API does not support modifiers, then the
> > > +user **must not** allocate the buffer with explicit modifiers and attempt to
> > > +import the buffer into the media API with no modifier, but either perform the
> > > +allocation using implicit modifiers, or allocate the buffer for media use
> > > +separately and copy between the two buffers.
> > > +
> > > +As one exception to the above, allocations may be 'upgraded' from implicit
> > > +to explicit modifiers. For example, if the buffer is allocated with
> > > +`gbm_bo_create` (taking no modifiers), the user may then query the modifier with
> > > +`gbm_bo_get_modifier` and then use this modifier as an explicit modifier token
> > > +if a valid modifier is returned.
> > 
> > Hm, I wonder if there's a good use-case for this upgrade? I feel like things
> > would be simpler without the exception.
> 
> IIRC, the Tegra Mesa driver relied on this "upgrade" to allocate buffers
> from the non-modifier-aware nouveau gallium driver it layers on top of and
> map them into the modifier-aware tegra-drm driver later, as there was no way
> to share implicit layout information between the nouveau and tegra-drm
> drivers in the kernel. However, I think trying to perform the same "upgrade"
> in the modesetting X driver caused issues with my nouveau format modifier
> patches for reasons I don't recall at the moment.
> 
> > > +When allocating buffers for exchange between different users and modifiers are
> > > +not available, implementations are strongly encouraged to use
> > > +`DRM_FORMAT_MOD_LINEAR` for their allocation, as this is the universal baseline
> > > +for exchange.
> > 
> > Maybe spell out that "users" may mean different APIs or different devices.
> > Sharing a pixel buffer between two separate devices via GBM will only work
> > if USE_LINEAR is provided.
> 
> Yes, I always try to differentiate between actual users (people) and
> applications/components/libraries/etc. (code).
> 
> > > +Any new users - userspace programs and protocols, kernel subsystems, etc -
> > > +wishing to exchange buffers must offer interoperability through dma-buf file
> > > +descriptors for memory planes, DRM format tokens to describe the format, DRM
> > > +format modifiers to describe the layout in memory, at least width and height for
> > > +dimensions, and at least offset and stride for each memory plane.
> > > diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
> > > index b9c1214d8f23..cb12f2654ed7 100644
> > > --- a/Documentation/gpu/index.rst
> > > +++ b/Documentation/gpu/index.rst
> > > @@ -10,6 +10,7 @@ Linux GPU Driver Developer's Guide
> > >      drm-kms
> > >      drm-kms-helpers
> > >      drm-uapi
> > > +   exchanging-pixel-buffers
> > >      driver-uapi
> > >      drm-client
> > >      drivers
> > > --
> > > 2.31.1
> 
> Thanks again for writing this all up.
> 
> -James

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-11-09  9:13     ` Daniel Vetter
@ 2021-11-09  9:22       ` Simon Ser
  2023-08-03 15:46       ` Daniel Stone
  1 sibling, 0 replies; 23+ messages in thread
From: Simon Ser @ 2021-11-09  9:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: James Jones, Daniel Stone, dri-devel

On Tuesday, November 9th, 2021 at 10:13, Daniel Vetter <daniel@ffwll.ch> wrote:

> On Mon, Nov 08, 2021 at 04:18:22PM -0800, James Jones wrote:
> > On 9/6/21 5:28 AM, Simon Ser wrote:
> > > > Since there's a lot of confusion around this, document both the rules
> > > > and the best practice around negotiating, allocating, importing, and
> > > > using buffers when crossing context/process/device/subsystem boundaries.
> > > >
> > > > This ties up all of dmabuf, formats and modifiers, and their usage.
> > > >
> > > > Signed-off-by: Daniel Stone <daniels@collabora.com>
> > >
> > > Thanks a lot for this write-up! This looks very good to me, a few comments
> > > below.
> >
> > Agreed, it would be awesome if this were merged somewhere. IMHO, a lot of
> > the non-trivial/typo suggestions below could be taken care of as follow-on
> > patches, as the content here is better in than out, even if it could be
> > clarified a bit.
>
> Seconded on just landing this without trying to perfect it first, because
> I was just looking for it and didn't find it anywhere :-/

Let me know if you lack time for this daniels, I can work on a new version.
I don't want this to be lost in review limbo!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2021-11-09  9:13     ` Daniel Vetter
  2021-11-09  9:22       ` Simon Ser
@ 2023-08-03 15:46       ` Daniel Stone
  2023-08-03 15:46         ` Daniel Stone
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Stone @ 2023-08-03 15:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: James Jones, Daniel Stone, dri-devel

On Tue, 9 Nov 2021 at 09:13, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Nov 08, 2021 at 04:18:22PM -0800, James Jones wrote:
> > On 9/6/21 5:28 AM, Simon Ser wrote:
> > > > Since there's a lot of confusion around this, document both the rules
> > > > and the best practice around negotiating, allocating, importing, and
> > > > using buffers when crossing context/process/device/subsystem boundaries.
> > > >
> > > > This ties up all of dmabuf, formats and modifiers, and their usage.
> > > >
> > > > Signed-off-by: Daniel Stone <daniels@collabora.com>
> > >
> > > Thanks a lot for this write-up! This looks very good to me, a few comments
> > > below.
> >
> > Agreed, it would be awesome if this were merged somewhere. IMHO, a lot of
> > the non-trivial/typo suggestions below could be taken care of as follow-on
> > patches, as the content here is better in than out, even if it could be
> > clarified a bit.
>
> Seconded on just landing this without trying to perfect it first, because
> I was just looking for it and didn't find it anywhere :-/

Swing and a miss ...

I've just sent out v2 with - AFAICT - all the changes from all

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] doc: gpu: Add document describing buffer exchange
  2023-08-03 15:46       ` Daniel Stone
@ 2023-08-03 15:46         ` Daniel Stone
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Stone @ 2023-08-03 15:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: James Jones, Daniel Stone, dri-devel

On Thu, 3 Aug 2023 at 16:46, Daniel Stone <daniel@fooishbar.org> wrote:
> On Tue, 9 Nov 2021 at 09:13, Daniel Vetter <daniel@ffwll.ch> wrote:
> > Seconded on just landing this without trying to perfect it first, because
> > I was just looking for it and didn't find it anywhere :-/
>
> Swing and a miss ...
>
> I've just sent out v2 with - AFAICT - all the changes from all

.. all of you in this thread. Thanks a lot for the review!

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics
  2021-09-05 12:27 [PATCH] doc: gpu: Add document describing buffer exchange Daniel Stone
                   ` (3 preceding siblings ...)
  2021-09-08 18:16 ` Daniel Vetter
@ 2023-08-03 15:47 ` Daniel Stone
  2023-08-03 19:47   ` James Jones
                     ` (2 more replies)
  2023-08-03 15:47 ` [PATCH v2 1/2] doc: dma-buf: Rewrite intro section a little Daniel Stone
  2023-08-03 15:47 ` [PATCH v2 2/2] doc: uapi: Add document describing dma-buf semantics Daniel Stone
  6 siblings, 3 replies; 23+ messages in thread
From: Daniel Stone @ 2023-08-03 15:47 UTC (permalink / raw)
  To: dri-devel; +Cc: linaro-mm-sig, linux-media

Hi all,
This is v2 to the linked patch series; thanks to everyone for reviewing
the initial version. I've moved this out of a pure DRM scope and into
the general userspace-API design section. Hopefully it helps others and
answers a bunch of questions.

I think it'd be great to have input/links/reflections from other
subsystems as well here.

Cheers,
Daniel



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/2] doc: dma-buf: Rewrite intro section a little
  2021-09-05 12:27 [PATCH] doc: gpu: Add document describing buffer exchange Daniel Stone
                   ` (4 preceding siblings ...)
  2023-08-03 15:47 ` [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics Daniel Stone
@ 2023-08-03 15:47 ` Daniel Stone
  2023-08-03 15:47 ` [PATCH v2 2/2] doc: uapi: Add document describing dma-buf semantics Daniel Stone
  6 siblings, 0 replies; 23+ messages in thread
From: Daniel Stone @ 2023-08-03 15:47 UTC (permalink / raw)
  To: dri-devel

Make it a little bit more clear what's going on and fix some formatting.

Signed-off-by: Daniel Stone <daniels@collabora.com>
---
 Documentation/driver-api/dma-buf.rst | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

v2: New.

diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index f92a32d095d9..862dbc2759d0 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -5,14 +5,22 @@ The dma-buf subsystem provides the framework for sharing buffers for
 hardware (DMA) access across multiple device drivers and subsystems, and
 for synchronizing asynchronous hardware access.
 
-This is used, for example, by drm "prime" multi-GPU support, but is of
-course not limited to GPU use cases.
-
-The three main components of this are: (1) dma-buf, representing a
-sg_table and exposed to userspace as a file descriptor to allow passing
-between devices, (2) fence, which provides a mechanism to signal when
-one device has finished access, and (3) reservation, which manages the
-shared or exclusive fence(s) associated with the buffer.
+As an example, it is used extensively by the DRM subsystem to exchange
+buffers between processes, contexts, library APIs within the same
+process, and also to exchange buffers with other subsystems such as
+V4L2.
+
+This document describes the way in which kernel subsystems can use and
+interact with the three main primitives offered by dma-buf:
+
+ - dma-buf, representing a sg_table and exposed to userspace as a file
+   descriptor to allow passing between processes, subsystems, devices,
+   etc;
+ - dma-fence, providing a mechanism to signal when an asynchronous
+   hardware operation has completed; and
+ - dma-resv, which manages a set of dma-fences for a particular dma-buf
+   allowing implicit (kernel-ordered) synchronization of work to
+   preserve the illusion of coherent access
 
 Shared DMA Buffers
 ------------------
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/2] doc: uapi: Add document describing dma-buf semantics
  2021-09-05 12:27 [PATCH] doc: gpu: Add document describing buffer exchange Daniel Stone
                   ` (5 preceding siblings ...)
  2023-08-03 15:47 ` [PATCH v2 1/2] doc: dma-buf: Rewrite intro section a little Daniel Stone
@ 2023-08-03 15:47 ` Daniel Stone
  2023-08-18 15:37   ` [v2,2/2] " suijingfeng
  2023-08-21 13:33   ` [PATCH v2 2/2] " Daniel Vetter
  6 siblings, 2 replies; 23+ messages in thread
From: Daniel Stone @ 2023-08-03 15:47 UTC (permalink / raw)
  To: dri-devel

Since there's a lot of confusion around this, document both the rules
and the best practice around negotiating, allocating, importing, and
using buffers when crossing context/process/device/subsystem boundaries.

This ties up all of dma-buf, formats and modifiers, and their usage.

Signed-off-by: Daniel Stone <daniels@collabora.com>
---
 Documentation/driver-api/dma-buf.rst          |   8 +
 Documentation/gpu/drm-uapi.rst                |   7 +
 .../userspace-api/dma-buf-alloc-exchange.rst  | 384 ++++++++++++++++++
 Documentation/userspace-api/index.rst         |   1 +
 4 files changed, 400 insertions(+)
 create mode 100644 Documentation/userspace-api/dma-buf-alloc-exchange.rst

v2:
 - Moved to general uAPI section, cross-referenced from dma-buf/DRM
 - Added Pekka's suggested glossary with some small changes
 - Cleanups and clarifications from Simon and James

diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 862dbc2759d0..0c153d79ccc4 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -22,6 +22,14 @@ interact with the three main primitives offered by dma-buf:
    allowing implicit (kernel-ordered) synchronization of work to
    preserve the illusion of coherent access
 
+
+Userspace API principles and use
+--------------------------------
+
+For more details on how to design your subsystem's API for dma-buf use, please
+see Documentation/userspace-api/dma-buf-alloc-exchange.rst.
+
+
 Shared DMA Buffers
 ------------------
 
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 65fb3036a580..eef5fd19bc92 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -486,3 +486,10 @@ and the CRTC index is its position in this array.
 
 .. kernel-doc:: include/uapi/drm/drm_mode.h
    :internal:
+
+
+dma-buf interoperability
+========================
+
+Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
+information on how dma-buf is integrated and exposed within DRM.
diff --git a/Documentation/userspace-api/dma-buf-alloc-exchange.rst b/Documentation/userspace-api/dma-buf-alloc-exchange.rst
new file mode 100644
index 000000000000..090453d2ad78
--- /dev/null
+++ b/Documentation/userspace-api/dma-buf-alloc-exchange.rst
@@ -0,0 +1,384 @@
+.. Copyright 2021-2023 Collabora Ltd.
+
+========================
+Exchanging pixel buffers
+========================
+
+As originally designed, the Linux graphics subsystem had extremely limited
+support for sharing pixel-buffer allocations between processes, devices, and
+subsystems. Modern systems require extensive integration between all three
+classes; this document details how applications and kernel subsystems should
+approach this sharing for two-dimensional image data.
+
+It is written with reference to the DRM subsystem for GPU and display devices,
+V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
+support, however any other subsystems should also follow this design and advice.
+
+
+Glossary of terms
+=================
+
+.. glossary::
+
+    image:
+      Conceptually a two-dimensional array of pixels. The pixels may be stored
+      in one or more memory buffers. Has width and height in pixels, pixel
+      format and modifier (implicit or explicit).
+
+    row:
+      A span along a single y-axis value, e.g. from co-ordinates (0,100) to
+      (200,100).
+
+    scanline:
+      Synonym for row.
+
+    column:
+      A span along a single x-axis value, e.g. from co-ordinates (100,0) to
+      (100,100).
+
+    memory buffer:
+      A piece of memory for storing (parts of) pixel data. Has stride and size
+      in bytes and at least one handle in some API. May contain one or more
+      planes.
+
+    plane:
+      A two-dimensional array of some or all of an image's color and alpha
+      channel values.
+
+    pixel:
+      A picture element. Has a single color value which is defined by one or
+      more color channels values, e.g. R, G and B, or Y, Cb and Cr. May also
+      have an alpha value as an additional channel.
+
+    pixel data:
+      Bytes or bits that represent some or all of the color/alpha channel values
+      of a pixel or an image. The data for one pixel may be spread over several
+      planes or memory buffers depending on format and modifier.
+
+    color value:
+      A tuple of numbers, representing a color. Each element in the tuple is a
+      color channel value.
+
+    color channel:
+      One of the dimensions in a color model. For example, RGB model has
+      channels R, G, and B. Alpha channel is sometimes counted as a color
+      channel as well.
+
+    pixel format:
+      A description of how pixel data represents the pixel's color and alpha
+      values.
+
+    modifier:
+      A description of how pixel data is laid out in memory buffers.
+
+    alpha:
+      A value that denotes the color coverage in a pixel. Sometimes used for
+      translucency instead.
+
+    stride:
+      A value that denotes the relationship between pixel-location co-ordinates
+      and byte-offset values. Typically used as the byte offset between two
+      pixels at the start of vertically-consecutive tiling blocks. For linear
+      layouts, the byte offset between two vertically-adjacent pixels.
+
+    pitch:
+      Synonym for stride.
+
+
+Formats and modifiers
+=====================
+
+Each buffer must have an underlying format. This format describes the color
+values provided for each pixel. Although each subsystem has its own format
+descriptions (e.g. V4L2 and fbdev), the ``DRM_FORMAT_*`` tokens should be reused
+wherever possible, as they are the standard descriptions used for interchange.
+These tokens are described in the ``drm_fourcc.h`` file, which is a part of
+DRM's uAPI.
+
+Each ``DRM_FORMAT_*`` token describes the translation between a pixel
+co-ordinate in an image, and the color values for that pixel contained within
+its memory buffers. The number and type of color channels are described:
+whether they are RGB or YUV, integer or floating-point, the size of each channel
+and their locations within the pixel memory, and the relationship between color
+planes.
+
+For example, ``DRM_FORMAT_ARGB8888`` describes a format in which each pixel has
+a single 32-bit value in memory. Alpha, red, green, and blue, color channels are
+available at 8-bit precision per channel, ordered respectively from most to
+least significant bits in little-endian storage. ``DRM_FORMAT_*`` is not
+affected by either CPU or device endianness; the byte pattern in memory is
+always as described in the format definition, which is usually little-endian.
+
+As a more complex example, ``DRM_FORMAT_NV12`` describes a format in which luma
+and chroma YUV samples are stored in separate planes, where the chroma plane is
+stored at half the resolution in both dimensions (i.e. one U/V chroma
+sample is stored for each 2x2 pixel grouping).
+
+Format modifiers describe a translation mechanism between these per-pixel memory
+samples, and the actual memory storage for the buffer. The most straightforward
+modifier is ``DRM_FORMAT_MOD_LINEAR``, describing a scheme in which each plane
+is laid out row-sequentially, from the top-left to the bottom-right corner.
+This is considered the baseline interchange format, and most convenient for CPU
+access.
+
+Modern hardware employs much more sophisticated access mechanisms, typically
+making use of tiled access and possibly also compression. For example, the
+``DRM_FORMAT_MOD_VIVANTE_TILED`` modifier describes memory storage where pixels
+are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
+a plane stores pixels (0,0) to (3,3) inclusive, and the second tile in a plane
+stores pixels (4,0) to (7,3) inclusive.
+
+Some modifiers may modify the number of planes required for an image; for
+example, the ``I915_FORMAT_MOD_Y_TILED_CCS`` modifier adds a second plane to RGB
+formats in which it stores data about the status of every tile, notably
+including whether the tile is fully populated with pixel data, or can be
+expanded from a single solid color.
+
+These extended layouts are highly vendor-specific, and even specific to
+particular generations or configurations of devices per-vendor. For this reason,
+support of modifiers must be explicitly enumerated and negotiated by all users
+in order to ensure a compatible and optimal pipeline, as discussed below.
+
+
+Dimensions and size
+===================
+
+Each pixel buffer must be accompanied by logical pixel dimensions. This refers
+to the number of unique samples which can be extracted from, or stored to, the
+underlying memory storage. For example, even though a 1920x1080
+``DRM_FORMAT_NV12`` buffer has a luma plane containing 1920x1080 samples for the Y
+component, and 960x540 samples for the U and V components, the overall buffer is
+still described as having dimensions of 1920x1080.
+
+The in-memory storage of a buffer is not guaranteed to begin immediately at the
+base address of the underlying memory, nor is it guaranteed that the memory
+storage is tightly clipped to either dimension.
+
+Each plane must therefore be described with an ``offset`` in bytes, which will be
+added to the base address of the memory storage before performing any per-pixel
+calculations. This may be used to combine multiple planes into a single memory
+buffer; for example, ``DRM_FORMAT_NV12`` may be stored in a single memory buffer
+where the luma plane's storage begins immediately at the start of the buffer
+with an offset of 0, and the chroma plane's storage follows within the same buffer
+beginning from the byte offset for that plane.
+
+Each plane must also have a ``stride`` in bytes, expressing the offset in memory
+between two contiguous row. For example, a ``DRM_FORMAT_MOD_LINEAR`` buffer
+with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
+order to allow for aligned access patterns. In this case, the buffer will still
+be described with a width of 1000, however the stride will be ``1024 * bpp``,
+indicating that there are 24 pixels at the positive extreme of the x axis whose
+values are not significant.
+
+Buffers may also be padded further in the y dimension, simply by allocating a
+larger area than would ordinarily be required. For example, many media decoders
+are not able to natively output buffers of height 1080, but instead require an
+effective height of 1088 pixels. In this case, the buffer continues to be
+described as having a height of 1080, with the memory allocation for each buffer
+being increased to account for the extra padding.
+
+
+Enumeration
+===========
+
+Every user of pixel buffers must be able to enumerate a set of supported formats
+and modifiers, described together. Within KMS, this is achieved with the
+``IN_FORMATS`` property on each DRM plane, listing the supported DRM formats, and
+the modifiers supported for each format. In userspace, this is supported through
+the `EGL_EXT_image_dma_buf_import_modifiers`_ extension entrypoints for EGL, the
+`VK_EXT_image_drm_format_modifier`_ extension for Vulkan, and the
+`zwp_linux_dmabuf_v1`_ extension for Wayland.
+
+Each of these interfaces allows users to query a set of supported
+format+modifier combinations.
+
+
+Negotiation
+===========
+
+It is the responsibility of userspace to negotiate an acceptable format+modifier
+combination for its usage. This is performed through a simple intersection of
+lists. For example, if a user wants to use Vulkan to render an image to be
+displayed on a KMS plane, it must:
+
+ - query KMS for the ``IN_FORMATS`` property for the given plane
+ - query Vulkan for the supported formats for its physical device, making sure
+   to pass the ``VkImageUsageFlagBits`` and ``VkImageCreateFlagBits``
+   corresponding to the intended rendering use
+ - intersect these formats to determine the most appropriate one
+ - for this format, intersect the lists of supported modifiers for both KMS and
+   Vulkan, to obtain a final list of acceptable modifiers for that format
+
+This intersection must be performed for all usages. For example, if the user
+also wishes to encode the image to a video stream, it must query the media API
+it intends to use for encoding for the set of modifiers it supports, and
+additionally intersect against this list.
+
+If the intersection of all lists is an empty list, it is not possible to share
+buffers in this way, and an alternate strategy must be considered (e.g. using
+CPU access routines to copy data between the different uses, with the
+corresponding performance cost).
+
+The resulting modifier list is unsorted; the order is not significant.
+
+
+Allocation
+==========
+
+Once userspace has determined an appropriate format, and corresponding list of
+acceptable modifiers, it must allocate the buffer. As there is no universal
+buffer-allocation interface available at either kernel or userspace level, the
+client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
+a media API.
+
+Each allocation request must take, at a minimum: the pixel format, a list of
+acceptable modifiers, and the buffer's width and height. Each API may extend
+this set of properties in different ways, such as allowing allocation in more
+than two dimensions, intended usage patterns, etc.
+
+The component which allocates the buffer will make an arbitrary choice of what
+it considers the 'best' modifier within the acceptable list for the requested
+allocation, any padding required, and further properties of the underlying
+memory buffers such as whether they are stored in system or device-specific
+memory, whether or not they are physically contiguous, and their cache mode.
+These properties of the memory buffer are not visible to userspace, however the
+``dma-heaps`` API is an effort to address this.
+
+After allocation, the client must query the allocator to determine the actual
+modifier selected for the buffer, as well as the per-plane offset and stride.
+Allocators are not permitted to vary the format in use, to select a modifier not
+provided within the acceptable list, nor to vary the pixel dimensions other than
+the padding expressed through offset, stride, and size.
+
+Communicating additional constraints, such as alignment of stride or offset,
+placement within a particular memory area, etc, is out of scope of dma-buf,
+and is not solved by format and modifier tokens.
+
+
+Import
+======
+
+To use a buffer within a different context, device, or subsystem, the user
+passes these parameters (format, modifier, width, height, and per-plane offset
+and stride) to an importing API.
+
+Each memory buffer is referred to by a buffer handle, which may be unique or
+duplicated within an image. For example, a ``DRM_FORMAT_NV12`` buffer may have
+the luma and chroma buffers combined into a single memory buffer by use of the
+per-plane offset parameters, or they may be completely separate allocations in
+memory. For this reason, each import and allocation API must provide a separate
+handle for each plane.
+
+Each kernel subsystem has its own types and interfaces for buffer management.
+DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
+are not portable between contexts, processes, devices, or subsystems.
+
+To address this, ``dma-buf`` handles are used as the universal interchange for
+buffers. Subsystem-specific operations are used to export native buffer handles
+to a ``dma-buf`` file descriptor, and to import those file descriptors into a
+native buffer handle. dma-buf file descriptors can be transferred between
+contexts, processes, devices, and subsystems.
+
+For example, a Wayland media player may use V4L2 to decode a video frame into a
+``DRM_FORMAT_NV12`` buffer. This will result in two memory planes (luma and
+chroma) being dequeued by the user from V4L2. These planes are then exported to
+one dma-buf file descriptor per plane, these descriptors are then sent along
+with the metadata (format, modifier, width, height, per-plane offset and stride)
+to the Wayland server. The Wayland server will then import these file
+descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
+through Vulkan, or a KMS framebuffer object; each of these import operations
+will take the same metadata and convert the dma-buf file descriptors into their
+native buffer handles.
+
+Having a non-empty intersection of supported modifiers does not guarantee that
+import will succeed into all consumers; they may have constraints beyond those
+impliied by modifiers which must be satisfied.
+
+
+Implicit modifiers
+==================
+
+The concept of modifiers post-dates all of the subsystems mentioned above. As
+such, it has been retrofitted into all of these APIs, and in order to ensure
+backwards compatibility, support is needed for drivers and userspace which do
+not (yet) support modifiers.
+
+As an example, GBM is used to allocate buffers to be shared between EGL for
+rendering and KMS for display. It has two entrypoints for allocating buffers:
+``gbm_bo_create`` which only takes the format, width, height, and a usage token,
+and ``gbm_bo_create_with_modifiers`` which extends this with a list of modifiers.
+
+In the latter case, the allocation is as discussed above, being provided with a
+list of acceptable modifiers that the implementation can choose from (or fail if
+it is not possible to allocate within those constraints). In the former case
+where modifiers are not provided, the GBM implementation must make its own
+choice as to what is likely to be the 'best' layout. Such a choice is entirely
+implementation-specific: some will internally use tiled layouts which are not
+CPU-accessible if the implementation decides that is a good idea through
+whatever heuristic. It is the implementation's responsibility to ensure that
+this choice is appropriate.
+
+To support this case where the layout is not known because there is no awareness
+of modifiers, a special ``DRM_FORMAT_MOD_INVALID`` token has been defined. This
+pseudo-modifier declares that the layout is not known, and that the driver
+should use its own logic to determine what the underlying layout may be.
+
+.. note::
+
+  ``DRM_FORMAT_MOD_INVALID`` is a non-zero value. The modifier value zero is
+  ``DRM_FORMAT_MOD_LINEAR``, which is an explicit guarantee that the image
+  has the linear layout. Care and attention should be taken to ensure that
+  zero as a default uninitialized value signals no modifier.
+
+There are four cases where this token may be used:
+  - during enumeration, an interface may return ``DRM_FORMAT_MOD_INVALID``, either
+    as the sole member of a modifier list to declare that explicit modifiers are
+    not supported, or as part of a larger list to declare that implicit modifiers
+    may be used
+  - during allocation, a user may supply ``DRM_FORMAT_MOD_INVALID``, either as the
+    sole member of a modifier list (equivalent to not supplying a modifier list
+    at all) to declare that explicit modifiers are not supported and must not be
+    used, or as part of a larger list to declare that an allocation using implicit
+    modifiers is acceptable
+  - in a post-allocation query, an implementation may return
+    ``DRM_FORMAT_MOD_INVALID`` as the modifier of the allocated buffer to declare
+    that the underlying layout is implementation-defined and that an explicit
+    modifier description is not available; per the above rules, this may only be
+    returned when the user has included ``DRM_FORMAT_MOD_INVALID`` as part of the
+    list of acceptable modifiers, or not provided a list
+  - when importing a buffer, the user may supply ``DRM_FORMAT_MOD_INVALID`` as the
+    buffer modifier (or not supply a modifier) to indicate that the modifier is
+    unknown for whatever reason; this is only acceptable when the buffer has
+    not been allocated with an explicit modifier
+
+It follows from this that for any single buffer, the complete chain of operations
+formed by the producer and all the consumers must be either fully implicit or fully
+explicit. For example, if a user wishes to allocate a buffer for use between
+GPU, display, and media, but the media API does not support modifiers, then the
+user **must not** allocate the buffer with explicit modifiers and attempt to
+import the buffer into the media API with no modifier, but either perform the
+allocation using implicit modifiers, or allocate the buffer for media use
+separately and copy between the two buffers.
+
+As one exception to the above, allocations may be 'upgraded' from implicit
+to explicit modifiers. For example, if the buffer is allocated with
+``gbm_bo_create`` (taking no modifiers), the user may then query the modifier with
+``gbm_bo_get_modifier`` and then use this modifier as an explicit modifier token
+if a valid modifier is returned.
+
+When allocating buffers for exchange between different users and modifiers are
+not available, implementations are strongly encouraged to use
+``DRM_FORMAT_MOD_LINEAR`` for their allocation, as this is the universal baseline
+for exchange. However, it is not guaranteed that this will result in the correct
+interpretation of buffer content, as implicit modifier operation may still be
+subject to driver-specific heuristics.
+
+Any new users - userspace programs and protocols, kernel subsystems, etc -
+wishing to exchange buffers must offer interoperability through dma-buf file
+descriptors for memory planes, DRM format tokens to describe the format, DRM
+format modifiers to describe the layout in memory, at least width and height for
+dimensions, and at least offset and stride for each memory plane.
+
+.. _zwp_linux_dmabuf_v1: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/blob/main/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
+.. _VK_EXT_image_drm_format_modifier: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_image_drm_format_modifier.html
+.. _EGL_EXT_image_dma_buf_import_modifiers: https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import_modifiers.txt
\ No newline at end of file
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 72a65db0c498..031df47a7c19 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -22,6 +22,7 @@ place where this information is gathered.
    unshare
    spec_ctrl
    accelerators/ocxl
+   dma-buf-alloc-exchange
    ebpf/index
    ELF
    ioctl/index
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics
  2023-08-03 15:47 ` [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics Daniel Stone
@ 2023-08-03 19:47   ` James Jones
  2023-08-03 20:30     ` Sebastian Wick
  2023-08-29 13:30   ` [Linaro-mm-sig] " Christian König
  2 siblings, 0 replies; 23+ messages in thread
From: James Jones @ 2023-08-03 19:47 UTC (permalink / raw)
  To: Daniel Stone, dri-devel; +Cc: linaro-mm-sig, linux-media

On 8/3/23 08:47, Daniel Stone wrote:
> Hi all,
> This is v2 to the linked patch series; thanks to everyone for reviewing
> the initial version. I've moved this out of a pure DRM scope and into
> the general userspace-API design section. Hopefully it helps others and
> answers a bunch of questions.

Again, thanks for writing this up. I think it is great to have all this 
knowledge collected in one place.

For the series:

Reviewed-by: James Jones <jajones@nvidia.com>

> I think it'd be great to have input/links/reflections from other
> subsystems as well here.

Agreed, though I'll reiterate my comment on the v1 series from a few 
years ago: I hope this can be merged relatively soon with additional 
documentation added in follow-up patches as needed. While you can always 
note more interactions, details, etc., everything here appears to be 
correct from my understanding and is strictly an improvement over the 
current lack of documentation.

Thanks,
-James

> Cheers,
> Daniel
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics
  2023-08-03 15:47 ` [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics Daniel Stone
@ 2023-08-03 20:30     ` Sebastian Wick
  2023-08-03 20:30     ` Sebastian Wick
  2023-08-29 13:30   ` [Linaro-mm-sig] " Christian König
  2 siblings, 0 replies; 23+ messages in thread
From: Sebastian Wick @ 2023-08-03 20:30 UTC (permalink / raw)
  To: Daniel Stone; +Cc: linaro-mm-sig, dri-devel, linux-media

For what it's worth this series is

Reviewed-by: Sebastian Wick <sebastian.wick@redhat.com>

On Thu, Aug 3, 2023 at 5:49 PM Daniel Stone <daniels@collabora.com> wrote:
>
> Hi all,
> This is v2 to the linked patch series; thanks to everyone for reviewing
> the initial version. I've moved this out of a pure DRM scope and into
> the general userspace-API design section. Hopefully it helps others and
> answers a bunch of questions.
>
> I think it'd be great to have input/links/reflections from other
> subsystems as well here.
>
> Cheers,
> Daniel
>
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics
@ 2023-08-03 20:30     ` Sebastian Wick
  0 siblings, 0 replies; 23+ messages in thread
From: Sebastian Wick @ 2023-08-03 20:30 UTC (permalink / raw)
  To: Daniel Stone; +Cc: dri-devel, linaro-mm-sig, linux-media

For what it's worth this series is

Reviewed-by: Sebastian Wick <sebastian.wick@redhat.com>

On Thu, Aug 3, 2023 at 5:49 PM Daniel Stone <daniels@collabora.com> wrote:
>
> Hi all,
> This is v2 to the linked patch series; thanks to everyone for reviewing
> the initial version. I've moved this out of a pure DRM scope and into
> the general userspace-API design section. Hopefully it helps others and
> answers a bunch of questions.
>
> I think it'd be great to have input/links/reflections from other
> subsystems as well here.
>
> Cheers,
> Daniel
>
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v2,2/2] doc: uapi: Add document describing dma-buf semantics
  2023-08-03 15:47 ` [PATCH v2 2/2] doc: uapi: Add document describing dma-buf semantics Daniel Stone
@ 2023-08-18 15:37   ` suijingfeng
  2023-08-21 13:33   ` [PATCH v2 2/2] " Daniel Vetter
  1 sibling, 0 replies; 23+ messages in thread
From: suijingfeng @ 2023-08-18 15:37 UTC (permalink / raw)
  To: Daniel Stone, dri-devel

Hi,


On 2023/8/3 23:47, Daniel Stone wrote:
> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and


Probably, best practices?


> using buffers when crossing context/process/device/subsystem boundaries.
>
> This ties up all of dma-buf, formats and modifiers, and their usage.
>
> Signed-off-by: Daniel Stone <daniels@collabora.com>
> ---
>   Documentation/driver-api/dma-buf.rst          |   8 +
>   Documentation/gpu/drm-uapi.rst                |   7 +
>   .../userspace-api/dma-buf-alloc-exchange.rst  | 384 ++++++++++++++++++
>   Documentation/userspace-api/index.rst         |   1 +
>   4 files changed, 400 insertions(+)
>   create mode 100644 Documentation/userspace-api/dma-buf-alloc-exchange.rst
>
> v2:
>   - Moved to general uAPI section, cross-referenced from dma-buf/DRM
>   - Added Pekka's suggested glossary with some small changes
>   - Cleanups and clarifications from Simon and James
>
> diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> index 862dbc2759d0..0c153d79ccc4 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -22,6 +22,14 @@ interact with the three main primitives offered by dma-buf:
>      allowing implicit (kernel-ordered) synchronization of work to
>      preserve the illusion of coherent access
>   
> +
> +Userspace API principles and use
> +--------------------------------
> +
> +For more details on how to design your subsystem's API for dma-buf use, please
> +see Documentation/userspace-api/dma-buf-alloc-exchange.rst.
> +
> +
>   Shared DMA Buffers
>   ------------------
>   
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index 65fb3036a580..eef5fd19bc92 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
> @@ -486,3 +486,10 @@ and the CRTC index is its position in this array.
>   
>   .. kernel-doc:: include/uapi/drm/drm_mode.h
>      :internal:
> +
> +
> +dma-buf interoperability
> +========================
> +
> +Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
> +information on how dma-buf is integrated and exposed within DRM.
> diff --git a/Documentation/userspace-api/dma-buf-alloc-exchange.rst b/Documentation/userspace-api/dma-buf-alloc-exchange.rst
> new file mode 100644
> index 000000000000..090453d2ad78
> --- /dev/null
> +++ b/Documentation/userspace-api/dma-buf-alloc-exchange.rst
> @@ -0,0 +1,384 @@
> +.. Copyright 2021-2023 Collabora Ltd.
> +
> +========================
> +Exchanging pixel buffers
> +========================
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and advice.
> +
> +
> +Glossary of terms
> +=================
> +
> +.. glossary::
> +
> +    image:
> +      Conceptually a two-dimensional array of pixels. The pixels may be stored
> +      in one or more memory buffers. Has width and height in pixels, pixel
> +      format and modifier (implicit or explicit).
> +
> +    row:
> +      A span along a single y-axis value, e.g. from co-ordinates (0,100) to
> +      (200,100).
> +
> +    scanline:
> +      Synonym for row.
> +
> +    column:
> +      A span along a single x-axis value, e.g. from co-ordinates (100,0) to
> +      (100,100).
> +
> +    memory buffer:
> +      A piece of memory for storing (parts of) pixel data. Has stride and size
> +      in bytes and at least one handle in some API. May contain one or more
> +      planes.
> +
> +    plane:
> +      A two-dimensional array of some or all of an image's color and alpha
> +      channel values.
> +
> +    pixel:
> +      A picture element. Has a single color value which is defined by one or
> +      more color channels values, e.g. R, G and B, or Y, Cb and Cr. May also
> +      have an alpha value as an additional channel.
> +
> +    pixel data:
> +      Bytes or bits that represent some or all of the color/alpha channel values
> +      of a pixel or an image. The data for one pixel may be spread over several
> +      planes or memory buffers depending on format and modifier.
> +
> +    color value:
> +      A tuple of numbers, representing a color. Each element in the tuple is a
> +      color channel value.
> +
> +    color channel:
> +      One of the dimensions in a color model. For example, RGB model has
> +      channels R, G, and B. Alpha channel is sometimes counted as a color
> +      channel as well.
> +
> +    pixel format:
> +      A description of how pixel data represents the pixel's color and alpha
> +      values.
> +
> +    modifier:
> +      A description of how pixel data is laid out in memory buffers.
> +
> +    alpha:
> +      A value that denotes the color coverage in a pixel. Sometimes used for
> +      translucency instead.
> +
> +    stride:
> +      A value that denotes the relationship between pixel-location co-ordinates
> +      and byte-offset values. Typically used as the byte offset between two
> +      pixels at the start of vertically-consecutive tiling blocks. For linear
> +      layouts, the byte offset between two vertically-adjacent pixels.
> +
> +    pitch:
> +      Synonym for stride.
> +
> +
> +Formats and modifiers
> +=====================
> +
> +Each buffer must have an underlying format. This format describes the color
> +values provided for each pixel. Although each subsystem has its own format
> +descriptions (e.g. V4L2 and fbdev), the ``DRM_FORMAT_*`` tokens should be reused
> +wherever possible, as they are the standard descriptions used for interchange.
> +These tokens are described in the ``drm_fourcc.h`` file, which is a part of
> +DRM's uAPI.
> +
> +Each ``DRM_FORMAT_*`` token describes the translation between a pixel
> +co-ordinate in an image, and the color values for that pixel contained within
> +its memory buffers. The number and type of color channels are described:
> +whether they are RGB or YUV, integer or floating-point, the size of each channel
> +and their locations within the pixel memory, and the relationship between color
> +planes.
> +
> +For example, ``DRM_FORMAT_ARGB8888`` describes a format in which each pixel has
> +a single 32-bit value in memory. Alpha, red, green, and blue, color channels are
> +available at 8-bit precision per channel, ordered respectively from most to
> +least significant bits in little-endian storage. ``DRM_FORMAT_*`` is not
> +affected by either CPU or device endianness; the byte pattern in memory is
> +always as described in the format definition, which is usually little-endian.
> +
> +As a more complex example, ``DRM_FORMAT_NV12`` describes a format in which luma
> +and chroma YUV samples are stored in separate planes, where the chroma plane is
> +stored at half the resolution in both dimensions (i.e. one U/V chroma
> +sample is stored for each 2x2 pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel memory
> +samples, and the actual memory storage for the buffer. The most straightforward
> +modifier is ``DRM_FORMAT_MOD_LINEAR``, describing a scheme in which each plane
> +is laid out row-sequentially, from the top-left to the bottom-right corner.
> +This is considered the baseline interchange format, and most convenient for CPU
> +access.
> +
> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +``DRM_FORMAT_MOD_VIVANTE_TILED`` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
> +a plane stores pixels (0,0) to (3,3) inclusive, and the second tile in a plane
> +stores pixels (4,0) to (7,3) inclusive.
> +
> +Some modifiers may modify the number of planes required for an image; for
> +example, the ``I915_FORMAT_MOD_Y_TILED_CCS`` modifier adds a second plane to RGB
> +formats in which it stores data about the status of every tile, notably
> +including whether the tile is fully populated with pixel data, or can be
> +expanded from a single solid color.
> +
> +These extended layouts are highly vendor-specific, and even specific to
> +particular generations or configurations of devices per-vendor. For this reason,
> +support of modifiers must be explicitly enumerated and negotiated by all users
> +in order to ensure a compatible and optimal pipeline, as discussed below.
> +
> +
> +Dimensions and size
> +===================
> +
> +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
> +to the number of unique samples which can be extracted from, or stored to, the
> +underlying memory storage. For example, even though a 1920x1080
> +``DRM_FORMAT_NV12`` buffer has a luma plane containing 1920x1080 samples for the Y
> +component, and 960x540 samples for the U and V components, the overall buffer is
> +still described as having dimensions of 1920x1080.
> +
> +The in-memory storage of a buffer is not guaranteed to begin immediately at the
> +base address of the underlying memory, nor is it guaranteed that the memory
> +storage is tightly clipped to either dimension.
> +
> +Each plane must therefore be described with an ``offset`` in bytes, which will be
> +added to the base address of the memory storage before performing any per-pixel
> +calculations. This may be used to combine multiple planes into a single memory
> +buffer; for example, ``DRM_FORMAT_NV12`` may be stored in a single memory buffer
> +where the luma plane's storage begins immediately at the start of the buffer
> +with an offset of 0, and the chroma plane's storage follows within the same buffer
> +beginning from the byte offset for that plane.
> +
> +Each plane must also have a ``stride`` in bytes, expressing the offset in memory
> +between two contiguous row. For example, a ``DRM_FORMAT_MOD_LINEAR`` buffer
> +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
> +order to allow for aligned access patterns. In this case, the buffer will still
> +be described with a width of 1000, however the stride will be ``1024 * bpp``,
> +indicating that there are 24 pixels at the positive extreme of the x axis whose
> +values are not significant.
> +
> +Buffers may also be padded further in the y dimension, simply by allocating a
> +larger area than would ordinarily be required. For example, many media decoders
> +are not able to natively output buffers of height 1080, but instead require an
> +effective height of 1088 pixels. In this case, the buffer continues to be
> +described as having a height of 1080, with the memory allocation for each buffer
> +being increased to account for the extra padding.
> +
> +
> +Enumeration
> +===========
> +
> +Every user of pixel buffers must be able to enumerate a set of supported formats
> +and modifiers, described together. Within KMS, this is achieved with the
> +``IN_FORMATS`` property on each DRM plane, listing the supported DRM formats, and
> +the modifiers supported for each format. In userspace, this is supported through
> +the `EGL_EXT_image_dma_buf_import_modifiers`_ extension entrypoints for EGL, the
> +`VK_EXT_image_drm_format_modifier`_ extension for Vulkan, and the
> +`zwp_linux_dmabuf_v1`_ extension for Wayland.
> +
> +Each of these interfaces allows users to query a set of supported
> +format+modifier combinations.
> +
> +
> +Negotiation
> +===========
> +
> +It is the responsibility of userspace to negotiate an acceptable format+modifier
> +combination for its usage. This is performed through a simple intersection of
> +lists. For example, if a user wants to use Vulkan to render an image to be
> +displayed on a KMS plane, it must:
> +
> + - query KMS for the ``IN_FORMATS`` property for the given plane
> + - query Vulkan for the supported formats for its physical device, making sure
> +   to pass the ``VkImageUsageFlagBits`` and ``VkImageCreateFlagBits``
> +   corresponding to the intended rendering use
> + - intersect these formats to determine the most appropriate one
> + - for this format, intersect the lists of supported modifiers for both KMS and
> +   Vulkan, to obtain a final list of acceptable modifiers for that format
> +
> +This intersection must be performed for all usages. For example, if the user
> +also wishes to encode the image to a video stream, it must query the media API
> +it intends to use for encoding for the set of modifiers it supports, and
> +additionally intersect against this list.
> +
> +If the intersection of all lists is an empty list, it is not possible to share
> +buffers in this way, and an alternate strategy must be considered (e.g. using
> +CPU access routines to copy data between the different uses, with the
> +corresponding performance cost).
> +
> +The resulting modifier list is unsorted; the order is not significant.
> +
> +
> +Allocation
> +==========
> +
> +Once userspace has determined an appropriate format, and corresponding list of
> +acceptable modifiers, it must allocate the buffer. As there is no universal
> +buffer-allocation interface available at either kernel or userspace level, the
> +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
> +a media API.
> +
> +Each allocation request must take, at a minimum: the pixel format, a list of
> +acceptable modifiers, and the buffer's width and height. Each API may extend
> +this set of properties in different ways, such as allowing allocation in more
> +than two dimensions, intended usage patterns, etc.
> +
> +The component which allocates the buffer will make an arbitrary choice of what
> +it considers the 'best' modifier within the acceptable list for the requested
> +allocation, any padding required, and further properties of the underlying
> +memory buffers such as whether they are stored in system or device-specific
> +memory, whether or not they are physically contiguous, and their cache mode.
> +These properties of the memory buffer are not visible to userspace, however the
> +``dma-heaps`` API is an effort to address this.
> +
> +After allocation, the client must query the allocator to determine the actual
> +modifier selected for the buffer, as well as the per-plane offset and stride.
> +Allocators are not permitted to vary the format in use, to select a modifier not
> +provided within the acceptable list, nor to vary the pixel dimensions other than
> +the padding expressed through offset, stride, and size.
> +
> +Communicating additional constraints, such as alignment of stride or offset,
> +placement within a particular memory area, etc, is out of scope of dma-buf,
> +and is not solved by format and modifier tokens.
> +
> +
> +Import
> +======
> +
> +To use a buffer within a different context, device, or subsystem, the user
> +passes these parameters (format, modifier, width, height, and per-plane offset
> +and stride) to an importing API.
> +
> +Each memory buffer is referred to by a buffer handle, which may be unique or
> +duplicated within an image. For example, a ``DRM_FORMAT_NV12`` buffer may have
> +the luma and chroma buffers combined into a single memory buffer by use of the
> +per-plane offset parameters, or they may be completely separate allocations in
> +memory. For this reason, each import and allocation API must provide a separate
> +handle for each plane.
> +
> +Each kernel subsystem has its own types and interfaces for buffer management.
> +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
> +are not portable between contexts, processes, devices, or subsystems.
> +
> +To address this, ``dma-buf`` handles are used as the universal interchange for
> +buffers. Subsystem-specific operations are used to export native buffer handles
> +to a ``dma-buf`` file descriptor, and to import those file descriptors into a
> +native buffer handle. dma-buf file descriptors can be transferred between
> +contexts, processes, devices, and subsystems.
> +
> +For example, a Wayland media player may use V4L2 to decode a video frame into a
> +``DRM_FORMAT_NV12`` buffer. This will result in two memory planes (luma and
> +chroma) being dequeued by the user from V4L2. These planes are then exported to
> +one dma-buf file descriptor per plane, these descriptors are then sent along
> +with the metadata (format, modifier, width, height, per-plane offset and stride)
> +to the Wayland server. The Wayland server will then import these file
> +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
> +through Vulkan, or a KMS framebuffer object; each of these import operations
> +will take the same metadata and convert the dma-buf file descriptors into their
> +native buffer handles.
> +
> +Having a non-empty intersection of supported modifiers does not guarantee that
> +import will succeed into all consumers; they may have constraints beyond those
> +impliied by modifiers which must be satisfied.


s/impliied/implied


> +
> +
> +Implicit modifiers
> +==================
> +
> +The concept of modifiers post-dates all of the subsystems mentioned above. As
> +such, it has been retrofitted into all of these APIs, and in order to ensure
> +backwards compatibility, support is needed for drivers and userspace which do
> +not (yet) support modifiers.
> +
> +As an example, GBM is used to allocate buffers to be shared between EGL for
> +rendering and KMS for display. It has two entrypoints for allocating buffers:
> +``gbm_bo_create`` which only takes the format, width, height, and a usage token,
> +and ``gbm_bo_create_with_modifiers`` which extends this with a list of modifiers.
> +
> +In the latter case, the allocation is as discussed above, being provided with a
> +list of acceptable modifiers that the implementation can choose from (or fail if
> +it is not possible to allocate within those constraints). In the former case
> +where modifiers are not provided, the GBM implementation must make its own
> +choice as to what is likely to be the 'best' layout. Such a choice is entirely
> +implementation-specific: some will internally use tiled layouts which are not
> +CPU-accessible if the implementation decides that is a good idea through
> +whatever heuristic. It is the implementation's responsibility to ensure that
> +this choice is appropriate.
> +
> +To support this case where the layout is not known because there is no awareness
> +of modifiers, a special ``DRM_FORMAT_MOD_INVALID`` token has been defined. This
> +pseudo-modifier declares that the layout is not known, and that the driver
> +should use its own logic to determine what the underlying layout may be.
> +
> +.. note::
> +
> +  ``DRM_FORMAT_MOD_INVALID`` is a non-zero value. The modifier value zero is
> +  ``DRM_FORMAT_MOD_LINEAR``, which is an explicit guarantee that the image
> +  has the linear layout. Care and attention should be taken to ensure that
> +  zero as a default uninitialized value signals no modifier.
> +
> +There are four cases where this token may be used:
> +  - during enumeration, an interface may return ``DRM_FORMAT_MOD_INVALID``, either
> +    as the sole member of a modifier list to declare that explicit modifiers are
> +    not supported, or as part of a larger list to declare that implicit modifiers
> +    may be used
> +  - during allocation, a user may supply ``DRM_FORMAT_MOD_INVALID``, either as the
> +    sole member of a modifier list (equivalent to not supplying a modifier list
> +    at all) to declare that explicit modifiers are not supported and must not be
> +    used, or as part of a larger list to declare that an allocation using implicit
> +    modifiers is acceptable
> +  - in a post-allocation query, an implementation may return
> +    ``DRM_FORMAT_MOD_INVALID`` as the modifier of the allocated buffer to declare
> +    that the underlying layout is implementation-defined and that an explicit
> +    modifier description is not available; per the above rules, this may only be
> +    returned when the user has included ``DRM_FORMAT_MOD_INVALID`` as part of the
> +    list of acceptable modifiers, or not provided a list
> +  - when importing a buffer, the user may supply ``DRM_FORMAT_MOD_INVALID`` as the
> +    buffer modifier (or not supply a modifier) to indicate that the modifier is
> +    unknown for whatever reason; this is only acceptable when the buffer has
> +    not been allocated with an explicit modifier
> +
> +It follows from this that for any single buffer, the complete chain of operations
> +formed by the producer and all the consumers must be either fully implicit or fully
> +explicit. For example, if a user wishes to allocate a buffer for use between
> +GPU, display, and media, but the media API does not support modifiers, then the
> +user **must not** allocate the buffer with explicit modifiers and attempt to
> +import the buffer into the media API with no modifier, but either perform the
> +allocation using implicit modifiers, or allocate the buffer for media use
> +separately and copy between the two buffers.
> +
> +As one exception to the above, allocations may be 'upgraded' from implicit
> +to explicit modifiers. For example, if the buffer is allocated with
> +``gbm_bo_create`` (taking no modifiers), the user may then query the modifier with
> +``gbm_bo_get_modifier`` and then use this modifier as an explicit modifier token
> +if a valid modifier is returned.
> +
> +When allocating buffers for exchange between different users and modifiers are
> +not available, implementations are strongly encouraged to use
> +``DRM_FORMAT_MOD_LINEAR`` for their allocation, as this is the universal baseline
> +for exchange. However, it is not guaranteed that this will result in the correct
> +interpretation of buffer content, as implicit modifier operation may still be
> +subject to driver-specific heuristics.
> +
> +Any new users - userspace programs and protocols, kernel subsystems, etc -
> +wishing to exchange buffers must offer interoperability through dma-buf file
> +descriptors for memory planes, DRM format tokens to describe the format, DRM
> +format modifiers to describe the layout in memory, at least width and height for
> +dimensions, and at least offset and stride for each memory plane.
> +
> +.. _zwp_linux_dmabuf_v1: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/blob/main/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
> +.. _VK_EXT_image_drm_format_modifier: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_image_drm_format_modifier.html
> +.. _EGL_EXT_image_dma_buf_import_modifiers: https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import_modifiers.txt
> \ No newline at end of file
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index 72a65db0c498..031df47a7c19 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -22,6 +22,7 @@ place where this information is gathered.
>      unshare
>      spec_ctrl
>      accelerators/ocxl
> +   dma-buf-alloc-exchange
>      ebpf/index
>      ELF
>      ioctl/index


This doc contains rich knowledge, thanks for the writing.
I believe that this will helps to educate a crowd of newbies, including me.
But I know part of the content inside this document is correct.
Maybe, it need a more advance programmer to review.
Anyway, I hope this elegant document can be merged.

Reviewed-by: Sui Jingfeng <suijingfeng@loongson.cn>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/2] doc: uapi: Add document describing dma-buf semantics
  2023-08-03 15:47 ` [PATCH v2 2/2] doc: uapi: Add document describing dma-buf semantics Daniel Stone
  2023-08-18 15:37   ` [v2,2/2] " suijingfeng
@ 2023-08-21 13:33   ` Daniel Vetter
  2023-08-21 17:17     ` Simon Ser
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2023-08-21 13:33 UTC (permalink / raw)
  To: Daniel Stone; +Cc: dri-devel

On Thu, Aug 03, 2023 at 04:47:29PM +0100, Daniel Stone wrote:
> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
> using buffers when crossing context/process/device/subsystem boundaries.
> 
> This ties up all of dma-buf, formats and modifiers, and their usage.
> 
> Signed-off-by: Daniel Stone <daniels@collabora.com>

On both patches Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> with the
minor nits on this one here address while applying or something like that.

> ---
>  Documentation/driver-api/dma-buf.rst          |   8 +
>  Documentation/gpu/drm-uapi.rst                |   7 +
>  .../userspace-api/dma-buf-alloc-exchange.rst  | 384 ++++++++++++++++++
>  Documentation/userspace-api/index.rst         |   1 +
>  4 files changed, 400 insertions(+)
>  create mode 100644 Documentation/userspace-api/dma-buf-alloc-exchange.rst
> 
> v2:
>  - Moved to general uAPI section, cross-referenced from dma-buf/DRM
>  - Added Pekka's suggested glossary with some small changes
>  - Cleanups and clarifications from Simon and James
> 
> diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> index 862dbc2759d0..0c153d79ccc4 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -22,6 +22,14 @@ interact with the three main primitives offered by dma-buf:
>     allowing implicit (kernel-ordered) synchronization of work to
>     preserve the illusion of coherent access
>  
> +
> +Userspace API principles and use
> +--------------------------------
> +
> +For more details on how to design your subsystem's API for dma-buf use, please
> +see Documentation/userspace-api/dma-buf-alloc-exchange.rst.
> +
> +
>  Shared DMA Buffers
>  ------------------
>  
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index 65fb3036a580..eef5fd19bc92 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
> @@ -486,3 +486,10 @@ and the CRTC index is its position in this array.
>  
>  .. kernel-doc:: include/uapi/drm/drm_mode.h
>     :internal:
> +
> +
> +dma-buf interoperability
> +========================
> +
> +Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
> +information on how dma-buf is integrated and exposed within DRM.
> diff --git a/Documentation/userspace-api/dma-buf-alloc-exchange.rst b/Documentation/userspace-api/dma-buf-alloc-exchange.rst
> new file mode 100644
> index 000000000000..090453d2ad78
> --- /dev/null
> +++ b/Documentation/userspace-api/dma-buf-alloc-exchange.rst
> @@ -0,0 +1,384 @@
> +.. Copyright 2021-2023 Collabora Ltd.
> +
> +========================
> +Exchanging pixel buffers
> +========================
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and advice.
> +
> +
> +Glossary of terms
> +=================
> +
> +.. glossary::
> +
> +    image:
> +      Conceptually a two-dimensional array of pixels. The pixels may be stored
> +      in one or more memory buffers. Has width and height in pixels, pixel
> +      format and modifier (implicit or explicit).
> +
> +    row:
> +      A span along a single y-axis value, e.g. from co-ordinates (0,100) to
> +      (200,100).
> +
> +    scanline:
> +      Synonym for row.
> +
> +    column:
> +      A span along a single x-axis value, e.g. from co-ordinates (100,0) to
> +      (100,100).
> +
> +    memory buffer:
> +      A piece of memory for storing (parts of) pixel data. Has stride and size
> +      in bytes and at least one handle in some API. May contain one or more
> +      planes.
> +
> +    plane:
> +      A two-dimensional array of some or all of an image's color and alpha
> +      channel values.
> +
> +    pixel:
> +      A picture element. Has a single color value which is defined by one or
> +      more color channels values, e.g. R, G and B, or Y, Cb and Cr. May also
> +      have an alpha value as an additional channel.
> +
> +    pixel data:
> +      Bytes or bits that represent some or all of the color/alpha channel values
> +      of a pixel or an image. The data for one pixel may be spread over several
> +      planes or memory buffers depending on format and modifier.
> +
> +    color value:
> +      A tuple of numbers, representing a color. Each element in the tuple is a
> +      color channel value.
> +
> +    color channel:
> +      One of the dimensions in a color model. For example, RGB model has
> +      channels R, G, and B. Alpha channel is sometimes counted as a color
> +      channel as well.
> +
> +    pixel format:
> +      A description of how pixel data represents the pixel's color and alpha
> +      values.
> +
> +    modifier:
> +      A description of how pixel data is laid out in memory buffers.
> +
> +    alpha:
> +      A value that denotes the color coverage in a pixel. Sometimes used for
> +      translucency instead.
> +
> +    stride:
> +      A value that denotes the relationship between pixel-location co-ordinates
> +      and byte-offset values. Typically used as the byte offset between two
> +      pixels at the start of vertically-consecutive tiling blocks. For linear
> +      layouts, the byte offset between two vertically-adjacent pixels.

Maybe add here:

"For non-linear formats the stride must be computed in a consistent way,
which usually is done as-if the linear."

This has resulted in some absolutely epic bikesheds since iirc arm fbc
spec defines the stride differently, and the resulting compat issues would
have been hilarious. This took almost forever to agree on.

> +
> +    pitch:
> +      Synonym for stride.
> +
> +
> +Formats and modifiers
> +=====================
> +
> +Each buffer must have an underlying format. This format describes the color
> +values provided for each pixel. Although each subsystem has its own format
> +descriptions (e.g. V4L2 and fbdev), the ``DRM_FORMAT_*`` tokens should be reused
> +wherever possible, as they are the standard descriptions used for interchange.
> +These tokens are described in the ``drm_fourcc.h`` file, which is a part of
> +DRM's uAPI.
> +
> +Each ``DRM_FORMAT_*`` token describes the translation between a pixel
> +co-ordinate in an image, and the color values for that pixel contained within
> +its memory buffers. The number and type of color channels are described:
> +whether they are RGB or YUV, integer or floating-point, the size of each channel
> +and their locations within the pixel memory, and the relationship between color
> +planes.
> +
> +For example, ``DRM_FORMAT_ARGB8888`` describes a format in which each pixel has
> +a single 32-bit value in memory. Alpha, red, green, and blue, color channels are
> +available at 8-bit precision per channel, ordered respectively from most to
> +least significant bits in little-endian storage. ``DRM_FORMAT_*`` is not
> +affected by either CPU or device endianness; the byte pattern in memory is
> +always as described in the format definition, which is usually little-endian.
> +
> +As a more complex example, ``DRM_FORMAT_NV12`` describes a format in which luma
> +and chroma YUV samples are stored in separate planes, where the chroma plane is
> +stored at half the resolution in both dimensions (i.e. one U/V chroma
> +sample is stored for each 2x2 pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel memory
> +samples, and the actual memory storage for the buffer. The most straightforward
> +modifier is ``DRM_FORMAT_MOD_LINEAR``, describing a scheme in which each plane
> +is laid out row-sequentially, from the top-left to the bottom-right corner.
> +This is considered the baseline interchange format, and most convenient for CPU
> +access.
> +
> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +``DRM_FORMAT_MOD_VIVANTE_TILED`` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
> +a plane stores pixels (0,0) to (3,3) inclusive, and the second tile in a plane
> +stores pixels (4,0) to (7,3) inclusive.
> +
> +Some modifiers may modify the number of planes required for an image; for
> +example, the ``I915_FORMAT_MOD_Y_TILED_CCS`` modifier adds a second plane to RGB
> +formats in which it stores data about the status of every tile, notably
> +including whether the tile is fully populated with pixel data, or can be
> +expanded from a single solid color.
> +
> +These extended layouts are highly vendor-specific, and even specific to
> +particular generations or configurations of devices per-vendor. For this reason,
> +support of modifiers must be explicitly enumerated and negotiated by all users
> +in order to ensure a compatible and optimal pipeline, as discussed below.
> +
> +
> +Dimensions and size
> +===================
> +
> +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
> +to the number of unique samples which can be extracted from, or stored to, the
> +underlying memory storage. For example, even though a 1920x1080
> +``DRM_FORMAT_NV12`` buffer has a luma plane containing 1920x1080 samples for the Y
> +component, and 960x540 samples for the U and V components, the overall buffer is
> +still described as having dimensions of 1920x1080.
> +
> +The in-memory storage of a buffer is not guaranteed to begin immediately at the
> +base address of the underlying memory, nor is it guaranteed that the memory
> +storage is tightly clipped to either dimension.
> +
> +Each plane must therefore be described with an ``offset`` in bytes, which will be
> +added to the base address of the memory storage before performing any per-pixel
> +calculations. This may be used to combine multiple planes into a single memory
> +buffer; for example, ``DRM_FORMAT_NV12`` may be stored in a single memory buffer
> +where the luma plane's storage begins immediately at the start of the buffer
> +with an offset of 0, and the chroma plane's storage follows within the same buffer
> +beginning from the byte offset for that plane.
> +
> +Each plane must also have a ``stride`` in bytes, expressing the offset in memory
> +between two contiguous row. For example, a ``DRM_FORMAT_MOD_LINEAR`` buffer
> +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
> +order to allow for aligned access patterns. In this case, the buffer will still
> +be described with a width of 1000, however the stride will be ``1024 * bpp``,
> +indicating that there are 24 pixels at the positive extreme of the x axis whose
> +values are not significant.
> +
> +Buffers may also be padded further in the y dimension, simply by allocating a
> +larger area than would ordinarily be required. For example, many media decoders
> +are not able to natively output buffers of height 1080, but instead require an
> +effective height of 1088 pixels. In this case, the buffer continues to be
> +described as having a height of 1080, with the memory allocation for each buffer
> +being increased to account for the extra padding.
> +
> +
> +Enumeration
> +===========
> +
> +Every user of pixel buffers must be able to enumerate a set of supported formats
> +and modifiers, described together. Within KMS, this is achieved with the
> +``IN_FORMATS`` property on each DRM plane, listing the supported DRM formats, and
> +the modifiers supported for each format. In userspace, this is supported through
> +the `EGL_EXT_image_dma_buf_import_modifiers`_ extension entrypoints for EGL, the
> +`VK_EXT_image_drm_format_modifier`_ extension for Vulkan, and the
> +`zwp_linux_dmabuf_v1`_ extension for Wayland.
> +
> +Each of these interfaces allows users to query a set of supported
> +format+modifier combinations.
> +
> +
> +Negotiation
> +===========
> +
> +It is the responsibility of userspace to negotiate an acceptable format+modifier
> +combination for its usage. This is performed through a simple intersection of
> +lists. For example, if a user wants to use Vulkan to render an image to be
> +displayed on a KMS plane, it must:
> +
> + - query KMS for the ``IN_FORMATS`` property for the given plane
> + - query Vulkan for the supported formats for its physical device, making sure
> +   to pass the ``VkImageUsageFlagBits`` and ``VkImageCreateFlagBits``
> +   corresponding to the intended rendering use
> + - intersect these formats to determine the most appropriate one
> + - for this format, intersect the lists of supported modifiers for both KMS and
> +   Vulkan, to obtain a final list of acceptable modifiers for that format
> +
> +This intersection must be performed for all usages. For example, if the user
> +also wishes to encode the image to a video stream, it must query the media API
> +it intends to use for encoding for the set of modifiers it supports, and
> +additionally intersect against this list.
> +
> +If the intersection of all lists is an empty list, it is not possible to share
> +buffers in this way, and an alternate strategy must be considered (e.g. using
> +CPU access routines to copy data between the different uses, with the
> +corresponding performance cost).
> +
> +The resulting modifier list is unsorted; the order is not significant.
> +
> +
> +Allocation
> +==========
> +
> +Once userspace has determined an appropriate format, and corresponding list of
> +acceptable modifiers, it must allocate the buffer. As there is no universal
> +buffer-allocation interface available at either kernel or userspace level, the
> +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
> +a media API.
> +
> +Each allocation request must take, at a minimum: the pixel format, a list of
> +acceptable modifiers, and the buffer's width and height. Each API may extend
> +this set of properties in different ways, such as allowing allocation in more
> +than two dimensions, intended usage patterns, etc.
> +
> +The component which allocates the buffer will make an arbitrary choice of what
> +it considers the 'best' modifier within the acceptable list for the requested
> +allocation, any padding required, and further properties of the underlying
> +memory buffers such as whether they are stored in system or device-specific
> +memory, whether or not they are physically contiguous, and their cache mode.
> +These properties of the memory buffer are not visible to userspace, however the
> +``dma-heaps`` API is an effort to address this.
> +
> +After allocation, the client must query the allocator to determine the actual
> +modifier selected for the buffer, as well as the per-plane offset and stride.
> +Allocators are not permitted to vary the format in use, to select a modifier not
> +provided within the acceptable list, nor to vary the pixel dimensions other than
> +the padding expressed through offset, stride, and size.
> +
> +Communicating additional constraints, such as alignment of stride or offset,
> +placement within a particular memory area, etc, is out of scope of dma-buf,
> +and is not solved by format and modifier tokens.
> +
> +
> +Import
> +======
> +
> +To use a buffer within a different context, device, or subsystem, the user
> +passes these parameters (format, modifier, width, height, and per-plane offset
> +and stride) to an importing API.
> +
> +Each memory buffer is referred to by a buffer handle, which may be unique or
> +duplicated within an image. For example, a ``DRM_FORMAT_NV12`` buffer may have
> +the luma and chroma buffers combined into a single memory buffer by use of the
> +per-plane offset parameters, or they may be completely separate allocations in
> +memory. For this reason, each import and allocation API must provide a separate
> +handle for each plane.
> +
> +Each kernel subsystem has its own types and interfaces for buffer management.
> +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
> +are not portable between contexts, processes, devices, or subsystems.
> +
> +To address this, ``dma-buf`` handles are used as the universal interchange for
> +buffers. Subsystem-specific operations are used to export native buffer handles
> +to a ``dma-buf`` file descriptor, and to import those file descriptors into a
> +native buffer handle. dma-buf file descriptors can be transferred between
> +contexts, processes, devices, and subsystems.
> +
> +For example, a Wayland media player may use V4L2 to decode a video frame into a
> +``DRM_FORMAT_NV12`` buffer. This will result in two memory planes (luma and
> +chroma) being dequeued by the user from V4L2. These planes are then exported to
> +one dma-buf file descriptor per plane, these descriptors are then sent along
> +with the metadata (format, modifier, width, height, per-plane offset and stride)
> +to the Wayland server. The Wayland server will then import these file
> +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
> +through Vulkan, or a KMS framebuffer object; each of these import operations
> +will take the same metadata and convert the dma-buf file descriptors into their
> +native buffer handles.
> +
> +Having a non-empty intersection of supported modifiers does not guarantee that
> +import will succeed into all consumers; they may have constraints beyond those
> +impliied by modifiers which must be satisfied.
> +
> +
> +Implicit modifiers
> +==================
> +
> +The concept of modifiers post-dates all of the subsystems mentioned above. As
> +such, it has been retrofitted into all of these APIs, and in order to ensure
> +backwards compatibility, support is needed for drivers and userspace which do
> +not (yet) support modifiers.
> +
> +As an example, GBM is used to allocate buffers to be shared between EGL for
> +rendering and KMS for display. It has two entrypoints for allocating buffers:
> +``gbm_bo_create`` which only takes the format, width, height, and a usage token,
> +and ``gbm_bo_create_with_modifiers`` which extends this with a list of modifiers.
> +
> +In the latter case, the allocation is as discussed above, being provided with a
> +list of acceptable modifiers that the implementation can choose from (or fail if
> +it is not possible to allocate within those constraints). In the former case
> +where modifiers are not provided, the GBM implementation must make its own
> +choice as to what is likely to be the 'best' layout. Such a choice is entirely
> +implementation-specific: some will internally use tiled layouts which are not
> +CPU-accessible if the implementation decides that is a good idea through
> +whatever heuristic. It is the implementation's responsibility to ensure that
> +this choice is appropriate.
> +
> +To support this case where the layout is not known because there is no awareness
> +of modifiers, a special ``DRM_FORMAT_MOD_INVALID`` token has been defined. This
> +pseudo-modifier declares that the layout is not known, and that the driver
> +should use its own logic to determine what the underlying layout may be.
> +
> +.. note::
> +
> +  ``DRM_FORMAT_MOD_INVALID`` is a non-zero value. The modifier value zero is
> +  ``DRM_FORMAT_MOD_LINEAR``, which is an explicit guarantee that the image
> +  has the linear layout. Care and attention should be taken to ensure that
> +  zero as a default uninitialized value signals no modifier.

I think the last sentence here got a bit confused, and probably should be
replaced with:

Care and attention should be taken to ensure that zero as a default
uninitialized value is not mixed up with either no modifier or the linear
modifier. Also note that in some API the invalid modifier value is
specified with an out-of-band flag, like in the ADDFB2 IOCTL.

Cheers, Sima


> +
> +There are four cases where this token may be used:
> +  - during enumeration, an interface may return ``DRM_FORMAT_MOD_INVALID``, either
> +    as the sole member of a modifier list to declare that explicit modifiers are
> +    not supported, or as part of a larger list to declare that implicit modifiers
> +    may be used
> +  - during allocation, a user may supply ``DRM_FORMAT_MOD_INVALID``, either as the
> +    sole member of a modifier list (equivalent to not supplying a modifier list
> +    at all) to declare that explicit modifiers are not supported and must not be
> +    used, or as part of a larger list to declare that an allocation using implicit
> +    modifiers is acceptable
> +  - in a post-allocation query, an implementation may return
> +    ``DRM_FORMAT_MOD_INVALID`` as the modifier of the allocated buffer to declare
> +    that the underlying layout is implementation-defined and that an explicit
> +    modifier description is not available; per the above rules, this may only be
> +    returned when the user has included ``DRM_FORMAT_MOD_INVALID`` as part of the
> +    list of acceptable modifiers, or not provided a list
> +  - when importing a buffer, the user may supply ``DRM_FORMAT_MOD_INVALID`` as the
> +    buffer modifier (or not supply a modifier) to indicate that the modifier is
> +    unknown for whatever reason; this is only acceptable when the buffer has
> +    not been allocated with an explicit modifier
> +
> +It follows from this that for any single buffer, the complete chain of operations
> +formed by the producer and all the consumers must be either fully implicit or fully
> +explicit. For example, if a user wishes to allocate a buffer for use between
> +GPU, display, and media, but the media API does not support modifiers, then the
> +user **must not** allocate the buffer with explicit modifiers and attempt to
> +import the buffer into the media API with no modifier, but either perform the
> +allocation using implicit modifiers, or allocate the buffer for media use
> +separately and copy between the two buffers.
> +
> +As one exception to the above, allocations may be 'upgraded' from implicit
> +to explicit modifiers. For example, if the buffer is allocated with
> +``gbm_bo_create`` (taking no modifiers), the user may then query the modifier with
> +``gbm_bo_get_modifier`` and then use this modifier as an explicit modifier token
> +if a valid modifier is returned.
> +
> +When allocating buffers for exchange between different users and modifiers are
> +not available, implementations are strongly encouraged to use
> +``DRM_FORMAT_MOD_LINEAR`` for their allocation, as this is the universal baseline
> +for exchange. However, it is not guaranteed that this will result in the correct
> +interpretation of buffer content, as implicit modifier operation may still be
> +subject to driver-specific heuristics.
> +
> +Any new users - userspace programs and protocols, kernel subsystems, etc -
> +wishing to exchange buffers must offer interoperability through dma-buf file
> +descriptors for memory planes, DRM format tokens to describe the format, DRM
> +format modifiers to describe the layout in memory, at least width and height for
> +dimensions, and at least offset and stride for each memory plane.
> +
> +.. _zwp_linux_dmabuf_v1: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/blob/main/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
> +.. _VK_EXT_image_drm_format_modifier: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_image_drm_format_modifier.html
> +.. _EGL_EXT_image_dma_buf_import_modifiers: https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import_modifiers.txt
> \ No newline at end of file
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index 72a65db0c498..031df47a7c19 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -22,6 +22,7 @@ place where this information is gathered.
>     unshare
>     spec_ctrl
>     accelerators/ocxl
> +   dma-buf-alloc-exchange
>     ebpf/index
>     ELF
>     ioctl/index
> -- 
> 2.41.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/2] doc: uapi: Add document describing dma-buf semantics
  2023-08-21 13:33   ` [PATCH v2 2/2] " Daniel Vetter
@ 2023-08-21 17:17     ` Simon Ser
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Ser @ 2023-08-21 17:17 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Stone, dri-devel

Pushed to drm-misc-next with minor edits. Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics
  2023-08-03 15:47 ` [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics Daniel Stone
  2023-08-03 19:47   ` James Jones
  2023-08-03 20:30     ` Sebastian Wick
@ 2023-08-29 13:30   ` Christian König
  2 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2023-08-29 13:30 UTC (permalink / raw)
  To: Daniel Stone, dri-devel; +Cc: linaro-mm-sig, linux-media

Am 03.08.23 um 17:47 schrieb Daniel Stone:
> Hi all,
> This is v2 to the linked patch series; thanks to everyone for reviewing
> the initial version. I've moved this out of a pure DRM scope and into
> the general userspace-API design section. Hopefully it helps others and
> answers a bunch of questions.
>
> I think it'd be great to have input/links/reflections from other
> subsystems as well here.

Could you send that one to me once more. My mail client seems to have 
swallowed the patches.

Thanks,
Christian.

>
> Cheers,
> Daniel
>
>
> _______________________________________________
> Linaro-mm-sig mailing list -- linaro-mm-sig@lists.linaro.org
> To unsubscribe send an email to linaro-mm-sig-leave@lists.linaro.org


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-08-29 13:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-05 12:27 [PATCH] doc: gpu: Add document describing buffer exchange Daniel Stone
2021-09-06 12:28 ` Simon Ser
2021-11-09  0:18   ` James Jones
2021-11-09  9:13     ` Daniel Vetter
2021-11-09  9:22       ` Simon Ser
2023-08-03 15:46       ` Daniel Stone
2023-08-03 15:46         ` Daniel Stone
2021-09-06 17:13 ` Robert Beckett
2021-09-08  9:34 ` Pekka Paalanen
2021-09-08  9:44   ` Simon Ser
2021-11-09  0:21     ` James Jones
2021-11-09  9:12       ` Daniel Vetter
2021-09-08 18:16 ` Daniel Vetter
2023-08-03 15:47 ` [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics Daniel Stone
2023-08-03 19:47   ` James Jones
2023-08-03 20:30   ` Sebastian Wick
2023-08-03 20:30     ` Sebastian Wick
2023-08-29 13:30   ` [Linaro-mm-sig] " Christian König
2023-08-03 15:47 ` [PATCH v2 1/2] doc: dma-buf: Rewrite intro section a little Daniel Stone
2023-08-03 15:47 ` [PATCH v2 2/2] doc: uapi: Add document describing dma-buf semantics Daniel Stone
2023-08-18 15:37   ` [v2,2/2] " suijingfeng
2023-08-21 13:33   ` [PATCH v2 2/2] " Daniel Vetter
2023-08-21 17:17     ` Simon Ser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.