All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
       [not found] <20171220085151.6327051e@nvidia.com>
@ 2017-12-20 19:51 ` Daniel Vetter
  2017-12-20 19:54   ` Kristian Høgsberg
       [not found] ` <CAPj87rOmGsN+HZEk1G=gFx_uPyipzEURB7=bfqOxxmLDtWwPgw@mail.gmail.com>
  1 sibling, 1 reply; 21+ messages in thread
From: Daniel Vetter @ 2017-12-20 19:51 UTC (permalink / raw)
  To: Miguel Angel Vico, dri-devel
  Cc: Rob Clark, Nicolai Hähnle, Jason Ekstrand,
	Kristian H. Kristensen, Ben Skeggs, Chad Versace, mesa-dev,
	Lyude Paul

Since this also involves the kernel let's add dri-devel ...

On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico <mvicomoya@nvidia.com> wrote:
> Hi all,
>
> As many of you already know, I've been working with James Jones on the
> Generic Device Allocator project lately. He started a discussion thread
> some weeks ago seeking feedback on the current prototype of the library
> and advice on how to move all this forward, from a prototype stage to
> production. For further reference, see:
>
>    https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
>
> From the thread above, we came up with very interesting high level
> design ideas for one of the currently missing parts in the library:
> Usage transitions. That's something I'll personally work on during the
> following weeks.
>
>
> In the meantime, I've been working on putting together an open source
> implementation of the allocator mechanisms using the Nouveau driver for
> all to be able to play with.
>
> Below I'm seeking feedback on a bunch of changes I had to make to
> different components of the graphics stack:
>
> ** Allocator **
>
>   An allocator driver implementation on top of Nouveau. The current
>   implementation only handles pitch linear layouts, but that's enough
>   to have the kmscube port working using the allocator and Nouveau
>   drivers.
>
>   You can pull these changes from
>
>       https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
>
> ** Mesa **
>
>   James's kmscube port to use the allocator relies on the
>   EXT_external_objects extension to import allocator allocations to
>   OpenGL as a texture object. However, the Nouveau implementation of
>   these mechanisms is missing in Mesa, so I went ahead and added them.
>
>   You can pull these changes from
>
>       https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
>
>   Also, James's kmscube port uses the NVX_unix_allocator_import
>   extension to attach allocator metadata to texture objects so the
>   driver knows how to deal with the imported memory.
>
>   Note that there isn't a formal spec for this extension yet. For now,
>   it just serves as an experimental mechanism to import allocator
>   memory in OpenGL, and attach metadata to texture objects.
>
>   You can pull these changes (written on top of the above) from:
>
>       https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
>
> ** kmscube **
>
>   Mostly minor fixes and improvements on top of James's port to use the
>   allocator. Main thing is the allocator initialization path will use
>   EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
>   by the underlying EGL implementation.
>
>   You can pull these changes from:
>
>       https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
>
>
> With all the above you should be able to get kmscube working using the
> allocator on top of the Nouveau driver.
>
>
> Another of the missing pieces before we can move this to production is
> importing allocations to DRM FB objects. This is probably one of the
> most sensitive parts of the project as it requires modification/addition
> of kernel driver interfaces.
>
> At XDC2017, James had several hallway conversations with several people
> about this, all having different opinions. I'd like to take this
> opportunity to also start a discussion about what's the best option to
> create a path to get allocator allocations added as DRM FB objects.
>
> These are the few options we've considered to start with:
>
>   A) Have vendor-private ioctls to set properties on GEM objects that
>      are inherited by the FB objects. This is how our (NVIDIA) desktop
>      DRM driver currently works. This would require every vendor to add
>      their own ioctl to process allocator metadata, but the metadata is
>      actually a vendor-agnostic object more like DRM modifiers. We'd
>      like to come up with a vendor-agnostic solutions that can be
>      integrated to core DRM.
>
>   B) Add a new drmModeAddFBWithMetadata() command that takes allocator
>      metadata blobs for each plane of the FB. Some people in the
>      community have mentioned this is their preferred design. This,
>      however, means we'd have to go through the exercise of adding
>      another metadata mechanism to the whole graphics stack.
>
>   C) Shove allocator metadata into DRM by defining it to be a separate
>      plane in the image, and using the existing DRM modifiers mechanism
>      to indicate there is another plane for each "real" plane added. It
>      isn't clear how this scales to surfaces that already need several
>      planes, but there are some people that see this as the only way
>      forward. Also, we would have to create a separate GEM buffer for
>      the metadatada itself, which seems excessive.
>
> We personally like option (B) better, and have already started to
> prototype the new path (which is actually very similar to the
> drmModeAddFB2() one). You can take a look at the new interfaces here:
>
>     https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
>
> There may be other options that haven't been explored yet that could be
> a better choice than the above, so any suggestion will be greatly
> appreciated.

What kind of metadata are we talking about here? Addfb has tons of
stuff already that's "metadata". The only thing I've spotted is
PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
userspace, but definitely not something addfb ever needs. addfb only
needs the resulting pitch that we actually allocated (and might decide
it doesn't like that, but that's a different issue).

And since there's no patches for nouveau itself I can't really say
anything beyond that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-20 19:51 ` [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces Daniel Vetter
@ 2017-12-20 19:54   ` Kristian Høgsberg
  2017-12-20 20:41     ` Miguel Angel Vico
  0 siblings, 1 reply; 21+ messages in thread
From: Kristian Høgsberg @ 2017-12-20 19:54 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Clark, Miguel Angel Vico, dri-devel, Jason Ekstrand,
	Kristian H. Kristensen, Ben Skeggs, Chad Versace, mesa-dev,
	Lyude Paul, Nicolai Hähnle

On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> Since this also involves the kernel let's add dri-devel ...
>
> On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico <mvicomoya@nvidia.com> wrote:
>> Hi all,
>>
>> As many of you already know, I've been working with James Jones on the
>> Generic Device Allocator project lately. He started a discussion thread
>> some weeks ago seeking feedback on the current prototype of the library
>> and advice on how to move all this forward, from a prototype stage to
>> production. For further reference, see:
>>
>>    https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
>>
>> From the thread above, we came up with very interesting high level
>> design ideas for one of the currently missing parts in the library:
>> Usage transitions. That's something I'll personally work on during the
>> following weeks.
>>
>>
>> In the meantime, I've been working on putting together an open source
>> implementation of the allocator mechanisms using the Nouveau driver for
>> all to be able to play with.
>>
>> Below I'm seeking feedback on a bunch of changes I had to make to
>> different components of the graphics stack:
>>
>> ** Allocator **
>>
>>   An allocator driver implementation on top of Nouveau. The current
>>   implementation only handles pitch linear layouts, but that's enough
>>   to have the kmscube port working using the allocator and Nouveau
>>   drivers.
>>
>>   You can pull these changes from
>>
>>       https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
>>
>> ** Mesa **
>>
>>   James's kmscube port to use the allocator relies on the
>>   EXT_external_objects extension to import allocator allocations to
>>   OpenGL as a texture object. However, the Nouveau implementation of
>>   these mechanisms is missing in Mesa, so I went ahead and added them.
>>
>>   You can pull these changes from
>>
>>       https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
>>
>>   Also, James's kmscube port uses the NVX_unix_allocator_import
>>   extension to attach allocator metadata to texture objects so the
>>   driver knows how to deal with the imported memory.
>>
>>   Note that there isn't a formal spec for this extension yet. For now,
>>   it just serves as an experimental mechanism to import allocator
>>   memory in OpenGL, and attach metadata to texture objects.
>>
>>   You can pull these changes (written on top of the above) from:
>>
>>       https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
>>
>> ** kmscube **
>>
>>   Mostly minor fixes and improvements on top of James's port to use the
>>   allocator. Main thing is the allocator initialization path will use
>>   EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
>>   by the underlying EGL implementation.
>>
>>   You can pull these changes from:
>>
>>       https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
>>
>>
>> With all the above you should be able to get kmscube working using the
>> allocator on top of the Nouveau driver.
>>
>>
>> Another of the missing pieces before we can move this to production is
>> importing allocations to DRM FB objects. This is probably one of the
>> most sensitive parts of the project as it requires modification/addition
>> of kernel driver interfaces.
>>
>> At XDC2017, James had several hallway conversations with several people
>> about this, all having different opinions. I'd like to take this
>> opportunity to also start a discussion about what's the best option to
>> create a path to get allocator allocations added as DRM FB objects.
>>
>> These are the few options we've considered to start with:
>>
>>   A) Have vendor-private ioctls to set properties on GEM objects that
>>      are inherited by the FB objects. This is how our (NVIDIA) desktop
>>      DRM driver currently works. This would require every vendor to add
>>      their own ioctl to process allocator metadata, but the metadata is
>>      actually a vendor-agnostic object more like DRM modifiers. We'd
>>      like to come up with a vendor-agnostic solutions that can be
>>      integrated to core DRM.
>>
>>   B) Add a new drmModeAddFBWithMetadata() command that takes allocator
>>      metadata blobs for each plane of the FB. Some people in the
>>      community have mentioned this is their preferred design. This,
>>      however, means we'd have to go through the exercise of adding
>>      another metadata mechanism to the whole graphics stack.
>>
>>   C) Shove allocator metadata into DRM by defining it to be a separate
>>      plane in the image, and using the existing DRM modifiers mechanism
>>      to indicate there is another plane for each "real" plane added. It
>>      isn't clear how this scales to surfaces that already need several
>>      planes, but there are some people that see this as the only way
>>      forward. Also, we would have to create a separate GEM buffer for
>>      the metadatada itself, which seems excessive.
>>
>> We personally like option (B) better, and have already started to
>> prototype the new path (which is actually very similar to the
>> drmModeAddFB2() one). You can take a look at the new interfaces here:
>>
>>     https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
>>
>> There may be other options that haven't been explored yet that could be
>> a better choice than the above, so any suggestion will be greatly
>> appreciated.
>
> What kind of metadata are we talking about here? Addfb has tons of
> stuff already that's "metadata". The only thing I've spotted is
> PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
> userspace, but definitely not something addfb ever needs. addfb only
> needs the resulting pitch that we actually allocated (and might decide
> it doesn't like that, but that's a different issue).
>
> And since there's no patches for nouveau itself I can't really say
> anything beyond that.

I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.

Kristian

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-20 19:54   ` Kristian Høgsberg
@ 2017-12-20 20:41     ` Miguel Angel Vico
  2017-12-20 23:22       ` [Mesa-dev] " Kristian Kristensen
  0 siblings, 1 reply; 21+ messages in thread
From: Miguel Angel Vico @ 2017-12-20 20:41 UTC (permalink / raw)
  To: Kristian Høgsberg
  Cc: Rob Clark, Nicolai Hähnle, dri-devel, Jason Ekstrand,
	Kristian H. Kristensen, Ben Skeggs, Chad Versace, mesa-dev,
	Lyude Paul

Inline.

On Wed, 20 Dec 2017 11:54:10 -0800
Kristian Høgsberg <hoegsberg@gmail.com> wrote:

> On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > Since this also involves the kernel let's add dri-devel ...

Yeah, I forgot. Thanks Daniel!

> >
> > On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico <mvicomoya@nvidia.com> wrote:  
> >> Hi all,
> >>
> >> As many of you already know, I've been working with James Jones on the
> >> Generic Device Allocator project lately. He started a discussion thread
> >> some weeks ago seeking feedback on the current prototype of the library
> >> and advice on how to move all this forward, from a prototype stage to
> >> production. For further reference, see:
> >>
> >>    https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
> >>
> >> From the thread above, we came up with very interesting high level
> >> design ideas for one of the currently missing parts in the library:
> >> Usage transitions. That's something I'll personally work on during the
> >> following weeks.
> >>
> >>
> >> In the meantime, I've been working on putting together an open source
> >> implementation of the allocator mechanisms using the Nouveau driver for
> >> all to be able to play with.
> >>
> >> Below I'm seeking feedback on a bunch of changes I had to make to
> >> different components of the graphics stack:
> >>
> >> ** Allocator **
> >>
> >>   An allocator driver implementation on top of Nouveau. The current
> >>   implementation only handles pitch linear layouts, but that's enough
> >>   to have the kmscube port working using the allocator and Nouveau
> >>   drivers.
> >>
> >>   You can pull these changes from
> >>
> >>       https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
> >>
> >> ** Mesa **
> >>
> >>   James's kmscube port to use the allocator relies on the
> >>   EXT_external_objects extension to import allocator allocations to
> >>   OpenGL as a texture object. However, the Nouveau implementation of
> >>   these mechanisms is missing in Mesa, so I went ahead and added them.
> >>
> >>   You can pull these changes from
> >>
> >>       https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
> >>
> >>   Also, James's kmscube port uses the NVX_unix_allocator_import
> >>   extension to attach allocator metadata to texture objects so the
> >>   driver knows how to deal with the imported memory.
> >>
> >>   Note that there isn't a formal spec for this extension yet. For now,
> >>   it just serves as an experimental mechanism to import allocator
> >>   memory in OpenGL, and attach metadata to texture objects.
> >>
> >>   You can pull these changes (written on top of the above) from:
> >>
> >>       https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
> >>
> >> ** kmscube **
> >>
> >>   Mostly minor fixes and improvements on top of James's port to use the
> >>   allocator. Main thing is the allocator initialization path will use
> >>   EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
> >>   by the underlying EGL implementation.
> >>
> >>   You can pull these changes from:
> >>
> >>       https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
> >>
> >>
> >> With all the above you should be able to get kmscube working using the
> >> allocator on top of the Nouveau driver.
> >>
> >>
> >> Another of the missing pieces before we can move this to production is
> >> importing allocations to DRM FB objects. This is probably one of the
> >> most sensitive parts of the project as it requires modification/addition
> >> of kernel driver interfaces.
> >>
> >> At XDC2017, James had several hallway conversations with several people
> >> about this, all having different opinions. I'd like to take this
> >> opportunity to also start a discussion about what's the best option to
> >> create a path to get allocator allocations added as DRM FB objects.
> >>
> >> These are the few options we've considered to start with:
> >>
> >>   A) Have vendor-private ioctls to set properties on GEM objects that
> >>      are inherited by the FB objects. This is how our (NVIDIA) desktop
> >>      DRM driver currently works. This would require every vendor to add
> >>      their own ioctl to process allocator metadata, but the metadata is
> >>      actually a vendor-agnostic object more like DRM modifiers. We'd
> >>      like to come up with a vendor-agnostic solutions that can be
> >>      integrated to core DRM.
> >>
> >>   B) Add a new drmModeAddFBWithMetadata() command that takes allocator
> >>      metadata blobs for each plane of the FB. Some people in the
> >>      community have mentioned this is their preferred design. This,
> >>      however, means we'd have to go through the exercise of adding
> >>      another metadata mechanism to the whole graphics stack.
> >>
> >>   C) Shove allocator metadata into DRM by defining it to be a separate
> >>      plane in the image, and using the existing DRM modifiers mechanism
> >>      to indicate there is another plane for each "real" plane added. It
> >>      isn't clear how this scales to surfaces that already need several
> >>      planes, but there are some people that see this as the only way
> >>      forward. Also, we would have to create a separate GEM buffer for
> >>      the metadatada itself, which seems excessive.
> >>
> >> We personally like option (B) better, and have already started to
> >> prototype the new path (which is actually very similar to the
> >> drmModeAddFB2() one). You can take a look at the new interfaces here:
> >>
> >>     https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
> >>
> >> There may be other options that haven't been explored yet that could be
> >> a better choice than the above, so any suggestion will be greatly
> >> appreciated.  
> >
> > What kind of metadata are we talking about here? Addfb has tons of
> > stuff already that's "metadata". The only thing I've spotted is
> > PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
> > userspace, but definitely not something addfb ever needs. addfb only
> > needs the resulting pitch that we actually allocated (and might decide
> > it doesn't like that, but that's a different issue).

Sorry I failed to make it clearer. Metadata here refers to all
allocation parameters the generic allocator was given to allocate
memory. That currently means the final capability set used for
the allocation, including all constraints (such as memory alignment,
pitch alignment, and others) and capabilities, describing allocation
properties like tiling formats, compression, and such.

> >
> > And since there's no patches for nouveau itself I can't really say
> > anything beyond that.  

I can work on implementing these interfaces for nouveau, maybe
partially, if that's going to help. I just thought it'd be better to
first start a discussion on what would be the right way to pass
allocator metadata to display drivers before starting to seriously
implement any of the proposed options.

> 
> I'd like to see concrete examples of actual display controllers
> supporting more format layouts than what can be specified with a 64
> bit modifier.

The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.

Thanks.

> 
> Kristian
> 
> > -Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel  
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


-- 
Miguel


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-20 20:41     ` Miguel Angel Vico
@ 2017-12-20 23:22       ` Kristian Kristensen
  2017-12-21  1:05         ` Ilia Mirkin
  2017-12-21  8:05         ` Daniel Vetter
  0 siblings, 2 replies; 21+ messages in thread
From: Kristian Kristensen @ 2017-12-20 23:22 UTC (permalink / raw)
  To: Miguel Angel Vico
  Cc: Rob Clark, Nicolai Hähnle, Kristian Høgsberg,
	dri-devel, Jason Ekstrand, Ben Skeggs, Chad Versace, mesa-dev,
	Lyude Paul


[-- Attachment #1.1: Type: text/plain, Size: 9437 bytes --]

On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <mvicomoya@nvidia.com>
wrote:

> Inline.
>
> On Wed, 20 Dec 2017 11:54:10 -0800
> Kristian Høgsberg <hoegsberg@gmail.com> wrote:
>
> > On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > Since this also involves the kernel let's add dri-devel ...
>
> Yeah, I forgot. Thanks Daniel!
>
> > >
> > > On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico <
> mvicomoya@nvidia.com> wrote:
> > >> Hi all,
> > >>
> > >> As many of you already know, I've been working with James Jones on the
> > >> Generic Device Allocator project lately. He started a discussion
> thread
> > >> some weeks ago seeking feedback on the current prototype of the
> library
> > >> and advice on how to move all this forward, from a prototype stage to
> > >> production. For further reference, see:
> > >>
> > >>    https://lists.freedesktop.org/archives/mesa-dev/2017-
> November/177632.html
> > >>
> > >> From the thread above, we came up with very interesting high level
> > >> design ideas for one of the currently missing parts in the library:
> > >> Usage transitions. That's something I'll personally work on during the
> > >> following weeks.
> > >>
> > >>
> > >> In the meantime, I've been working on putting together an open source
> > >> implementation of the allocator mechanisms using the Nouveau driver
> for
> > >> all to be able to play with.
> > >>
> > >> Below I'm seeking feedback on a bunch of changes I had to make to
> > >> different components of the graphics stack:
> > >>
> > >> ** Allocator **
> > >>
> > >>   An allocator driver implementation on top of Nouveau. The current
> > >>   implementation only handles pitch linear layouts, but that's enough
> > >>   to have the kmscube port working using the allocator and Nouveau
> > >>   drivers.
> > >>
> > >>   You can pull these changes from
> > >>
> > >>       https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/
> nouveau-driver
> > >>
> > >> ** Mesa **
> > >>
> > >>   James's kmscube port to use the allocator relies on the
> > >>   EXT_external_objects extension to import allocator allocations to
> > >>   OpenGL as a texture object. However, the Nouveau implementation of
> > >>   these mechanisms is missing in Mesa, so I went ahead and added them.
> > >>
> > >>   You can pull these changes from
> > >>
> > >>       https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_
> external_objects-nouveau
> > >>
> > >>   Also, James's kmscube port uses the NVX_unix_allocator_import
> > >>   extension to attach allocator metadata to texture objects so the
> > >>   driver knows how to deal with the imported memory.
> > >>
> > >>   Note that there isn't a formal spec for this extension yet. For now,
> > >>   it just serves as an experimental mechanism to import allocator
> > >>   memory in OpenGL, and attach metadata to texture objects.
> > >>
> > >>   You can pull these changes (written on top of the above) from:
> > >>
> > >>       https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_
> unix_allocator_import
> > >>
> > >> ** kmscube **
> > >>
> > >>   Mostly minor fixes and improvements on top of James's port to use
> the
> > >>   allocator. Main thing is the allocator initialization path will use
> > >>   EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
> > >>   by the underlying EGL implementation.
> > >>
> > >>   You can pull these changes from:
> > >>
> > >>       https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/
> allocator-nouveau
> > >>
> > >>
> > >> With all the above you should be able to get kmscube working using the
> > >> allocator on top of the Nouveau driver.
> > >>
> > >>
> > >> Another of the missing pieces before we can move this to production is
> > >> importing allocations to DRM FB objects. This is probably one of the
> > >> most sensitive parts of the project as it requires
> modification/addition
> > >> of kernel driver interfaces.
> > >>
> > >> At XDC2017, James had several hallway conversations with several
> people
> > >> about this, all having different opinions. I'd like to take this
> > >> opportunity to also start a discussion about what's the best option to
> > >> create a path to get allocator allocations added as DRM FB objects.
> > >>
> > >> These are the few options we've considered to start with:
> > >>
> > >>   A) Have vendor-private ioctls to set properties on GEM objects that
> > >>      are inherited by the FB objects. This is how our (NVIDIA) desktop
> > >>      DRM driver currently works. This would require every vendor to
> add
> > >>      their own ioctl to process allocator metadata, but the metadata
> is
> > >>      actually a vendor-agnostic object more like DRM modifiers. We'd
> > >>      like to come up with a vendor-agnostic solutions that can be
> > >>      integrated to core DRM.
> > >>
> > >>   B) Add a new drmModeAddFBWithMetadata() command that takes allocator
> > >>      metadata blobs for each plane of the FB. Some people in the
> > >>      community have mentioned this is their preferred design. This,
> > >>      however, means we'd have to go through the exercise of adding
> > >>      another metadata mechanism to the whole graphics stack.
> > >>
> > >>   C) Shove allocator metadata into DRM by defining it to be a separate
> > >>      plane in the image, and using the existing DRM modifiers
> mechanism
> > >>      to indicate there is another plane for each "real" plane added.
> It
> > >>      isn't clear how this scales to surfaces that already need several
> > >>      planes, but there are some people that see this as the only way
> > >>      forward. Also, we would have to create a separate GEM buffer for
> > >>      the metadatada itself, which seems excessive.
> > >>
> > >> We personally like option (B) better, and have already started to
> > >> prototype the new path (which is actually very similar to the
> > >> drmModeAddFB2() one). You can take a look at the new interfaces here:
> > >>
> > >>     https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_
> addfb_with_metadata__4.14-rc8
> > >>
> > >> There may be other options that haven't been explored yet that could
> be
> > >> a better choice than the above, so any suggestion will be greatly
> > >> appreciated.
> > >
> > > What kind of metadata are we talking about here? Addfb has tons of
> > > stuff already that's "metadata". The only thing I've spotted is
> > > PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
> > > userspace, but definitely not something addfb ever needs. addfb only
> > > needs the resulting pitch that we actually allocated (and might decide
> > > it doesn't like that, but that's a different issue).
>
> Sorry I failed to make it clearer. Metadata here refers to all
> allocation parameters the generic allocator was given to allocate
> memory. That currently means the final capability set used for
> the allocation, including all constraints (such as memory alignment,
> pitch alignment, and others) and capabilities, describing allocation
> properties like tiling formats, compression, and such.
>
> > >
> > > And since there's no patches for nouveau itself I can't really say
> > > anything beyond that.
>
> I can work on implementing these interfaces for nouveau, maybe
> partially, if that's going to help. I just thought it'd be better to
> first start a discussion on what would be the right way to pass
> allocator metadata to display drivers before starting to seriously
> implement any of the proposed options.
>
> >
> > I'd like to see concrete examples of actual display controllers
> > supporting more format layouts than what can be specified with a 64
> > bit modifier.
>
> The main problem is our tiling and other metadata parameters can't
> generally fit in a modifier, so we find passing a blob of metadata a
> more suitable mechanism.
>

I understand that you may have n knobs with a total of more than a total of
56 bits that configure your tiling/swizzling for color buffers. What I
don't buy is that you need all those combinations when passing buffers
around between codecs, cameras and display controllers. Even if you're
sharing between the same 3D drivers in different processes, I expect just
locking down, say, 64 different combinations (you can add more over time)
and assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.

If you want us the redesign KMS and the rest of the eco system around blobs
instead of the modifiers that are now moderately pervasive, you have to
justify it a little better than just "we didn't find it suitable".

Kristian


> Thanks.
>
> >
> > Kristian
> >
> > > -Daniel
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
> --
> Miguel
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 13327 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-20 23:22       ` [Mesa-dev] " Kristian Kristensen
@ 2017-12-21  1:05         ` Ilia Mirkin
  2017-12-21  8:05         ` Daniel Vetter
  1 sibling, 0 replies; 21+ messages in thread
From: Ilia Mirkin @ 2017-12-21  1:05 UTC (permalink / raw)
  To: Kristian Kristensen
  Cc: Rob Clark, Miguel Angel Vico, Kristian Høgsberg, dri-devel,
	Jason Ekstrand, Ben Skeggs, Chad Versace, mesa-dev, Lyude Paul,
	Nicolai Hähnle

On Wed, Dec 20, 2017 at 6:22 PM, Kristian Kristensen
<hoegsberg@google.com> wrote:
> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <mvicomoya@nvidia.com>
> wrote:
>> On Wed, 20 Dec 2017 11:54:10 -0800
>> Kristian Høgsberg <hoegsberg@gmail.com> wrote:
>> > I'd like to see concrete examples of actual display controllers
>> > supporting more format layouts than what can be specified with a 64
>> > bit modifier.
>>
>> The main problem is our tiling and other metadata parameters can't
>> generally fit in a modifier, so we find passing a blob of metadata a
>> more suitable mechanism.
>
>
> I understand that you may have n knobs with a total of more than a total of
> 56 bits that configure your tiling/swizzling for color buffers. What I don't
> buy is that you need all those combinations when passing buffers around
> between codecs, cameras and display controllers. Even if you're sharing
> between the same 3D drivers in different processes, I expect just locking
> down, say, 64 different combinations (you can add more over time) and
> assigning each a modifier would be sufficient. I doubt you'd extract
> meaningful performance gains from going all the way to a blob.

There's probably a world of stuff that we don't know about in nouveau,
but I have a hard time coming up with more than 64-bits worth of
tiling info for dGPU surfaces...

There's 8 bits (sorta, not fully populated, but might as well use
them) of "micro" tiling which is done at the PTE level by the memory
controller and includes compression settings, and then there's 4 bits
of tiling per dimension for macro blocks (which configures different
sizes for each dimension for tile sizes) -- that's only 20 bits. MSAA
level (which is part of the micro tiling setting usually, but may not
necessarily have to be) - another couple of bits, maybe something else
weird for another few bits. Anyways, this is *nowhere* close to 64
bits.

What am I missing?

  -ilia
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-20 23:22       ` [Mesa-dev] " Kristian Kristensen
  2017-12-21  1:05         ` Ilia Mirkin
@ 2017-12-21  8:05         ` Daniel Vetter
  2018-02-21  6:14           ` Chad Versace
  1 sibling, 1 reply; 21+ messages in thread
From: Daniel Vetter @ 2017-12-21  8:05 UTC (permalink / raw)
  To: Kristian Kristensen
  Cc: Rob Clark, Miguel Angel Vico, Kristian Høgsberg, dri-devel,
	Jason Ekstrand, Ben Skeggs, Chad Versace, mesa-dev, Lyude Paul,
	Nicolai Hähnle

On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen
<hoegsberg@google.com> wrote:
> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <mvicomoya@nvidia.com>
> wrote:
>>
>> Inline.
>>
>> On Wed, 20 Dec 2017 11:54:10 -0800
>> Kristian Høgsberg <hoegsberg@gmail.com> wrote:
>>
>> > On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> > > Since this also involves the kernel let's add dri-devel ...
>>
>> Yeah, I forgot. Thanks Daniel!
>>
>> > >
>> > > On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico
>> > > <mvicomoya@nvidia.com> wrote:
>> > >> Hi all,
>> > >>
>> > >> As many of you already know, I've been working with James Jones on
>> > >> the
>> > >> Generic Device Allocator project lately. He started a discussion
>> > >> thread
>> > >> some weeks ago seeking feedback on the current prototype of the
>> > >> library
>> > >> and advice on how to move all this forward, from a prototype stage to
>> > >> production. For further reference, see:
>> > >>
>> > >>
>> > >> https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
>> > >>
>> > >> From the thread above, we came up with very interesting high level
>> > >> design ideas for one of the currently missing parts in the library:
>> > >> Usage transitions. That's something I'll personally work on during
>> > >> the
>> > >> following weeks.
>> > >>
>> > >>
>> > >> In the meantime, I've been working on putting together an open source
>> > >> implementation of the allocator mechanisms using the Nouveau driver
>> > >> for
>> > >> all to be able to play with.
>> > >>
>> > >> Below I'm seeking feedback on a bunch of changes I had to make to
>> > >> different components of the graphics stack:
>> > >>
>> > >> ** Allocator **
>> > >>
>> > >>   An allocator driver implementation on top of Nouveau. The current
>> > >>   implementation only handles pitch linear layouts, but that's enough
>> > >>   to have the kmscube port working using the allocator and Nouveau
>> > >>   drivers.
>> > >>
>> > >>   You can pull these changes from
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
>> > >>
>> > >> ** Mesa **
>> > >>
>> > >>   James's kmscube port to use the allocator relies on the
>> > >>   EXT_external_objects extension to import allocator allocations to
>> > >>   OpenGL as a texture object. However, the Nouveau implementation of
>> > >>   these mechanisms is missing in Mesa, so I went ahead and added
>> > >> them.
>> > >>
>> > >>   You can pull these changes from
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
>> > >>
>> > >>   Also, James's kmscube port uses the NVX_unix_allocator_import
>> > >>   extension to attach allocator metadata to texture objects so the
>> > >>   driver knows how to deal with the imported memory.
>> > >>
>> > >>   Note that there isn't a formal spec for this extension yet. For
>> > >> now,
>> > >>   it just serves as an experimental mechanism to import allocator
>> > >>   memory in OpenGL, and attach metadata to texture objects.
>> > >>
>> > >>   You can pull these changes (written on top of the above) from:
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
>> > >>
>> > >> ** kmscube **
>> > >>
>> > >>   Mostly minor fixes and improvements on top of James's port to use
>> > >> the
>> > >>   allocator. Main thing is the allocator initialization path will use
>> > >>   EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
>> > >>   by the underlying EGL implementation.
>> > >>
>> > >>   You can pull these changes from:
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
>> > >>
>> > >>
>> > >> With all the above you should be able to get kmscube working using
>> > >> the
>> > >> allocator on top of the Nouveau driver.
>> > >>
>> > >>
>> > >> Another of the missing pieces before we can move this to production
>> > >> is
>> > >> importing allocations to DRM FB objects. This is probably one of the
>> > >> most sensitive parts of the project as it requires
>> > >> modification/addition
>> > >> of kernel driver interfaces.
>> > >>
>> > >> At XDC2017, James had several hallway conversations with several
>> > >> people
>> > >> about this, all having different opinions. I'd like to take this
>> > >> opportunity to also start a discussion about what's the best option
>> > >> to
>> > >> create a path to get allocator allocations added as DRM FB objects.
>> > >>
>> > >> These are the few options we've considered to start with:
>> > >>
>> > >>   A) Have vendor-private ioctls to set properties on GEM objects that
>> > >>      are inherited by the FB objects. This is how our (NVIDIA)
>> > >> desktop
>> > >>      DRM driver currently works. This would require every vendor to
>> > >> add
>> > >>      their own ioctl to process allocator metadata, but the metadata
>> > >> is
>> > >>      actually a vendor-agnostic object more like DRM modifiers. We'd
>> > >>      like to come up with a vendor-agnostic solutions that can be
>> > >>      integrated to core DRM.
>> > >>
>> > >>   B) Add a new drmModeAddFBWithMetadata() command that takes
>> > >> allocator
>> > >>      metadata blobs for each plane of the FB. Some people in the
>> > >>      community have mentioned this is their preferred design. This,
>> > >>      however, means we'd have to go through the exercise of adding
>> > >>      another metadata mechanism to the whole graphics stack.
>> > >>
>> > >>   C) Shove allocator metadata into DRM by defining it to be a
>> > >> separate
>> > >>      plane in the image, and using the existing DRM modifiers
>> > >> mechanism
>> > >>      to indicate there is another plane for each "real" plane added.
>> > >> It
>> > >>      isn't clear how this scales to surfaces that already need
>> > >> several
>> > >>      planes, but there are some people that see this as the only way
>> > >>      forward. Also, we would have to create a separate GEM buffer for
>> > >>      the metadatada itself, which seems excessive.
>> > >>
>> > >> We personally like option (B) better, and have already started to
>> > >> prototype the new path (which is actually very similar to the
>> > >> drmModeAddFB2() one). You can take a look at the new interfaces here:
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
>> > >>
>> > >> There may be other options that haven't been explored yet that could
>> > >> be
>> > >> a better choice than the above, so any suggestion will be greatly
>> > >> appreciated.
>> > >
>> > > What kind of metadata are we talking about here? Addfb has tons of
>> > > stuff already that's "metadata". The only thing I've spotted is
>> > > PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
>> > > userspace, but definitely not something addfb ever needs. addfb only
>> > > needs the resulting pitch that we actually allocated (and might decide
>> > > it doesn't like that, but that's a different issue).
>>
>> Sorry I failed to make it clearer. Metadata here refers to all
>> allocation parameters the generic allocator was given to allocate
>> memory. That currently means the final capability set used for
>> the allocation, including all constraints (such as memory alignment,
>> pitch alignment, and others) and capabilities, describing allocation
>> properties like tiling formats, compression, and such.

Yeah, that part was all clear. I'd want more details of what exact
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?

As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.

>> > >
>> > > And since there's no patches for nouveau itself I can't really say
>> > > anything beyond that.
>>
>> I can work on implementing these interfaces for nouveau, maybe
>> partially, if that's going to help. I just thought it'd be better to
>> first start a discussion on what would be the right way to pass
>> allocator metadata to display drivers before starting to seriously
>> implement any of the proposed options.

It's not so much wiring down the interfaces, but actually implementing
the features. "We need more than the 56bits of modifier" is a lot more
plausible when you have the full stack showing that you do actually
need it. Or well, not a full stack but at least a demo that shows what
you want to pull of but can't do right now.

>> > I'd like to see concrete examples of actual display controllers
>> > supporting more format layouts than what can be specified with a 64
>> > bit modifier.
>>
>> The main problem is our tiling and other metadata parameters can't
>> generally fit in a modifier, so we find passing a blob of metadata a
>> more suitable mechanism.
>
>
> I understand that you may have n knobs with a total of more than a total of
> 56 bits that configure your tiling/swizzling for color buffers. What I don't
> buy is that you need all those combinations when passing buffers around
> between codecs, cameras and display controllers. Even if you're sharing
> between the same 3D drivers in different processes, I expect just locking
> down, say, 64 different combinations (you can add more over time) and
> assigning each a modifier would be sufficient. I doubt you'd extract
> meaningful performance gains from going all the way to a blob.

Tegra just redesigned it's modifier space from an ungodly amount of
bits to just a few layouts. Not even just the ones in used, but simply
limiting to the ones that make sense (there's dependencies apparently)
Also note that the modifier alone doesn't need to describe the layout
precisely, it only makes sense together with a specific pixel format
and size. E.g. a bunch of the i915 layouts change layout depending
upon bpp.

> If you want us the redesign KMS and the rest of the eco system around blobs
> instead of the modifiers that are now moderately pervasive, you have to
> justify it a little better than just "we didn't find it suitable".

Given that this involves the kernel and hence the kernel's userspace
requirements for merging stuff (assuming of course you want to
establish this as an upstream interface), then I'd say a sufficient
demonstration would be actually running out of bits in nouveau
(kernel+mesa).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
       [not found]       ` <CAF6AEGtNsBwhJAiUGuUQFdKccurSzD-HnpHH-JcHSx0RUCnZGA@mail.gmail.com>
@ 2017-12-28 18:24         ` Miguel Angel Vico
  2018-01-03 14:53           ` [Mesa-dev] " Rob Clark
                             ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Miguel Angel Vico @ 2017-12-28 18:24 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Nicolai Hähnle, dri-devel, Jason Ekstrand,
	Kristian H. Kristensen, Ben Skeggs, Chad Versace, mesa-dev,
	Lyude Paul, Brian Starkey

(Adding dri-devel back, and trying to respond to some comments from
the different forks)

James Jones wrote:

> Your worst case analysis above isn't far off from our HW, give or take
> some bits and axes here and there.  We've started an internal discussion
> about how to lay out all the bits we need.  It's hard to even enumerate
> them all without having a complete understanding of what capability sets
> are going to include, a fully-optimized implementation of the mechanism
> on our HW, and lot's of test scenarios though.  

(thanks James for most of the info below)

To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.

We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.

Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.

Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
bits.

If device-local properties are included, we might need a couple more
bits for caching.

We may also need to express locality information, which may take at
least another 2 or 3 bits.

If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.

So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.


Daniel Stone wrote:

> So I reflexively
> get a bit itchy when I see the kernel being used to transit magic
> blobs of data which are supplied by userspace, and only interpreted by
> different userspace. Having tiling formats hidden away means that
> we've had real-world bugs in AMD hardware, where we end up displaying
> garbage because we cannot generically reason about the buffer
> attributes.  

I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?


Daniel Vetter wrote:

> I think in the interim figuring out how to expose kms capabilities
> better (and necessarily standardizing at least some of them which
> matter at the compositor level, like size limits of framebuffers)
> feels like the place to push the ecosystem forward. In some way
> Miguel's proposal looks a bit backwards, since it adds the pitch
> capabilities to addfb, but at addfb time you've allocated everything
> already, so way too late to fix things up. With modifiers we've added
> a very simple per-plane property to list which modifiers can be
> combined with which pixel formats. Tiny start, but obviously very far
> from all that we'll need.  

Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.


Rob Clark wrote:

> It does seem like, if possible, starting out with modifiers for now at
> the kernel interface would make life easier, vs trying to reinvent
> both kernel and userspace APIs at the same time.  Userspace APIs are
> easier to change or throw away.  Presumably by the time we get to the
> point of changing kernel uabi, we are already using, and pretty happy
> with, serialized liballoc data over the wire in userspace so it is
> only a matter of changing the kernel interface.  

I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability sets, how that
would translate to modifiers?

I assume display doesn't really care about a lot of the data capability
sets may encode, but is it correct to think of modifiers as things only
display needs? If we are to treat modifiers as a first-class citizen, I
would expect to use them beyond that.


Kristian Kristensen wrote:

> I agree and let me elaborate a bit. The problem we're seeing isn't that we
> need more that 2^56 modifiers for a future GPU. The problem is that flags
> like USE_SCANOUT (which your allocator proposal essentially keeps) is
> inadequate. The available tiling and compression formats vary with which
> (in KMS terms) CRTC you want to use, which plane you're on whether you want
> rotation or no and how much you want to scale etc. It's not realistic to
> think that we could model this in a centralized allocator library that's
> detached from the display driver. To be fair, this is not a point about
> blobs vs modifiers, it's saying that the use flags don't belong in the
> allocator, they belong in the APIs that will be using the buffer - and not
> as literal use flags, but as a way to discover supported modifiers for a
> given use case.  

Why detached from the display driver? I don't see why there couldn't be
an allocator driver with access to display capabilities that can be
used in the negotiation step to find the optimal set of allocation
parameters.


Kristian Kristensen wrote:

> I understand that you may have n knobs with a total of more than a total of
> 56 bits that configure your tiling/swizzling for color buffers. What I don't
> buy is that you need all those combinations when passing buffers around
> between codecs, cameras and display controllers. Even if you're sharing
> between the same 3D drivers in different processes, I expect just locking
> down, say, 64 different combinations (you can add more over time) and
> assigning each a modifier would be sufficient. I doubt you'd extract
> meaningful performance gains from going all the way to a blob.  

If someone has N knobs available, I don't understand why there
shouldn't be a mechanism that allows making use of them all, regardless
of performance numbers.


Daniel Vetter wrote:

> Yeah, that part was all clear. I'd want more details of what exact
> kind of metadata. fast-clear colors? tiling layouts? aux data for the
> compressor? hiz (or whatever you folks call it) tree?
>
> As you say, we've discussed massive amounts of different variants on
> this, and there's different answers for different questions. Consensus
> seems to be that bigger stuff (compression data, hiz, clear colors,
> ...) should be stored in aux planes, while the exact layout and what
> kind of aux planes you have are encoded in the modifier.  

My understanding is that capability sets may include all metadata you
mentioned. Besides tiling/swizzling layout and compression parameters,
things like zero-bandwidth-clears (I guess the same or similar to
fast-clear colors?), hiz-like data, device-local properties such as
caches, or locality information could/will be also included in a
capability set. We are even considering encoding some sort of usage
transition information in the capability set itself.


Thanks,
Miguel.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-28 18:24         ` Miguel Angel Vico
@ 2018-01-03 14:53           ` Rob Clark
  2018-01-03 19:26           ` James Jones
  2018-01-08  9:35           ` Daniel Vetter
  2 siblings, 0 replies; 21+ messages in thread
From: Rob Clark @ 2018-01-03 14:53 UTC (permalink / raw)
  To: Miguel Angel Vico
  Cc: Rob Clark, Nicolai Hähnle, dri-devel, Jason Ekstrand,
	Kristian H. Kristensen, Ben Skeggs, Chad Versace, mesa-dev,
	Lyude Paul

On Thu, Dec 28, 2017 at 1:24 PM, Miguel Angel Vico <mvicomoya@nvidia.com> wrote:
> (Adding dri-devel back, and trying to respond to some comments from
> the different forks)
>
> James Jones wrote:
>
>> Your worst case analysis above isn't far off from our HW, give or take
>> some bits and axes here and there.  We've started an internal discussion
>> about how to lay out all the bits we need.  It's hard to even enumerate
>> them all without having a complete understanding of what capability sets
>> are going to include, a fully-optimized implementation of the mechanism
>> on our HW, and lot's of test scenarios though.
>
> (thanks James for most of the info below)
>
> To elaborate a bit, if we want to share an allocation across GPUs for 3D
> rendering, it seems we would need 12 bits to express our
> swizzling/tiling memory layouts for fermi+. In addition to that,
> maxwell uses 3 more bits for this, and we need an extra bit to identify
> pre-fermi representations.
>
> We also need one bit to differentiate between Tegra and desktop, and
> another one to indicate whether the layout is otherwise linear.
>
> Then things like whether compression is used (one more bit), and we can
> probably get by with 3 bits for the type of compression if we are
> creative. However, it'd be way easier to just track arch + page kind,
> which would be like 32 bits on its own.
>
> Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
> bits.
>
> If device-local properties are included, we might need a couple more
> bits for caching.
>
> We may also need to express locality information, which may take at
> least another 2 or 3 bits.
>
> If we want to share array textures too, you also need to pass the array
> pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
> its own.
>
> So yes, as James mentioned, with some effort, we could technically fit
> our current allocation parameters in a modifier, but I'm still not
> convinced this is as future proof as it could be as our hardware grows
> in capabilities.
>
>
> Daniel Stone wrote:
>
>> So I reflexively
>> get a bit itchy when I see the kernel being used to transit magic
>> blobs of data which are supplied by userspace, and only interpreted by
>> different userspace. Having tiling formats hidden away means that
>> we've had real-world bugs in AMD hardware, where we end up displaying
>> garbage because we cannot generically reason about the buffer
>> attributes.
>
> I'm a bit confused. Can't modifiers be specified by vendors and only
> interpreted by drivers? My understanding was that modifiers could
> actually be treated as opaque 64-bit data, in which case they would
> qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
> scalable. What am I missing?
>
>
> Daniel Vetter wrote:
>
>> I think in the interim figuring out how to expose kms capabilities
>> better (and necessarily standardizing at least some of them which
>> matter at the compositor level, like size limits of framebuffers)
>> feels like the place to push the ecosystem forward. In some way
>> Miguel's proposal looks a bit backwards, since it adds the pitch
>> capabilities to addfb, but at addfb time you've allocated everything
>> already, so way too late to fix things up. With modifiers we've added
>> a very simple per-plane property to list which modifiers can be
>> combined with which pixel formats. Tiny start, but obviously very far
>> from all that we'll need.
>
> Not sure whether I might be misunderstanding your statement, but one of
> the allocator main features is negotiation of nearly optimal allocation
> parameters given a set of uses on different devices/engines by the
> capability merge operation. A client should have queried what every
> device/engine is capable of for the given uses, find the optimal set of
> capabilities, and use it for allocating a buffer. At the moment these
> parameters are given to KMS, they are expected to be good. If they
> aren't, the client didn't do things right.
>
>
> Rob Clark wrote:
>
>> It does seem like, if possible, starting out with modifiers for now at
>> the kernel interface would make life easier, vs trying to reinvent
>> both kernel and userspace APIs at the same time.  Userspace APIs are
>> easier to change or throw away.  Presumably by the time we get to the
>> point of changing kernel uabi, we are already using, and pretty happy
>> with, serialized liballoc data over the wire in userspace so it is
>> only a matter of changing the kernel interface.
>
> I guess we can indeed start with modifiers for now, if that's what it
> takes to get the allocator mechanisms rolling. However, it seems to me
> that we won't be able to encode the same type of information included
> in capability sets with modifiers in all cases. For instance, if we end
> up encoding usage transition information in capability sets, how that
> would translate to modifiers?
>
> I assume display doesn't really care about a lot of the data capability
> sets may encode, but is it correct to think of modifiers as things only
> display needs? If we are to treat modifiers as a first-class citizen, I
> would expect to use them beyond that.
>

btw, the places where modifiers are used currently is limited to 2d
textures, without mipmap levels.  Basically scanout buffers, winsys
buffers, decoded frames of video, and that sort of thing.  I think we
can keep it that way, which avoids needing to encode additional info
(layer pitch, z tiling info for 3d textures, or whatever else).

So we just need to have something in userspace that translates the
relevant subset of capability set info to modifiers.

Maybe down the road, if capability sets are ubiquitous we can
"promote" that mechanism to kernel uabi.. although tbh I am not
entirely sure I can envision a use-case where kernel needs to know
about a cubemap array texture.

BR,
-R

>
> Kristian Kristensen wrote:
>
>> I agree and let me elaborate a bit. The problem we're seeing isn't that we
>> need more that 2^56 modifiers for a future GPU. The problem is that flags
>> like USE_SCANOUT (which your allocator proposal essentially keeps) is
>> inadequate. The available tiling and compression formats vary with which
>> (in KMS terms) CRTC you want to use, which plane you're on whether you want
>> rotation or no and how much you want to scale etc. It's not realistic to
>> think that we could model this in a centralized allocator library that's
>> detached from the display driver. To be fair, this is not a point about
>> blobs vs modifiers, it's saying that the use flags don't belong in the
>> allocator, they belong in the APIs that will be using the buffer - and not
>> as literal use flags, but as a way to discover supported modifiers for a
>> given use case.
>
> Why detached from the display driver? I don't see why there couldn't be
> an allocator driver with access to display capabilities that can be
> used in the negotiation step to find the optimal set of allocation
> parameters.
>
>
> Kristian Kristensen wrote:
>
>> I understand that you may have n knobs with a total of more than a total of
>> 56 bits that configure your tiling/swizzling for color buffers. What I don't
>> buy is that you need all those combinations when passing buffers around
>> between codecs, cameras and display controllers. Even if you're sharing
>> between the same 3D drivers in different processes, I expect just locking
>> down, say, 64 different combinations (you can add more over time) and
>> assigning each a modifier would be sufficient. I doubt you'd extract
>> meaningful performance gains from going all the way to a blob.
>
> If someone has N knobs available, I don't understand why there
> shouldn't be a mechanism that allows making use of them all, regardless
> of performance numbers.
>
>
> Daniel Vetter wrote:
>
>> Yeah, that part was all clear. I'd want more details of what exact
>> kind of metadata. fast-clear colors? tiling layouts? aux data for the
>> compressor? hiz (or whatever you folks call it) tree?
>>
>> As you say, we've discussed massive amounts of different variants on
>> this, and there's different answers for different questions. Consensus
>> seems to be that bigger stuff (compression data, hiz, clear colors,
>> ...) should be stored in aux planes, while the exact layout and what
>> kind of aux planes you have are encoded in the modifier.
>
> My understanding is that capability sets may include all metadata you
> mentioned. Besides tiling/swizzling layout and compression parameters,
> things like zero-bandwidth-clears (I guess the same or similar to
> fast-clear colors?), hiz-like data, device-local properties such as
> caches, or locality information could/will be also included in a
> capability set. We are even considering encoding some sort of usage
> transition information in the capability set itself.
>
>
> Thanks,
> Miguel.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-28 18:24         ` Miguel Angel Vico
  2018-01-03 14:53           ` [Mesa-dev] " Rob Clark
@ 2018-01-03 19:26           ` James Jones
  2018-01-03 20:36             ` Kristian Kristensen
  2018-01-08  9:35           ` Daniel Vetter
  2 siblings, 1 reply; 21+ messages in thread
From: James Jones @ 2018-01-03 19:26 UTC (permalink / raw)
  To: Miguel Angel Vico, Rob Clark
  Cc: Rob Clark, Nicolai Hähnle, dri-devel, Jason Ekstrand,
	Kristian H. Kristensen, Ben Skeggs, Chad Versace, mesa-dev,
	Lyude Paul

On 12/28/2017 10:24 AM, Miguel Angel Vico wrote:
> (Adding dri-devel back, and trying to respond to some comments from
> the different forks)
> 
> James Jones wrote:
> 
>> Your worst case analysis above isn't far off from our HW, give or take
>> some bits and axes here and there.  We've started an internal discussion
>> about how to lay out all the bits we need.  It's hard to even enumerate
>> them all without having a complete understanding of what capability sets
>> are going to include, a fully-optimized implementation of the mechanism
>> on our HW, and lot's of test scenarios though.
> 
> (thanks James for most of the info below)
> 
> To elaborate a bit, if we want to share an allocation across GPUs for 3D
> rendering, it seems we would need 12 bits to express our
> swizzling/tiling memory layouts for fermi+. In addition to that,
> maxwell uses 3 more bits for this, and we need an extra bit to identify
> pre-fermi representations.
> 
> We also need one bit to differentiate between Tegra and desktop, and
> another one to indicate whether the layout is otherwise linear.
> 
> Then things like whether compression is used (one more bit), and we can
> probably get by with 3 bits for the type of compression if we are
> creative. However, it'd be way easier to just track arch + page kind,
> which would be like 32 bits on its own.

Not clear if this is an NV-only term, so for those not familiar, page 
kind is very loosely the equivalent of a format modifier our HW uses 
internally in its memory management subsystem.  The value mappings vary 
a bit for each HW generation.

> Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
> bits.
> 
> If device-local properties are included, we might need a couple more
> bits for caching.
> 
> We may also need to express locality information, which may take at
> least another 2 or 3 bits.
> 
> If we want to share array textures too, you also need to pass the array
> pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
> its own.
> 
> So yes, as James mentioned, with some effort, we could technically fit
> our current allocation parameters in a modifier, but I'm still not
> convinced this is as future proof as it could be as our hardware grows
> in capabilities.
> 
> 
> Daniel Stone wrote:
> 
>> So I reflexively
>> get a bit itchy when I see the kernel being used to transit magic
>> blobs of data which are supplied by userspace, and only interpreted by
>> different userspace. Having tiling formats hidden away means that
>> we've had real-world bugs in AMD hardware, where we end up displaying
>> garbage because we cannot generically reason about the buffer
>> attributes.
> 
> I'm a bit confused. Can't modifiers be specified by vendors and only
> interpreted by drivers? My understanding was that modifiers could
> actually be treated as opaque 64-bit data, in which case they would
> qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
> scalable. What am I missing?
> 
> 
> Daniel Vetter wrote:
> 
>> I think in the interim figuring out how to expose kms capabilities
>> better (and necessarily standardizing at least some of them which
>> matter at the compositor level, like size limits of framebuffers)
>> feels like the place to push the ecosystem forward. In some way
>> Miguel's proposal looks a bit backwards, since it adds the pitch
>> capabilities to addfb, but at addfb time you've allocated everything
>> already, so way too late to fix things up. With modifiers we've added
>> a very simple per-plane property to list which modifiers can be
>> combined with which pixel formats. Tiny start, but obviously very far
>> from all that we'll need.
> 
> Not sure whether I might be misunderstanding your statement, but one of
> the allocator main features is negotiation of nearly optimal allocation
> parameters given a set of uses on different devices/engines by the
> capability merge operation. A client should have queried what every
> device/engine is capable of for the given uses, find the optimal set of
> capabilities, and use it for allocating a buffer. At the moment these
> parameters are given to KMS, they are expected to be good. If they
> aren't, the client didn't do things right.
> 
> 
> Rob Clark wrote:
> 
>> It does seem like, if possible, starting out with modifiers for now at
>> the kernel interface would make life easier, vs trying to reinvent
>> both kernel and userspace APIs at the same time.  Userspace APIs are
>> easier to change or throw away.  Presumably by the time we get to the
>> point of changing kernel uabi, we are already using, and pretty happy
>> with, serialized liballoc data over the wire in userspace so it is
>> only a matter of changing the kernel interface.
> 
> I guess we can indeed start with modifiers for now, if that's what it
> takes to get the allocator mechanisms rolling. However, it seems to me
> that we won't be able to encode the same type of information included
> in capability sets with modifiers in all cases. For instance, if we end
> up encoding usage transition information in capability sets, how that
> would translate to modifiers?
> 
> I assume display doesn't really care about a lot of the data capability
> sets may encode, but is it correct to think of modifiers as things only
> display needs? If we are to treat modifiers as a first-class citizen, I
> would expect to use them beyond that.

Right, this becomes a lot more interesting when modifiers or capability 
sets start getting used to share things from Vulkan<->Vulkan, for 
example.  Of course, we don't need to change kernel ABIs for that, but 
wayland protocols, Vulkan extensions, etc. might need modification. 
Regardless, I agree with Miguel's sentiment.  Let's at least defer this 
debate a bit until we know more about what capability sets look like. 
If modifiers alone still seem sufficient, so be it.

> 
> Kristian Kristensen wrote:
> 
>> I agree and let me elaborate a bit. The problem we're seeing isn't that we
>> need more that 2^56 modifiers for a future GPU. The problem is that flags
>> like USE_SCANOUT (which your allocator proposal essentially keeps) is
>> inadequate. The available tiling and compression formats vary with which
>> (in KMS terms) CRTC you want to use, which plane you're on whether you want
>> rotation or no and how much you want to scale etc. It's not realistic to
>> think that we could model this in a centralized allocator library that's
>> detached from the display driver. To be fair, this is not a point about
>> blobs vs modifiers, it's saying that the use flags don't belong in the
>> allocator, they belong in the APIs that will be using the buffer - and not
>> as literal use flags, but as a way to discover supported modifiers for a
>> given use case.
> 
> Why detached from the display driver? I don't see why there couldn't be
> an allocator driver with access to display capabilities that can be
> used in the negotiation step to find the optimal set of allocation
> parameters.

In addition, speaking to some other portions of your response, most of 
the usage in the prototype is placeholder stuff for testing. 
USE_SCANNOUT is partially expanded to include orientation as well, which 
helps in some cases on our hardware.  If there's more complex stuff for 
other display hardware, it needs to be expanded further, or that HW is 
free to expose a vendor-specific usage, since usage is extensible.  It's 
easy to mirror in all the relevant usage flags from other APIs or 
engines too.  That's a rather small amount of duplication.

The important part is the logic that selects optimal usage.  I don't 
think it's possible to select optimal usage with the queries spread 
around all the APIs.  Vulkan isn't going to know about video encode 
usage.  In many situations it won't know about display usage.  It just 
knows optimal texture/render usage.  Therefore it can't optimize 
parameters for usage it doesn't know about it.  A centralized allocator 
can, especially when all the usage ends up delegated to a single 
device/GPU.  It will have all the same information available to it on 
the back end because it can access DRM devices, v4l devices, etc. to 
query their capabilities via allocator backends, but it can have more 
information available on the front end from the app, and a more complete 
solution returned from a driver that is able to parse and consider that 
additional information.

Additionally, I again offer the goal of an optimal gralloc 
implementation built on top of the allocator mechanism.  I find it 
difficult to imagine building gralloc on top of Vulkan or EGL and DRM. 
Does such a solution seem feasible to you?  I've not researched this 
significantly myself, but Google Android engineers shared that concern 
when we had the initial discussions at XDC 2016.

> Kristian Kristensen wrote:
> 
>> I understand that you may have n knobs with a total of more than a total of
>> 56 bits that configure your tiling/swizzling for color buffers. What I don't
>> buy is that you need all those combinations when passing buffers around
>> between codecs, cameras and display controllers. Even if you're sharing
>> between the same 3D drivers in different processes, I expect just locking
>> down, say, 64 different combinations (you can add more over time) and
>> assigning each a modifier would be sufficient. I doubt you'd extract
>> meaningful performance gains from going all the way to a blob.
> 
> If someone has N knobs available, I don't understand why there
> shouldn't be a mechanism that allows making use of them all, regardless
> of performance numbers.
> 
> 
> Daniel Vetter wrote:
> 
>> Yeah, that part was all clear. I'd want more details of what exact
>> kind of metadata. fast-clear colors? tiling layouts? aux data for the
>> compressor? hiz (or whatever you folks call it) tree?
>>
>> As you say, we've discussed massive amounts of different variants on
>> this, and there's different answers for different questions. Consensus
>> seems to be that bigger stuff (compression data, hiz, clear colors,
>> ...) should be stored in aux planes, while the exact layout and what
>> kind of aux planes you have are encoded in the modifier.
> 
> My understanding is that capability sets may include all metadata you
> mentioned. Besides tiling/swizzling layout and compression parameters,
> things like zero-bandwidth-clears (I guess the same or similar to
> fast-clear colors?), hiz-like data, device-local properties such as
> caches, or locality information could/will be also included in a
> capability set. We are even considering encoding some sort of usage
> transition information in the capability set itself.

I think there's some nuance here.  The format of compression metadata 
would clearly be a capability set thing.  The compression data itself 
would indeed be in some auxiliary surface on most/all hardware.  Things 
like fast clears are harder to nail down because implementations seem 
more varied there.  It might be very awkward on some hardware to put the 
necessary meta-data in a DRM FB plane, while that might be the only 
reasonable way to accomplish it on other hardware.  I think we'll have 
to work through some corner cases across lots of hardware before this 
bottoms out.

Thanks,
-James

> Thanks,
> Miguel.
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-01-03 19:26           ` James Jones
@ 2018-01-03 20:36             ` Kristian Kristensen
  0 siblings, 0 replies; 21+ messages in thread
From: Kristian Kristensen @ 2018-01-03 20:36 UTC (permalink / raw)
  To: James Jones
  Cc: Rob Clark, Miguel Angel Vico, dri-devel, Jason Ekstrand,
	Ben Skeggs, Chad Versace, mesa-dev, Lyude Paul,
	Nicolai Hähnle


[-- Attachment #1.1: Type: text/plain, Size: 14138 bytes --]

On Wed, Jan 3, 2018 at 11:26 AM, James Jones <jajones@nvidia.com> wrote:

> On 12/28/2017 10:24 AM, Miguel Angel Vico wrote:
>
>> (Adding dri-devel back, and trying to respond to some comments from
>> the different forks)
>>
>> James Jones wrote:
>>
>> Your worst case analysis above isn't far off from our HW, give or take
>>> some bits and axes here and there.  We've started an internal discussion
>>> about how to lay out all the bits we need.  It's hard to even enumerate
>>> them all without having a complete understanding of what capability sets
>>> are going to include, a fully-optimized implementation of the mechanism
>>> on our HW, and lot's of test scenarios though.
>>>
>>
>> (thanks James for most of the info below)
>>
>> To elaborate a bit, if we want to share an allocation across GPUs for 3D
>> rendering, it seems we would need 12 bits to express our
>> swizzling/tiling memory layouts for fermi+. In addition to that,
>> maxwell uses 3 more bits for this, and we need an extra bit to identify
>> pre-fermi representations.
>>
>> We also need one bit to differentiate between Tegra and desktop, and
>> another one to indicate whether the layout is otherwise linear.
>>
>> Then things like whether compression is used (one more bit), and we can
>> probably get by with 3 bits for the type of compression if we are
>> creative. However, it'd be way easier to just track arch + page kind,
>> which would be like 32 bits on its own.
>>
>
> Not clear if this is an NV-only term, so for those not familiar, page kind
> is very loosely the equivalent of a format modifier our HW uses internally
> in its memory management subsystem.  The value mappings vary a bit for each
> HW generation.
>
>
> Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
>> bits.
>>
>> If device-local properties are included, we might need a couple more
>> bits for caching.
>>
>> We may also need to express locality information, which may take at
>> least another 2 or 3 bits.
>>
>> If we want to share array textures too, you also need to pass the array
>> pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
>> its own.
>>
>> So yes, as James mentioned, with some effort, we could technically fit
>> our current allocation parameters in a modifier, but I'm still not
>> convinced this is as future proof as it could be as our hardware grows
>> in capabilities.
>>
>>
>> Daniel Stone wrote:
>>
>> So I reflexively
>>> get a bit itchy when I see the kernel being used to transit magic
>>> blobs of data which are supplied by userspace, and only interpreted by
>>> different userspace. Having tiling formats hidden away means that
>>> we've had real-world bugs in AMD hardware, where we end up displaying
>>> garbage because we cannot generically reason about the buffer
>>> attributes.
>>>
>>
>> I'm a bit confused. Can't modifiers be specified by vendors and only
>> interpreted by drivers? My understanding was that modifiers could
>> actually be treated as opaque 64-bit data, in which case they would
>> qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
>> scalable. What am I missing?
>>
>>
>> Daniel Vetter wrote:
>>
>> I think in the interim figuring out how to expose kms capabilities
>>> better (and necessarily standardizing at least some of them which
>>> matter at the compositor level, like size limits of framebuffers)
>>> feels like the place to push the ecosystem forward. In some way
>>> Miguel's proposal looks a bit backwards, since it adds the pitch
>>> capabilities to addfb, but at addfb time you've allocated everything
>>> already, so way too late to fix things up. With modifiers we've added
>>> a very simple per-plane property to list which modifiers can be
>>> combined with which pixel formats. Tiny start, but obviously very far
>>> from all that we'll need.
>>>
>>
>> Not sure whether I might be misunderstanding your statement, but one of
>> the allocator main features is negotiation of nearly optimal allocation
>> parameters given a set of uses on different devices/engines by the
>> capability merge operation. A client should have queried what every
>> device/engine is capable of for the given uses, find the optimal set of
>> capabilities, and use it for allocating a buffer. At the moment these
>> parameters are given to KMS, they are expected to be good. If they
>> aren't, the client didn't do things right.
>>
>>
>> Rob Clark wrote:
>>
>> It does seem like, if possible, starting out with modifiers for now at
>>> the kernel interface would make life easier, vs trying to reinvent
>>> both kernel and userspace APIs at the same time.  Userspace APIs are
>>> easier to change or throw away.  Presumably by the time we get to the
>>> point of changing kernel uabi, we are already using, and pretty happy
>>> with, serialized liballoc data over the wire in userspace so it is
>>> only a matter of changing the kernel interface.
>>>
>>
>> I guess we can indeed start with modifiers for now, if that's what it
>> takes to get the allocator mechanisms rolling. However, it seems to me
>> that we won't be able to encode the same type of information included
>> in capability sets with modifiers in all cases. For instance, if we end
>> up encoding usage transition information in capability sets, how that
>> would translate to modifiers?
>>
>> I assume display doesn't really care about a lot of the data capability
>> sets may encode, but is it correct to think of modifiers as things only
>> display needs? If we are to treat modifiers as a first-class citizen, I
>> would expect to use them beyond that.
>>
>
> Right, this becomes a lot more interesting when modifiers or capability
> sets start getting used to share things from Vulkan<->Vulkan, for example.
> Of course, we don't need to change kernel ABIs for that, but wayland
> protocols, Vulkan extensions, etc. might need modification. Regardless, I
> agree with Miguel's sentiment.  Let's at least defer this debate a bit
> until we know more about what capability sets look like. If modifiers alone
> still seem sufficient, so be it.


Modifers aren't display only, but I suppose they are 2D color buffer only -
no mip maps, texture arrays, cube maps etc. But within that scope, they
should provide a mechanism for negotiating the optimal layout for a given
use case.

Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.


>
>
>
>> Kristian Kristensen wrote:
>>
>> I agree and let me elaborate a bit. The problem we're seeing isn't that we
>>> need more that 2^56 modifiers for a future GPU. The problem is that flags
>>> like USE_SCANOUT (which your allocator proposal essentially keeps) is
>>> inadequate. The available tiling and compression formats vary with which
>>> (in KMS terms) CRTC you want to use, which plane you're on whether you
>>> want
>>> rotation or no and how much you want to scale etc. It's not realistic to
>>> think that we could model this in a centralized allocator library that's
>>> detached from the display driver. To be fair, this is not a point about
>>> blobs vs modifiers, it's saying that the use flags don't belong in the
>>> allocator, they belong in the APIs that will be using the buffer - and
>>> not
>>> as literal use flags, but as a way to discover supported modifiers for a
>>> given use case.
>>>
>>
>> Why detached from the display driver? I don't see why there couldn't be
>> an allocator driver with access to display capabilities that can be
>> used in the negotiation step to find the optimal set of allocation
>> parameters.
>>
>
> In addition, speaking to some other portions of your response, most of the
> usage in the prototype is placeholder stuff for testing. USE_SCANNOUT is
> partially expanded to include orientation as well, which helps in some
> cases on our hardware.  If there's more complex stuff for other display
> hardware, it needs to be expanded further, or that HW is free to expose a
> vendor-specific usage, since usage is extensible.  It's easy to mirror in
> all the relevant usage flags from other APIs or engines too.  That's a
> rather small amount of duplication.
>

I understand that it's an incomplete example, but even so I don't think
this duplication is feasible. It's not a matter of how many use cases we
have to duplicate at this point in time, it's that all these APIs are live,
evolving APIs and keeping the allocator uptodate as various APIs grow new
corner cases doesn't seem practical. Further, it's not orthogonal or
composable - the allocator has to know about all producers and consumers
and if I add a new piece of hardware I have to extend the allocator to
understands its new use cases. With the modifier model, I just ask the new
driver which modifiers it supports for the use case I'm interested in and
feed those modifiers to the allocator.


>
> The important part is the logic that selects optimal usage.  I don't think
> it's possible to select optimal usage with the queries spread around all
> the APIs.  Vulkan isn't going to know about video encode usage.  In many
> situations it won't know about display usage.  It just knows optimal
> texture/render usage.  Therefore it can't optimize parameters for usage it
> doesn't know about it.  A centralized allocator can, especially when all
> the usage ends up delegated to a single device/GPU.  It will have all the
> same information available to it on the back end because it can access DRM
> devices, v4l devices, etc. to query their capabilities via allocator
> backends, but it can have more information available on the front end from
> the app, and a more complete solution returned from a driver that is able
> to parse and consider that additional information.
>

Vulkan isn't expected to know about video encode usage. You ask the video
codec about supported modifiers for encode and you ask Vulkan for supported
modifiers for, say optimal render usage. The allocator determines the
optimal lowest common denominator and allocates the buffer. Maybe that's
linear, or if you've designed both parts, maybe there's a simple shared
tiled format that the encoder can source from.


>
> Additionally, I again offer the goal of an optimal gralloc implementation
> built on top of the allocator mechanism.  I find it difficult to imagine
> building gralloc on top of Vulkan or EGL and DRM. Does such a solution seem
> feasible to you?  I've not researched this significantly myself, but Google
> Android engineers shared that concern when we had the initial discussions
> at XDC 2016.
>
>
> Kristian Kristensen wrote:
>>
>> I understand that you may have n knobs with a total of more than a total
>>> of
>>> 56 bits that configure your tiling/swizzling for color buffers. What I
>>> don't
>>> buy is that you need all those combinations when passing buffers around
>>> between codecs, cameras and display controllers. Even if you're sharing
>>> between the same 3D drivers in different processes, I expect just locking
>>> down, say, 64 different combinations (you can add more over time) and
>>> assigning each a modifier would be sufficient. I doubt you'd extract
>>> meaningful performance gains from going all the way to a blob.
>>>
>>
>> If someone has N knobs available, I don't understand why there
>> shouldn't be a mechanism that allows making use of them all, regardless
>> of performance numbers.
>>
>>
>> Daniel Vetter wrote:
>>
>> Yeah, that part was all clear. I'd want more details of what exact
>>> kind of metadata. fast-clear colors? tiling layouts? aux data for the
>>> compressor? hiz (or whatever you folks call it) tree?
>>>
>>> As you say, we've discussed massive amounts of different variants on
>>> this, and there's different answers for different questions. Consensus
>>> seems to be that bigger stuff (compression data, hiz, clear colors,
>>> ...) should be stored in aux planes, while the exact layout and what
>>> kind of aux planes you have are encoded in the modifier.
>>>
>>
>> My understanding is that capability sets may include all metadata you
>> mentioned. Besides tiling/swizzling layout and compression parameters,
>> things like zero-bandwidth-clears (I guess the same or similar to
>> fast-clear colors?), hiz-like data, device-local properties such as
>> caches, or locality information could/will be also included in a
>> capability set. We are even considering encoding some sort of usage
>> transition information in the capability set itself.
>>
>
> I think there's some nuance here.  The format of compression metadata
> would clearly be a capability set thing.  The compression data itself would
> indeed be in some auxiliary surface on most/all hardware.  Things like fast
> clears are harder to nail down because implementations seem more varied
> there.  It might be very awkward on some hardware to put the necessary
> meta-data in a DRM FB plane, while that might be the only reasonable way to
> accomplish it on other hardware.  I think we'll have to work through some
> corner cases across lots of hardware before this bottoms out.
>

For modifiers and liballocator as well, the meta data is copied by value
(and passed through IPC) and as such can't model shared mutable
information. That means, fast colors, compression aux buffers and such, has
to be in a share BO plane.

Kristian


>
> Thanks,
> -James
>
> Thanks,
>> Miguel.
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>

[-- Attachment #1.2: Type: text/html, Size: 17378 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-28 18:24         ` Miguel Angel Vico
  2018-01-03 14:53           ` [Mesa-dev] " Rob Clark
  2018-01-03 19:26           ` James Jones
@ 2018-01-08  9:35           ` Daniel Vetter
  2 siblings, 0 replies; 21+ messages in thread
From: Daniel Vetter @ 2018-01-08  9:35 UTC (permalink / raw)
  To: Miguel Angel Vico
  Cc: Rob Clark, Nicolai Hähnle, dri-devel, Jason Ekstrand,
	Kristian H. Kristensen, Ben Skeggs, Chad Versace, mesa-dev,
	Lyude Paul

Just wanted to clarify this one thing here, otherwise I think Rob/krh
covered it all.

On Thu, Dec 28, 2017 at 10:24:38AM -0800, Miguel Angel Vico wrote:
> Daniel Vetter wrote:
> > I think in the interim figuring out how to expose kms capabilities
> > better (and necessarily standardizing at least some of them which
> > matter at the compositor level, like size limits of framebuffers)
> > feels like the place to push the ecosystem forward. In some way
> > Miguel's proposal looks a bit backwards, since it adds the pitch
> > capabilities to addfb, but at addfb time you've allocated everything
> > already, so way too late to fix things up. With modifiers we've added
> > a very simple per-plane property to list which modifiers can be
> > combined with which pixel formats. Tiny start, but obviously very far
> > from all that we'll need.  
> 
> Not sure whether I might be misunderstanding your statement, but one of
> the allocator main features is negotiation of nearly optimal allocation
> parameters given a set of uses on different devices/engines by the
> capability merge operation. A client should have queried what every
> device/engine is capable of for the given uses, find the optimal set of
> capabilities, and use it for allocating a buffer. At the moment these
> parameters are given to KMS, they are expected to be good. If they
> aren't, the client didn't do things right.

Your example code has a new capability for PITCH_ALIGNMENT. That looks
wrong for addfb (which should only received the the computed intersection
of all requirements, not the requirements itself). And since that was the
only thing in your example code besides the bare boilerplate to wire it
all up it looks a bit confused.

Maybe we need to distinguish capabilities into constraints on properties
(like pitch alignment, or power-of-two pitch) and properties (like pitch)
themselves.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2017-12-21  8:05         ` Daniel Vetter
@ 2018-02-21  6:14           ` Chad Versace
  2018-02-21 18:26             ` [Mesa-dev] " Daniel Vetter
  2018-02-22  0:00             ` Alex Deucher
  0 siblings, 2 replies; 21+ messages in thread
From: Chad Versace @ 2018-02-21  6:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Clark, Miguel Angel Vico, Kristian Høgsberg, dri-devel,
	Jason Ekstrand, Kristian Kristensen, Ben Skeggs, mesa-dev,
	Lyude Paul, Nicolai Hähnle

On Thu 21 Dec 2017, Daniel Vetter wrote:
> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <hoegsberg@google.com> wrote:
>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <mvicomoya@nvidia.com> wrote:
>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <hoegsberg@gmail.com> wrote:
>>>> I'd like to see concrete examples of actual display controllers
>>>> supporting more format layouts than what can be specified with a 64
>>>> bit modifier.
>>>
>>> The main problem is our tiling and other metadata parameters can't
>>> generally fit in a modifier, so we find passing a blob of metadata a
>>> more suitable mechanism.
>>
>> I understand that you may have n knobs with a total of more than a total of
>> 56 bits that configure your tiling/swizzling for color buffers. What I don't
>> buy is that you need all those combinations when passing buffers around
>> between codecs, cameras and display controllers. Even if you're sharing
>> between the same 3D drivers in different processes, I expect just locking
>> down, say, 64 different combinations (you can add more over time) and
>> assigning each a modifier would be sufficient. I doubt you'd extract
>> meaningful performance gains from going all the way to a blob.

I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.

I summarized this opinion in VK_EXT_image_drm_format_modifier,
where I wrote an "introdution to modifiers" section. Here's an excerpt:

    One goal of modifiers in the Linux ecosystem is to enumerate for each
    vendor a reasonably sized set of tiling formats that are appropriate for
    images shared across processes, APIs, and/or devices, where each
    participating component may possibly be from different vendors.
    A non-goal is to enumerate all tiling formats supported by all vendors.
    Some tiling formats used internally by vendors are inappropriate for
    sharing; no modifiers should be assigned to such tiling formats.

> Tegra just redesigned it's modifier space from an ungodly amount of
> bits to just a few layouts. Not even just the ones in used, but simply
> limiting to the ones that make sense (there's dependencies apparently)
> Also note that the modifier alone doesn't need to describe the layout
> precisely, it only makes sense together with a specific pixel format
> and size. E.g. a bunch of the i915 layouts change layout depending
> upon bpp.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-21  6:14           ` Chad Versace
@ 2018-02-21 18:26             ` Daniel Vetter
  2018-02-21 23:23               ` Chad Versace
  2018-02-22  0:00             ` Alex Deucher
  1 sibling, 1 reply; 21+ messages in thread
From: Daniel Vetter @ 2018-02-21 18:26 UTC (permalink / raw)
  To: Chad Versace, Daniel Vetter, Kristian Kristensen, Rob Clark,
	Miguel Angel Vico, Kristian Høgsberg, dri-devel,
	Jason Ekstrand, Ben Skeggs, mesa-dev, Lyude Paul,
	Nicolai Hähnle

On Tue, Feb 20, 2018 at 10:14:47PM -0800, Chad Versace wrote:
> On Thu 21 Dec 2017, Daniel Vetter wrote:
> > On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <hoegsberg@google.com> wrote:
> >> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <mvicomoya@nvidia.com> wrote:
> >>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <hoegsberg@gmail.com> wrote:
> >>>> I'd like to see concrete examples of actual display controllers
> >>>> supporting more format layouts than what can be specified with a 64
> >>>> bit modifier.
> >>>
> >>> The main problem is our tiling and other metadata parameters can't
> >>> generally fit in a modifier, so we find passing a blob of metadata a
> >>> more suitable mechanism.
> >>
> >> I understand that you may have n knobs with a total of more than a total of
> >> 56 bits that configure your tiling/swizzling for color buffers. What I don't
> >> buy is that you need all those combinations when passing buffers around
> >> between codecs, cameras and display controllers. Even if you're sharing
> >> between the same 3D drivers in different processes, I expect just locking
> >> down, say, 64 different combinations (you can add more over time) and
> >> assigning each a modifier would be sufficient. I doubt you'd extract
> >> meaningful performance gains from going all the way to a blob.
> 
> I agree with Kristian above. In my opinion, choosing to encode in
> modifiers a precise description of every possible tiling/compression
> layout is not technically incorrect, but I believe it misses the point.
> The intention behind modifiers is not to exhaustively describe all
> possibilites.
> 
> I summarized this opinion in VK_EXT_image_drm_format_modifier,
> where I wrote an "introdution to modifiers" section. Here's an excerpt:
> 
>     One goal of modifiers in the Linux ecosystem is to enumerate for each
>     vendor a reasonably sized set of tiling formats that are appropriate for
>     images shared across processes, APIs, and/or devices, where each
>     participating component may possibly be from different vendors.
>     A non-goal is to enumerate all tiling formats supported by all vendors.
>     Some tiling formats used internally by vendors are inappropriate for
>     sharing; no modifiers should be assigned to such tiling formats.

fwiw (since the source of truth wrt modifiers is the kernel's uapi
header):

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

I'm happy to merge modifier #define additions for pretty much anything
where there's a need for sharing across devices/drivers/apis, explicitly
including stuff that's only relevant for userspace and which the kernel
nevers sees (in e.g. a kms addfb2 call). Trying to preemptively enumerate
everything that's possible doesn't seem like a wise idea. But even then we
can probably spare the oddball vendor prefix is a driver team really
insists that this is what they want, best using some code that makes the
case for them.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-21 18:26             ` [Mesa-dev] " Daniel Vetter
@ 2018-02-21 23:23               ` Chad Versace
  0 siblings, 0 replies; 21+ messages in thread
From: Chad Versace @ 2018-02-21 23:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Clark, Miguel Angel Vico, Kristian Høgsberg, dri-devel,
	Jason Ekstrand, Kristian Kristensen, Ben Skeggs, mesa-dev,
	Lyude Paul, Nicolai Hähnle

On Wed 21 Feb 2018, Daniel Vetter wrote:
> On Tue, Feb 20, 2018 at 10:14:47PM -0800, Chad Versace wrote:
> > On Thu 21 Dec 2017, Daniel Vetter wrote:
> > > On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <hoegsberg@google.com> wrote:
> > >> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <mvicomoya@nvidia.com> wrote:
> > >>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <hoegsberg@gmail.com> wrote:
> > >>>> I'd like to see concrete examples of actual display controllers
> > >>>> supporting more format layouts than what can be specified with a 64
> > >>>> bit modifier.
> > >>>
> > >>> The main problem is our tiling and other metadata parameters can't
> > >>> generally fit in a modifier, so we find passing a blob of metadata a
> > >>> more suitable mechanism.
> > >>
> > >> I understand that you may have n knobs with a total of more than a total of
> > >> 56 bits that configure your tiling/swizzling for color buffers. What I don't
> > >> buy is that you need all those combinations when passing buffers around
> > >> between codecs, cameras and display controllers. Even if you're sharing
> > >> between the same 3D drivers in different processes, I expect just locking
> > >> down, say, 64 different combinations (you can add more over time) and
> > >> assigning each a modifier would be sufficient. I doubt you'd extract
> > >> meaningful performance gains from going all the way to a blob.
> > 
> > I agree with Kristian above. In my opinion, choosing to encode in
> > modifiers a precise description of every possible tiling/compression
> > layout is not technically incorrect, but I believe it misses the point.
> > The intention behind modifiers is not to exhaustively describe all
> > possibilites.
> > 
> > I summarized this opinion in VK_EXT_image_drm_format_modifier,
> > where I wrote an "introdution to modifiers" section. Here's an excerpt:
> > 
> >     One goal of modifiers in the Linux ecosystem is to enumerate for each
> >     vendor a reasonably sized set of tiling formats that are appropriate for
> >     images shared across processes, APIs, and/or devices, where each
> >     participating component may possibly be from different vendors.
> >     A non-goal is to enumerate all tiling formats supported by all vendors.
> >     Some tiling formats used internally by vendors are inappropriate for
> >     sharing; no modifiers should be assigned to such tiling formats.
> 
> fwiw (since the source of truth wrt modifiers is the kernel's uapi
> header):
> 
> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Linux would eventually encounter big problems if the kernel and Vulkan
disagreed on the fundamental, unspoken Theory of Modifiers. So your
acked-by is definitely worth something here. Thanks for confirming.

> 
> I'm happy to merge modifier #define additions for pretty much anything
> where there's a need for sharing across devices/drivers/apis, explicitly
> including stuff that's only relevant for userspace and which the kernel
> nevers sees (in e.g. a kms addfb2 call). Trying to preemptively enumerate
> everything that's possible doesn't seem like a wise idea. But even then we
> can probably spare the oddball vendor prefix is a driver team really
> insists that this is what they want, best using some code that makes the
> case for them.

Yep. I believe Jason Ekstrand has tentative plans for such a modifier
that improves performance for interop in GL and Vulkan but the kernel
and Intel display hw wouldn't understand: a modifier for CCS_E images
that are fully compressed.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-21  6:14           ` Chad Versace
  2018-02-21 18:26             ` [Mesa-dev] " Daniel Vetter
@ 2018-02-22  0:00             ` Alex Deucher
  2018-02-22 18:04               ` Kristian Høgsberg
  1 sibling, 1 reply; 21+ messages in thread
From: Alex Deucher @ 2018-02-22  0:00 UTC (permalink / raw)
  To: Chad Versace, Daniel Vetter, Kristian Kristensen, Rob Clark,
	Miguel Angel Vico, Kristian Høgsberg, dri-devel,
	Jason Ekstrand, Ben Skeggs, mesa-dev, Lyude Paul,
	Nicolai Hähnle

On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary@chromium.org> wrote:
> On Thu 21 Dec 2017, Daniel Vetter wrote:
>> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <hoegsberg@google.com> wrote:
>>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <mvicomoya@nvidia.com> wrote:
>>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <hoegsberg@gmail.com> wrote:
>>>>> I'd like to see concrete examples of actual display controllers
>>>>> supporting more format layouts than what can be specified with a 64
>>>>> bit modifier.
>>>>
>>>> The main problem is our tiling and other metadata parameters can't
>>>> generally fit in a modifier, so we find passing a blob of metadata a
>>>> more suitable mechanism.
>>>
>>> I understand that you may have n knobs with a total of more than a total of
>>> 56 bits that configure your tiling/swizzling for color buffers. What I don't
>>> buy is that you need all those combinations when passing buffers around
>>> between codecs, cameras and display controllers. Even if you're sharing
>>> between the same 3D drivers in different processes, I expect just locking
>>> down, say, 64 different combinations (you can add more over time) and
>>> assigning each a modifier would be sufficient. I doubt you'd extract
>>> meaningful performance gains from going all the way to a blob.
>
> I agree with Kristian above. In my opinion, choosing to encode in
> modifiers a precise description of every possible tiling/compression
> layout is not technically incorrect, but I believe it misses the point.
> The intention behind modifiers is not to exhaustively describe all
> possibilites.
>
> I summarized this opinion in VK_EXT_image_drm_format_modifier,
> where I wrote an "introdution to modifiers" section. Here's an excerpt:
>
>     One goal of modifiers in the Linux ecosystem is to enumerate for each
>     vendor a reasonably sized set of tiling formats that are appropriate for
>     images shared across processes, APIs, and/or devices, where each
>     participating component may possibly be from different vendors.
>     A non-goal is to enumerate all tiling formats supported by all vendors.
>     Some tiling formats used internally by vendors are inappropriate for
>     sharing; no modifiers should be assigned to such tiling formats.

Where it gets tricky is how to select that subset?  Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself.  At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)

I guess we could do something like:
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
etc.

We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application perspective.  All of the parameters
affect the alignment requirements.

Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-22  0:00             ` Alex Deucher
@ 2018-02-22 18:04               ` Kristian Høgsberg
  2018-02-22 18:49                 ` Bas Nieuwenhuizen
  2018-02-22 19:21                 ` [Mesa-dev] " Eric Anholt
  0 siblings, 2 replies; 21+ messages in thread
From: Kristian Høgsberg @ 2018-02-22 18:04 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Rob Clark, Chad Versace, Nicolai Hähnle, dri-devel,
	Jason Ekstrand, Kristian Høgsberg, Ben Skeggs, mesa-dev,
	Lyude Paul, Miguel Angel Vico

On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:

> On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary@chromium.org>
wrote:
> > On Thu 21 Dec 2017, Daniel Vetter wrote:
> >> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
hoegsberg@google.com> wrote:
> >>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
mvicomoya@nvidia.com> wrote:
> >>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
hoegsberg@gmail.com> wrote:
> >>>>> I'd like to see concrete examples of actual display controllers
> >>>>> supporting more format layouts than what can be specified with a 64
> >>>>> bit modifier.
> >>>>
> >>>> The main problem is our tiling and other metadata parameters can't
> >>>> generally fit in a modifier, so we find passing a blob of metadata a
> >>>> more suitable mechanism.
> >>>
> >>> I understand that you may have n knobs with a total of more than a
total of
> >>> 56 bits that configure your tiling/swizzling for color buffers. What
I don't
> >>> buy is that you need all those combinations when passing buffers
around
> >>> between codecs, cameras and display controllers. Even if you're
sharing
> >>> between the same 3D drivers in different processes, I expect just
locking
> >>> down, say, 64 different combinations (you can add more over time) and
> >>> assigning each a modifier would be sufficient. I doubt you'd extract
> >>> meaningful performance gains from going all the way to a blob.
> >
> > I agree with Kristian above. In my opinion, choosing to encode in
> > modifiers a precise description of every possible tiling/compression
> > layout is not technically incorrect, but I believe it misses the point.
> > The intention behind modifiers is not to exhaustively describe all
> > possibilites.
> >
> > I summarized this opinion in VK_EXT_image_drm_format_modifier,
> > where I wrote an "introdution to modifiers" section. Here's an excerpt:
> >
> >     One goal of modifiers in the Linux ecosystem is to enumerate for
each
> >     vendor a reasonably sized set of tiling formats that are
appropriate for
> >     images shared across processes, APIs, and/or devices, where each
> >     participating component may possibly be from different vendors.
> >     A non-goal is to enumerate all tiling formats supported by all
vendors.
> >     Some tiling formats used internally by vendors are inappropriate for
> >     sharing; no modifiers should be assigned to such tiling formats.

> Where it gets tricky is how to select that subset?  Our tiling mode
> are defined more by the asic specific constraints than the tiling mode
> itself.  At a high level we have basically 3 tiling modes (out of 16
> possible) that would be the minimum we'd want to expose for gfx6-8.
> gfx9 uses a completely new scheme.
> 1. Linear (per asic stride requirements, not usable by many hw blocks)
> 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
> 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
> tile split (7 possible), sample split (4 possible), num banks (4
> possible), bank width (4 possible), bank height (4 possible), macro
> tile aspect (4 possible) all of which are asic config specific)

> I guess we could do something like:
> AMD_GFX6_LINEAR_ALIGNED_64B
> AMD_GFX6_LINEAR_ALIGNED_256B
> AMD_GFX6_LINEAR_ALIGNED_512B
> AMD_GFX6_1D_THIN_DISPLAY
> AMD_GFX6_1D_THIN_DEPTH
> AMD_GFX6_1D_THIN_ROTATED
> AMD_GFX6_1D_THIN_THIN
> AMD_GFX6_1D_THIN_THICK

AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> etc.

> We only probably need 40 bits to encode all of the tiling parameters
> so we could do family, plus tiling encoding that still seems unwieldy
> to deal with from an application perspective.  All of the parameters
> affect the alignment requirements.

We discussed this earlier in the thread, here's what I said:

Another point here is that the modifier doesn't need to encode all the
thing you have to communicate to the HW. For a given width, height, format,
compression type and maybe a few other high-level parameters, I'm skeptical
that the remaining tile parameters aren't just mechanically derivable using
a fixed table or formula. So instead of thinking of the modifiers as
something you can just memcpy into a state packet, it identifies a family
of configurations - enough information to deterministically derive the full
exact configuration. The formula may change, for example for different
hardware or if it's determined to not be optimal, and in that case, we can
use a new modifier to represent to new formula.

Kristian


> Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-22 18:04               ` Kristian Høgsberg
@ 2018-02-22 18:49                 ` Bas Nieuwenhuizen
  2018-02-22 21:16                   ` Alex Deucher
  2018-02-22 19:21                 ` [Mesa-dev] " Eric Anholt
  1 sibling, 1 reply; 21+ messages in thread
From: Bas Nieuwenhuizen @ 2018-02-22 18:49 UTC (permalink / raw)
  To: Kristian Høgsberg
  Cc: Rob Clark, Chad Versace, dri-devel, Jason Ekstrand,
	Kristian Høgsberg, Ben Skeggs, Miguel Angel Vico, mesa-dev,
	Lyude Paul, Nicolai Hähnle

On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg <hoegsberg@gmail.com> wrote:
> On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
>> On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary@chromium.org>
> wrote:
>> > On Thu 21 Dec 2017, Daniel Vetter wrote:
>> >> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
> hoegsberg@google.com> wrote:
>> >>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
> mvicomoya@nvidia.com> wrote:
>> >>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
> hoegsberg@gmail.com> wrote:
>> >>>>> I'd like to see concrete examples of actual display controllers
>> >>>>> supporting more format layouts than what can be specified with a 64
>> >>>>> bit modifier.
>> >>>>
>> >>>> The main problem is our tiling and other metadata parameters can't
>> >>>> generally fit in a modifier, so we find passing a blob of metadata a
>> >>>> more suitable mechanism.
>> >>>
>> >>> I understand that you may have n knobs with a total of more than a
> total of
>> >>> 56 bits that configure your tiling/swizzling for color buffers. What
> I don't
>> >>> buy is that you need all those combinations when passing buffers
> around
>> >>> between codecs, cameras and display controllers. Even if you're
> sharing
>> >>> between the same 3D drivers in different processes, I expect just
> locking
>> >>> down, say, 64 different combinations (you can add more over time) and
>> >>> assigning each a modifier would be sufficient. I doubt you'd extract
>> >>> meaningful performance gains from going all the way to a blob.
>> >
>> > I agree with Kristian above. In my opinion, choosing to encode in
>> > modifiers a precise description of every possible tiling/compression
>> > layout is not technically incorrect, but I believe it misses the point.
>> > The intention behind modifiers is not to exhaustively describe all
>> > possibilites.
>> >
>> > I summarized this opinion in VK_EXT_image_drm_format_modifier,
>> > where I wrote an "introdution to modifiers" section. Here's an excerpt:
>> >
>> >     One goal of modifiers in the Linux ecosystem is to enumerate for
> each
>> >     vendor a reasonably sized set of tiling formats that are
> appropriate for
>> >     images shared across processes, APIs, and/or devices, where each
>> >     participating component may possibly be from different vendors.
>> >     A non-goal is to enumerate all tiling formats supported by all
> vendors.
>> >     Some tiling formats used internally by vendors are inappropriate for
>> >     sharing; no modifiers should be assigned to such tiling formats.
>
>> Where it gets tricky is how to select that subset?  Our tiling mode
>> are defined more by the asic specific constraints than the tiling mode
>> itself.  At a high level we have basically 3 tiling modes (out of 16
>> possible) that would be the minimum we'd want to expose for gfx6-8.
>> gfx9 uses a completely new scheme.
>> 1. Linear (per asic stride requirements, not usable by many hw blocks)
>> 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
>> 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
>> tile split (7 possible), sample split (4 possible), num banks (4
>> possible), bank width (4 possible), bank height (4 possible), macro
>> tile aspect (4 possible) all of which are asic config specific)
>
>> I guess we could do something like:
>> AMD_GFX6_LINEAR_ALIGNED_64B
>> AMD_GFX6_LINEAR_ALIGNED_256B
>> AMD_GFX6_LINEAR_ALIGNED_512B
>> AMD_GFX6_1D_THIN_DISPLAY
>> AMD_GFX6_1D_THIN_DEPTH
>> AMD_GFX6_1D_THIN_ROTATED
>> AMD_GFX6_1D_THIN_THIN
>> AMD_GFX6_1D_THIN_THICK
>
> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>> etc.
>
>> We only probably need 40 bits to encode all of the tiling parameters
>> so we could do family, plus tiling encoding that still seems unwieldy
>> to deal with from an application perspective.  All of the parameters
>> affect the alignment requirements.
>
> We discussed this earlier in the thread, here's what I said:
>
> Another point here is that the modifier doesn't need to encode all the
> thing you have to communicate to the HW. For a given width, height, format,
> compression type and maybe a few other high-level parameters, I'm skeptical
> that the remaining tile parameters aren't just mechanically derivable using
> a fixed table or formula. So instead of thinking of the modifiers as
> something you can just memcpy into a state packet, it identifies a family
> of configurations - enough information to deterministically derive the full
> exact configuration. The formula may change, for example for different
> hardware or if it's determined to not be optimal, and in that case, we can
> use a new modifier to represent to new formula.

I think this is not so much about being able to dump it in a state
packet, but about sharing between different GPUs of AMD. We have
basically only a few interesting tiling modes if you look at a single
GPU, but checking if those are equal depends on the other bits  which
may or may not be different per chip for the same conceptual tiling
mode. We could just put a chip identifier in, but that would preclude
any sharing while I think we can do some.

- Bas
>
> Kristian
>
>
>> Alex
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-22 18:04               ` Kristian Høgsberg
  2018-02-22 18:49                 ` Bas Nieuwenhuizen
@ 2018-02-22 19:21                 ` Eric Anholt
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Anholt @ 2018-02-22 19:21 UTC (permalink / raw)
  To: Kristian Høgsberg, Alex Deucher
  Cc: Rob Clark, Chad Versace, dri-devel, Jason Ekstrand,
	Kristian Høgsberg, Ben Skeggs, Miguel Angel Vico, mesa-dev,
	Lyude Paul, Nicolai Hähnle


[-- Attachment #1.1: Type: text/plain, Size: 6355 bytes --]

Kristian Høgsberg <hoegsberg@gmail.com> writes:

> On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
>> On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary@chromium.org>
> wrote:
>> > On Thu 21 Dec 2017, Daniel Vetter wrote:
>> >> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
> hoegsberg@google.com> wrote:
>> >>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
> mvicomoya@nvidia.com> wrote:
>> >>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
> hoegsberg@gmail.com> wrote:
>> >>>>> I'd like to see concrete examples of actual display controllers
>> >>>>> supporting more format layouts than what can be specified with a 64
>> >>>>> bit modifier.
>> >>>>
>> >>>> The main problem is our tiling and other metadata parameters can't
>> >>>> generally fit in a modifier, so we find passing a blob of metadata a
>> >>>> more suitable mechanism.
>> >>>
>> >>> I understand that you may have n knobs with a total of more than a
> total of
>> >>> 56 bits that configure your tiling/swizzling for color buffers. What
> I don't
>> >>> buy is that you need all those combinations when passing buffers
> around
>> >>> between codecs, cameras and display controllers. Even if you're
> sharing
>> >>> between the same 3D drivers in different processes, I expect just
> locking
>> >>> down, say, 64 different combinations (you can add more over time) and
>> >>> assigning each a modifier would be sufficient. I doubt you'd extract
>> >>> meaningful performance gains from going all the way to a blob.
>> >
>> > I agree with Kristian above. In my opinion, choosing to encode in
>> > modifiers a precise description of every possible tiling/compression
>> > layout is not technically incorrect, but I believe it misses the point.
>> > The intention behind modifiers is not to exhaustively describe all
>> > possibilites.
>> >
>> > I summarized this opinion in VK_EXT_image_drm_format_modifier,
>> > where I wrote an "introdution to modifiers" section. Here's an excerpt:
>> >
>> >     One goal of modifiers in the Linux ecosystem is to enumerate for
> each
>> >     vendor a reasonably sized set of tiling formats that are
> appropriate for
>> >     images shared across processes, APIs, and/or devices, where each
>> >     participating component may possibly be from different vendors.
>> >     A non-goal is to enumerate all tiling formats supported by all
> vendors.
>> >     Some tiling formats used internally by vendors are inappropriate for
>> >     sharing; no modifiers should be assigned to such tiling formats.
>
>> Where it gets tricky is how to select that subset?  Our tiling mode
>> are defined more by the asic specific constraints than the tiling mode
>> itself.  At a high level we have basically 3 tiling modes (out of 16
>> possible) that would be the minimum we'd want to expose for gfx6-8.
>> gfx9 uses a completely new scheme.
>> 1. Linear (per asic stride requirements, not usable by many hw blocks)
>> 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
>> 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
>> tile split (7 possible), sample split (4 possible), num banks (4
>> possible), bank width (4 possible), bank height (4 possible), macro
>> tile aspect (4 possible) all of which are asic config specific)
>
>> I guess we could do something like:
>> AMD_GFX6_LINEAR_ALIGNED_64B
>> AMD_GFX6_LINEAR_ALIGNED_256B
>> AMD_GFX6_LINEAR_ALIGNED_512B
>> AMD_GFX6_1D_THIN_DISPLAY
>> AMD_GFX6_1D_THIN_DEPTH
>> AMD_GFX6_1D_THIN_ROTATED
>> AMD_GFX6_1D_THIN_THIN
>> AMD_GFX6_1D_THIN_THICK
>
> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>
> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>> etc.
>
>> We only probably need 40 bits to encode all of the tiling parameters
>> so we could do family, plus tiling encoding that still seems unwieldy
>> to deal with from an application perspective.  All of the parameters
>> affect the alignment requirements.
>
> We discussed this earlier in the thread, here's what I said:
>
> Another point here is that the modifier doesn't need to encode all the
> thing you have to communicate to the HW. For a given width, height, format,
> compression type and maybe a few other high-level parameters, I'm skeptical
> that the remaining tile parameters aren't just mechanically derivable using
> a fixed table or formula. So instead of thinking of the modifiers as
> something you can just memcpy into a state packet, it identifies a family
> of configurations - enough information to deterministically derive the full
> exact configuration. The formula may change, for example for different
> hardware or if it's determined to not be optimal, and in that case, we can
> use a new modifier to represent to new formula.

Agreed.  For Broadcom's VC5+ stuff, our tiling layout depends on the
number of SDRAM banks and bank size, but all users of buffers will know
what those are, so I'm not planning on including those in the modifier.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-22 18:49                 ` Bas Nieuwenhuizen
@ 2018-02-22 21:16                   ` Alex Deucher
  2018-02-27  6:10                     ` James Jones
  2018-03-07 17:23                     ` Daniel Vetter
  0 siblings, 2 replies; 21+ messages in thread
From: Alex Deucher @ 2018-02-22 21:16 UTC (permalink / raw)
  To: Bas Nieuwenhuizen
  Cc: Rob Clark, Chad Versace, Kristian Høgsberg, dri-devel,
	Jason Ekstrand, Kristian Høgsberg, Ben Skeggs,
	Miguel Angel Vico, mesa-dev, Lyude Paul, Nicolai Hähnle

On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
> On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg <hoegsberg@gmail.com> wrote:
>> On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>>
>>> On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary@chromium.org>
>> wrote:
>>> > On Thu 21 Dec 2017, Daniel Vetter wrote:
>>> >> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
>> hoegsberg@google.com> wrote:
>>> >>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
>> mvicomoya@nvidia.com> wrote:
>>> >>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
>> hoegsberg@gmail.com> wrote:
>>> >>>>> I'd like to see concrete examples of actual display controllers
>>> >>>>> supporting more format layouts than what can be specified with a 64
>>> >>>>> bit modifier.
>>> >>>>
>>> >>>> The main problem is our tiling and other metadata parameters can't
>>> >>>> generally fit in a modifier, so we find passing a blob of metadata a
>>> >>>> more suitable mechanism.
>>> >>>
>>> >>> I understand that you may have n knobs with a total of more than a
>> total of
>>> >>> 56 bits that configure your tiling/swizzling for color buffers. What
>> I don't
>>> >>> buy is that you need all those combinations when passing buffers
>> around
>>> >>> between codecs, cameras and display controllers. Even if you're
>> sharing
>>> >>> between the same 3D drivers in different processes, I expect just
>> locking
>>> >>> down, say, 64 different combinations (you can add more over time) and
>>> >>> assigning each a modifier would be sufficient. I doubt you'd extract
>>> >>> meaningful performance gains from going all the way to a blob.
>>> >
>>> > I agree with Kristian above. In my opinion, choosing to encode in
>>> > modifiers a precise description of every possible tiling/compression
>>> > layout is not technically incorrect, but I believe it misses the point.
>>> > The intention behind modifiers is not to exhaustively describe all
>>> > possibilites.
>>> >
>>> > I summarized this opinion in VK_EXT_image_drm_format_modifier,
>>> > where I wrote an "introdution to modifiers" section. Here's an excerpt:
>>> >
>>> >     One goal of modifiers in the Linux ecosystem is to enumerate for
>> each
>>> >     vendor a reasonably sized set of tiling formats that are
>> appropriate for
>>> >     images shared across processes, APIs, and/or devices, where each
>>> >     participating component may possibly be from different vendors.
>>> >     A non-goal is to enumerate all tiling formats supported by all
>> vendors.
>>> >     Some tiling formats used internally by vendors are inappropriate for
>>> >     sharing; no modifiers should be assigned to such tiling formats.
>>
>>> Where it gets tricky is how to select that subset?  Our tiling mode
>>> are defined more by the asic specific constraints than the tiling mode
>>> itself.  At a high level we have basically 3 tiling modes (out of 16
>>> possible) that would be the minimum we'd want to expose for gfx6-8.
>>> gfx9 uses a completely new scheme.
>>> 1. Linear (per asic stride requirements, not usable by many hw blocks)
>>> 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
>>> 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
>>> tile split (7 possible), sample split (4 possible), num banks (4
>>> possible), bank width (4 possible), bank height (4 possible), macro
>>> tile aspect (4 possible) all of which are asic config specific)
>>
>>> I guess we could do something like:
>>> AMD_GFX6_LINEAR_ALIGNED_64B
>>> AMD_GFX6_LINEAR_ALIGNED_256B
>>> AMD_GFX6_LINEAR_ALIGNED_512B
>>> AMD_GFX6_1D_THIN_DISPLAY
>>> AMD_GFX6_1D_THIN_DEPTH
>>> AMD_GFX6_1D_THIN_ROTATED
>>> AMD_GFX6_1D_THIN_THIN
>>> AMD_GFX6_1D_THIN_THICK
>>
>> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>
>> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>> etc.
>>
>>> We only probably need 40 bits to encode all of the tiling parameters
>>> so we could do family, plus tiling encoding that still seems unwieldy
>>> to deal with from an application perspective.  All of the parameters
>>> affect the alignment requirements.
>>
>> We discussed this earlier in the thread, here's what I said:
>>
>> Another point here is that the modifier doesn't need to encode all the
>> thing you have to communicate to the HW. For a given width, height, format,
>> compression type and maybe a few other high-level parameters, I'm skeptical
>> that the remaining tile parameters aren't just mechanically derivable using
>> a fixed table or formula. So instead of thinking of the modifiers as
>> something you can just memcpy into a state packet, it identifies a family
>> of configurations - enough information to deterministically derive the full
>> exact configuration. The formula may change, for example for different
>> hardware or if it's determined to not be optimal, and in that case, we can
>> use a new modifier to represent to new formula.
>
> I think this is not so much about being able to dump it in a state
> packet, but about sharing between different GPUs of AMD. We have
> basically only a few interesting tiling modes if you look at a single
> GPU, but checking if those are equal depends on the other bits  which
> may or may not be different per chip for the same conceptual tiling
> mode. We could just put a chip identifier in, but that would preclude
> any sharing while I think we can do some.

Right.  And the 2D ones, while they are the most complicated, are also
the most interesting from a performance perspective so ideally you'd
find a match on one of those.  If you don't expose the 2D modes,
there's not much point in supporting modifiers at all.

Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-22 21:16                   ` Alex Deucher
@ 2018-02-27  6:10                     ` James Jones
  2018-03-07 17:23                     ` Daniel Vetter
  1 sibling, 0 replies; 21+ messages in thread
From: James Jones @ 2018-02-27  6:10 UTC (permalink / raw)
  To: Alex Deucher, Bas Nieuwenhuizen
  Cc: Rob Clark, Chad Versace, Nicolai Hähnle,
	Kristian Høgsberg, dri-devel, Jason Ekstrand,
	Kristian Høgsberg, Ben Skeggs, mesa-dev, Lyude Paul,
	Miguel Angel Vico

On 02/22/2018 01:16 PM, Alex Deucher wrote:
> On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl> wrote:
>> On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg <hoegsberg@gmail.com> wrote:
>>> On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>>>
>>>> On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary@chromium.org>
>>> wrote:
>>>>> On Thu 21 Dec 2017, Daniel Vetter wrote:
>>>>>> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
>>> hoegsberg@google.com> wrote:
>>>>>>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
>>> mvicomoya@nvidia.com> wrote:
>>>>>>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
>>> hoegsberg@gmail.com> wrote:
>>>>>>>>> I'd like to see concrete examples of actual display controllers
>>>>>>>>> supporting more format layouts than what can be specified with a 64
>>>>>>>>> bit modifier.
>>>>>>>>
>>>>>>>> The main problem is our tiling and other metadata parameters can't
>>>>>>>> generally fit in a modifier, so we find passing a blob of metadata a
>>>>>>>> more suitable mechanism.
>>>>>>>
>>>>>>> I understand that you may have n knobs with a total of more than a
>>> total of
>>>>>>> 56 bits that configure your tiling/swizzling for color buffers. What
>>> I don't
>>>>>>> buy is that you need all those combinations when passing buffers
>>> around
>>>>>>> between codecs, cameras and display controllers. Even if you're
>>> sharing
>>>>>>> between the same 3D drivers in different processes, I expect just
>>> locking
>>>>>>> down, say, 64 different combinations (you can add more over time) and
>>>>>>> assigning each a modifier would be sufficient. I doubt you'd extract
>>>>>>> meaningful performance gains from going all the way to a blob.
>>>>>
>>>>> I agree with Kristian above. In my opinion, choosing to encode in
>>>>> modifiers a precise description of every possible tiling/compression
>>>>> layout is not technically incorrect, but I believe it misses the point.
>>>>> The intention behind modifiers is not to exhaustively describe all
>>>>> possibilites.
>>>>>
>>>>> I summarized this opinion in VK_EXT_image_drm_format_modifier,
>>>>> where I wrote an "introdution to modifiers" section. Here's an excerpt:
>>>>>
>>>>>      One goal of modifiers in the Linux ecosystem is to enumerate for
>>> each
>>>>>      vendor a reasonably sized set of tiling formats that are
>>> appropriate for
>>>>>      images shared across processes, APIs, and/or devices, where each
>>>>>      participating component may possibly be from different vendors.
>>>>>      A non-goal is to enumerate all tiling formats supported by all
>>> vendors.
>>>>>      Some tiling formats used internally by vendors are inappropriate for
>>>>>      sharing; no modifiers should be assigned to such tiling formats.
>>>
>>>> Where it gets tricky is how to select that subset?  Our tiling mode
>>>> are defined more by the asic specific constraints than the tiling mode
>>>> itself.  At a high level we have basically 3 tiling modes (out of 16
>>>> possible) that would be the minimum we'd want to expose for gfx6-8.
>>>> gfx9 uses a completely new scheme.
>>>> 1. Linear (per asic stride requirements, not usable by many hw blocks)
>>>> 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
>>>> 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
>>>> tile split (7 possible), sample split (4 possible), num banks (4
>>>> possible), bank width (4 possible), bank height (4 possible), macro
>>>> tile aspect (4 possible) all of which are asic config specific)
>>>
>>>> I guess we could do something like:
>>>> AMD_GFX6_LINEAR_ALIGNED_64B
>>>> AMD_GFX6_LINEAR_ALIGNED_256B
>>>> AMD_GFX6_LINEAR_ALIGNED_512B
>>>> AMD_GFX6_1D_THIN_DISPLAY
>>>> AMD_GFX6_1D_THIN_DEPTH
>>>> AMD_GFX6_1D_THIN_ROTATED
>>>> AMD_GFX6_1D_THIN_THIN
>>>> AMD_GFX6_1D_THIN_THICK
>>>
>>> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>> etc.
>>>
>>>> We only probably need 40 bits to encode all of the tiling parameters
>>>> so we could do family, plus tiling encoding that still seems unwieldy
>>>> to deal with from an application perspective.  All of the parameters
>>>> affect the alignment requirements.
>>>
>>> We discussed this earlier in the thread, here's what I said:
>>>
>>> Another point here is that the modifier doesn't need to encode all the
>>> thing you have to communicate to the HW. For a given width, height, format,
>>> compression type and maybe a few other high-level parameters, I'm skeptical
>>> that the remaining tile parameters aren't just mechanically derivable using
>>> a fixed table or formula. So instead of thinking of the modifiers as
>>> something you can just memcpy into a state packet, it identifies a family
>>> of configurations - enough information to deterministically derive the full
>>> exact configuration. The formula may change, for example for different
>>> hardware or if it's determined to not be optimal, and in that case, we can
>>> use a new modifier to represent to new formula.
>>
>> I think this is not so much about being able to dump it in a state
>> packet, but about sharing between different GPUs of AMD. We have
>> basically only a few interesting tiling modes if you look at a single
>> GPU, but checking if those are equal depends on the other bits  which
>> may or may not be different per chip for the same conceptual tiling
>> mode. We could just put a chip identifier in, but that would preclude
>> any sharing while I think we can do some.
> 
> Right.  And the 2D ones, while they are the most complicated, are also
> the most interesting from a performance perspective so ideally you'd
> find a match on one of those.  If you don't expose the 2D modes,
> there's not much point in supporting modifiers at all.

This is essentially the problem I keep running into when trying to work 
up something based on the suggestions here as well.  Yes, for a given 
build of our driver on a single device, we can re-derive exactly the 
same tiling parameters given a few manageable constraints.  That was the 
essence of the design of the Vulkan external objects framework, and it 
comes with all the limitations I'm trying to avoid by introducing the 
more complex allocator framework:

-We want to share across GPUs.

-We potentially want to share across non-version-locked driver 
components, even potentially between Nouveau-driven/Tegra-DRM driven 
GPUs and NVIDIA proprietary driven GPUs.  There's no way we can assure 
the drivers use the same algorithm there.

Taking it further than even I would like to, in a discussion over DRM 
format modifier usage in Vulkan, it was recently proposed that DRM 
format modifiers be used to serialize data in a pre-tiled format.  I 
personally don't think DRM format modifiers should be used for this at 
all, but something like extended allocator meta-data might be appropriate.

At this point I've heard engineers from Intel, AMD, and of course myself 
at NVIDIA saying that while DRM format modifiers solve many more cases 
than assuming pitch-linear or doing magic to pass around metadata, they 
don't solve all the cases necessary to make optimal use of any of our HW 
in at least some interesting cases.  Hence it seems reasonable to 
continue to improve the design of these mechanisms.

Responding to some earlier points that fell off my mail retention limit 
while I was on paternity leave:

> I understand that it's an incomplete example, but even so I don't think
> this duplication is feasible. It's not a matter of how many use cases we
> have to duplicate at this point in time, it's that all these APIs are live,
> evolving APIs and keeping the allocator uptodate as various APIs grow new
> corner cases doesn't seem practical. Further, it's not orthogonal or
> composable - the allocator has to know about all producers and consumers
> and if I add a new piece of hardware I have to extend the allocator to
> understands its new use cases. With the modifier model, I just ask the new
> driver which modifiers it supports for the use case I'm interested in and
> feed those modifiers to the allocator.

There are currently 3 complete modern low-level 3D graphics APIs along 
with some slightly longer in the tooth higher-level alternatives being 
actively maintained at more or less the same feature level, countless 
video decode/encode APIs with more or less equivalent functionality, and 
more mode setting APIs than anyone wants.  If that much total duplicated 
effort is possible, it seems feasible to maintain a list of layouts and 
related properties, most of which will see some re-use between all these 
APIs.

Further, the central library doesn't need to be burdened by all of these 
use cases unless they become cross-vendor.  The usage itself is 
vendor-extensible, so if AMD had wanted to add a bunch of Mantle-only 
usage bits, they could have done so without cluttering the shared 
library code or namespace.

> Vulkan isn't expected to know about video encode usage. You ask the video
> codec about supported modifiers for encode and you ask Vulkan for supported
> modifiers for, say optimal render usage. The allocator determines the
> optimal lowest common denominator and allocates the buffer. Maybe that's
> linear, or if you've designed both parts, maybe there's a simple shared
> tiled format that the encoder can source from.

It was determined early on in attempts to design this mechanism that 
such LCD intersection doesn't produce the optimal result.  Only 
considering the usage holistically can produce optimal layouts.

> For modifiers and liballocator as well, the meta data is copied by value
> (and passed through IPC) and as such can't model shared mutable
> information. That means, fast colors, compression aux buffers and such, has
> to be in a share BO plane.

Again, this is making large design assumptions.  Fast clear color data, 
for example would be a very reasonable thing to include in static 
metadata given our driver+HW architecture.

Thanks,
-James

> Alex
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
  2018-02-22 21:16                   ` Alex Deucher
  2018-02-27  6:10                     ` James Jones
@ 2018-03-07 17:23                     ` Daniel Vetter
  1 sibling, 0 replies; 21+ messages in thread
From: Daniel Vetter @ 2018-03-07 17:23 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Rob Clark, Chad Versace, Nicolai H??hnle, Kristian H??gsberg,
	dri-devel, Jason Ekstrand, Kristian H??gsberg, Ben Skeggs,
	mesa-dev, Lyude Paul, Miguel Angel Vico

On Thu, Feb 22, 2018 at 04:16:52PM -0500, Alex Deucher wrote:
> On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl> wrote:
> > On Thu, Feb 22, 2018 at 7:04 PM, Kristian H??gsberg <hoegsberg@gmail.com> wrote:
> >> On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> >>
> >>> On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary@chromium.org>
> >> wrote:
> >>> > On Thu 21 Dec 2017, Daniel Vetter wrote:
> >>> >> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
> >> hoegsberg@google.com> wrote:
> >>> >>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
> >> mvicomoya@nvidia.com> wrote:
> >>> >>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian H??gsberg <
> >> hoegsberg@gmail.com> wrote:
> >>> >>>>> I'd like to see concrete examples of actual display controllers
> >>> >>>>> supporting more format layouts than what can be specified with a 64
> >>> >>>>> bit modifier.
> >>> >>>>
> >>> >>>> The main problem is our tiling and other metadata parameters can't
> >>> >>>> generally fit in a modifier, so we find passing a blob of metadata a
> >>> >>>> more suitable mechanism.
> >>> >>>
> >>> >>> I understand that you may have n knobs with a total of more than a
> >> total of
> >>> >>> 56 bits that configure your tiling/swizzling for color buffers. What
> >> I don't
> >>> >>> buy is that you need all those combinations when passing buffers
> >> around
> >>> >>> between codecs, cameras and display controllers. Even if you're
> >> sharing
> >>> >>> between the same 3D drivers in different processes, I expect just
> >> locking
> >>> >>> down, say, 64 different combinations (you can add more over time) and
> >>> >>> assigning each a modifier would be sufficient. I doubt you'd extract
> >>> >>> meaningful performance gains from going all the way to a blob.
> >>> >
> >>> > I agree with Kristian above. In my opinion, choosing to encode in
> >>> > modifiers a precise description of every possible tiling/compression
> >>> > layout is not technically incorrect, but I believe it misses the point.
> >>> > The intention behind modifiers is not to exhaustively describe all
> >>> > possibilites.
> >>> >
> >>> > I summarized this opinion in VK_EXT_image_drm_format_modifier,
> >>> > where I wrote an "introdution to modifiers" section. Here's an excerpt:
> >>> >
> >>> >     One goal of modifiers in the Linux ecosystem is to enumerate for
> >> each
> >>> >     vendor a reasonably sized set of tiling formats that are
> >> appropriate for
> >>> >     images shared across processes, APIs, and/or devices, where each
> >>> >     participating component may possibly be from different vendors.
> >>> >     A non-goal is to enumerate all tiling formats supported by all
> >> vendors.
> >>> >     Some tiling formats used internally by vendors are inappropriate for
> >>> >     sharing; no modifiers should be assigned to such tiling formats.
> >>
> >>> Where it gets tricky is how to select that subset?  Our tiling mode
> >>> are defined more by the asic specific constraints than the tiling mode
> >>> itself.  At a high level we have basically 3 tiling modes (out of 16
> >>> possible) that would be the minimum we'd want to expose for gfx6-8.
> >>> gfx9 uses a completely new scheme.
> >>> 1. Linear (per asic stride requirements, not usable by many hw blocks)
> >>> 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
> >>> 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
> >>> tile split (7 possible), sample split (4 possible), num banks (4
> >>> possible), bank width (4 possible), bank height (4 possible), macro
> >>> tile aspect (4 possible) all of which are asic config specific)
> >>
> >>> I guess we could do something like:
> >>> AMD_GFX6_LINEAR_ALIGNED_64B
> >>> AMD_GFX6_LINEAR_ALIGNED_256B
> >>> AMD_GFX6_LINEAR_ALIGNED_512B
> >>> AMD_GFX6_1D_THIN_DISPLAY
> >>> AMD_GFX6_1D_THIN_DEPTH
> >>> AMD_GFX6_1D_THIN_ROTATED
> >>> AMD_GFX6_1D_THIN_THIN
> >>> AMD_GFX6_1D_THIN_THICK
> >>
> >> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>> etc.
> >>
> >>> We only probably need 40 bits to encode all of the tiling parameters
> >>> so we could do family, plus tiling encoding that still seems unwieldy
> >>> to deal with from an application perspective.  All of the parameters
> >>> affect the alignment requirements.
> >>
> >> We discussed this earlier in the thread, here's what I said:
> >>
> >> Another point here is that the modifier doesn't need to encode all the
> >> thing you have to communicate to the HW. For a given width, height, format,
> >> compression type and maybe a few other high-level parameters, I'm skeptical
> >> that the remaining tile parameters aren't just mechanically derivable using
> >> a fixed table or formula. So instead of thinking of the modifiers as
> >> something you can just memcpy into a state packet, it identifies a family
> >> of configurations - enough information to deterministically derive the full
> >> exact configuration. The formula may change, for example for different
> >> hardware or if it's determined to not be optimal, and in that case, we can
> >> use a new modifier to represent to new formula.
> >
> > I think this is not so much about being able to dump it in a state
> > packet, but about sharing between different GPUs of AMD. We have
> > basically only a few interesting tiling modes if you look at a single
> > GPU, but checking if those are equal depends on the other bits  which
> > may or may not be different per chip for the same conceptual tiling
> > mode. We could just put a chip identifier in, but that would preclude
> > any sharing while I think we can do some.
> 
> Right.  And the 2D ones, while they are the most complicated, are also
> the most interesting from a performance perspective so ideally you'd
> find a match on one of those.  If you don't expose the 2D modes,
> there's not much point in supporting modifiers at all.

1. Make sure you have a test farm covering all your use cases and hw.

2. Create a struct that encodes everything. Make it a few kb big if it has
to be, whatever it takes.

3. Do a little library that contains a huge table mapping modifiers to
these structs, and one function that returns you the unique modifier for
the given tiling layout description struct. We can have that in the kernel
sources, or just delegate the entire AMD modifier block to some userspace
library you're managing (with just the few modifiers the kernel needs in
the uapi/drm_fourcc.h header). If the lib doesn't find the modifier, make
it crash with a nice loud backtrace.

4. Add modifiers to that lib until you stop failing on the test farm.

5 optional: Make the lib faster with hashing/compressing/whatever if it
turns out to be a bottleneck somewhere. Since you'll only ever need it on
import/export, add a small cache with the relevant few entries for the
device instance at hand and I don't expect this will be a problem, ever.

I'm pretty sure you'll finish step 4 before you run out of modifiers. If
you don't, then we suck it up, admit sheepishly that modifiers turned out
to be a stupid idea and rev the kernel's uapi. We know how to do that, but
I also don't want to rev uapi just for fun.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-03-07 17:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20171220085151.6327051e@nvidia.com>
2017-12-20 19:51 ` [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces Daniel Vetter
2017-12-20 19:54   ` Kristian Høgsberg
2017-12-20 20:41     ` Miguel Angel Vico
2017-12-20 23:22       ` [Mesa-dev] " Kristian Kristensen
2017-12-21  1:05         ` Ilia Mirkin
2017-12-21  8:05         ` Daniel Vetter
2018-02-21  6:14           ` Chad Versace
2018-02-21 18:26             ` [Mesa-dev] " Daniel Vetter
2018-02-21 23:23               ` Chad Versace
2018-02-22  0:00             ` Alex Deucher
2018-02-22 18:04               ` Kristian Høgsberg
2018-02-22 18:49                 ` Bas Nieuwenhuizen
2018-02-22 21:16                   ` Alex Deucher
2018-02-27  6:10                     ` James Jones
2018-03-07 17:23                     ` Daniel Vetter
2018-02-22 19:21                 ` [Mesa-dev] " Eric Anholt
     [not found] ` <CAPj87rOmGsN+HZEk1G=gFx_uPyipzEURB7=bfqOxxmLDtWwPgw@mail.gmail.com>
     [not found]   ` <b1d78126-fb83-d3c0-290f-8a5406ab1c79@nvidia.com>
     [not found]     ` <CAKMK7uHGBhMge8ayqcJUXysesVL8yc1dNk3rdH6C_N2DpO2OyQ@mail.gmail.com>
     [not found]       ` <CAF6AEGtNsBwhJAiUGuUQFdKccurSzD-HnpHH-JcHSx0RUCnZGA@mail.gmail.com>
2017-12-28 18:24         ` Miguel Angel Vico
2018-01-03 14:53           ` [Mesa-dev] " Rob Clark
2018-01-03 19:26           ` James Jones
2018-01-03 20:36             ` Kristian Kristensen
2018-01-08  9:35           ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.