All of lore.kernel.org
 help / color / mirror / Atom feed
* dma-resv ongoing discussion
@ 2021-05-24  2:03 Dave Airlie
  2021-05-25 13:18 ` Christian König
  0 siblings, 1 reply; 2+ messages in thread
From: Dave Airlie @ 2021-05-24  2:03 UTC (permalink / raw)
  To: dri-devel, Christian König, Daniel Vetter, Jason Ekstrand

I'd like to try and summarise where I feel we are all at with respect
to the dma-buf discussions. I think I've gotten a fairly good idea of
how things stand but I'm not sure we are really getting to the how to
move things forward stage, where is probably when I need to step in.
Thanks for keeping this as respectful as it has been I understand it
can be difficult. I also think we are starting to find we moved the
knob on driver development happening in company siloes too far with
acceleration features and hopefully with this and TTM work etc we can
start to push back to upstream first designs.

I think Jason[1] summed up my feelings on this the best. We have a
dma-buf inter-driver contract that has a design issue. We didn't fix
that initially, now we have amdgpu as the outlier in a world where
everyone else agreed to the contract.

a) Christian wants to try and move forward with fixing the world of
dma-buf design across all drivers, but hasn't come up with a plan for
doing so apart from amdgpu/i915. I think one strength Daniel has here
is that he's good at coming up with plans that change the ecosystem.
I'd really like to see some concrete effort to work out how much work
fixing this across the ecosystem is and whether it is possible. I
expect Daniel's big huge monster commit message summary of the current
drivers is a great place to start for this. That is if we can agree
dma-buf is broken and what dma-buf should look like tomorrow.

b) Daniel is coming from the side of let's bring amdgpu into the fold
first, then if the problem exists we can move everything forward
together. He intends on pointing out how alone amdgpu is here, and
wants to try and create a uapi that at least mitigates the biggest
problems with moving amdgpu to the common model first. I'd like to
know if this is at least a possibility as an alternate route. I
understand AMD have some goals to reach here but I think we've dug a
massive hole here and paying off the tech debt is going to have to
delay those goals if we are to keep upstream sane.

I'm slowly paging all of the technical details as I go, I'd like to
see more thought around Daniel's idea of fixing the amdgpu oversync
with TLB flushing, as it really doesn't make much sense to be that TLB
flushing on process teardown is going to stall out other processes
using the shared buffer, that it should only stall out moving the
pages. If that then allows aligning amdgpu for now and we can work out
how to fix (a) then that would rock.

Please correct me where I'm wrong here and definitely if I've
misrepresented anyone's positions.

Dave.


[1] https://lore.kernel.org/dri-devel/a1925038-5c3c-0193-1870-27488caa2577@gmail.com/T/#md800f00476ca1869a81b02a28cb2fabc1028c6be

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: dma-resv ongoing discussion
  2021-05-24  2:03 dma-resv ongoing discussion Dave Airlie
@ 2021-05-25 13:18 ` Christian König
  0 siblings, 0 replies; 2+ messages in thread
From: Christian König @ 2021-05-25 13:18 UTC (permalink / raw)
  To: Dave Airlie, dri-devel, Daniel Vetter, Jason Ekstrand

Hi Dave and of course everybody else,

Am 24.05.21 um 04:03 schrieb Dave Airlie:
> I'd like to try and summarise where I feel we are all at with respect
> to the dma-buf discussions. I think I've gotten a fairly good idea of
> how things stand but I'm not sure we are really getting to the how to
> move things forward stage, where is probably when I need to step in.
> Thanks for keeping this as respectful as it has been I understand it
> can be difficult. I also think we are starting to find we moved the
> knob on driver development happening in company siloes too far with
> acceleration features and hopefully with this and TTM work etc we can
> start to push back to upstream first designs.
>
> I think Jason[1] summed up my feelings on this the best. We have a
> dma-buf inter-driver contract that has a design issue. We didn't fix
> that initially, now we have amdgpu as the outlier in a world where
> everyone else agreed to the contract.
>
> a) Christian wants to try and move forward with fixing the world of
> dma-buf design across all drivers, but hasn't come up with a plan for
> doing so apart from amdgpu/i915. I think one strength Daniel has here
> is that he's good at coming up with plans that change the ecosystem.
> I'd really like to see some concrete effort to work out how much work
> fixing this across the ecosystem is and whether it is possible. I
> expect Daniel's big huge monster commit message summary of the current
> drivers is a great place to start for this. That is if we can agree
> dma-buf is broken and what dma-buf should look like tomorrow.

Well to clarify I don't want to move forward to implement new features, 
but rather to fix existing shortcomings.

 From my point of view the main purpose of the dma_resv object is to 
provide a container for dma_fence objects for different use cases.

Those use cases are then.
1. Resource management.
2. Implicit synchronization.
3. Information about current operations.

Now I think I can summarize the problem I'm seeing in that the focus of 
the design is to much towards towards a single use case here.

For example, for resource management alone I need to be able to add any 
fence at any time to the resv object without any restriction.

> b) Daniel is coming from the side of let's bring amdgpu into the fold
> first, then if the problem exists we can move everything forward
> together. He intends on pointing out how alone amdgpu is here, and
> wants to try and create a uapi that at least mitigates the biggest
> problems with moving amdgpu to the common model first. I'd like to
> know if this is at least a possibility as an alternate route. I
> understand AMD have some goals to reach here but I think we've dug a
> massive hole here and paying off the tech debt is going to have to
> delay those goals if we are to keep upstream sane.

I don't think we can do this so easily without breaking uAPI.

Userspace in the form of both RADV as well as AMDVLK depend on that 
behavior and we still have the original video decode use case this was 
invented for.

> I'm slowly paging all of the technical details as I go, I'd like to
> see more thought around Daniel's idea of fixing the amdgpu oversync
> with TLB flushing, as it really doesn't make much sense to be that TLB
> flushing on process teardown is going to stall out other processes
> using the shared buffer, that it should only stall out moving the
> pages. If that then allows aligning amdgpu for now and we can work out
> how to fix (a) then that would rock.

Well this is exactly what I've been trying to do by adding those flags 
to the shared fences, but Daniel already convinced me that this is to 
invasive as a first step.

And while this over synchronization is annoying it's already there for a 
very long time and only affects the case when the BO is shared between 
devices.

So for the moment I'm pondering on the question what would be the 
absolutely minimum change necessary to get amdgpu to use the exclusive 
fence in the same way other drivers do.

And I think I can summarize this into two things:
1. We make it possible to add shared fences which are not synchronized 
to the explicit fence.
2. We make it possible to replace the explicit fence without removing 
all the shared fences.

With that in place I'm able to change amdgpu so that we can fill in the 
exclusive fence during CS with chain nodes and keep the synchronization 
model for existing amdgpu uAPI the same.

Regards,
Christian.

> Please correct me where I'm wrong here and definitely if I've
> misrepresented anyone's positions.
>
> Dave.
>
>
> [1] https://lore.kernel.org/dri-devel/a1925038-5c3c-0193-1870-27488caa2577@gmail.com/T/#md800f00476ca1869a81b02a28cb2fabc1028c6be


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-05-25 13:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-24  2:03 dma-resv ongoing discussion Dave Airlie
2021-05-25 13:18 ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.