All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Threaded submission & semaphore sharing
@ 2019-08-02  6:10 Koenig, Christian
  2019-08-02  6:27 ` Lionel Landwerlin
  0 siblings, 1 reply; 14+ messages in thread
From: Koenig, Christian @ 2019-08-02  6:10 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 7697 bytes --]



Am 02.08.2019 07:38 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com>:
On 02/08/2019 08:21, Koenig, Christian wrote:


Am 02.08.2019 07:17 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>:
On 02/08/2019 08:08, Koenig, Christian wrote:
Hi Lionel,

Well that looks more like your test case is buggy.

According to the code the ctx1 queue always waits for sem1 and ctx2 queue always waits for sem2.


That's supposed to be the same underlying syncobj because it's exported from one VkDevice as opaque FD from sem1 and imported into sem2.

Well than that's still buggy and won't synchronize at all.

When ctx1 waits for a semaphore and then signals the same semaphore there is no guarantee that ctx2 will run in between jobs.

It's perfectly valid in this case to first run all jobs from ctx1 and then all jobs from ctx2.


That's not really how I see the semaphores working.

The spec describe VkSemaphore as an interface to an internal payload opaque to the application.


When ctx1 waits on the semaphore, it waits on the payload put there by the previous iteration.

And who says that it's not waiting for it's own previous payload?

See if the payload is a counter this won't work either. Keep in mind that this has the semantic of a semaphore. Whoever grabs the semaphore first wins and can run, everybody else has to wait.


Then it proceeds to signal it by replacing the internal payload.

That's an implementation detail of our sync objects, but I don't think that this behavior is part of the Vulkan specification.

Regards,
Christian.


ctx2 then waits on that and replaces the payload again with the new internal synchronization object.


The internal payload is a dma fence in our case and signaling just replaces a dma fence by another or puts one where there was none before.

So we should have created a dependecy link between all the submissions and then should be executed in the order of QueueSubmit() calls.


-Lionel


It only prevents running both at the same time and as far as I can see that still works even with threaded submission.

You need at least two semaphores for a tandem submission.

Regards,
Christian.


This way there can't be any Synchronisation between the two.

Regards,
Christian.

Am 02.08.2019 06:55 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>:
Hey Christian,

The problem boils down to the fact that we don't immediately create dma fences when calling vkQueueSubmit().
This is delayed to a thread.

From a single application thread, you can QueueSubmit() to 2 queues from 2 different devices.
Each QueueSubmit to one queue has a dependency on the previous QueueSubmit on the other queue through an exported/imported semaphore.

From the API point of view the state of the semaphore should be changed after each QueueSubmit().
The problem is that it's not because of the thread and because you might have those 2 submission threads tied to different VkDevice/VkInstance or even different applications (synchronizing themselves outside the vulkan API).

Hope that makes sense.
It's not really easy to explain by mail, the best explanation is probably reading the test : https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788

Like David mentioned you're not running into that issue right now, because you only dispatch to the thread under specific conditions.
But I could build a case to force that and likely run into the same issue.

-Lionel

On 02/08/2019 07:33, Koenig, Christian wrote:
Hi Lionel,

Well could you describe once more what the problem is?

Cause I don't fully understand why a rather normal tandem submission with two semaphores should fail in any way.

Regards,
Christian.

Am 02.08.2019 06:28 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>:
There aren't CTS tests covering the issue I was mentioning.
But we could add them.

I don't have all the details regarding your implementation but even with
the "semaphore thread", I could see it running into the same issues.
What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?

For example with queueA & queueB from 2 different VkDevice :
     vkQueueSubmit(queueA, signal semA);
     vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with
timelineSemB triggering a wait before signal.
     vkQueueSubmit(queueB, signal semA);


-Lionel

On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
> Hi Lionel,
>
> By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue.
> By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
>
> -David
>
> -----Original Message-----
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>
> Sent: Friday, August 2, 2019 4:52 AM
> To: dri-devel <dri-devel@lists.freedesktop.org><mailto:dri-devel@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; Jason Ekstrand <jason@jlekstrand.net><mailto:jason@jlekstrand.net>
> Subject: Threaded submission & semaphore sharing
>
> Hi Christian, David,
>
> Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
>
> Essentially we're failing a test in crucible :
> func.sync.semaphore-fd.opaque-fd
> This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue.
> Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other.
> It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
>
> The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost.
> Because we have 2 devices and they both have their own submission thread.
>
> Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload.
> This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
>
> The scenario would look like this :
>       - vkQueueSubmit(queueA, signal on semA);
>           - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace.
>           - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit().
>       - vkQueueSubmit(queueB, wait on semA);
>           - in the caller thread, this would read the syncobj additional
> u64 payload
>           - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
>
> Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj.
> It would need an additional ioctl to read & increment the value.
>
> What do you think?
>
> -Lionel








[-- Attachment #1.2: Type: text/html, Size: 11909 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  6:10 Threaded submission & semaphore sharing Koenig, Christian
@ 2019-08-02  6:27 ` Lionel Landwerlin
  2019-08-02  9:11   ` zhoucm1
  0 siblings, 1 reply; 14+ messages in thread
From: Lionel Landwerlin @ 2019-08-02  6:27 UTC (permalink / raw)
  To: Koenig, Christian; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 11619 bytes --]

On 02/08/2019 09:10, Koenig, Christian wrote:
>
>
> Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
> <lionel.g.landwerlin@intel.com>:
>
>     On 02/08/2019 08:21, Koenig, Christian wrote:
>
>
>
>         Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>         <lionel.g.landwerlin@intel.com>
>         <mailto:lionel.g.landwerlin@intel.com>:
>
>             On 02/08/2019 08:08, Koenig, Christian wrote:
>
>                 Hi Lionel,
>
>                 Well that looks more like your test case is buggy.
>
>                 According to the code the ctx1 queue always waits for
>                 sem1 and ctx2 queue always waits for sem2.
>
>
>             That's supposed to be the same underlying syncobj because
>             it's exported from one VkDevice as opaque FD from sem1 and
>             imported into sem2.
>
>
>         Well than that's still buggy and won't synchronize at all.
>
>         When ctx1 waits for a semaphore and then signals the same
>         semaphore there is no guarantee that ctx2 will run in between
>         jobs.
>
>         It's perfectly valid in this case to first run all jobs from
>         ctx1 and then all jobs from ctx2.
>
>
>     That's not really how I see the semaphores working.
>
>     The spec describe VkSemaphore as an interface to an internal
>     payload opaque to the application.
>
>
>     When ctx1 waits on the semaphore, it waits on the payload put
>     there by the previous iteration.
>
>
> And who says that it's not waiting for it's own previous payload?


That's was I understood from you previous comment : "there is no 
guarantee that ctx2 will run in between jobs"


>
> See if the payload is a counter this won't work either. Keep in mind 
> that this has the semantic of a semaphore. Whoever grabs the semaphore 
> first wins and can run, everybody else has to wait.


What performs the "grab" here?

I thought that would be vkQueueSubmit().

Since that occuring from a single application thread, that should then 
be ordered in execution of ctx1,ctx2,ctx1,...


Thanks for your time on this,


-Lionel


>
>     Then it proceeds to signal it by replacing the internal payload.
>
>
> That's an implementation detail of our sync objects, but I don't think 
> that this behavior is part of the Vulkan specification.
>
> Regards,
> Christian.
>
>
>     ctx2 then waits on that and replaces the payload again with the
>     new internal synchronization object.
>
>
>     The internal payload is a dma fence in our case and signaling just
>     replaces a dma fence by another or puts one where there was none
>     before.
>
>     So we should have created a dependecy link between all the
>     submissions and then should be executed in the order of
>     QueueSubmit() calls.
>
>
>     -Lionel
>
>
>
>         It only prevents running both at the same time and as far as I
>         can see that still works even with threaded submission.
>
>         You need at least two semaphores for a tandem submission.
>
>         Regards,
>         Christian.
>
>
>
>                 This way there can't be any Synchronisation between
>                 the two.
>
>                 Regards,
>                 Christian.
>
>                 Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>                 <lionel.g.landwerlin@intel.com>
>                 <mailto:lionel.g.landwerlin@intel.com>:
>                 Hey Christian,
>
>                 The problem boils down to the fact that we don't
>                 immediately create dma fences when calling
>                 vkQueueSubmit().
>                 This is delayed to a thread.
>
>                 From a single application thread, you can
>                 QueueSubmit() to 2 queues from 2 different devices.
>                 Each QueueSubmit to one queue has a dependency on the
>                 previous QueueSubmit on the other queue through an
>                 exported/imported semaphore.
>
>                 From the API point of view the state of the semaphore
>                 should be changed after each QueueSubmit().
>                 The problem is that it's not because of the thread and
>                 because you might have those 2 submission threads tied
>                 to different VkDevice/VkInstance or even different
>                 applications (synchronizing themselves outside the
>                 vulkan API).
>
>                 Hope that makes sense.
>                 It's not really easy to explain by mail, the best
>                 explanation is probably reading the test :
>                 https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>
>                 Like David mentioned you're not running into that
>                 issue right now, because you only dispatch to the
>                 thread under specific conditions.
>                 But I could build a case to force that and likely run
>                 into the same issue.
>
>                 -Lionel
>
>                 On 02/08/2019 07:33, Koenig, Christian wrote:
>
>                     Hi Lionel,
>
>                     Well could you describe once more what the problem is?
>
>                     Cause I don't fully understand why a rather normal
>                     tandem submission with two semaphores should fail
>                     in any way.
>
>                     Regards,
>                     Christian.
>
>                     Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>                     <lionel.g.landwerlin@intel.com>
>                     <mailto:lionel.g.landwerlin@intel.com>:
>                     There aren't CTS tests covering the issue I was
>                     mentioning.
>                     But we could add them.
>
>                     I don't have all the details regarding your
>                     implementation but even with
>                     the "semaphore thread", I could see it running
>                     into the same issues.
>                     What if a mix of binary & timeline semaphores are
>                     handed to vkQueueSubmit()?
>
>                     For example with queueA & queueB from 2 different
>                     VkDevice :
>                          vkQueueSubmit(queueA, signal semA);
>                          vkQueueSubmit(queueA, wait on [semA,
>                     timelineSemB]); with
>                     timelineSemB triggering a wait before signal.
>                          vkQueueSubmit(queueB, signal semA);
>
>
>                     -Lionel
>
>                     On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>                     > Hi Lionel,
>                     >
>                     > By the Queue thread is a heavy thread, which is
>                     always resident in driver during application
>                     running, our guys don't like that. So we switch to
>                     Semaphore Thread, only when waitBeforeSignal of
>                     timeline happens, we spawn a thread to handle that
>                     wait. So we don't have your this issue.
>                     > By the way, I already pass all your CTS cases
>                     for now. I suggest you to switch to Semaphore
>                     Thread instead of Queue Thread as well. It works
>                     very well.
>                     >
>                     > -David
>                     >
>                     > -----Original Message-----
>                     > From: Lionel Landwerlin
>                     <lionel.g.landwerlin@intel.com>
>                     <mailto:lionel.g.landwerlin@intel.com>
>                     > Sent: Friday, August 2, 2019 4:52 AM
>                     > To: dri-devel <dri-devel@lists.freedesktop.org>
>                     <mailto:dri-devel@lists.freedesktop.org>; Koenig,
>                     Christian <Christian.Koenig@amd.com>
>                     <mailto:Christian.Koenig@amd.com>; Zhou,
>                     David(ChunMing) <David1.Zhou@amd.com>
>                     <mailto:David1.Zhou@amd.com>; Jason Ekstrand
>                     <jason@jlekstrand.net> <mailto:jason@jlekstrand.net>
>                     > Subject: Threaded submission & semaphore sharing
>                     >
>                     > Hi Christian, David,
>                     >
>                     > Sorry to report this so late in the process, but
>                     I think we found an issue not directly related to
>                     syncobj timelines themselves but with a side
>                     effect of the threaded submissions.
>                     >
>                     > Essentially we're failing a test in crucible :
>                     > func.sync.semaphore-fd.opaque-fd
>                     > This test create a single binary semaphore,
>                     shares it between 2 VkDevice/VkQueue.
>                     > Then in a loop it proceeds to submit workload
>                     alternating between the 2 VkQueue with one submit
>                     depending on the other.
>                     > It does so by waiting on the VkSemaphore
>                     signaled in the previous iteration and resignaling it.
>                     >
>                     > The problem for us is that once things are
>                     dispatched to the submission thread, the ordering
>                     of the submission is lost.
>                     > Because we have 2 devices and they both have
>                     their own submission thread.
>                     >
>                     > Jason suggested that we reestablish the ordering
>                     by having semaphores/syncobjs carry an additional
>                     uint64_t payload.
>                     > This 64bit integer would represent be an
>                     identifier that submission threads will
>                     WAIT_FOR_AVAILABLE on.
>                     >
>                     > The scenario would look like this :
>                     >       - vkQueueSubmit(queueA, signal on semA);
>                     >           - in the caller thread, this would
>                     increment the syncobj additional u64 payload and
>                     return it to userspace.
>                     >           - at some point the submission thread
>                     of queueA submits the workload and signal the
>                     syncobj of semA with value returned in the caller
>                     thread of vkQueueSubmit().
>                     >       - vkQueueSubmit(queueB, wait on semA);
>                     >           - in the caller thread, this would
>                     read the syncobj additional
>                     > u64 payload
>                     >           - at some point the submission thread
>                     of queueB will try to submit the work, but first
>                     it will WAIT_FOR_AVAILABLE the u64 value returned
>                     in the step above
>                     >
>                     > Because we want the binary semaphores to be
>                     shared across processes and would like this to
>                     remain a single FD, the simplest location to store
>                     this additional u64 payload would be the DRM syncobj.
>                     > It would need an additional ioctl to read &
>                     increment the value.
>                     >
>                     > What do you think?
>                     >
>                     > -Lionel
>
>
>
>
>
>
>


[-- Attachment #1.2: Type: text/html, Size: 28955 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  6:27 ` Lionel Landwerlin
@ 2019-08-02  9:11   ` zhoucm1
  2019-08-02  9:41     ` Lionel Landwerlin
  0 siblings, 1 reply; 14+ messages in thread
From: zhoucm1 @ 2019-08-02  9:11 UTC (permalink / raw)
  To: Lionel Landwerlin, Koenig, Christian; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 12590 bytes --]

Hi Lionel,

For binary semaphore, I guess every one will think application will 
guarantee wait is behind the signal, whenever the semaphore is shared or 
used in internal-process.

I think below two options can fix your problem:

a. Can we extend vkWaitForFence so that it can be able to wait on 
fence-available? If fence is available, then it's safe to do semaphore 
wait in vkQueueSubmit.

b. Make waitBeforeSignal is valid for binary semaphore as well, as that 
way, It is reasonable to add wait/signal counting for binary syncobj.


-David


On 2019年08月02日 14:27, Lionel Landwerlin wrote:
> On 02/08/2019 09:10, Koenig, Christian wrote:
>>
>>
>> Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
>> <lionel.g.landwerlin@intel.com>:
>>
>>     On 02/08/2019 08:21, Koenig, Christian wrote:
>>
>>
>>
>>         Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>>         <lionel.g.landwerlin@intel.com>
>>         <mailto:lionel.g.landwerlin@intel.com>:
>>
>>             On 02/08/2019 08:08, Koenig, Christian wrote:
>>
>>                 Hi Lionel,
>>
>>                 Well that looks more like your test case is buggy.
>>
>>                 According to the code the ctx1 queue always waits for
>>                 sem1 and ctx2 queue always waits for sem2.
>>
>>
>>             That's supposed to be the same underlying syncobj because
>>             it's exported from one VkDevice as opaque FD from sem1
>>             and imported into sem2.
>>
>>
>>         Well than that's still buggy and won't synchronize at all.
>>
>>         When ctx1 waits for a semaphore and then signals the same
>>         semaphore there is no guarantee that ctx2 will run in between
>>         jobs.
>>
>>         It's perfectly valid in this case to first run all jobs from
>>         ctx1 and then all jobs from ctx2.
>>
>>
>>     That's not really how I see the semaphores working.
>>
>>     The spec describe VkSemaphore as an interface to an internal
>>     payload opaque to the application.
>>
>>
>>     When ctx1 waits on the semaphore, it waits on the payload put
>>     there by the previous iteration.
>>
>>
>> And who says that it's not waiting for it's own previous payload?
>
>
> That's was I understood from you previous comment : "there is no 
> guarantee that ctx2 will run in between jobs"
>
>
>>
>> See if the payload is a counter this won't work either. Keep in mind 
>> that this has the semantic of a semaphore. Whoever grabs the 
>> semaphore first wins and can run, everybody else has to wait.
>
>
> What performs the "grab" here?
>
> I thought that would be vkQueueSubmit().
>
> Since that occuring from a single application thread, that should then 
> be ordered in execution of ctx1,ctx2,ctx1,...
>
>
> Thanks for your time on this,
>
>
> -Lionel
>
>
>>
>>     Then it proceeds to signal it by replacing the internal payload.
>>
>>
>> That's an implementation detail of our sync objects, but I don't 
>> think that this behavior is part of the Vulkan specification.
>>
>> Regards,
>> Christian.
>>
>>
>>     ctx2 then waits on that and replaces the payload again with the
>>     new internal synchronization object.
>>
>>
>>     The internal payload is a dma fence in our case and signaling
>>     just replaces a dma fence by another or puts one where there was
>>     none before.
>>
>>     So we should have created a dependecy link between all the
>>     submissions and then should be executed in the order of
>>     QueueSubmit() calls.
>>
>>
>>     -Lionel
>>
>>
>>
>>         It only prevents running both at the same time and as far as
>>         I can see that still works even with threaded submission.
>>
>>         You need at least two semaphores for a tandem submission.
>>
>>         Regards,
>>         Christian.
>>
>>
>>
>>                 This way there can't be any Synchronisation between
>>                 the two.
>>
>>                 Regards,
>>                 Christian.
>>
>>                 Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>>                 <lionel.g.landwerlin@intel.com>
>>                 <mailto:lionel.g.landwerlin@intel.com>:
>>                 Hey Christian,
>>
>>                 The problem boils down to the fact that we don't
>>                 immediately create dma fences when calling
>>                 vkQueueSubmit().
>>                 This is delayed to a thread.
>>
>>                 From a single application thread, you can
>>                 QueueSubmit() to 2 queues from 2 different devices.
>>                 Each QueueSubmit to one queue has a dependency on the
>>                 previous QueueSubmit on the other queue through an
>>                 exported/imported semaphore.
>>
>>                 From the API point of view the state of the semaphore
>>                 should be changed after each QueueSubmit().
>>                 The problem is that it's not because of the thread
>>                 and because you might have those 2 submission threads
>>                 tied to different VkDevice/VkInstance or even
>>                 different applications (synchronizing themselves
>>                 outside the vulkan API).
>>
>>                 Hope that makes sense.
>>                 It's not really easy to explain by mail, the best
>>                 explanation is probably reading the test :
>>                 https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>>
>>                 Like David mentioned you're not running into that
>>                 issue right now, because you only dispatch to the
>>                 thread under specific conditions.
>>                 But I could build a case to force that and likely run
>>                 into the same issue.
>>
>>                 -Lionel
>>
>>                 On 02/08/2019 07:33, Koenig, Christian wrote:
>>
>>                     Hi Lionel,
>>
>>                     Well could you describe once more what the
>>                     problem is?
>>
>>                     Cause I don't fully understand why a rather
>>                     normal tandem submission with two semaphores
>>                     should fail in any way.
>>
>>                     Regards,
>>                     Christian.
>>
>>                     Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>>                     <lionel.g.landwerlin@intel.com>
>>                     <mailto:lionel.g.landwerlin@intel.com>:
>>                     There aren't CTS tests covering the issue I was
>>                     mentioning.
>>                     But we could add them.
>>
>>                     I don't have all the details regarding your
>>                     implementation but even with
>>                     the "semaphore thread", I could see it running
>>                     into the same issues.
>>                     What if a mix of binary & timeline semaphores are
>>                     handed to vkQueueSubmit()?
>>
>>                     For example with queueA & queueB from 2 different
>>                     VkDevice :
>>                          vkQueueSubmit(queueA, signal semA);
>>                          vkQueueSubmit(queueA, wait on [semA,
>>                     timelineSemB]); with
>>                     timelineSemB triggering a wait before signal.
>>                          vkQueueSubmit(queueB, signal semA);
>>
>>
>>                     -Lionel
>>
>>                     On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>>                     > Hi Lionel,
>>                     >
>>                     > By the Queue thread is a heavy thread, which is
>>                     always resident in driver during application
>>                     running, our guys don't like that. So we switch
>>                     to Semaphore Thread, only when waitBeforeSignal
>>                     of timeline happens, we spawn a thread to handle
>>                     that wait. So we don't have your this issue.
>>                     > By the way, I already pass all your CTS cases
>>                     for now. I suggest you to switch to Semaphore
>>                     Thread instead of Queue Thread as well. It works
>>                     very well.
>>                     >
>>                     > -David
>>                     >
>>                     > -----Original Message-----
>>                     > From: Lionel Landwerlin
>>                     <lionel.g.landwerlin@intel.com>
>>                     <mailto:lionel.g.landwerlin@intel.com>
>>                     > Sent: Friday, August 2, 2019 4:52 AM
>>                     > To: dri-devel <dri-devel@lists.freedesktop.org>
>>                     <mailto:dri-devel@lists.freedesktop.org>; Koenig,
>>                     Christian <Christian.Koenig@amd.com>
>>                     <mailto:Christian.Koenig@amd.com>; Zhou,
>>                     David(ChunMing) <David1.Zhou@amd.com>
>>                     <mailto:David1.Zhou@amd.com>; Jason Ekstrand
>>                     <jason@jlekstrand.net> <mailto:jason@jlekstrand.net>
>>                     > Subject: Threaded submission & semaphore sharing
>>                     >
>>                     > Hi Christian, David,
>>                     >
>>                     > Sorry to report this so late in the process,
>>                     but I think we found an issue not directly
>>                     related to syncobj timelines themselves but with
>>                     a side effect of the threaded submissions.
>>                     >
>>                     > Essentially we're failing a test in crucible :
>>                     > func.sync.semaphore-fd.opaque-fd
>>                     > This test create a single binary semaphore,
>>                     shares it between 2 VkDevice/VkQueue.
>>                     > Then in a loop it proceeds to submit workload
>>                     alternating between the 2 VkQueue with one submit
>>                     depending on the other.
>>                     > It does so by waiting on the VkSemaphore
>>                     signaled in the previous iteration and
>>                     resignaling it.
>>                     >
>>                     > The problem for us is that once things are
>>                     dispatched to the submission thread, the ordering
>>                     of the submission is lost.
>>                     > Because we have 2 devices and they both have
>>                     their own submission thread.
>>                     >
>>                     > Jason suggested that we reestablish the
>>                     ordering by having semaphores/syncobjs carry an
>>                     additional uint64_t payload.
>>                     > This 64bit integer would represent be an
>>                     identifier that submission threads will
>>                     WAIT_FOR_AVAILABLE on.
>>                     >
>>                     > The scenario would look like this :
>>                     >       - vkQueueSubmit(queueA, signal on semA);
>>                     >           - in the caller thread, this would
>>                     increment the syncobj additional u64 payload and
>>                     return it to userspace.
>>                     >           - at some point the submission thread
>>                     of queueA submits the workload and signal the
>>                     syncobj of semA with value returned in the caller
>>                     thread of vkQueueSubmit().
>>                     >       - vkQueueSubmit(queueB, wait on semA);
>>                     >           - in the caller thread, this would
>>                     read the syncobj additional
>>                     > u64 payload
>>                     >           - at some point the submission thread
>>                     of queueB will try to submit the work, but first
>>                     it will WAIT_FOR_AVAILABLE the u64 value returned
>>                     in the step above
>>                     >
>>                     > Because we want the binary semaphores to be
>>                     shared across processes and would like this to
>>                     remain a single FD, the simplest location to
>>                     store this additional u64 payload would be the
>>                     DRM syncobj.
>>                     > It would need an additional ioctl to read &
>>                     increment the value.
>>                     >
>>                     > What do you think?
>>                     >
>>                     > -Lionel
>>
>>
>>
>>
>>
>>
>>
>


[-- Attachment #1.2: Type: text/html, Size: 31721 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  9:11   ` zhoucm1
@ 2019-08-02  9:41     ` Lionel Landwerlin
  2019-08-02 10:01       ` zhoucm1
  0 siblings, 1 reply; 14+ messages in thread
From: Lionel Landwerlin @ 2019-08-02  9:41 UTC (permalink / raw)
  To: zhoucm1, Koenig, Christian; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 13330 bytes --]

Hey David,

On 02/08/2019 12:11, zhoucm1 wrote:
>
> Hi Lionel,
>
> For binary semaphore, I guess every one will think application will 
> guarantee wait is behind the signal, whenever the semaphore is shared 
> or used in internal-process.
>
> I think below two options can fix your problem:
>
> a. Can we extend vkWaitForFence so that it can be able to wait on 
> fence-available? If fence is available, then it's safe to do semaphore 
> wait in vkQueueSubmit.
>

I'm sorry, but I don't understand what vkWaitForFence() has to do with 
this problem.

They test case we're struggling with doesn't use that API.


Can you maybe explain a bit more how it relates?


> b. Make waitBeforeSignal is valid for binary semaphore as well, as 
> that way, It is reasonable to add wait/signal counting for binary syncobj.
>

Yeah essentially the change we're proposing internally makes binary 
semaphores use syncobj timelines.

There is just another u64 associated with them.


-Lionel


>
> -David
>
>
> On 2019年08月02日 14:27, Lionel Landwerlin wrote:
>> On 02/08/2019 09:10, Koenig, Christian wrote:
>>>
>>>
>>> Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
>>> <lionel.g.landwerlin@intel.com>:
>>>
>>>     On 02/08/2019 08:21, Koenig, Christian wrote:
>>>
>>>
>>>
>>>         Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>>>         <lionel.g.landwerlin@intel.com>
>>>         <mailto:lionel.g.landwerlin@intel.com>:
>>>
>>>             On 02/08/2019 08:08, Koenig, Christian wrote:
>>>
>>>                 Hi Lionel,
>>>
>>>                 Well that looks more like your test case is buggy.
>>>
>>>                 According to the code the ctx1 queue always waits
>>>                 for sem1 and ctx2 queue always waits for sem2.
>>>
>>>
>>>             That's supposed to be the same underlying syncobj
>>>             because it's exported from one VkDevice as opaque FD
>>>             from sem1 and imported into sem2.
>>>
>>>
>>>         Well than that's still buggy and won't synchronize at all.
>>>
>>>         When ctx1 waits for a semaphore and then signals the same
>>>         semaphore there is no guarantee that ctx2 will run in
>>>         between jobs.
>>>
>>>         It's perfectly valid in this case to first run all jobs from
>>>         ctx1 and then all jobs from ctx2.
>>>
>>>
>>>     That's not really how I see the semaphores working.
>>>
>>>     The spec describe VkSemaphore as an interface to an internal
>>>     payload opaque to the application.
>>>
>>>
>>>     When ctx1 waits on the semaphore, it waits on the payload put
>>>     there by the previous iteration.
>>>
>>>
>>> And who says that it's not waiting for it's own previous payload?
>>
>>
>> That's was I understood from you previous comment : "there is no 
>> guarantee that ctx2 will run in between jobs"
>>
>>
>>>
>>> See if the payload is a counter this won't work either. Keep in mind 
>>> that this has the semantic of a semaphore. Whoever grabs the 
>>> semaphore first wins and can run, everybody else has to wait.
>>
>>
>> What performs the "grab" here?
>>
>> I thought that would be vkQueueSubmit().
>>
>> Since that occuring from a single application thread, that should 
>> then be ordered in execution of ctx1,ctx2,ctx1,...
>>
>>
>> Thanks for your time on this,
>>
>>
>> -Lionel
>>
>>
>>>
>>>     Then it proceeds to signal it by replacing the internal payload.
>>>
>>>
>>> That's an implementation detail of our sync objects, but I don't 
>>> think that this behavior is part of the Vulkan specification.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>     ctx2 then waits on that and replaces the payload again with the
>>>     new internal synchronization object.
>>>
>>>
>>>     The internal payload is a dma fence in our case and signaling
>>>     just replaces a dma fence by another or puts one where there was
>>>     none before.
>>>
>>>     So we should have created a dependecy link between all the
>>>     submissions and then should be executed in the order of
>>>     QueueSubmit() calls.
>>>
>>>
>>>     -Lionel
>>>
>>>
>>>
>>>         It only prevents running both at the same time and as far as
>>>         I can see that still works even with threaded submission.
>>>
>>>         You need at least two semaphores for a tandem submission.
>>>
>>>         Regards,
>>>         Christian.
>>>
>>>
>>>
>>>                 This way there can't be any Synchronisation between
>>>                 the two.
>>>
>>>                 Regards,
>>>                 Christian.
>>>
>>>                 Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>>>                 <lionel.g.landwerlin@intel.com>
>>>                 <mailto:lionel.g.landwerlin@intel.com>:
>>>                 Hey Christian,
>>>
>>>                 The problem boils down to the fact that we don't
>>>                 immediately create dma fences when calling
>>>                 vkQueueSubmit().
>>>                 This is delayed to a thread.
>>>
>>>                 From a single application thread, you can
>>>                 QueueSubmit() to 2 queues from 2 different devices.
>>>                 Each QueueSubmit to one queue has a dependency on
>>>                 the previous QueueSubmit on the other queue through
>>>                 an exported/imported semaphore.
>>>
>>>                 From the API point of view the state of the
>>>                 semaphore should be changed after each QueueSubmit().
>>>                 The problem is that it's not because of the thread
>>>                 and because you might have those 2 submission
>>>                 threads tied to different VkDevice/VkInstance or
>>>                 even different applications (synchronizing
>>>                 themselves outside the vulkan API).
>>>
>>>                 Hope that makes sense.
>>>                 It's not really easy to explain by mail, the best
>>>                 explanation is probably reading the test :
>>>                 https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>>>
>>>                 Like David mentioned you're not running into that
>>>                 issue right now, because you only dispatch to the
>>>                 thread under specific conditions.
>>>                 But I could build a case to force that and likely
>>>                 run into the same issue.
>>>
>>>                 -Lionel
>>>
>>>                 On 02/08/2019 07:33, Koenig, Christian wrote:
>>>
>>>                     Hi Lionel,
>>>
>>>                     Well could you describe once more what the
>>>                     problem is?
>>>
>>>                     Cause I don't fully understand why a rather
>>>                     normal tandem submission with two semaphores
>>>                     should fail in any way.
>>>
>>>                     Regards,
>>>                     Christian.
>>>
>>>                     Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>>>                     <lionel.g.landwerlin@intel.com>
>>>                     <mailto:lionel.g.landwerlin@intel.com>:
>>>                     There aren't CTS tests covering the issue I was
>>>                     mentioning.
>>>                     But we could add them.
>>>
>>>                     I don't have all the details regarding your
>>>                     implementation but even with
>>>                     the "semaphore thread", I could see it running
>>>                     into the same issues.
>>>                     What if a mix of binary & timeline semaphores
>>>                     are handed to vkQueueSubmit()?
>>>
>>>                     For example with queueA & queueB from 2
>>>                     different VkDevice :
>>>                     vkQueueSubmit(queueA, signal semA);
>>>                     vkQueueSubmit(queueA, wait on [semA,
>>>                     timelineSemB]); with
>>>                     timelineSemB triggering a wait before signal.
>>>                     vkQueueSubmit(queueB, signal semA);
>>>
>>>
>>>                     -Lionel
>>>
>>>                     On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>>>                     > Hi Lionel,
>>>                     >
>>>                     > By the Queue thread is a heavy thread, which
>>>                     is always resident in driver during application
>>>                     running, our guys don't like that. So we switch
>>>                     to Semaphore Thread, only when waitBeforeSignal
>>>                     of timeline happens, we spawn a thread to handle
>>>                     that wait. So we don't have your this issue.
>>>                     > By the way, I already pass all your CTS cases
>>>                     for now. I suggest you to switch to Semaphore
>>>                     Thread instead of Queue Thread as well. It works
>>>                     very well.
>>>                     >
>>>                     > -David
>>>                     >
>>>                     > -----Original Message-----
>>>                     > From: Lionel Landwerlin
>>>                     <lionel.g.landwerlin@intel.com>
>>>                     <mailto:lionel.g.landwerlin@intel.com>
>>>                     > Sent: Friday, August 2, 2019 4:52 AM
>>>                     > To: dri-devel
>>>                     <dri-devel@lists.freedesktop.org>
>>>                     <mailto:dri-devel@lists.freedesktop.org>;
>>>                     Koenig, Christian <Christian.Koenig@amd.com>
>>>                     <mailto:Christian.Koenig@amd.com>; Zhou,
>>>                     David(ChunMing) <David1.Zhou@amd.com>
>>>                     <mailto:David1.Zhou@amd.com>; Jason Ekstrand
>>>                     <jason@jlekstrand.net> <mailto:jason@jlekstrand.net>
>>>                     > Subject: Threaded submission & semaphore sharing
>>>                     >
>>>                     > Hi Christian, David,
>>>                     >
>>>                     > Sorry to report this so late in the process,
>>>                     but I think we found an issue not directly
>>>                     related to syncobj timelines themselves but with
>>>                     a side effect of the threaded submissions.
>>>                     >
>>>                     > Essentially we're failing a test in crucible :
>>>                     > func.sync.semaphore-fd.opaque-fd
>>>                     > This test create a single binary semaphore,
>>>                     shares it between 2 VkDevice/VkQueue.
>>>                     > Then in a loop it proceeds to submit workload
>>>                     alternating between the 2 VkQueue with one
>>>                     submit depending on the other.
>>>                     > It does so by waiting on the VkSemaphore
>>>                     signaled in the previous iteration and
>>>                     resignaling it.
>>>                     >
>>>                     > The problem for us is that once things are
>>>                     dispatched to the submission thread, the
>>>                     ordering of the submission is lost.
>>>                     > Because we have 2 devices and they both have
>>>                     their own submission thread.
>>>                     >
>>>                     > Jason suggested that we reestablish the
>>>                     ordering by having semaphores/syncobjs carry an
>>>                     additional uint64_t payload.
>>>                     > This 64bit integer would represent be an
>>>                     identifier that submission threads will
>>>                     WAIT_FOR_AVAILABLE on.
>>>                     >
>>>                     > The scenario would look like this :
>>>                     >       - vkQueueSubmit(queueA, signal on semA);
>>>                     >           - in the caller thread, this would
>>>                     increment the syncobj additional u64 payload and
>>>                     return it to userspace.
>>>                     >           - at some point the submission
>>>                     thread of queueA submits the workload and signal
>>>                     the syncobj of semA with value returned in the
>>>                     caller thread of vkQueueSubmit().
>>>                     >       - vkQueueSubmit(queueB, wait on semA);
>>>                     >           - in the caller thread, this would
>>>                     read the syncobj additional
>>>                     > u64 payload
>>>                     >           - at some point the submission
>>>                     thread of queueB will try to submit the work,
>>>                     but first it will WAIT_FOR_AVAILABLE the u64
>>>                     value returned in the step above
>>>                     >
>>>                     > Because we want the binary semaphores to be
>>>                     shared across processes and would like this to
>>>                     remain a single FD, the simplest location to
>>>                     store this additional u64 payload would be the
>>>                     DRM syncobj.
>>>                     > It would need an additional ioctl to read &
>>>                     increment the value.
>>>                     >
>>>                     > What do you think?
>>>                     >
>>>                     > -Lionel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


[-- Attachment #1.2: Type: text/html, Size: 34593 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  9:41     ` Lionel Landwerlin
@ 2019-08-02 10:01       ` zhoucm1
  2019-08-02 12:12         ` Lionel Landwerlin
  0 siblings, 1 reply; 14+ messages in thread
From: zhoucm1 @ 2019-08-02 10:01 UTC (permalink / raw)
  To: Lionel Landwerlin, Koenig, Christian; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 14067 bytes --]



On 2019年08月02日 17:41, Lionel Landwerlin wrote:
> Hey David,
>
> On 02/08/2019 12:11, zhoucm1 wrote:
>>
>> Hi Lionel,
>>
>> For binary semaphore, I guess every one will think application will 
>> guarantee wait is behind the signal, whenever the semaphore is shared 
>> or used in internal-process.
>>
>> I think below two options can fix your problem:
>>
>> a. Can we extend vkWaitForFence so that it can be able to wait on 
>> fence-available? If fence is available, then it's safe to do 
>> semaphore wait in vkQueueSubmit.
>>
>
> I'm sorry, but I don't understand what vkWaitForFence() has to do with 
> this problem.
>
> They test case we're struggling with doesn't use that API.
>
>
> Can you maybe explain a bit more how it relates?
>
vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()

Sorry, that could lead application program very ugly.
>
>
>> b. Make waitBeforeSignal is valid for binary semaphore as well, as 
>> that way, It is reasonable to add wait/signal counting for binary 
>> syncobj.
>>
>
> Yeah essentially the change we're proposing internally makes binary 
> semaphores use syncobj timelines.
>
Will you raise up a MR to add the language of waitBeforeSignal support 
of binary semaphore to vulkan spec?

-David
>
> There is just another u64 associated with them.
>
>
> -Lionel
>
>
>>
>> -David
>>
>>
>> On 2019年08月02日 14:27, Lionel Landwerlin wrote:
>>> On 02/08/2019 09:10, Koenig, Christian wrote:
>>>>
>>>>
>>>> Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
>>>> <lionel.g.landwerlin@intel.com>:
>>>>
>>>>     On 02/08/2019 08:21, Koenig, Christian wrote:
>>>>
>>>>
>>>>
>>>>         Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>>>>         <lionel.g.landwerlin@intel.com>
>>>>         <mailto:lionel.g.landwerlin@intel.com>:
>>>>
>>>>             On 02/08/2019 08:08, Koenig, Christian wrote:
>>>>
>>>>                 Hi Lionel,
>>>>
>>>>                 Well that looks more like your test case is buggy.
>>>>
>>>>                 According to the code the ctx1 queue always waits
>>>>                 for sem1 and ctx2 queue always waits for sem2.
>>>>
>>>>
>>>>             That's supposed to be the same underlying syncobj
>>>>             because it's exported from one VkDevice as opaque FD
>>>>             from sem1 and imported into sem2.
>>>>
>>>>
>>>>         Well than that's still buggy and won't synchronize at all.
>>>>
>>>>         When ctx1 waits for a semaphore and then signals the same
>>>>         semaphore there is no guarantee that ctx2 will run in
>>>>         between jobs.
>>>>
>>>>         It's perfectly valid in this case to first run all jobs
>>>>         from ctx1 and then all jobs from ctx2.
>>>>
>>>>
>>>>     That's not really how I see the semaphores working.
>>>>
>>>>     The spec describe VkSemaphore as an interface to an internal
>>>>     payload opaque to the application.
>>>>
>>>>
>>>>     When ctx1 waits on the semaphore, it waits on the payload put
>>>>     there by the previous iteration.
>>>>
>>>>
>>>> And who says that it's not waiting for it's own previous payload?
>>>
>>>
>>> That's was I understood from you previous comment : "there is no 
>>> guarantee that ctx2 will run in between jobs"
>>>
>>>
>>>>
>>>> See if the payload is a counter this won't work either. Keep in 
>>>> mind that this has the semantic of a semaphore. Whoever grabs the 
>>>> semaphore first wins and can run, everybody else has to wait.
>>>
>>>
>>> What performs the "grab" here?
>>>
>>> I thought that would be vkQueueSubmit().
>>>
>>> Since that occuring from a single application thread, that should 
>>> then be ordered in execution of ctx1,ctx2,ctx1,...
>>>
>>>
>>> Thanks for your time on this,
>>>
>>>
>>> -Lionel
>>>
>>>
>>>>
>>>>     Then it proceeds to signal it by replacing the internal payload.
>>>>
>>>>
>>>> That's an implementation detail of our sync objects, but I don't 
>>>> think that this behavior is part of the Vulkan specification.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>     ctx2 then waits on that and replaces the payload again with the
>>>>     new internal synchronization object.
>>>>
>>>>
>>>>     The internal payload is a dma fence in our case and signaling
>>>>     just replaces a dma fence by another or puts one where there
>>>>     was none before.
>>>>
>>>>     So we should have created a dependecy link between all the
>>>>     submissions and then should be executed in the order of
>>>>     QueueSubmit() calls.
>>>>
>>>>
>>>>     -Lionel
>>>>
>>>>
>>>>
>>>>         It only prevents running both at the same time and as far
>>>>         as I can see that still works even with threaded submission.
>>>>
>>>>         You need at least two semaphores for a tandem submission.
>>>>
>>>>         Regards,
>>>>         Christian.
>>>>
>>>>
>>>>
>>>>                 This way there can't be any Synchronisation between
>>>>                 the two.
>>>>
>>>>                 Regards,
>>>>                 Christian.
>>>>
>>>>                 Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>>>>                 <lionel.g.landwerlin@intel.com>
>>>>                 <mailto:lionel.g.landwerlin@intel.com>:
>>>>                 Hey Christian,
>>>>
>>>>                 The problem boils down to the fact that we don't
>>>>                 immediately create dma fences when calling
>>>>                 vkQueueSubmit().
>>>>                 This is delayed to a thread.
>>>>
>>>>                 From a single application thread, you can
>>>>                 QueueSubmit() to 2 queues from 2 different devices.
>>>>                 Each QueueSubmit to one queue has a dependency on
>>>>                 the previous QueueSubmit on the other queue through
>>>>                 an exported/imported semaphore.
>>>>
>>>>                 From the API point of view the state of the
>>>>                 semaphore should be changed after each QueueSubmit().
>>>>                 The problem is that it's not because of the thread
>>>>                 and because you might have those 2 submission
>>>>                 threads tied to different VkDevice/VkInstance or
>>>>                 even different applications (synchronizing
>>>>                 themselves outside the vulkan API).
>>>>
>>>>                 Hope that makes sense.
>>>>                 It's not really easy to explain by mail, the best
>>>>                 explanation is probably reading the test :
>>>>                 https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>>>>
>>>>                 Like David mentioned you're not running into that
>>>>                 issue right now, because you only dispatch to the
>>>>                 thread under specific conditions.
>>>>                 But I could build a case to force that and likely
>>>>                 run into the same issue.
>>>>
>>>>                 -Lionel
>>>>
>>>>                 On 02/08/2019 07:33, Koenig, Christian wrote:
>>>>
>>>>                     Hi Lionel,
>>>>
>>>>                     Well could you describe once more what the
>>>>                     problem is?
>>>>
>>>>                     Cause I don't fully understand why a rather
>>>>                     normal tandem submission with two semaphores
>>>>                     should fail in any way.
>>>>
>>>>                     Regards,
>>>>                     Christian.
>>>>
>>>>                     Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>>>>                     <lionel.g.landwerlin@intel.com>
>>>>                     <mailto:lionel.g.landwerlin@intel.com>:
>>>>                     There aren't CTS tests covering the issue I was
>>>>                     mentioning.
>>>>                     But we could add them.
>>>>
>>>>                     I don't have all the details regarding your
>>>>                     implementation but even with
>>>>                     the "semaphore thread", I could see it running
>>>>                     into the same issues.
>>>>                     What if a mix of binary & timeline semaphores
>>>>                     are handed to vkQueueSubmit()?
>>>>
>>>>                     For example with queueA & queueB from 2
>>>>                     different VkDevice :
>>>>                     vkQueueSubmit(queueA, signal semA);
>>>>                     vkQueueSubmit(queueA, wait on [semA,
>>>>                     timelineSemB]); with
>>>>                     timelineSemB triggering a wait before signal.
>>>>                     vkQueueSubmit(queueB, signal semA);
>>>>
>>>>
>>>>                     -Lionel
>>>>
>>>>                     On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>>>>                     > Hi Lionel,
>>>>                     >
>>>>                     > By the Queue thread is a heavy thread, which
>>>>                     is always resident in driver during application
>>>>                     running, our guys don't like that. So we switch
>>>>                     to Semaphore Thread, only when waitBeforeSignal
>>>>                     of timeline happens, we spawn a thread to
>>>>                     handle that wait. So we don't have your this issue.
>>>>                     > By the way, I already pass all your CTS cases
>>>>                     for now. I suggest you to switch to Semaphore
>>>>                     Thread instead of Queue Thread as well. It
>>>>                     works very well.
>>>>                     >
>>>>                     > -David
>>>>                     >
>>>>                     > -----Original Message-----
>>>>                     > From: Lionel Landwerlin
>>>>                     <lionel.g.landwerlin@intel.com>
>>>>                     <mailto:lionel.g.landwerlin@intel.com>
>>>>                     > Sent: Friday, August 2, 2019 4:52 AM
>>>>                     > To: dri-devel
>>>>                     <dri-devel@lists.freedesktop.org>
>>>>                     <mailto:dri-devel@lists.freedesktop.org>;
>>>>                     Koenig, Christian <Christian.Koenig@amd.com>
>>>>                     <mailto:Christian.Koenig@amd.com>; Zhou,
>>>>                     David(ChunMing) <David1.Zhou@amd.com>
>>>>                     <mailto:David1.Zhou@amd.com>; Jason Ekstrand
>>>>                     <jason@jlekstrand.net>
>>>>                     <mailto:jason@jlekstrand.net>
>>>>                     > Subject: Threaded submission & semaphore sharing
>>>>                     >
>>>>                     > Hi Christian, David,
>>>>                     >
>>>>                     > Sorry to report this so late in the process,
>>>>                     but I think we found an issue not directly
>>>>                     related to syncobj timelines themselves but
>>>>                     with a side effect of the threaded submissions.
>>>>                     >
>>>>                     > Essentially we're failing a test in crucible :
>>>>                     > func.sync.semaphore-fd.opaque-fd
>>>>                     > This test create a single binary semaphore,
>>>>                     shares it between 2 VkDevice/VkQueue.
>>>>                     > Then in a loop it proceeds to submit workload
>>>>                     alternating between the 2 VkQueue with one
>>>>                     submit depending on the other.
>>>>                     > It does so by waiting on the VkSemaphore
>>>>                     signaled in the previous iteration and
>>>>                     resignaling it.
>>>>                     >
>>>>                     > The problem for us is that once things are
>>>>                     dispatched to the submission thread, the
>>>>                     ordering of the submission is lost.
>>>>                     > Because we have 2 devices and they both have
>>>>                     their own submission thread.
>>>>                     >
>>>>                     > Jason suggested that we reestablish the
>>>>                     ordering by having semaphores/syncobjs carry an
>>>>                     additional uint64_t payload.
>>>>                     > This 64bit integer would represent be an
>>>>                     identifier that submission threads will
>>>>                     WAIT_FOR_AVAILABLE on.
>>>>                     >
>>>>                     > The scenario would look like this :
>>>>                     >       - vkQueueSubmit(queueA, signal on semA);
>>>>                     >           - in the caller thread, this would
>>>>                     increment the syncobj additional u64 payload
>>>>                     and return it to userspace.
>>>>                     >           - at some point the submission
>>>>                     thread of queueA submits the workload and
>>>>                     signal the syncobj of semA with value returned
>>>>                     in the caller thread of vkQueueSubmit().
>>>>                     >       - vkQueueSubmit(queueB, wait on semA);
>>>>                     >           - in the caller thread, this would
>>>>                     read the syncobj additional
>>>>                     > u64 payload
>>>>                     >           - at some point the submission
>>>>                     thread of queueB will try to submit the work,
>>>>                     but first it will WAIT_FOR_AVAILABLE the u64
>>>>                     value returned in the step above
>>>>                     >
>>>>                     > Because we want the binary semaphores to be
>>>>                     shared across processes and would like this to
>>>>                     remain a single FD, the simplest location to
>>>>                     store this additional u64 payload would be the
>>>>                     DRM syncobj.
>>>>                     > It would need an additional ioctl to read &
>>>>                     increment the value.
>>>>                     >
>>>>                     > What do you think?
>>>>                     >
>>>>                     > -Lionel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>


[-- Attachment #1.2: Type: text/html, Size: 37411 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02 10:01       ` zhoucm1
@ 2019-08-02 12:12         ` Lionel Landwerlin
  0 siblings, 0 replies; 14+ messages in thread
From: Lionel Landwerlin @ 2019-08-02 12:12 UTC (permalink / raw)
  To: zhoucm1, Koenig, Christian; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 14969 bytes --]

Went through the spec VUs again and there is a bit that says that binary 
semaphores must have all of the their dependencies submitted for execution.
So essentially it means in our implementation of QueueSubmit() we should 
be able to do a short wait_for_available on all of those which does 
ensure everything is properly ordered.

So I think you're right on this David, waitBeforeSignal on binary 
semaphore is out of scope.

Again sorry for the noise.

-Lionel

On 02/08/2019 13:01, zhoucm1 wrote:
>
>
>
> On 2019年08月02日 17:41, Lionel Landwerlin wrote:
>> Hey David,
>>
>> On 02/08/2019 12:11, zhoucm1 wrote:
>>>
>>> Hi Lionel,
>>>
>>> For binary semaphore, I guess every one will think application will 
>>> guarantee wait is behind the signal, whenever the semaphore is 
>>> shared or used in internal-process.
>>>
>>> I think below two options can fix your problem:
>>>
>>> a. Can we extend vkWaitForFence so that it can be able to wait on 
>>> fence-available? If fence is available, then it's safe to do 
>>> semaphore wait in vkQueueSubmit.
>>>
>>
>> I'm sorry, but I don't understand what vkWaitForFence() has to do 
>> with this problem.
>>
>> They test case we're struggling with doesn't use that API.
>>
>>
>> Can you maybe explain a bit more how it relates?
>>
> vkQueueSubmit()
> vkWaitForFenceAvailable()
> vkQueueSubmit()
> vkWaitForFenceAvailable()
> vkQueueSubmit()
> vkWaitForFenceAvailable()
>
> Sorry, that could lead application program very ugly.
>>
>>
>>> b. Make waitBeforeSignal is valid for binary semaphore as well, as 
>>> that way, It is reasonable to add wait/signal counting for binary 
>>> syncobj.
>>>
>>
>> Yeah essentially the change we're proposing internally makes binary 
>> semaphores use syncobj timelines.
>>
> Will you raise up a MR to add the language of waitBeforeSignal support 
> of binary semaphore to vulkan spec?
>
> -David
>>
>> There is just another u64 associated with them.
>>
>>
>> -Lionel
>>
>>
>>>
>>> -David
>>>
>>>
>>> On 2019年08月02日 14:27, Lionel Landwerlin wrote:
>>>> On 02/08/2019 09:10, Koenig, Christian wrote:
>>>>>
>>>>>
>>>>> Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
>>>>> <lionel.g.landwerlin@intel.com>:
>>>>>
>>>>>     On 02/08/2019 08:21, Koenig, Christian wrote:
>>>>>
>>>>>
>>>>>
>>>>>         Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>>>>>         <lionel.g.landwerlin@intel.com>
>>>>>         <mailto:lionel.g.landwerlin@intel.com>:
>>>>>
>>>>>             On 02/08/2019 08:08, Koenig, Christian wrote:
>>>>>
>>>>>                 Hi Lionel,
>>>>>
>>>>>                 Well that looks more like your test case is buggy.
>>>>>
>>>>>                 According to the code the ctx1 queue always waits
>>>>>                 for sem1 and ctx2 queue always waits for sem2.
>>>>>
>>>>>
>>>>>             That's supposed to be the same underlying syncobj
>>>>>             because it's exported from one VkDevice as opaque FD
>>>>>             from sem1 and imported into sem2.
>>>>>
>>>>>
>>>>>         Well than that's still buggy and won't synchronize at all.
>>>>>
>>>>>         When ctx1 waits for a semaphore and then signals the same
>>>>>         semaphore there is no guarantee that ctx2 will run in
>>>>>         between jobs.
>>>>>
>>>>>         It's perfectly valid in this case to first run all jobs
>>>>>         from ctx1 and then all jobs from ctx2.
>>>>>
>>>>>
>>>>>     That's not really how I see the semaphores working.
>>>>>
>>>>>     The spec describe VkSemaphore as an interface to an internal
>>>>>     payload opaque to the application.
>>>>>
>>>>>
>>>>>     When ctx1 waits on the semaphore, it waits on the payload put
>>>>>     there by the previous iteration.
>>>>>
>>>>>
>>>>> And who says that it's not waiting for it's own previous payload?
>>>>
>>>>
>>>> That's was I understood from you previous comment : "there is no 
>>>> guarantee that ctx2 will run in between jobs"
>>>>
>>>>
>>>>>
>>>>> See if the payload is a counter this won't work either. Keep in 
>>>>> mind that this has the semantic of a semaphore. Whoever grabs the 
>>>>> semaphore first wins and can run, everybody else has to wait.
>>>>
>>>>
>>>> What performs the "grab" here?
>>>>
>>>> I thought that would be vkQueueSubmit().
>>>>
>>>> Since that occuring from a single application thread, that should 
>>>> then be ordered in execution of ctx1,ctx2,ctx1,...
>>>>
>>>>
>>>> Thanks for your time on this,
>>>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>>
>>>>>     Then it proceeds to signal it by replacing the internal payload.
>>>>>
>>>>>
>>>>> That's an implementation detail of our sync objects, but I don't 
>>>>> think that this behavior is part of the Vulkan specification.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>
>>>>>     ctx2 then waits on that and replaces the payload again with
>>>>>     the new internal synchronization object.
>>>>>
>>>>>
>>>>>     The internal payload is a dma fence in our case and signaling
>>>>>     just replaces a dma fence by another or puts one where there
>>>>>     was none before.
>>>>>
>>>>>     So we should have created a dependecy link between all the
>>>>>     submissions and then should be executed in the order of
>>>>>     QueueSubmit() calls.
>>>>>
>>>>>
>>>>>     -Lionel
>>>>>
>>>>>
>>>>>
>>>>>         It only prevents running both at the same time and as far
>>>>>         as I can see that still works even with threaded submission.
>>>>>
>>>>>         You need at least two semaphores for a tandem submission.
>>>>>
>>>>>         Regards,
>>>>>         Christian.
>>>>>
>>>>>
>>>>>
>>>>>                 This way there can't be any Synchronisation
>>>>>                 between the two.
>>>>>
>>>>>                 Regards,
>>>>>                 Christian.
>>>>>
>>>>>                 Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>>>>>                 <lionel.g.landwerlin@intel.com>
>>>>>                 <mailto:lionel.g.landwerlin@intel.com>:
>>>>>                 Hey Christian,
>>>>>
>>>>>                 The problem boils down to the fact that we don't
>>>>>                 immediately create dma fences when calling
>>>>>                 vkQueueSubmit().
>>>>>                 This is delayed to a thread.
>>>>>
>>>>>                 From a single application thread, you can
>>>>>                 QueueSubmit() to 2 queues from 2 different devices.
>>>>>                 Each QueueSubmit to one queue has a dependency on
>>>>>                 the previous QueueSubmit on the other queue
>>>>>                 through an exported/imported semaphore.
>>>>>
>>>>>                 From the API point of view the state of the
>>>>>                 semaphore should be changed after each QueueSubmit().
>>>>>                 The problem is that it's not because of the thread
>>>>>                 and because you might have those 2 submission
>>>>>                 threads tied to different VkDevice/VkInstance or
>>>>>                 even different applications (synchronizing
>>>>>                 themselves outside the vulkan API).
>>>>>
>>>>>                 Hope that makes sense.
>>>>>                 It's not really easy to explain by mail, the best
>>>>>                 explanation is probably reading the test :
>>>>>                 https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>>>>>
>>>>>                 Like David mentioned you're not running into that
>>>>>                 issue right now, because you only dispatch to the
>>>>>                 thread under specific conditions.
>>>>>                 But I could build a case to force that and likely
>>>>>                 run into the same issue.
>>>>>
>>>>>                 -Lionel
>>>>>
>>>>>                 On 02/08/2019 07:33, Koenig, Christian wrote:
>>>>>
>>>>>                     Hi Lionel,
>>>>>
>>>>>                     Well could you describe once more what the
>>>>>                     problem is?
>>>>>
>>>>>                     Cause I don't fully understand why a rather
>>>>>                     normal tandem submission with two semaphores
>>>>>                     should fail in any way.
>>>>>
>>>>>                     Regards,
>>>>>                     Christian.
>>>>>
>>>>>                     Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>>>>>                     <lionel.g.landwerlin@intel.com>
>>>>>                     <mailto:lionel.g.landwerlin@intel.com>:
>>>>>                     There aren't CTS tests covering the issue I
>>>>>                     was mentioning.
>>>>>                     But we could add them.
>>>>>
>>>>>                     I don't have all the details regarding your
>>>>>                     implementation but even with
>>>>>                     the "semaphore thread", I could see it running
>>>>>                     into the same issues.
>>>>>                     What if a mix of binary & timeline semaphores
>>>>>                     are handed to vkQueueSubmit()?
>>>>>
>>>>>                     For example with queueA & queueB from 2
>>>>>                     different VkDevice :
>>>>>                     vkQueueSubmit(queueA, signal semA);
>>>>>                     vkQueueSubmit(queueA, wait on [semA,
>>>>>                     timelineSemB]); with
>>>>>                     timelineSemB triggering a wait before signal.
>>>>>                     vkQueueSubmit(queueB, signal semA);
>>>>>
>>>>>
>>>>>                     -Lionel
>>>>>
>>>>>                     On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>>>>>                     > Hi Lionel,
>>>>>                     >
>>>>>                     > By the Queue thread is a heavy thread, which
>>>>>                     is always resident in driver during
>>>>>                     application running, our guys don't like that.
>>>>>                     So we switch to Semaphore Thread, only when
>>>>>                     waitBeforeSignal of timeline happens, we spawn
>>>>>                     a thread to handle that wait. So we don't have
>>>>>                     your this issue.
>>>>>                     > By the way, I already pass all your CTS
>>>>>                     cases for now. I suggest you to switch to
>>>>>                     Semaphore Thread instead of Queue Thread as
>>>>>                     well. It works very well.
>>>>>                     >
>>>>>                     > -David
>>>>>                     >
>>>>>                     > -----Original Message-----
>>>>>                     > From: Lionel Landwerlin
>>>>>                     <lionel.g.landwerlin@intel.com>
>>>>>                     <mailto:lionel.g.landwerlin@intel.com>
>>>>>                     > Sent: Friday, August 2, 2019 4:52 AM
>>>>>                     > To: dri-devel
>>>>>                     <dri-devel@lists.freedesktop.org>
>>>>>                     <mailto:dri-devel@lists.freedesktop.org>;
>>>>>                     Koenig, Christian <Christian.Koenig@amd.com>
>>>>>                     <mailto:Christian.Koenig@amd.com>; Zhou,
>>>>>                     David(ChunMing) <David1.Zhou@amd.com>
>>>>>                     <mailto:David1.Zhou@amd.com>; Jason Ekstrand
>>>>>                     <jason@jlekstrand.net>
>>>>>                     <mailto:jason@jlekstrand.net>
>>>>>                     > Subject: Threaded submission & semaphore sharing
>>>>>                     >
>>>>>                     > Hi Christian, David,
>>>>>                     >
>>>>>                     > Sorry to report this so late in the process,
>>>>>                     but I think we found an issue not directly
>>>>>                     related to syncobj timelines themselves but
>>>>>                     with a side effect of the threaded submissions.
>>>>>                     >
>>>>>                     > Essentially we're failing a test in crucible :
>>>>>                     > func.sync.semaphore-fd.opaque-fd
>>>>>                     > This test create a single binary semaphore,
>>>>>                     shares it between 2 VkDevice/VkQueue.
>>>>>                     > Then in a loop it proceeds to submit
>>>>>                     workload alternating between the 2 VkQueue
>>>>>                     with one submit depending on the other.
>>>>>                     > It does so by waiting on the VkSemaphore
>>>>>                     signaled in the previous iteration and
>>>>>                     resignaling it.
>>>>>                     >
>>>>>                     > The problem for us is that once things are
>>>>>                     dispatched to the submission thread, the
>>>>>                     ordering of the submission is lost.
>>>>>                     > Because we have 2 devices and they both have
>>>>>                     their own submission thread.
>>>>>                     >
>>>>>                     > Jason suggested that we reestablish the
>>>>>                     ordering by having semaphores/syncobjs carry
>>>>>                     an additional uint64_t payload.
>>>>>                     > This 64bit integer would represent be an
>>>>>                     identifier that submission threads will
>>>>>                     WAIT_FOR_AVAILABLE on.
>>>>>                     >
>>>>>                     > The scenario would look like this :
>>>>>                     >       - vkQueueSubmit(queueA, signal on semA);
>>>>>                     >           - in the caller thread, this would
>>>>>                     increment the syncobj additional u64 payload
>>>>>                     and return it to userspace.
>>>>>                     >           - at some point the submission
>>>>>                     thread of queueA submits the workload and
>>>>>                     signal the syncobj of semA with value returned
>>>>>                     in the caller thread of vkQueueSubmit().
>>>>>                     >       - vkQueueSubmit(queueB, wait on semA);
>>>>>                     >           - in the caller thread, this would
>>>>>                     read the syncobj additional
>>>>>                     > u64 payload
>>>>>                     >           - at some point the submission
>>>>>                     thread of queueB will try to submit the work,
>>>>>                     but first it will WAIT_FOR_AVAILABLE the u64
>>>>>                     value returned in the step above
>>>>>                     >
>>>>>                     > Because we want the binary semaphores to be
>>>>>                     shared across processes and would like this to
>>>>>                     remain a single FD, the simplest location to
>>>>>                     store this additional u64 payload would be the
>>>>>                     DRM syncobj.
>>>>>                     > It would need an additional ioctl to read &
>>>>>                     increment the value.
>>>>>                     >
>>>>>                     > What do you think?
>>>>>                     >
>>>>>                     > -Lionel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


[-- Attachment #1.2: Type: text/html, Size: 41328 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
@ 2019-08-02  5:21 Koenig, Christian
  0 siblings, 0 replies; 14+ messages in thread
From: Koenig, Christian @ 2019-08-02  5:21 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 6375 bytes --]



Am 02.08.2019 07:17 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com>:
On 02/08/2019 08:08, Koenig, Christian wrote:
Hi Lionel,

Well that looks more like your test case is buggy.

According to the code the ctx1 queue always waits for sem1 and ctx2 queue always waits for sem2.


That's supposed to be the same underlying syncobj because it's exported from one VkDevice as opaque FD from sem1 and imported into sem2.

Well than that's still buggy and won't synchronize at all.

When ctx1 waits for a semaphore and then signals the same semaphore there is no guarantee that ctx2 will run in between jobs.

It's perfectly valid in this case to first run all jobs from ctx1 and then all jobs from ctx2.

It only prevents running both at the same time and as far as I can see that still works even with threaded submission.

You need at least two semaphores for a tandem submission.

Regards,
Christian.


This way there can't be any Synchronisation between the two.

Regards,
Christian.

Am 02.08.2019 06:55 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>:
Hey Christian,

The problem boils down to the fact that we don't immediately create dma fences when calling vkQueueSubmit().
This is delayed to a thread.

From a single application thread, you can QueueSubmit() to 2 queues from 2 different devices.
Each QueueSubmit to one queue has a dependency on the previous QueueSubmit on the other queue through an exported/imported semaphore.

From the API point of view the state of the semaphore should be changed after each QueueSubmit().
The problem is that it's not because of the thread and because you might have those 2 submission threads tied to different VkDevice/VkInstance or even different applications (synchronizing themselves outside the vulkan API).

Hope that makes sense.
It's not really easy to explain by mail, the best explanation is probably reading the test : https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788

Like David mentioned you're not running into that issue right now, because you only dispatch to the thread under specific conditions.
But I could build a case to force that and likely run into the same issue.

-Lionel

On 02/08/2019 07:33, Koenig, Christian wrote:
Hi Lionel,

Well could you describe once more what the problem is?

Cause I don't fully understand why a rather normal tandem submission with two semaphores should fail in any way.

Regards,
Christian.

Am 02.08.2019 06:28 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>:
There aren't CTS tests covering the issue I was mentioning.
But we could add them.

I don't have all the details regarding your implementation but even with
the "semaphore thread", I could see it running into the same issues.
What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?

For example with queueA & queueB from 2 different VkDevice :
     vkQueueSubmit(queueA, signal semA);
     vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with
timelineSemB triggering a wait before signal.
     vkQueueSubmit(queueB, signal semA);


-Lionel

On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
> Hi Lionel,
>
> By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue.
> By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
>
> -David
>
> -----Original Message-----
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>
> Sent: Friday, August 2, 2019 4:52 AM
> To: dri-devel <dri-devel@lists.freedesktop.org><mailto:dri-devel@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; Jason Ekstrand <jason@jlekstrand.net><mailto:jason@jlekstrand.net>
> Subject: Threaded submission & semaphore sharing
>
> Hi Christian, David,
>
> Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
>
> Essentially we're failing a test in crucible :
> func.sync.semaphore-fd.opaque-fd
> This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue.
> Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other.
> It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
>
> The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost.
> Because we have 2 devices and they both have their own submission thread.
>
> Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload.
> This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
>
> The scenario would look like this :
>       - vkQueueSubmit(queueA, signal on semA);
>           - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace.
>           - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit().
>       - vkQueueSubmit(queueB, wait on semA);
>           - in the caller thread, this would read the syncobj additional
> u64 payload
>           - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
>
> Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj.
> It would need an additional ioctl to read & increment the value.
>
> What do you think?
>
> -Lionel






[-- Attachment #1.2: Type: text/html, Size: 9407 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  5:08         ` Koenig, Christian
@ 2019-08-02  5:16           ` Lionel Landwerlin
  0 siblings, 0 replies; 14+ messages in thread
From: Lionel Landwerlin @ 2019-08-02  5:16 UTC (permalink / raw)
  To: Koenig, Christian; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5848 bytes --]

On 02/08/2019 08:08, Koenig, Christian wrote:
> Hi Lionel,
>
> Well that looks more like your test case is buggy.
>
> According to the code the ctx1 queue always waits for sem1 and ctx2 
> queue always waits for sem2.


That's supposed to be the same underlying syncobj because it's exported 
from one VkDevice as opaque FD from sem1 and imported into sem2.


>
> This way there can't be any Synchronisation between the two.
>
> Regards,
> Christian.
>
> Am 02.08.2019 06:55 schrieb Lionel Landwerlin 
> <lionel.g.landwerlin@intel.com>:
> Hey Christian,
>
> The problem boils down to the fact that we don't immediately create 
> dma fences when calling vkQueueSubmit().
> This is delayed to a thread.
>
> From a single application thread, you can QueueSubmit() to 2 queues 
> from 2 different devices.
> Each QueueSubmit to one queue has a dependency on the previous 
> QueueSubmit on the other queue through an exported/imported semaphore.
>
> From the API point of view the state of the semaphore should be 
> changed after each QueueSubmit().
> The problem is that it's not because of the thread and because you 
> might have those 2 submission threads tied to different 
> VkDevice/VkInstance or even different applications (synchronizing 
> themselves outside the vulkan API).
>
> Hope that makes sense.
> It's not really easy to explain by mail, the best explanation is 
> probably reading the test : 
> https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>
> Like David mentioned you're not running into that issue right now, 
> because you only dispatch to the thread under specific conditions.
> But I could build a case to force that and likely run into the same issue.
>
> -Lionel
>
> On 02/08/2019 07:33, Koenig, Christian wrote:
>> Hi Lionel,
>>
>> Well could you describe once more what the problem is?
>>
>> Cause I don't fully understand why a rather normal tandem submission 
>> with two semaphores should fail in any way.
>>
>> Regards,
>> Christian.
>>
>> Am 02.08.2019 06:28 schrieb Lionel Landwerlin 
>> <lionel.g.landwerlin@intel.com>:
>> There aren't CTS tests covering the issue I was mentioning.
>> But we could add them.
>>
>> I don't have all the details regarding your implementation but even with
>> the "semaphore thread", I could see it running into the same issues.
>> What if a mix of binary & timeline semaphores are handed to 
>> vkQueueSubmit()?
>>
>> For example with queueA & queueB from 2 different VkDevice :
>>      vkQueueSubmit(queueA, signal semA);
>>      vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with
>> timelineSemB triggering a wait before signal.
>>      vkQueueSubmit(queueB, signal semA);
>>
>>
>> -Lionel
>>
>> On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>> > Hi Lionel,
>> >
>> > By the Queue thread is a heavy thread, which is always resident in 
>> driver during application running, our guys don't like that. So we 
>> switch to Semaphore Thread, only when waitBeforeSignal of timeline 
>> happens, we spawn a thread to handle that wait. So we don't have your 
>> this issue.
>> > By the way, I already pass all your CTS cases for now. I suggest 
>> you to switch to Semaphore Thread instead of Queue Thread as well. It 
>> works very well.
>> >
>> > -David
>> >
>> > -----Original Message-----
>> > From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> > Sent: Friday, August 2, 2019 4:52 AM
>> > To: dri-devel <dri-devel@lists.freedesktop.org>; Koenig, Christian 
>> <Christian.Koenig@amd.com>; Zhou, David(ChunMing) 
>> <David1.Zhou@amd.com>; Jason Ekstrand <jason@jlekstrand.net>
>> > Subject: Threaded submission & semaphore sharing
>> >
>> > Hi Christian, David,
>> >
>> > Sorry to report this so late in the process, but I think we found 
>> an issue not directly related to syncobj timelines themselves but 
>> with a side effect of the threaded submissions.
>> >
>> > Essentially we're failing a test in crucible :
>> > func.sync.semaphore-fd.opaque-fd
>> > This test create a single binary semaphore, shares it between 2 
>> VkDevice/VkQueue.
>> > Then in a loop it proceeds to submit workload alternating between 
>> the 2 VkQueue with one submit depending on the other.
>> > It does so by waiting on the VkSemaphore signaled in the previous 
>> iteration and resignaling it.
>> >
>> > The problem for us is that once things are dispatched to the 
>> submission thread, the ordering of the submission is lost.
>> > Because we have 2 devices and they both have their own submission 
>> thread.
>> >
>> > Jason suggested that we reestablish the ordering by having 
>> semaphores/syncobjs carry an additional uint64_t payload.
>> > This 64bit integer would represent be an identifier that submission 
>> threads will WAIT_FOR_AVAILABLE on.
>> >
>> > The scenario would look like this :
>> >       - vkQueueSubmit(queueA, signal on semA);
>> >           - in the caller thread, this would increment the syncobj 
>> additional u64 payload and return it to userspace.
>> >           - at some point the submission thread of queueA submits 
>> the workload and signal the syncobj of semA with value returned in 
>> the caller thread of vkQueueSubmit().
>> >       - vkQueueSubmit(queueB, wait on semA);
>> >           - in the caller thread, this would read the syncobj 
>> additional
>> > u64 payload
>> >           - at some point the submission thread of queueB will try 
>> to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 
>> value returned in the step above
>> >
>> > Because we want the binary semaphores to be shared across processes 
>> and would like this to remain a single FD, the simplest location to 
>> store this additional u64 payload would be the DRM syncobj.
>> > It would need an additional ioctl to read & increment the value.
>> >
>> > What do you think?
>> >
>> > -Lionel
>>
>>
>


[-- Attachment #1.2: Type: text/html, Size: 12208 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  4:55       ` Lionel Landwerlin
@ 2019-08-02  5:08         ` Koenig, Christian
  2019-08-02  5:16           ` Lionel Landwerlin
  0 siblings, 1 reply; 14+ messages in thread
From: Koenig, Christian @ 2019-08-02  5:08 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5564 bytes --]

Hi Lionel,

Well that looks more like your test case is buggy.

According to the code the ctx1 queue always waits for sem1 and ctx2 queue always waits for sem2.

This way there can't be any Synchronisation between the two.

Regards,
Christian.

Am 02.08.2019 06:55 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com>:
Hey Christian,

The problem boils down to the fact that we don't immediately create dma fences when calling vkQueueSubmit().
This is delayed to a thread.

>From a single application thread, you can QueueSubmit() to 2 queues from 2 different devices.
Each QueueSubmit to one queue has a dependency on the previous QueueSubmit on the other queue through an exported/imported semaphore.

>From the API point of view the state of the semaphore should be changed after each QueueSubmit().
The problem is that it's not because of the thread and because you might have those 2 submission threads tied to different VkDevice/VkInstance or even different applications (synchronizing themselves outside the vulkan API).

Hope that makes sense.
It's not really easy to explain by mail, the best explanation is probably reading the test : https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788

Like David mentioned you're not running into that issue right now, because you only dispatch to the thread under specific conditions.
But I could build a case to force that and likely run into the same issue.

-Lionel

On 02/08/2019 07:33, Koenig, Christian wrote:
Hi Lionel,

Well could you describe once more what the problem is?

Cause I don't fully understand why a rather normal tandem submission with two semaphores should fail in any way.

Regards,
Christian.

Am 02.08.2019 06:28 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>:
There aren't CTS tests covering the issue I was mentioning.
But we could add them.

I don't have all the details regarding your implementation but even with
the "semaphore thread", I could see it running into the same issues.
What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?

For example with queueA & queueB from 2 different VkDevice :
     vkQueueSubmit(queueA, signal semA);
     vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with
timelineSemB triggering a wait before signal.
     vkQueueSubmit(queueB, signal semA);


-Lionel

On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
> Hi Lionel,
>
> By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue.
> By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
>
> -David
>
> -----Original Message-----
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com><mailto:lionel.g.landwerlin@intel.com>
> Sent: Friday, August 2, 2019 4:52 AM
> To: dri-devel <dri-devel@lists.freedesktop.org><mailto:dri-devel@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; Jason Ekstrand <jason@jlekstrand.net><mailto:jason@jlekstrand.net>
> Subject: Threaded submission & semaphore sharing
>
> Hi Christian, David,
>
> Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
>
> Essentially we're failing a test in crucible :
> func.sync.semaphore-fd.opaque-fd
> This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue.
> Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other.
> It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
>
> The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost.
> Because we have 2 devices and they both have their own submission thread.
>
> Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload.
> This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
>
> The scenario would look like this :
>       - vkQueueSubmit(queueA, signal on semA);
>           - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace.
>           - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit().
>       - vkQueueSubmit(queueB, wait on semA);
>           - in the caller thread, this would read the syncobj additional
> u64 payload
>           - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
>
> Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj.
> It would need an additional ioctl to read & increment the value.
>
> What do you think?
>
> -Lionel




[-- Attachment #1.2: Type: text/html, Size: 8611 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  4:33     ` Koenig, Christian
@ 2019-08-02  4:55       ` Lionel Landwerlin
  2019-08-02  5:08         ` Koenig, Christian
  0 siblings, 1 reply; 14+ messages in thread
From: Lionel Landwerlin @ 2019-08-02  4:55 UTC (permalink / raw)
  To: Koenig, Christian; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5154 bytes --]

Hey Christian,

The problem boils down to the fact that we don't immediately create dma 
fences when calling vkQueueSubmit().
This is delayed to a thread.

 From a single application thread, you can QueueSubmit() to 2 queues 
from 2 different devices.
Each QueueSubmit to one queue has a dependency on the previous 
QueueSubmit on the other queue through an exported/imported semaphore.

 From the API point of view the state of the semaphore should be changed 
after each QueueSubmit().
The problem is that it's not because of the thread and because you might 
have those 2 submission threads tied to different VkDevice/VkInstance or 
even different applications (synchronizing themselves outside the vulkan 
API).

Hope that makes sense.
It's not really easy to explain by mail, the best explanation is 
probably reading the test : 
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788

Like David mentioned you're not running into that issue right now, 
because you only dispatch to the thread under specific conditions.
But I could build a case to force that and likely run into the same issue.

-Lionel

On 02/08/2019 07:33, Koenig, Christian wrote:
> Hi Lionel,
>
> Well could you describe once more what the problem is?
>
> Cause I don't fully understand why a rather normal tandem submission 
> with two semaphores should fail in any way.
>
> Regards,
> Christian.
>
> Am 02.08.2019 06:28 schrieb Lionel Landwerlin 
> <lionel.g.landwerlin@intel.com>:
> There aren't CTS tests covering the issue I was mentioning.
> But we could add them.
>
> I don't have all the details regarding your implementation but even with
> the "semaphore thread", I could see it running into the same issues.
> What if a mix of binary & timeline semaphores are handed to 
> vkQueueSubmit()?
>
> For example with queueA & queueB from 2 different VkDevice :
>      vkQueueSubmit(queueA, signal semA);
>      vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with
> timelineSemB triggering a wait before signal.
>      vkQueueSubmit(queueB, signal semA);
>
>
> -Lionel
>
> On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
> > Hi Lionel,
> >
> > By the Queue thread is a heavy thread, which is always resident in 
> driver during application running, our guys don't like that. So we 
> switch to Semaphore Thread, only when waitBeforeSignal of timeline 
> happens, we spawn a thread to handle that wait. So we don't have your 
> this issue.
> > By the way, I already pass all your CTS cases for now. I suggest you 
> to switch to Semaphore Thread instead of Queue Thread as well. It 
> works very well.
> >
> > -David
> >
> > -----Original Message-----
> > From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > Sent: Friday, August 2, 2019 4:52 AM
> > To: dri-devel <dri-devel@lists.freedesktop.org>; Koenig, Christian 
> <Christian.Koenig@amd.com>; Zhou, David(ChunMing) 
> <David1.Zhou@amd.com>; Jason Ekstrand <jason@jlekstrand.net>
> > Subject: Threaded submission & semaphore sharing
> >
> > Hi Christian, David,
> >
> > Sorry to report this so late in the process, but I think we found an 
> issue not directly related to syncobj timelines themselves but with a 
> side effect of the threaded submissions.
> >
> > Essentially we're failing a test in crucible :
> > func.sync.semaphore-fd.opaque-fd
> > This test create a single binary semaphore, shares it between 2 
> VkDevice/VkQueue.
> > Then in a loop it proceeds to submit workload alternating between 
> the 2 VkQueue with one submit depending on the other.
> > It does so by waiting on the VkSemaphore signaled in the previous 
> iteration and resignaling it.
> >
> > The problem for us is that once things are dispatched to the 
> submission thread, the ordering of the submission is lost.
> > Because we have 2 devices and they both have their own submission 
> thread.
> >
> > Jason suggested that we reestablish the ordering by having 
> semaphores/syncobjs carry an additional uint64_t payload.
> > This 64bit integer would represent be an identifier that submission 
> threads will WAIT_FOR_AVAILABLE on.
> >
> > The scenario would look like this :
> >       - vkQueueSubmit(queueA, signal on semA);
> >           - in the caller thread, this would increment the syncobj 
> additional u64 payload and return it to userspace.
> >           - at some point the submission thread of queueA submits 
> the workload and signal the syncobj of semA with value returned in the 
> caller thread of vkQueueSubmit().
> >       - vkQueueSubmit(queueB, wait on semA);
> >           - in the caller thread, this would read the syncobj additional
> > u64 payload
> >           - at some point the submission thread of queueB will try 
> to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value 
> returned in the step above
> >
> > Because we want the binary semaphores to be shared across processes 
> and would like this to remain a single FD, the simplest location to 
> store this additional u64 payload would be the DRM syncobj.
> > It would need an additional ioctl to read & increment the value.
> >
> > What do you think?
> >
> > -Lionel
>
>


[-- Attachment #1.2: Type: text/html, Size: 9383 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  4:28   ` Lionel Landwerlin
@ 2019-08-02  4:33     ` Koenig, Christian
  2019-08-02  4:55       ` Lionel Landwerlin
  0 siblings, 1 reply; 14+ messages in thread
From: Koenig, Christian @ 2019-08-02  4:33 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: Jason Ekstrand, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3811 bytes --]

Hi Lionel,

Well could you describe once more what the problem is?

Cause I don't fully understand why a rather normal tandem submission with two semaphores should fail in any way.

Regards,
Christian.

Am 02.08.2019 06:28 schrieb Lionel Landwerlin <lionel.g.landwerlin@intel.com>:
There aren't CTS tests covering the issue I was mentioning.
But we could add them.

I don't have all the details regarding your implementation but even with
the "semaphore thread", I could see it running into the same issues.
What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?

For example with queueA & queueB from 2 different VkDevice :
     vkQueueSubmit(queueA, signal semA);
     vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with
timelineSemB triggering a wait before signal.
     vkQueueSubmit(queueB, signal semA);


-Lionel

On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
> Hi Lionel,
>
> By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue.
> By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
>
> -David
>
> -----Original Message-----
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Sent: Friday, August 2, 2019 4:52 AM
> To: dri-devel <dri-devel@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>; Jason Ekstrand <jason@jlekstrand.net>
> Subject: Threaded submission & semaphore sharing
>
> Hi Christian, David,
>
> Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
>
> Essentially we're failing a test in crucible :
> func.sync.semaphore-fd.opaque-fd
> This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue.
> Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other.
> It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
>
> The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost.
> Because we have 2 devices and they both have their own submission thread.
>
> Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload.
> This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
>
> The scenario would look like this :
>       - vkQueueSubmit(queueA, signal on semA);
>           - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace.
>           - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit().
>       - vkQueueSubmit(queueB, wait on semA);
>           - in the caller thread, this would read the syncobj additional
> u64 payload
>           - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
>
> Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj.
> It would need an additional ioctl to read & increment the value.
>
> What do you think?
>
> -Lionel



[-- Attachment #1.2: Type: text/html, Size: 5257 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Threaded submission & semaphore sharing
  2019-08-02  3:18 ` Zhou, David(ChunMing)
@ 2019-08-02  4:28   ` Lionel Landwerlin
  2019-08-02  4:33     ` Koenig, Christian
  0 siblings, 1 reply; 14+ messages in thread
From: Lionel Landwerlin @ 2019-08-02  4:28 UTC (permalink / raw)
  To: Zhou, David(ChunMing), dri-devel, Koenig, Christian, Jason Ekstrand

There aren't CTS tests covering the issue I was mentioning.
But we could add them.

I don't have all the details regarding your implementation but even with 
the "semaphore thread", I could see it running into the same issues.
What if a mix of binary & timeline semaphores are handed to vkQueueSubmit()?

For example with queueA & queueB from 2 different VkDevice :
     vkQueueSubmit(queueA, signal semA);
     vkQueueSubmit(queueA, wait on [semA, timelineSemB]); with 
timelineSemB triggering a wait before signal.
     vkQueueSubmit(queueB, signal semA);


-Lionel

On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
> Hi Lionel,
>
> By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue.
> By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.
>
> -David
>
> -----Original Message-----
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Sent: Friday, August 2, 2019 4:52 AM
> To: dri-devel <dri-devel@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>; Jason Ekstrand <jason@jlekstrand.net>
> Subject: Threaded submission & semaphore sharing
>
> Hi Christian, David,
>
> Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.
>
> Essentially we're failing a test in crucible :
> func.sync.semaphore-fd.opaque-fd
> This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue.
> Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other.
> It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.
>
> The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost.
> Because we have 2 devices and they both have their own submission thread.
>
> Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload.
> This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.
>
> The scenario would look like this :
>       - vkQueueSubmit(queueA, signal on semA);
>           - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace.
>           - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit().
>       - vkQueueSubmit(queueB, wait on semA);
>           - in the caller thread, this would read the syncobj additional
> u64 payload
>           - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above
>
> Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj.
> It would need an additional ioctl to read & increment the value.
>
> What do you think?
>
> -Lionel


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Threaded submission & semaphore sharing
  2019-08-01 20:52 Lionel Landwerlin
@ 2019-08-02  3:18 ` Zhou, David(ChunMing)
  2019-08-02  4:28   ` Lionel Landwerlin
  0 siblings, 1 reply; 14+ messages in thread
From: Zhou, David(ChunMing) @ 2019-08-02  3:18 UTC (permalink / raw)
  To: Lionel Landwerlin, dri-devel, Koenig, Christian, Jason Ekstrand

Hi Lionel,

By the Queue thread is a heavy thread, which is always resident in driver during application running, our guys don't like that. So we switch to Semaphore Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to handle that wait. So we don't have your this issue.
By the way, I already pass all your CTS cases for now. I suggest you to switch to Semaphore Thread instead of Queue Thread as well. It works very well.

-David

-----Original Message-----
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com> 
Sent: Friday, August 2, 2019 4:52 AM
To: dri-devel <dri-devel@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>; Jason Ekstrand <jason@jlekstrand.net>
Subject: Threaded submission & semaphore sharing

Hi Christian, David,

Sorry to report this so late in the process, but I think we found an issue not directly related to syncobj timelines themselves but with a side effect of the threaded submissions.

Essentially we're failing a test in crucible : 
func.sync.semaphore-fd.opaque-fd
This test create a single binary semaphore, shares it between 2 VkDevice/VkQueue.
Then in a loop it proceeds to submit workload alternating between the 2 VkQueue with one submit depending on the other.
It does so by waiting on the VkSemaphore signaled in the previous iteration and resignaling it.

The problem for us is that once things are dispatched to the submission thread, the ordering of the submission is lost.
Because we have 2 devices and they both have their own submission thread.

Jason suggested that we reestablish the ordering by having semaphores/syncobjs carry an additional uint64_t payload.
This 64bit integer would represent be an identifier that submission threads will WAIT_FOR_AVAILABLE on.

The scenario would look like this :
     - vkQueueSubmit(queueA, signal on semA);
         - in the caller thread, this would increment the syncobj additional u64 payload and return it to userspace.
         - at some point the submission thread of queueA submits the workload and signal the syncobj of semA with value returned in the caller thread of vkQueueSubmit().
     - vkQueueSubmit(queueB, wait on semA);
         - in the caller thread, this would read the syncobj additional
u64 payload
         - at some point the submission thread of queueB will try to submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step above

Because we want the binary semaphores to be shared across processes and would like this to remain a single FD, the simplest location to store this additional u64 payload would be the DRM syncobj.
It would need an additional ioctl to read & increment the value.

What do you think?

-Lionel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Threaded submission & semaphore sharing
@ 2019-08-01 20:52 Lionel Landwerlin
  2019-08-02  3:18 ` Zhou, David(ChunMing)
  0 siblings, 1 reply; 14+ messages in thread
From: Lionel Landwerlin @ 2019-08-01 20:52 UTC (permalink / raw)
  To: dri-devel, Koenig, Christian, Zhou, David(ChunMing), Jason Ekstrand

Hi Christian, David,

Sorry to report this so late in the process, but I think we found an 
issue not directly related to syncobj timelines themselves but with a 
side effect of the threaded submissions.

Essentially we're failing a test in crucible : 
func.sync.semaphore-fd.opaque-fd
This test create a single binary semaphore, shares it between 2 
VkDevice/VkQueue.
Then in a loop it proceeds to submit workload alternating between the 2 
VkQueue with one submit depending on the other.
It does so by waiting on the VkSemaphore signaled in the previous 
iteration and resignaling it.

The problem for us is that once things are dispatched to the submission 
thread, the ordering of the submission is lost.
Because we have 2 devices and they both have their own submission thread.

Jason suggested that we reestablish the ordering by having 
semaphores/syncobjs carry an additional uint64_t payload.
This 64bit integer would represent be an identifier that submission 
threads will WAIT_FOR_AVAILABLE on.

The scenario would look like this :
     - vkQueueSubmit(queueA, signal on semA);
         - in the caller thread, this would increment the syncobj 
additional u64 payload and return it to userspace.
         - at some point the submission thread of queueA submits the 
workload and signal the syncobj of semA with value returned in the 
caller thread of vkQueueSubmit().
     - vkQueueSubmit(queueB, wait on semA);
         - in the caller thread, this would read the syncobj additional 
u64 payload
         - at some point the submission thread of queueB will try to 
submit the work, but first it will WAIT_FOR_AVAILABLE the u64 value 
returned in the step above

Because we want the binary semaphores to be shared across processes and 
would like this to remain a single FD, the simplest location to store 
this additional u64 payload would be the DRM syncobj.
It would need an additional ioctl to read & increment the value.

What do you think?

-Lionel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-08-02 12:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-02  6:10 Threaded submission & semaphore sharing Koenig, Christian
2019-08-02  6:27 ` Lionel Landwerlin
2019-08-02  9:11   ` zhoucm1
2019-08-02  9:41     ` Lionel Landwerlin
2019-08-02 10:01       ` zhoucm1
2019-08-02 12:12         ` Lionel Landwerlin
  -- strict thread matches above, loose matches on Subject: below --
2019-08-02  5:21 Koenig, Christian
2019-08-01 20:52 Lionel Landwerlin
2019-08-02  3:18 ` Zhou, David(ChunMing)
2019-08-02  4:28   ` Lionel Landwerlin
2019-08-02  4:33     ` Koenig, Christian
2019-08-02  4:55       ` Lionel Landwerlin
2019-08-02  5:08         ` Koenig, Christian
2019-08-02  5:16           ` Lionel Landwerlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.