Hi,
Since Christian believes that we can't deadlock
the kernel with some changes there, we just need
to make everything nice for userspace too. Instead
of explaining how it will work, I will explain the
cases where future hardware (and its kernel
driver) will break existing userspace in order to
protect everybody from deadlocks. Anything that
uses implicit sync will be spared, so X and
Wayland will be fine, assuming they don't
import/export fences. Those use cases that do
import/export fences might or might not work,
depending on how the fences are used.
One of the necessities is that all fences will
become future fences. The semantics of
imported/exported fences will change completely
and will have new restrictions on the usage. The
restrictions are:
1) Android sync files will be impossible to
support, so won't be supported. (they don't allow
future fences)
2) Implicit sync and explicit sync will be
mutually exclusive between process. A process can
either use one or the other, but not both. This is
meant to prevent a deadlock condition with future
fences where any process can malevolently deadlock
execution of any other process, even execution of
a higher-privileged process. The kernel will
impose the following restrictions to protect
against the deadlock:
a) a process with an implicitly-sync'd
imported/exported buffer can't import/export a
fence from/to another process
b) a process with an imported/exported fence
can't import/export an implicitly-sync'd buffer
from/to another process
Alternative: A higher-privileged process could
enforce both restrictions instead of the kernel to
protect itself from the deadlock, but this would
be a can of worms for existing userspace. It would
be better if the kernel just broke unsafe
userspace on future hw, just like sync files.
If both implicit and explicit sync are allowed
to occur simultaneously, sending a future fence
that will never signal to any process will
deadlock that process after it acquires the
implicit sync lock, which is a sequence number
that the process is required to write to memory
and send an interrupt from the GPU in a finite
time. This is how the deadlock can happen:
* The process gets sequence number N from the
kernel for an implicitly-sync'd buffer.
* The process inserts (into the GPU
user-mapped queue) a wait for sequence number
N-1.
* The process inserts a wait for a fence, but it
doesn't know that it will never signal ==>
deadlock.
...
* The process inserts a command to write
sequence number N to a predetermined memory
location. (which will make the buffer idle and
send an interrupt to the kernel)
...
* The kernel will terminate the process because it
has never received the interrupt. (i.e. a
less-privileged process just killed a
more-privileged process)
It's the interrupt for implicit sync that never
arrived that caused the termination, and the only
way another process can cause it is by sending a
fence that will never signal. Thus,
importing/exporting fences from/to other processes
can't be allowed simultaneously with implicit
sync.
3) Compositors (and other privileged processes,
and display flipping) can't trust
imported/exported fences. They need a timeout
recovery mechanism from the beginning, and the
following are some possible solutions to timeouts:
a) use a CPU wait with a small absolute
timeout, and display the previous content on
timeout
b) use a GPU wait with a small absolute
timeout, and conditional rendering will choose
between the latest content (if signalled) and
previous content (if timed out)
The result would be that the desktop can run
close to 60 fps even if an app runs at 1 fps.
Redefining imported/exported fences and
breaking some users/OSs is the only way to have
userspace GPU command submission, and the
deadlock example here is the counterexample
proving that there is no other way.
So, what are the chances this is going to fly
with the ecosystem?
Thanks,
Marek