On 03/27/2015 01:45 AM, Daniel Vetter wrote:
On Thu, Mar 26, 2015 at 12:41:13PM -0700, yu.dai@intel.com wrote:
> From: Dave Gordon <david.s.gordon@intel.com>
> 
> In order to fully initialise the default contexts, we have to execute
> batchbuffer commands on the GPU engines. But we can't do that until any
> required firmware has been loaded, which may not be possible during
> driver load, because the filesystem(s) containing the firmware may not
> be mounted until later.
> 
> Therefore, we now allow the first call to the firmware-loading code to
> return -EAGAIN to indicate that it's not yet ready, and that it should
> be retried when the device is first opened from user code, by which
> time we expect that all required filesystems will have been mounted.
> The late-retry code will then re-attempt to load the firmware if the
> early attempt failed.

We've tried a similar approach a while back and it doesn't work well in
conjunction with rps - the hw tends to fall over if the context state
isn't properly initialized when going into rc6.
I believe patch 18/18 of this series is to notify GuC when RC6 is on and off. I don't know much details of implementation inside firmware, but I believe GuC will take care of it properly.
Why exactly can't we load that firmware right at boot-up, or at least
stall correctly until it's there?
Dave G. wrote a very good comment about this. Sorry I lost it during patch squashing. Here is a copy of it. I will amend the comment in next version.
The GuC loader uses an asynchronous thread to fetch the firmware image
(aka "binary blob") from a file. This thread has then had to wait for
the mainline driver loading code to complete GEM initialisation before
it can convert the blob into a GEM object and transfer it to the GuC's
memory.

Unfortunately, with this scheme, the GuC loading was occurring *after*
the internally-generated batches used to initialise contexts had already
been submitted (using direct access to the ELSP, since the GuC wasn't
ready).

In addition, the one-way synchronisation mechanism resulted in the
firmware image being transfeered to the GuC at an indeterminate time
(just "sometime" after the mainline thread releases the device's
struct_mutex), with consequent confusion.

Rather than complicate the loader further by adding a second sync point
and arranging the handover of the struct_mutex from one thread to the
other, this commit reverses the synchronisation so that the mainline
thread waits for the asynchronous thread rather than vice versa. The
firmware loader now only has to save the reference to the blob and
signal a completion; the mainline thread can then continue with the rest
of the loading process when it catches up. The result is a much simpler
(and fully deterministic) process for loading the GuC firmware.

-Alex