The GuC loader uses an asynchronous thread to fetch the firmware image
(aka "binary blob") from a file. This thread has then had to wait for
the mainline driver loading code to complete GEM initialisation before
it can convert the blob into a GEM object and transfer it to the GuC's
memory.

Unfortunately, with this scheme, the GuC loading was occurring *after*
the internally-generated batches used to initialise contexts had already
been submitted (using direct access to the ELSP, since the GuC wasn't
ready).

In addition, the one-way synchronisation mechanism resulted in the
firmware image being transfeered to the GuC at an indeterminate time
(just "sometime" after the mainline thread releases the device's
struct_mutex), with consequent confusion.

Rather than complicate the loader further by adding a second sync point
and arranging the handover of the struct_mutex from one thread to the
other, this commit reverses the synchronisation so that the mainline
thread waits for the asynchronous thread rather than vice versa. The
firmware loader now only has to save the reference to the blob and
signal a completion; the mainline thread can then continue with the rest
of the loading process when it catches up. The result is a much simpler
(and fully deterministic) process for loading the GuC firmware.