All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] Batch submission via GuC
@ 2015-06-15 18:36 Dave Gordon
  2015-06-15 18:36 ` [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c Dave Gordon
                   ` (17 more replies)
  0 siblings, 18 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky, Vinit Azad

This patch series enables command submission via the GuC. In this mode,
instead of the host CPU driving the execlist port directly, it hands
over work items to the GuC, using a doorbell mechanism to tell the GuC
that new items have been added to its work queue. The GuC then dispatches
contexts to the various GPU engines, and manages the resulting context-
switch interrupts. Completion of a batch is however still signalled to
the CPU; the GuC is not involved in handling user interrupts.

There are three subsequences within the patch series:

  drm/i915: Add i915_gem_object_write() to i915_gem.c
  drm/i915: Embedded microcontroller (uC) firmware loading support

These first two patches provide a generic framework for fetching the
firmware that may be required by any embedded microcontroller from a
file, using an asynchronous thread so that driver initialisation can
continue while the firmware is being fetched. It is hoped that this
framework is sufficiently general that it can be used for all curent
and future microcontrollers.

  drm/i915: Add GuC-related module parameters
  drm/i915: Add GuC-related header files
  drm/i915: GuC-specific firmware loader
  drm/i915: Debugfs interface to read GuC load status

These four patches complete the GuC loader. At this point in the sequence
we can load and activate the GuC firmware, but not submit any batches
through it. (This is nonetheless a potentially useful state, as the GuC
can do other useful work even when not handling batch submissions).

  drm/i915: Defer default hardware context initialisation until first
  drm/i915: Move execlists defines from .c to .h
  drm/i915: GuC submission setup, phase 1
  drm/i915: Enable GuC firmware log
  drm/i915: Implementation of GuC client
  drm/i915: Interrupt routing for GuC submission
  drm/i915: Integrate GuC-based command submission
  drm/i915: Debugfs interface for GuC submission statistics
  Documentation/drm: kerneldoc for GuC
  drm/i915: Enable GuC submission, where supported

In the final section, we implement the GuC submission mechanism, link
it into the (execlist-based) submission path, and finally enable it
(on supported platforms). On platforms where there is no GuC, or if
the GuC firmware cannot be found or is invalid, batch submission will
revert to using the execlist mechanism directly.

The GuC firmware itself is not included in this patchset; it is or will
be available for download from https://01.org/linuxgraphics/downloads/
This driver works with and requires GuC firmware revision 3.x. It will
not work with any firmware version 1.x, as the GuC protocol in those
revisions was incompatible and is no longer supported.

Prerequisites: GuC submission will expose existing inadequacies in
some of the existing codepaths unless certain other patches are applied.
In particular we will require some version of Michel Thierry's patch
  drm/i915/lrc: Update PDPx registers with lri commands
(because the GuC support light-restore, which execlist mode doesn't),
and my own 
  drm/i915: Allocate OLR more safely (workaround until OLR goes away)
because otherwise the changed timing means that there is an increased
risk of writing to a ringbuffer that is not currently pinned & mapped,
causing a kernel OOPS.

Alex Dai (10):
  drm/i915: Add i915_gem_object_write() to i915_gem.c
  drm/i915: Add GuC-related module parameters
  drm/i915: Add GuC-related header files
  drm/i915: GuC-specific firmware loader
  drm/i915: Debugfs interface to read GuC load status
  drm/i915: GuC submission setup, phase 1
  drm/i915: Enable GuC firmware log
  drm/i915: Implementation of GuC client
  drm/i915: Integrate GuC-based command submission
  Documentation/drm: kerneldoc for GuC

Dave Gordon (5):
  drm/i915: Embedded microcontroller (uC) firmware loading support
  drm/i915: Defer default hardware context initialisation until first
  drm/i915: Interrupt routing for GuC submission
  drm/i915: Debugfs interface for GuC submission statistics
  drm/i915: Enable GuC submission, where supported

Michael H. Nguyen (1):
  drm/i915: Move execlists defines from .c to .h

Ben Widawsky
Vinit Azad
  created the original versions on which some of these patches are based.

 Documentation/DocBook/drm.tmpl             |   19 +
 drivers/gpu/drm/i915/Makefile              |    7 +
 drivers/gpu/drm/i915/i915_debugfs.c        |  109 +++-
 drivers/gpu/drm/i915/i915_dma.c            |    4 +
 drivers/gpu/drm/i915/i915_drv.h            |   17 +
 drivers/gpu/drm/i915/i915_gem.c            |   39 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |   52 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  873 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_irq.c            |   48 ++
 drivers/gpu/drm/i915/i915_params.c         |    9 +
 drivers/gpu/drm/i915/i915_reg.h            |   92 ++-
 drivers/gpu/drm/i915/intel_guc.h           |  184 ++++++
 drivers/gpu/drm/i915/intel_guc_api.h       |  227 ++++++++
 drivers/gpu/drm/i915/intel_guc_loader.c    |  498 ++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c           |  128 ++--
 drivers/gpu/drm/i915/intel_lrc.h           |    8 +
 drivers/gpu/drm/i915/intel_uc_loader.c     |  312 ++++++++++
 drivers/gpu/drm/i915/intel_uc_loader.h     |   82 +++
 18 files changed, 2607 insertions(+), 101 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_guc_submission.c
 create mode 100644 drivers/gpu/drm/i915/intel_guc.h
 create mode 100644 drivers/gpu/drm/i915/intel_guc_api.h
 create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c
 create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c
 create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h

-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-15 20:09   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support Dave Gordon
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

i915_gem_object_write() is a generic function to copy data from a plain
linear buffer to a paged gem object.

We will need this for the microcontroller firmware loading support code.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |    2 ++
 drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 611fbd8..9094c06 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
 void i915_gem_object_free(struct drm_i915_gem_object *obj);
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			 const struct drm_i915_gem_object_ops *ops);
+int i915_gem_object_write(struct drm_i915_gem_object *obj,
+			  const void *data, size_t size);
 struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
 void i915_init_vm(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index be35f04..75d63c2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
 	return false;
 }
 
+/* Fill the @obj with the @size amount of @data */
+int i915_gem_object_write(struct drm_i915_gem_object *obj,
+			const void *data, size_t size)
+{
+	struct sg_table *sg;
+	size_t bytes;
+	int ret;
+
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
+
+	i915_gem_object_pin_pages(obj);
+
+	sg = obj->pages;
+
+	bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
+
+	i915_gem_object_unpin_pages(obj);
+
+	if (WARN_ON(bytes != size)) {
+		DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
+		i915_gem_object_put_pages(obj);
+		return -EIO;
+	}
+
+	return 0;
+}
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
  2015-06-15 18:36 ` [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-17 12:05   ` Daniel Vetter
  2015-06-15 18:36 ` [PATCH 03/15] drm/i915: Add GuC-related module parameters Dave Gordon
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

Current devices may contain one or more programmable microcontrollers
that need to have a firmware image (aka "binary blob") loaded from an
external medium and transferred to the device's memory.

This file provides generic support functions for doing this; they can
then be used by each uC-specific loader, thus reducing code duplication
and testing effort.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Alex Dai <yu.dai@intel.com>
---
 drivers/gpu/drm/i915/Makefile          |    3 +
 drivers/gpu/drm/i915/intel_uc_loader.c |  312 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_uc_loader.h |   82 +++++++++
 3 files changed, 397 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c
 create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index b7ddf48..607fa2a 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -38,6 +38,9 @@ i915-y += i915_cmd_parser.o \
 	  intel_ringbuffer.o \
 	  intel_uncore.o
 
+# generic ancilliary microcontroller support
+i915-y += intel_uc_loader.o
+
 # autogenerated null render state
 i915-y += intel_renderstate_gen6.o \
 	  intel_renderstate_gen7.o \
diff --git a/drivers/gpu/drm/i915/intel_uc_loader.c b/drivers/gpu/drm/i915/intel_uc_loader.c
new file mode 100644
index 0000000..26f0fbe
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_uc_loader.c
@@ -0,0 +1,312 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Author:
+ *	Dave Gordon <david.s.gordon@intel.com>
+ */
+#include <linux/firmware.h>
+#include "i915_drv.h"
+#include "intel_uc_loader.h"
+
+/**
+ * DOC: Generic embedded microcontroller (uC) firmware loading support
+ *
+ * The functions in this file provide a generic way to load the firmware that
+ * may be required by an embedded microcontroller (uC).
+ *
+ * The function intel_uc_fw_init() should be called early, and will initiate
+ * an asynchronous request to fetch the firmware image (aka "binary blob").
+ * When the image has been fetched into memory, the kernel will call back to
+ * uc_fw_fetch_callback() whose function is simply to record the completion
+ * status, and stash the firmware blob for later.
+ *
+ * At some convenient point after GEM initialisation, the driver should call
+ * intel_uc_fw_check(); this will check whether the asynchronous thread has
+ * completed and wait for it if not, check whether the image was successfully
+ * fetched; and then allow the callback() function (if provided) to validate
+ * the image and/or save the data in a GEM object.
+ *
+ * Thereafter the uC-specific code can transfer the data in the GEM object
+ * to the uC's memory (in some uC-specific way, not handled here).
+ *
+ * During driver shutdown, or if driver load is aborted, intel_uc_fw_fini()
+ * should be called to release any remaining resources.
+ */
+
+
+/*
+ * Called once per uC, late in driver initialisation. GEM is now ready, and so
+ * we can now create a GEM object to hold the uC firmware. But first, we must
+ * synchronise with the firmware-fetching thread that was initiated during
+ * early driver load, in intel_uc_fw_init(), and see whether it successfully
+ * fetched the firmware blob.
+ */
+static void
+uc_fw_fetch_wait(struct intel_uc_fw *uc_fw,
+		 bool callback(struct intel_uc_fw *))
+{
+	struct drm_device *dev = uc_fw->uc_dev;
+	struct drm_i915_gem_object *obj;
+	const struct firmware *fw;
+
+	DRM_DEBUG_DRIVER("before waiting: %s fw fetch status %d, fw %p\n",
+		uc_fw->uc_name, uc_fw->uc_fw_fetch_status, uc_fw->uc_fw_blob);
+
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+	WARN_ON(uc_fw->uc_fw_fetch_status != INTEL_UC_FIRMWARE_PENDING);
+
+	wait_for_completion(&uc_fw->uc_fw_fetched);
+
+	DRM_DEBUG_DRIVER("after waiting: %s fw fetch status %d, fw %p\n",
+		uc_fw->uc_name, uc_fw->uc_fw_fetch_status, uc_fw->uc_fw_blob);
+
+	fw = uc_fw->uc_fw_blob;
+	if (!fw) {
+		/* no firmware found; try again in case FS was not mounted */
+		DRM_DEBUG_DRIVER("retry fetching %s fw from <%s>\n",
+			uc_fw->uc_name, uc_fw->uc_fw_path);
+		if (request_firmware(&fw, uc_fw->uc_fw_path, &dev->pdev->dev))
+			goto fail;
+		if (!fw)
+			goto fail;
+		DRM_DEBUG_DRIVER("fetch %s fw from <%s> succeeded, fw %p\n",
+			uc_fw->uc_name, uc_fw->uc_fw_path, fw);
+		uc_fw->uc_fw_blob = fw;
+	}
+
+	/* Callback to the optional uC-specific function, if supplied */
+	if (callback && !callback(uc_fw))
+		goto fail;
+
+	/* Callback may have done the object allocation & write itself */
+	obj = uc_fw->uc_fw_obj;
+	if (!obj) {
+		size_t pages = round_up(fw->size, PAGE_SIZE);
+		obj = i915_gem_alloc_object(dev, pages);
+		if (!obj)
+			goto fail;
+
+		uc_fw->uc_fw_obj = obj;
+		uc_fw->uc_fw_size = fw->size;
+		if (i915_gem_object_write(obj, fw->data, fw->size))
+			goto fail;
+	}
+
+	DRM_DEBUG_DRIVER("%s fw fetch status SUCCESS\n", uc_fw->uc_name);
+	release_firmware(fw);
+	uc_fw->uc_fw_blob = NULL;
+	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_SUCCESS;
+	return;
+
+fail:
+	DRM_DEBUG_DRIVER("%s fw fetch status FAIL; fw %p, obj %p\n",
+		uc_fw->uc_name, fw, uc_fw->uc_fw_obj);
+	DRM_ERROR("Failed to fetch %s firmware from <%s>\n",
+		  uc_fw->uc_name, uc_fw->uc_fw_path);
+
+	obj = uc_fw->uc_fw_obj;
+	if (obj)
+		drm_gem_object_unreference(&obj->base);
+	uc_fw->uc_fw_obj = NULL;
+
+	release_firmware(fw);		/* OK even if fw is NULL */
+	uc_fw->uc_fw_blob = NULL;
+	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
+}
+
+/**
+ * intel_uc_fw_check() - check the status of the firmware fetching process
+ * @uc_fw:	intel_uc_fw structure
+ * @callback:	optional callback function to validate and/or save the image
+ *
+ * If the fetch is still PENDING, wait for completion first, then check and
+ * return the outcome. Subsequent calls will just return the same outcome
+ * based on the recorded fetch status, without triggering another fetch
+ * and without calling @callback().
+ *
+ * After this call, @uc_fw->uc_fw_fetch_status will show whether the firmware
+ * image was successfully fetched and transferred to a GEM object. If it is
+ * INTEL_UC_FIRMWARE_SUCCESS, @uc_fw->uc_fw_obj will be point to the GEM
+ * object, and the size of the image will be in @uc_fw->uc_fw_size.  For any
+ * other status value, these members are undefined.
+ *
+ * The @callback() parameter allows the uC-specific code to validate the
+ * image before it is saved, and also to override the default save mechanism
+ * if required. When it is called, @uc_fw->uc_fw_blob refers to the fetched
+ * firmware image, and @uc_fw->uc_fw_obj is NULL.
+ *
+ * If @callback() returns FALSE, the fetched image is considered invalid.
+ * The fetch status will be set to FAIL, and this function will return -EIO.
+ *
+ * If @callback() returns TRUE but doesn't set @uc_fw->uc_fw_obj, the image
+ * is considered good; it will be saved in a GEM object as described above.
+ * This is the default if no @callback() is supplied.
+ *
+ * If @callback() returns TRUE after setting @uc_fw->uc_fw_obj, this means
+ * that the image has already been saved by @callback() itself. This allows
+ * @callback() to customise the format of the data in the GEM object, for
+ * example if it needs to save only a portion of the loaded image.
+ *
+ * In all cases the firmware blob is released before this function returns.
+ *
+ * Return:	non-zero code on error
+ */
+int
+intel_uc_fw_check(struct intel_uc_fw *uc_fw,
+		  bool callback(struct intel_uc_fw *))
+{
+	WARN_ON(!mutex_is_locked(&uc_fw->uc_dev->struct_mutex));
+
+	if (uc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING) {
+		/* We only come here once */
+		uc_fw_fetch_wait(uc_fw, callback);
+		/* state must now be FAIL or SUCCESS */
+	}
+
+	DRM_DEBUG_DRIVER("%s fw fetch status %d\n",
+		uc_fw->uc_name, uc_fw->uc_fw_fetch_status);
+
+	switch (uc_fw->uc_fw_fetch_status) {
+	case INTEL_UC_FIRMWARE_FAIL:
+		/* something went wrong :( */
+		return -EIO;
+
+	case INTEL_UC_FIRMWARE_NONE:
+		/* no firmware, nothing to do (not an error) */
+		return 0;
+
+	case INTEL_UC_FIRMWARE_PENDING:
+	default:
+		/* "can't happen" */
+		WARN_ONCE(1, "%s fw <%s> invalid uc_fw_fetch_status %d!\n",
+			uc_fw->uc_name, uc_fw->uc_fw_path,
+			uc_fw->uc_fw_fetch_status);
+		return -ENXIO;
+
+	case INTEL_UC_FIRMWARE_SUCCESS:
+		return 0;
+	}
+}
+
+/*
+ * Callback from the kernel's asynchronous firmware-fetching subsystem.
+ * All we have to do here is stash the blob and signal completion.
+ * Error checking (e.g. no firmware found) is left to mainline code.
+ * We don't have (and don't want or need to acquire) the struct_mutex here.
+ */
+static void
+uc_fw_fetch_callback(const struct firmware *fw, void *context)
+{
+	struct intel_uc_fw *uc_fw = context;
+
+	WARN_ON(uc_fw->uc_fw_fetch_status != INTEL_UC_FIRMWARE_PENDING);
+	DRM_DEBUG_DRIVER("%s firmware fetch from <%s> status %d, fw %p\n",
+			uc_fw->uc_name, uc_fw->uc_fw_path,
+			uc_fw->uc_fw_fetch_status, fw);
+
+	uc_fw->uc_fw_blob = fw;
+	complete(&uc_fw->uc_fw_fetched);
+}
+
+/**
+ * intel_uc_fw_init() - initiate the fetching of firmware
+ * @dev:	drm device
+ * @uc_fw:	intel_uc_fw structure
+ * @name:	human-readable device name (e.g. "GuC") for messages
+ * @fw_path:	(trailing parts of) path to firmware (e.g. "i915/guc_fw.bin")
+ * 		@fw_path == NULL means "no firmware expected" (not an error),
+ * 		@fw_path == "" (empty string) means "firmware unknown" i.e.
+ * 		the uC requires firmware, but the driver doesn't know where
+ * 		to find the proper version. This will be logged as an error.
+ *
+ * This is called just once per uC, during driver loading. It is therefore
+ * automatically single-threaded and does not need to acquire any mutexes
+ * or spinlocks. OTOH, GEM is not yet fully initialised, so we can't do
+ * very much here.
+ *
+ * The main task here is to initiate the fetching of the uC firmware into
+ * memory, using the standard kernel firmware fetching support.  The actual
+ * fetching will then proceed asynchronously and in parallel with the rest
+ * of driver initialisation; later in the loading process we will synchronise
+ * with the firmware-fetching thread before transferring the firmware image
+ * firstly into a GEM object and then into the uC's memory.
+ */
+void
+intel_uc_fw_init(struct drm_device *dev, struct intel_uc_fw *uc_fw,
+		 const char *name, const char *fw_path)
+{
+	uc_fw->uc_dev = dev;
+	uc_fw->uc_name = name;
+	uc_fw->uc_fw_path = fw_path;
+	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_NONE;
+	uc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_NONE;
+	init_completion(&uc_fw->uc_fw_fetched);
+
+	if (fw_path == NULL)
+		return;
+
+	if (*fw_path == '\0') {
+		DRM_ERROR("No %s firmware known for this platform\n", name);
+		uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
+		return;
+	}
+
+	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_PENDING;
+
+	if (request_firmware_nowait(THIS_MODULE, true, fw_path,
+				    &dev->pdev->dev,
+				    GFP_KERNEL, uc_fw,
+				    uc_fw_fetch_callback)) {
+		DRM_ERROR("Failed to request %s firmware from <%s>\n",
+			  name, fw_path);
+		uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
+		return;
+	}
+
+	/* firmware fetch initiated, callback will signal completion */
+	DRM_DEBUG_DRIVER("initiated fetching %s firmware from <%s>\n",
+		name, fw_path);
+}
+
+/**
+ * intel_uc_fw_fini() - clean up all uC firmware-related data
+ * @uc_fw:	intel_uc_fw structure
+ */
+void
+intel_uc_fw_fini(struct intel_uc_fw *uc_fw)
+{
+	WARN_ON(!mutex_is_locked(&uc_fw->uc_dev->struct_mutex));
+
+	/*
+	 * Generally, the blob should have been released earlier, but
+	 * if the driver load were aborted after the fetch had been
+	 * initiated but not completed it might still be around
+	 */
+	if (uc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING)
+		wait_for_completion(&uc_fw->uc_fw_fetched);
+	release_firmware(uc_fw->uc_fw_blob);	/* OK even if NULL */
+	uc_fw->uc_fw_blob = NULL;
+
+	if (uc_fw->uc_fw_obj)
+		drm_gem_object_unreference(&uc_fw->uc_fw_obj->base);
+	uc_fw->uc_fw_obj = NULL;
+}
diff --git a/drivers/gpu/drm/i915/intel_uc_loader.h b/drivers/gpu/drm/i915/intel_uc_loader.h
new file mode 100644
index 0000000..22502ea
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_uc_loader.h
@@ -0,0 +1,82 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Author:
+ *	Dave Gordon <david.s.gordon@intel.com>
+ */
+#ifndef _INTEL_UC_LOADER_H
+#define _INTEL_UC_LOADER_H
+
+/*
+ * Microcontroller (uC) firmware loading support
+ */
+
+/*
+ * These values are used to track the stages of getting the required firmware
+ * into an onboard microcontroller. The common code tracks the phases of
+ * fetching the firmware (aka "binary blob") from an external file into a GEM
+ * object in the 'uc_fw_fetch_status' field below; the uC-specific DMA code
+ * uses the 'uc_fw_load_status' field to track the transfer from GEM object
+ * to uC memory.
+ *
+ * For the first (fetch) stage, the interpretation of the values is:
+ * NONE - no firmware is being fetched e.g. because there is no uC
+ * PENDING - firmware fetch initiated; callback will complete 'uc_fw_fetched'
+ * SUCCESS - uC firmware fetched into a GEM object and ready for use
+ * FAIL - something went wrong; uC firmware is not available
+ *
+ * The second (load) stage is simpler as there is no asynchronous handoff:
+ * NONE - no firmware is being loaded e.g. because there is no uC
+ * PENDING - firmware DMA load in progress
+ * SUCCESS - uC firmware loaded into uC memory and ready for use
+ * FAIL - something went wrong; uC firmware is not available
+ */
+enum intel_uc_fw_status {
+	INTEL_UC_FIRMWARE_FAIL = -1,
+	INTEL_UC_FIRMWARE_NONE = 0,
+	INTEL_UC_FIRMWARE_PENDING,
+	INTEL_UC_FIRMWARE_SUCCESS
+};
+
+/*
+ * This structure encapsulates all the data needed during the process of
+ * fetching, caching, and loading the firmware image into the uC.
+ */
+struct intel_uc_fw {
+	struct drm_device *		uc_dev;
+	const char *			uc_name;
+	const char *			uc_fw_path;
+	const struct firmware *		uc_fw_blob;
+	struct completion		uc_fw_fetched;
+	size_t				uc_fw_size;
+	struct drm_i915_gem_object *	uc_fw_obj;
+	enum intel_uc_fw_status		uc_fw_fetch_status;
+	enum intel_uc_fw_status		uc_fw_load_status;
+};
+
+void intel_uc_fw_init(struct drm_device *dev, struct intel_uc_fw *uc_fw,
+		const char *uc_name, const char *fw_path);
+int intel_uc_fw_check(struct intel_uc_fw *uc_fw,
+		bool callback(struct intel_uc_fw *));
+void intel_uc_fw_fini(struct intel_uc_fw *uc_fw);
+
+#endif
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 03/15] drm/i915: Add GuC-related module parameters
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
  2015-06-15 18:36 ` [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c Dave Gordon
  2015-06-15 18:36 ` [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-15 18:36 ` [PATCH 04/15] drm/i915: Add GuC-related header files Dave Gordon
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

Two new module parameters: "enable_guc_submission" which will turn
on submission of batchbuffers via the GuC (when implemented), and
"guc_log_level" which controls the level of debugging logged by the
GuC and captured by the host.

Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h    |    2 ++
 drivers/gpu/drm/i915/i915_params.c |    9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9094c06..731a1c8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2564,6 +2564,8 @@ struct i915_params {
 	bool reset;
 	bool disable_display;
 	bool disable_vtd_wa;
+	bool enable_guc_submission;
+	int guc_log_level;
 	int use_mmio_flip;
 	int mmio_debug;
 	bool verbose_state_checks;
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 8ac5a1b..5134095 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -54,6 +54,8 @@ struct i915_params i915 __read_mostly = {
 	.verbose_state_checks = 1,
 	.nuclear_pageflip = 0,
 	.edp_vswing = 0,
+	.enable_guc_submission = false,
+	.guc_log_level = -1,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -192,3 +194,10 @@ MODULE_PARM_DESC(edp_vswing,
 		 "Ignore/Override vswing pre-emph table selection from VBT "
 		 "(0=use value from vbt [default], 1=low power swing(200mV),"
 		 "2=default swing(400mV))");
+
+module_param_named(enable_guc_submission, i915.enable_guc_submission, bool, 0400);
+MODULE_PARM_DESC(enable_guc_submission, "Enable GuC submission (default:false)");
+
+module_param_named(guc_log_level, i915.guc_log_level, int, 0400);
+MODULE_PARM_DESC(guc_log_level,
+	"GuC firmware logging level (-1:disabled (default), 0-3:enabled)");
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 04/15] drm/i915: Add GuC-related header files
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (2 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 03/15] drm/i915: Add GuC-related module parameters Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-15 20:20   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 05/15] drm/i915: GuC-specific firmware loader Dave Gordon
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

intel_guc_api.h contains the subset of the GuC interface that we
will need for submission of commands through the GuC. These MUST
be kept in sync with the definitions used by the GuC firmware.

intel_guc.h defines structures and parameters relevant to loading
the GuC firmware and setting it running. Some of these also need
to be kept in sync with the firmware.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h      |    4 +-
 drivers/gpu/drm/i915/intel_guc.h     |  169 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc_api.h |  227 ++++++++++++++++++++++++++++++++++
 3 files changed, 399 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/intel_guc.h
 create mode 100644 drivers/gpu/drm/i915/intel_guc_api.h

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0f72c0e..0e4589e 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6762,7 +6762,9 @@ enum skl_disp_power_wells {
 #define   GEN9_PGCTL_SSB_EU311_ACK	(1 << 14)
 
 #define GEN7_MISCCPCTL			(0x9424)
-#define   GEN7_DOP_CLOCK_GATE_ENABLE	(1<<0)
+#define   GEN7_DOP_CLOCK_GATE_ENABLE		(1<<0)
+#define   GEN8_DOP_CLOCK_GATE_CFCLK_ENABLE	(1<<2)
+#define   GEN8_DOP_CLOCK_GATE_GUC_ENABLE	(1<<4)
 
 /* IVYBRIDGE DPF */
 #define GEN7_L3CDERRST1			0xB008 /* L3CD Error Status 1 */
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
new file mode 100644
index 0000000..82367c9
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -0,0 +1,169 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+#ifndef _INTEL_GUC_H_
+#define _INTEL_GUC_H_
+
+#include "intel_guc_api.h"
+#include "intel_uc_loader.h"
+
+#define GUC_DB_SIZE	PAGE_SIZE
+#define GUC_WQ_SIZE	(PAGE_SIZE * 2)
+
+struct i915_guc_client {
+	spinlock_t wq_lock;
+	struct drm_i915_gem_object *client_obj;
+	u32 priority;
+	off_t doorbell_offset;
+	off_t proc_desc_offset;
+	off_t wq_offset;
+	uint16_t doorbell_id;
+	uint32_t ctx_index;
+	uint32_t wq_size;
+	uint32_t wq_tail;
+	uint32_t cookie;
+
+	/* GuC submission statistics & status */
+	uint64_t submissions;
+	uint32_t q_fail;
+	uint32_t b_fail;
+	int retcode;
+};
+
+#define I915_MAX_DOORBELLS	256
+#define INVALID_DOORBELL_ID	I915_MAX_DOORBELLS
+
+#define INVALID_CTX_ID		(MAX_GUC_GPU_CONTEXTS+1)
+
+struct intel_guc {
+	/* Generic uC firmware management */
+	struct intel_uc_fw guc_fw;
+
+	/* GuC-specific additions */
+	uint32_t fw_ver_major;
+	uint32_t fw_ver_minor;
+
+	spinlock_t host2guc_lock;
+
+	struct drm_i915_gem_object *ctx_pool_obj;
+	struct drm_i915_gem_object *log_obj;
+	struct i915_guc_client *execbuf_client;
+
+	struct ida ctx_ids;
+	uint32_t log_flags;
+	int db_cacheline;
+	DECLARE_BITMAP(doorbell_bitmap, I915_MAX_DOORBELLS);
+
+	/* Action status & statistics */
+	uint64_t action_count;		/* Total commands issued	*/
+	uint32_t action_cmd;		/* Last command word		*/
+	uint32_t action_status;		/* Last return status		*/
+	uint32_t action_fail;		/* Total number of failures	*/
+	int32_t action_err;		/* Last error code		*/
+};
+
+/* Sizes of the parts of the GuC log object (if required) */
+#define GUC_LOG_DPC_PAGES	3
+#define GUC_LOG_ISR_PAGES	3
+#define GUC_LOG_CRASH_PAGES	1
+
+#define GUC_STATUS		0xc000
+#define   GS_BOOTROM_SHIFT	1
+#define   GS_BOOTROM_MASK	(0x7F << GS_BOOTROM_SHIFT)
+#define   GS_BOOTROM_RSA_FAILED	(0x50 << GS_BOOTROM_SHIFT)
+#define   GS_UKERNEL_SHIFT	8
+#define   GS_UKERNEL_MASK	(0xFF << GS_UKERNEL_SHIFT)
+#define   GS_UKERNEL_LAPIC_DONE	(0x30 << GS_UKERNEL_SHIFT)
+#define   GS_UKERNEL_DPC_ERROR	(0x60 << GS_UKERNEL_SHIFT)
+#define   GS_UKERNEL_READY	(0xF0 << GS_UKERNEL_SHIFT)
+#define   GS_MIA_SHIFT		16
+#define   GS_MIA_MASK		(0x7 << GS_MIA_SHIFT)
+
+#define GUC_WOPCM_SIZE		0xc050
+#define   GUC_WOPCM_SIZE_VALUE  (0x80 << 12)	/* 512KB */
+#define   GUC_WOPCM_OFFSET	0x80000		/* 512KB */
+#define SOFT_SCRATCH(n)		(0xc180 + ((n) * 4))
+
+#define UOS_CSS_HEADER_OFFSET	0
+#define UOS_CSS_HEADER_SIZE	0x80
+#define   UOS_VER_MINOR_OFFSET	0x44
+#define   UOS_VER_MAJOR_OFFSET	0x46
+#define UOS_RSA_SIG_SIZE	0x100
+#define UOS_CSS_SIGNING_SIZE	0x204
+
+#define UOS_RSA_SCRATCH_0	0xc200
+#define DMA_ADDR_0_LOW		0xc300
+#define DMA_ADDR_0_HIGH		0xc304
+#define DMA_ADDR_1_LOW		0xc308
+#define DMA_ADDR_1_HIGH		0xc30c
+#define   DMA_ADDRESS_SPACE_WOPCM	(7 << 16)
+#define   DMA_ADDRESS_SPACE_GTT		(8 << 16)
+#define DMA_COPY_SIZE		0xc310
+#define DMA_CTRL		0xc314
+#define   UOS_MOVE		(1<<4)
+#define   START_DMA		(1<<0)
+#define DMA_GUC_WOPCM_OFFSET	0xc340
+
+#define GEN8_GT_PM_CONFIG		0x138140
+#define GEN9_GT_PM_CONFIG		0x13816c
+#define   GEN8_GT_DOORBELL_ENABLE	(1<<0)
+
+#define GEN8_GTCR 0x4274
+#define   GEN8_GTCR_INVALIDATE (1<<0)
+
+#define GUC_ARAT_C6DIS		0xA178
+
+#define GUC_SHIM_CONTROL	(0xc064)
+#define   GUC_DISABLE_SRAM_INIT_TO_ZEROES	(1<<0)
+#define   GUC_ENABLE_READ_CACHE_LOGIC		(1<<1)
+#define   GUC_ENABLE_MIA_CACHING		(1<<2)
+#define   GUC_GEN10_MSGCH_ENABLE		(1<<4)
+#define   GUC_ENABLE_READ_CACHE_FOR_SRAM_DATA	(1<<9)
+#define   GUC_ENABLE_READ_CACHE_FOR_WOPCM_DATA	(1<<10)
+#define   GUC_ENABLE_MIA_CLOCK_GATING		(1<<15)
+#define   GUC_GEN10_SHIM_WC_ENABLE		(1<<21)
+
+#define GUC_SHIM_CONTROL_VALUE	(GUC_DISABLE_SRAM_INIT_TO_ZEROES | \
+				 GUC_ENABLE_READ_CACHE_LOGIC | \
+				 GUC_ENABLE_MIA_CACHING | \
+				 GUC_ENABLE_READ_CACHE_FOR_SRAM_DATA | \
+				 GUC_ENABLE_READ_CACHE_FOR_WOPCM_DATA)
+
+#define HOST2GUC_INTERRUPT	0xc4c8
+#define   HOST2GUC_TRIGGER	(1<<0)
+
+#define DRBMISC1		0x1984
+#define   DOORBELL_ENABLE	(1<<0)
+
+#define GEN8_DRBREGL(x) (0x1000 + (x) * 8)
+#define   GEN8_DRB_VALID (1<<0)
+#define GEN8_DRBREGU(x) (0x1000 + (x) * 8 + 4)
+
+#define DE_GUCRMR		0x44054
+
+#define GUC_BCS_RCS_IER		0xC550
+#define GUC_VCS2_VCS1_IER	0xC554
+#define GUC_WD_VECS_IER		0xC558
+#define GUC_PM_P24C_IER		0xC55C
+
+#endif
diff --git a/drivers/gpu/drm/i915/intel_guc_api.h b/drivers/gpu/drm/i915/intel_guc_api.h
new file mode 100644
index 0000000..6f2cff2
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_guc_api.h
@@ -0,0 +1,227 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+#ifndef _INTEL_GUC_API_H
+#define _INTEL_GUC_API_H
+
+#define GFXCORE_FAMILY_GEN8		11
+#define GFXCORE_FAMILY_GEN9		12
+#define GFXCORE_FAMILY_FORCE_ULONG	0x7fffffff
+
+#define GUC_CTX_PRIORITY_CRITICAL	0
+#define GUC_CTX_PRIORITY_HIGH		1
+#define GUC_CTX_PRIORITY_NORMAL		2
+#define GUC_CTX_PRIORITY_LOW		3
+
+#define MAX_GUC_GPU_CONTEXTS		1024
+
+/* Work queue item header definitions */
+#define WQ_STATUS_ACTIVE		1
+#define WQ_STATUS_SUSPENDED		2
+#define WQ_STATUS_CMD_ERROR		3
+#define WQ_STATUS_ENGINE_ID_NOT_USED	4
+#define WQ_STATUS_SUSPENDED_FROM_RESET	5
+#define WQ_TYPE_SHIFT			0
+#define   WQ_TYPE_BATCH_BUF		(0x1 << WQ_TYPE_SHIFT)
+#define   WQ_TYPE_PSEUDO		(0x2 << WQ_TYPE_SHIFT)
+#define   WQ_TYPE_INORDER		(0x3 << WQ_TYPE_SHIFT)
+#define WQ_TARGET_SHIFT			10
+#define WQ_LEN_SHIFT			16
+#define WQ_NO_WCFLUSH_WAIT		(1 << 27)
+#define WQ_PRESENT_WORKLOAD		(1 << 28)
+#define WQ_WORKLOAD_SHIFT		29
+#define   WQ_WORKLOAD_GENERAL		(0 << WQ_WORKLOAD_SHIFT)
+#define   WQ_WORKLOAD_GPGPU		(1 << WQ_WORKLOAD_SHIFT)
+#define   WQ_WORKLOAD_TOUCH		(2 << WQ_WORKLOAD_SHIFT)
+
+#define WQ_RING_TAIL_SHIFT		20
+#define WQ_RING_TAIL_MASK		(0x7FF << WQ_RING_TAIL_SHIFT)
+
+#define GUC_DOORBELL_ENABLED		1
+#define GUC_DOORBELL_DISABLED		0
+
+#define GUC_CTX_DESC_ATTR_ACTIVE	(1 << 0)
+#define GUC_CTX_DESC_ATTR_PENDING_DB	(1 << 1)
+#define GUC_CTX_DESC_ATTR_KERNEL	(1 << 2)
+#define GUC_CTX_DESC_ATTR_PREEMPT	(1 << 3)
+#define GUC_CTX_DESC_ATTR_RESET		(1 << 4)
+#define GUC_CTX_DESC_ATTR_WQLOCKED	(1 << 5)
+#define GUC_CTX_DESC_ATTR_PCH		(1 << 6)
+
+/* The guc control data is 10 DWORDs */
+#define GUC_CTL_CTXINFO			0
+#define   GUC_CTL_CTXNUM_IN16_SHIFT	0
+#define   GUC_CTL_BASE_ADDR_SHIFT	12
+#define GUC_CTL_ARAT_HIGH		1
+#define GUC_CTL_ARAT_LOW		2
+#define GUC_CTL_DEVICE_INFO		3
+#define   GUC_CTL_GTTYPE_SHIFT		0
+#define   GUC_CTL_COREFAMILY_SHIFT	7
+#define GUC_CTL_LOG_PARAMS		4
+#define   GUC_LOG_VALID			(1 << 0)
+#define   GUC_LOG_NOTIFY_ON_HALF_FULL	(1 << 1)
+#define   GUC_LOG_ALLOC_IN_MEGABYTE	(1 << 3)
+#define   GUC_LOG_CRASH_SHIFT		4
+#define   GUC_LOG_DPC_SHIFT		6
+#define   GUC_LOG_ISR_SHIFT		9
+#define   GUC_LOG_BUF_ADDR_SHIFT	12
+#define GUC_CTL_PAGE_FAULT_CONTROL	5
+#define GUC_CTL_WA			6
+#define   GUC_CTL_WA_UK_BY_DRIVER	(1 << 3)
+#define GUC_CTL_FEATURE			7
+#define   GUC_CTL_VCS2_ENABLED		(1 << 0)
+#define   GUC_CTL_KERNEL_SUBMISSIONS	(1 << 1)
+#define   GUC_CTL_FEATURE2		(1 << 2)
+#define   GUC_CTL_POWER_GATING		(1 << 3)
+#define   GUC_CTL_DISABLE_SCHEDULER	(1 << 4)
+#define   GUC_CTL_PREEMPTION_LOG	(1 << 5)
+#define   GUC_CTL_ENABLE_SLPC		(1 << 7)
+#define GUC_CTL_DEBUG			8
+#define   GUC_LOG_VERBOSITY_SHIFT	0
+#define   GUC_LOG_VERBOSITY_LOW		(0 << GUC_LOG_VERBOSITY_SHIFT)
+#define   GUC_LOG_VERBOSITY_MED		(1 << GUC_LOG_VERBOSITY_SHIFT)
+#define   GUC_LOG_VERBOSITY_HIGH	(2 << GUC_LOG_VERBOSITY_SHIFT)
+#define   GUC_LOG_VERBOSITY_ULTRA	(3 << GUC_LOG_VERBOSITY_SHIFT)
+/* Verbosity range-check limits, without the shift */
+#define	  GUC_LOG_VERBOSITY_MIN		0
+#define	  GUC_LOG_VERBOSITY_MAX		3
+
+#define GUC_CTL_MAX_DWORDS		(GUC_CTL_DEBUG + 1)
+
+struct guc_doorbell_info {
+	u32 db_status;
+	u32 cookie;
+	u32 reserved[14];
+} __packed;
+
+union guc_doorbell_qw {
+	struct {
+		u32 db_status;
+		u32 cookie;
+	};
+	u64 value_qw;
+} __packed;
+
+struct guc_process_desc {
+	u32 context_id;
+	u64 db_base_addr;
+	u32 head;
+	u32 tail;
+	u32 error_offset;
+	u64 wq_base_addr;
+	u32 wq_size_bytes;
+	u32 wq_status;
+	u32 engine_presence;
+	u32 priority;
+	u32 reserved[30];
+} __packed;
+
+/* Work item for submitting workloads into work queue of GuC. */
+struct guc_wq_item {
+	u32 header;
+	u32 context_desc;
+	u32 ring_tail;
+	u32 fence_id;
+} __packed;
+
+/* engine id and context id is packed into guc_execlist_context.context_id*/
+#define GUC_ELC_CTXID_OFFSET		0
+#define GUC_ELC_ENGINE_OFFSET		29
+
+/* The execlist context including software and HW information */
+struct guc_execlist_context {
+	u32 context_desc;
+	u32 context_id;
+	u32 ring_status;
+	u32 ring_lcra;
+	u32 ring_begin;
+	u32 ring_end;
+	u32 ring_next_free_location;
+	u32 ring_current_tail_pointer_value;
+	u8 engine_state_submit_value;
+	u8 engine_state_wait_value;
+	u16 pagefault_count;
+	u16 engine_submit_queue_count;
+} __packed;
+
+/*Context descriptor for communicating between uKernel and Driver*/
+struct guc_context_desc {
+	u32 sched_common_area;
+	u32 context_id;
+	u32 pas_id;
+	u8 engines_used;
+	u64 db_trigger_cpu;
+	u32 db_trigger_uk;
+	u64 db_trigger_phy;
+	u16 db_id;
+
+	struct guc_execlist_context lrc[I915_NUM_RINGS];
+
+	u8 attribute;
+
+	u32 priority;
+
+	u32 wq_sampled_tail_offset;
+	u32 wq_total_submit_enqueues;
+
+	u32 process_desc;
+	u32 wq_addr;
+	u32 wq_size;
+
+	u32 engine_presence;
+
+	u32 reserved0[1];
+	u64 reserved1[1];
+
+	u64 desc_private;
+} __packed;
+
+/* This Action will be programmed in C180 - SOFT_SCRATCH_O_REG */
+enum host2guc_action {
+	HOST2GUC_ACTION_DEFAULT = 0x0,
+	HOST2GUC_ACTION_SAMPLE_FORCEWAKE = 0x6,
+	HOST2GUC_ACTION_ALLOCATE_DOORBELL = 0x10,
+	HOST2GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
+	HOST2GUC_ACTION_SLPC_REQUEST = 0x3003,
+	HOST2GUC_ACTION_LIMIT
+};
+
+/*
+ * The GuC sends its response to a command by overwriting the
+ * command in SS0. The response is distinguishable from a command
+ * by the fact that all the MASK bits are set. The remaining bits
+ * give more detail.
+ */
+#define	GUC2HOST_RESPONSE_MASK	0xF0000000
+#define	GUC2HOST_IS_RESPONSE(x) \
+	(((x) & GUC2HOST_RESPONSE_MASK) == GUC2HOST_RESPONSE_MASK)
+#define	GUC2HOST_STATUS(x)	(GUC2HOST_RESPONSE_MASK | (x))
+
+/* GUC will return status back to SOFT_SCRATCH_O_REG */
+enum guc2host_status {
+	GUC2HOST_STATUS_SUCCESS = GUC2HOST_STATUS(0x0),
+	GUC2HOST_STATUS_ALLOCATE_DOORBELL_FAIL = GUC2HOST_STATUS(0x10),
+	GUC2HOST_STATUS_DEALLOCATE_DOORBELL_FAIL = GUC2HOST_STATUS(0x20),
+	GUC2HOST_STATUS_GENERIC_FAIL = GUC2HOST_STATUS(0x0000F000)
+};
+
+#endif
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (3 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 04/15] drm/i915: Add GuC-related header files Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-15 20:30   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status Dave Gordon
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

This uses the unified firmware loader to fetch the firmware image,
then loads it into the GuC's memory via a dedicated DMA engine.

This patch is derived from GuC loading work originally done by
Vinit Azad and Ben Widawsky. It has been reconstructed to accord
with the unified firmware loading mechanism by Dave Gordon as well
as new firmware layout etc.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/Makefile           |    3 +
 drivers/gpu/drm/i915/i915_dma.c         |    4 +
 drivers/gpu/drm/i915/i915_drv.h         |   11 +
 drivers/gpu/drm/i915/i915_gem.c         |    2 +
 drivers/gpu/drm/i915/intel_guc.h        |    5 +
 drivers/gpu/drm/i915/intel_guc_loader.c |  416 +++++++++++++++++++++++++++++++
 6 files changed, 441 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 607fa2a..15818df 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -41,6 +41,9 @@ i915-y += i915_cmd_parser.o \
 # generic ancilliary microcontroller support
 i915-y += intel_uc_loader.o
 
+# general-purpose microcontroller (GuC) support
+i915-y += intel_guc_loader.o
+
 # autogenerated null render state
 i915-y += intel_renderstate_gen6.o \
 	  intel_renderstate_gen7.o \
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 3424863..028dbff 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -465,6 +465,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
 
 cleanup_gem:
 	mutex_lock(&dev->struct_mutex);
+	intel_guc_ucode_fini(dev);
 	i915_gem_cleanup_ringbuffer(dev);
 	i915_gem_context_fini(dev);
 	mutex_unlock(&dev->struct_mutex);
@@ -862,6 +863,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 
 	intel_uncore_init(dev);
 
+	intel_guc_ucode_init(dev);
+
 	/* Load CSR Firmware for SKL */
 	intel_csr_ucode_init(dev);
 
@@ -1113,6 +1116,7 @@ int i915_driver_unload(struct drm_device *dev)
 	flush_workqueue(dev_priv->wq);
 
 	mutex_lock(&dev->struct_mutex);
+	intel_guc_ucode_fini(dev);
 	i915_gem_cleanup_ringbuffer(dev);
 	i915_gem_context_fini(dev);
 	mutex_unlock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 731a1c8..f47cde7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -50,6 +50,7 @@
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include "intel_guc.h"
 
 /* General customization:
  */
@@ -1669,6 +1670,8 @@ struct drm_i915_private {
 
 	struct intel_gmbus gmbus[GMBUS_NUM_PINS];
 
+	struct intel_guc guc;
+
 	/** gmbus_mutex protects against concurrent usage of the single hw gmbus
 	 * controller on different i2c buses. */
 	struct mutex gmbus_mutex;
@@ -1913,6 +1916,11 @@ static inline struct drm_i915_private *dev_to_i915(struct device *dev)
 	return to_i915(dev_get_drvdata(dev));
 }
 
+static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
+{
+	return container_of(guc, struct drm_i915_private, guc);
+}
+
 /* Iterate over initialised rings */
 #define for_each_ring(ring__, dev_priv__, i__) \
 	for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
@@ -2503,6 +2511,9 @@ struct drm_i915_cmd_table {
 
 #define HAS_CSR(dev)	(IS_SKYLAKE(dev))
 
+#define HAS_GUC_UCODE(dev)	(IS_GEN9(dev))
+#define HAS_GUC_SCHED(dev)	(IS_GEN9(dev))
+
 #define INTEL_PCH_DEVICE_ID_MASK		0xff00
 #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
 #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 75d63c2..cd4a865 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5023,6 +5023,8 @@ i915_gem_init_hw(struct drm_device *dev)
 		i915_gem_cleanup_ringbuffer(dev);
 	}
 
+	/* We can't enable contexts until all firmware is loaded */
+	ret = intel_guc_ucode_load(dev, false);
 	ret = i915_gem_context_enable(dev_priv);
 	if (ret && ret != -EIO) {
 		DRM_ERROR("Context enable failed %d\n", ret);
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 82367c9..0b44265 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -166,4 +166,9 @@ struct intel_guc {
 #define GUC_WD_VECS_IER		0xC558
 #define GUC_PM_P24C_IER		0xC55C
 
+/* intel_guc_loader.c */
+extern void intel_guc_ucode_init(struct drm_device *dev);
+extern int intel_guc_ucode_load(struct drm_device *dev, bool wait);
+extern void intel_guc_ucode_fini(struct drm_device *dev);
+
 #endif
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
new file mode 100644
index 0000000..16eef4c
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -0,0 +1,416 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Vinit Azad <vinit.azad@intel.com>
+ *    Ben Widawsky <ben@bwidawsk.net>
+ *    Dave Gordon <david.s.gordon@intel.com>
+ *    Alex Dai <yu.dai@intel.com>
+ */
+#include <linux/firmware.h>
+#include "i915_drv.h"
+#include "intel_guc.h"
+
+/**
+ * DOC: GuC
+ *
+ * intel_guc:
+ * Top level structure of guc. It handles firmware loading and manages client
+ * pool and doorbells. intel_guc owns a i915_guc_client to replace the legacy
+ * ExecList submission.
+ *
+ * Firmware versioning:
+ * The firmware build process will generate a version header file with major and
+ * minor version defined. The versions are built into CSS header of firmware.
+ * i915 kernel driver set the minimal firmware version required per platform.
+ * The firmware installation package will install (symbolic link) proper version
+ * of firmware.
+ *
+ * GuC address space:
+ * GuC does not allow any gfx GGTT address that falls into range [0, WOPCM_TOP),
+ * which is reserved for Boot ROM, SRAM and WOPCM. Currently this top address is
+ * 512K. In order to exclude 0-512K address space from GGTT, all gfx objects
+ * used by GuC is pinned with PIN_OFFSET_BIAS along with size of WOPCM.
+ *
+ * Firmware log:
+ * Firmware log is enabled by setting i915.guc_log_level to non-negative level.
+ * Log data is printed out via reading debugfs i915_guc_log_dump. Reading from
+ * i915_guc_load_status will print out firmware loading status and scratch
+ * registers value.
+ *
+ */
+
+#define I915_SKL_GUC_UCODE "i915/skl_guc_ver3.bin"
+MODULE_FIRMWARE(I915_SKL_GUC_UCODE);
+
+static u32 get_gttype(struct drm_device *dev)
+{
+	/* XXX: GT type based on PCI device ID? field seems unused by fw */
+	return 0;
+}
+
+static u32 get_core_family(struct drm_device *dev)
+{
+	switch (INTEL_INFO(dev)->gen) {
+	case 8:
+		return GFXCORE_FAMILY_GEN8;
+	case 9:
+		return GFXCORE_FAMILY_GEN9;
+	default:
+		DRM_ERROR("GUC: unknown gen for scheduler init\n");
+		return GFXCORE_FAMILY_FORCE_ULONG;
+	}
+}
+
+static void set_guc_init_params(struct drm_i915_private *dev_priv)
+{
+	struct intel_guc *guc = &dev_priv->guc;
+	u32 params[GUC_CTL_MAX_DWORDS];
+	int i;
+
+	memset(&params, 0, sizeof(params));
+
+	params[GUC_CTL_DEVICE_INFO] |=
+		(get_gttype(dev_priv->dev) << GUC_CTL_GTTYPE_SHIFT) |
+		(get_core_family(dev_priv->dev) << GUC_CTL_COREFAMILY_SHIFT);
+
+	/* GuC ARAT increment is 10 ns. GuC default scheduler quantum is one
+	 * second. This ARAR is calculated by:
+	 * Scheduler-Quantum-in-ns / ARAT-increment-in-ns = 1000000000 / 10
+	 */
+	params[GUC_CTL_ARAT_HIGH] = 0;
+	params[GUC_CTL_ARAT_LOW] = 100000000;
+
+	params[GUC_CTL_WA] |= GUC_CTL_WA_UK_BY_DRIVER;
+
+	params[GUC_CTL_FEATURE] |= GUC_CTL_DISABLE_SCHEDULER |
+			GUC_CTL_VCS2_ENABLED;
+
+	if (i915.guc_log_level >= 0) {
+		params[GUC_CTL_LOG_PARAMS] = guc->log_flags;
+		params[GUC_CTL_DEBUG] =
+			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
+	}
+
+	I915_WRITE(SOFT_SCRATCH(0), 0);
+
+	for (i = 0; i < GUC_CTL_MAX_DWORDS; i++)
+		I915_WRITE(SOFT_SCRATCH(1 + i), params[i]);
+}
+
+/* Read GuC status register (GUC_STATUS)
+ * Return true if get a success code from normal boot or RC6 boot
+ */
+static inline bool i915_guc_get_status(struct drm_i915_private *dev_priv,
+					u32 *status)
+{
+	*status = I915_READ(GUC_STATUS);
+	return (((*status) & GS_UKERNEL_MASK) == GS_UKERNEL_READY ||
+		((*status) & GS_UKERNEL_MASK) == GS_UKERNEL_LAPIC_DONE);
+}
+
+/* Transfers the firmware image to RAM for execution by the microcontroller.
+ *
+ * GuC Firmware layout:
+ * +-------------------------------+  ----
+ * |          CSS header           |  128B
+ * +-------------------------------+  ----
+ * |             uCode             |
+ * +-------------------------------+  ----
+ * |         RSA signature         |  256B
+ * +-------------------------------+  ----
+ * |         RSA public Key        |  256B
+ * +-------------------------------+  ----
+ * |       Public key modulus      |    4B
+ * +-------------------------------+  ----
+ *
+ * Architecturally, the DMA engine is bidirectional, and in can potentially
+ * even transfer between GTT locations. This functionality is left out of the
+ * API for now as there is no need for it.
+ *
+ * Be note that GuC need the CSS header plus uKernel code to be copied as one
+ * chunk of data. RSA sig data is loaded via MMIO.
+ */
+static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
+{
+	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
+	struct drm_i915_gem_object *fw_obj = guc_fw->uc_fw_obj;
+	unsigned long offset;
+	struct sg_table *sg = fw_obj->pages;
+	u32 status, ucode_size, rsa[UOS_RSA_SIG_SIZE / sizeof(u32)];
+	int i, ret = 0;
+
+	/* uCode size, also is where RSA signature starts */
+	offset = ucode_size = guc_fw->uc_fw_size - UOS_CSS_SIGNING_SIZE;
+
+	/* Copy RSA signature from the fw image to HW for verification */
+	sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, UOS_RSA_SIG_SIZE, offset);
+	for (i = 0; i < UOS_RSA_SIG_SIZE / sizeof(u32); i++)
+		I915_WRITE(UOS_RSA_SCRATCH_0 + i * sizeof(u32), rsa[i]);
+
+	/* Set the source address for the new blob */
+	offset = i915_gem_obj_ggtt_offset(fw_obj);
+	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
+	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
+
+	/* Set the destination. Current uCode expects an 8k stack starting from
+	 * offset 0. */
+	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
+
+	/* XXX: The image is automatically transfered to SRAM after the RSA
+	 * verification. This is why the address space is chosen as such. */
+	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
+
+	I915_WRITE(DMA_COPY_SIZE, ucode_size);
+
+	/* Finally start the DMA */
+	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
+
+	/*
+	 * Spin-wait for the DMA to complete & the GuC to start up.
+	 * NB: Docs recommend not using the interrupt for completion.
+	 * FIXME: what's a valid timeout?
+	 */
+	ret = wait_for_atomic(i915_guc_get_status(dev_priv, &status), 10);
+
+	DRM_DEBUG_DRIVER("DMA status = 0x%x, GuC status 0x%x\n",
+			I915_READ(DMA_CTRL), status);
+
+	if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
+		DRM_ERROR("%s firmware signature verification failed\n",
+			guc_fw->uc_name);
+		ret = -ENOEXEC;
+	}
+
+	DRM_DEBUG_DRIVER("GuC fw load status %s %d\n",
+			ret ? "FAIL" : "SUCCESS", ret);
+
+	return ret;
+}
+
+/*
+ * Loads the GuC firmware blob in to the MinuteIA.
+ */
+static int guc_ucode_xfer(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
+	bool pinned = false;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	ret = i915_gem_obj_ggtt_pin(guc_fw->uc_fw_obj, 0, 0);
+	if (ret)
+		goto out;
+	pinned = true;
+
+	/* init WOPCM */
+	I915_WRITE(GUC_WOPCM_SIZE, GUC_WOPCM_SIZE_VALUE);
+	I915_WRITE(DMA_GUC_WOPCM_OFFSET, GUC_WOPCM_OFFSET);
+
+	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
+	I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+
+	/* Set MMIO/WA for GuC init */
+	I915_WRITE(DRBMISC1, DOORBELL_ENABLE);
+
+	/* Enable MIA caching. GuC clock gating is disabled. */
+	I915_WRITE(GUC_SHIM_CONTROL, GUC_SHIM_CONTROL_VALUE);
+
+	/* WaC6DisallowByGfxPause*/
+	I915_WRITE(GEN6_GFXPAUSE, 0x30FFF);
+
+	if (IS_SKYLAKE(dev))
+		I915_WRITE(GEN9_GT_PM_CONFIG, GEN8_GT_DOORBELL_ENABLE);
+	else
+		I915_WRITE(GEN8_GT_PM_CONFIG, GEN8_GT_DOORBELL_ENABLE);
+
+	if (IS_GEN9(dev)) {
+		/* DOP Clock Gating Enable for GuC clocks */
+		I915_WRITE(GEN7_MISCCPCTL, (GEN8_DOP_CLOCK_GATE_GUC_ENABLE |
+					    I915_READ(GEN7_MISCCPCTL)));
+
+		/* allows for 5us before GT can go to RC6 */
+		I915_WRITE(GUC_ARAT_C6DIS, 0x1FF);
+	}
+
+	set_guc_init_params(dev_priv);
+
+	ret = guc_ucode_xfer_dma(dev_priv);
+
+	/* We can free the object pages now, and we would, except we might as
+	 * well keep it around for suspend/resume. Instead, we just wait for the
+	 * DMA to complete, and unpin the object
+	 */
+
+out:
+	if (pinned)
+		i915_gem_object_ggtt_unpin(guc_fw->uc_fw_obj);
+	else
+		DRM_DEBUG_DRIVER("pin failed %d\n", ret);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+
+	return ret;
+}
+
+/*
+ * Check the firmware that was found; if it's the wrong size or the wrong
+ * version, return FALSE. If it's OK, save the data in a GEM object and
+ * return TRUE.
+ *
+ * The GuC firmware image has the version number embedded at a well-known
+ * offset within the firmware blob; note that major / minor version are
+ * TWO bytes each (i.e. u16), although all pointers and offsets are defined
+ * in terms of bytes (u8).
+ */
+static bool
+guc_ucode_check(struct intel_uc_fw *guc_fw)
+{
+	struct intel_guc *guc = container_of(guc_fw, struct intel_guc, guc_fw);
+	const u8 *css_header = guc_fw->uc_fw_blob->data + UOS_CSS_HEADER_OFFSET;
+	uint32_t major, minor;
+
+	DRM_DEBUG_DRIVER("firmware file size %zu (minimum %u)\n",
+		guc_fw->uc_fw_blob->size, UOS_CSS_SIGNING_SIZE);
+
+	/* Check the size of the blob first */
+	if (guc_fw->uc_fw_blob->size <= UOS_CSS_SIGNING_SIZE)
+		return false;
+
+	major = *(u16 *)(css_header + UOS_VER_MAJOR_OFFSET);
+	minor = *(u16 *)(css_header + UOS_VER_MINOR_OFFSET);
+
+	if (major != guc->fw_ver_major || minor < guc->fw_ver_minor) {
+		DRM_ERROR("GuC firmware version %d.%d, required %d.%d\n",
+			 major, minor, guc->fw_ver_major, guc->fw_ver_minor);
+		return false;
+	}
+
+	DRM_DEBUG_DRIVER("firmware version %d.%d OK (minimum %d.%d)\n",
+		 major, minor, guc->fw_ver_major, guc->fw_ver_minor);
+
+	/* Override default GEM object allocation-and-save here, if needed */
+	return true;
+}
+
+/**
+ * intel_guc_ucode_init() - initiate a firmware loading request
+ *
+ * Called early during driver load, before GEM is initialised.
+ * Driver is single threaded, so no mutex is required.
+ */
+void intel_guc_ucode_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_guc *guc = &dev_priv->guc;
+	struct intel_uc_fw *guc_fw = &guc->guc_fw;
+	const char *path;
+
+	if (!HAS_GUC_SCHED(dev))
+		i915.enable_guc_submission = false;
+
+	if (!HAS_GUC_UCODE(dev)) {
+		path = NULL;
+	} else if (IS_SKYLAKE(dev)) {
+		path = I915_SKL_GUC_UCODE;
+		guc->fw_ver_major = 3;
+		guc->fw_ver_minor = 0;
+	} else {
+		i915.enable_guc_submission = false;
+		path = "";	/* unknown device */
+	}
+
+	intel_uc_fw_init(dev, guc_fw, "GuC", path);
+}
+
+/**
+ * intel_guc_ucode_load() - load GuC uCode into the device
+ *
+ * Called from gem_init_hw() during driver loading and also after a GPU reset.
+ * Checks that the firmware fetching process has succeeded, and if so transfers
+ * the loaded image to the hardware.
+ *
+ * However, there are a few checks to do first. The very first call should have
+ * (wait == FALSE), but the fetch_state will still be PENDING as the firmware may
+ * not be available that early. Therefore, on this first call, we just return.
+ *
+ * The second call should come from the first open of the device (wait == TRUE).
+ * This is a good time to load the firmware into the device, as by this point it
+ * must be available.
+ *
+ * Any subsequent calls are expected to have wait == FALSE, and indicate that the
+ * hardware has been reset and so the firmware should be reloaded.
+ */
+int intel_guc_ucode_load(struct drm_device *dev, bool wait)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
+	int err;
+
+	DRM_DEBUG_DRIVER("GuC: wait %d, fetch status %d, load status %d\n",
+		wait, guc_fw->uc_fw_fetch_status, guc_fw->uc_fw_load_status);
+
+	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING && !wait)
+		return -EAGAIN;
+
+	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_NONE)
+		return 0;
+
+	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_SUCCESS &&
+	    guc_fw->uc_fw_load_status == INTEL_UC_FIRMWARE_FAIL)
+		return -ENOEXEC;
+
+	guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_PENDING;
+	err = intel_uc_fw_check(guc_fw, guc_ucode_check);
+	if (err)
+		goto fail;
+
+	err = guc_ucode_xfer(dev);
+	if (err)
+		goto fail;
+
+	guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_SUCCESS;
+
+	return 0;
+
+fail:
+	if (guc_fw->uc_fw_load_status == INTEL_UC_FIRMWARE_PENDING)
+		guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_FAIL;
+
+	DRM_ERROR("Failed to initialize GuC, error %d\n", err);
+
+	return err;
+}
+
+/**
+ * intel_guc_ucode_fini() - clean up all allocated resources
+ */
+void intel_guc_ucode_fini(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
+
+	intel_uc_fw_fini(guc_fw);
+}
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (4 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 05/15] drm/i915: GuC-specific firmware loader Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-16  9:40   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open Dave Gordon
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

The new node provides access to the status of the common uC loader
code and the GuC-specific loader; also the scratch registers used
for communicatio between the i915 driver and the GuC firmware.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   37 +++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 47636f3..c52a745 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2352,6 +2352,42 @@ static int i915_llc(struct seq_file *m, void *data)
 	return 0;
 }
 
+static void i915_uc_load_status_info(struct seq_file *m, struct intel_uc_fw *uc_fw)
+{
+	seq_printf(m, "%s firmware status:\n\tpath: <%s>\n\tfetch: %d\n\tload: %d\n",
+			uc_fw->uc_name,
+			uc_fw->uc_fw_path,
+			uc_fw->uc_fw_fetch_status,
+			uc_fw->uc_fw_load_status);
+}
+
+static int i915_guc_load_status_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_i915_private *dev_priv = node->minor->dev->dev_private;
+	u32 tmp, i;
+
+	if (!HAS_GUC_UCODE(dev_priv->dev))
+		return 0;
+
+	i915_uc_load_status_info(m, &dev_priv->guc.guc_fw);
+
+	tmp = I915_READ(GUC_STATUS);
+
+	seq_printf(m, "\nGuC status 0x%08x:\n", tmp);
+	seq_printf(m, "\tBootrom status = 0x%x\n",
+		(tmp & GS_BOOTROM_MASK) >> GS_BOOTROM_SHIFT);
+	seq_printf(m, "\tuKernel status = 0x%x\n",
+		(tmp & GS_UKERNEL_MASK) >> GS_UKERNEL_SHIFT);
+	seq_printf(m, "\tMIA Core status = 0x%x\n",
+		(tmp & GS_MIA_MASK) >> GS_MIA_SHIFT);
+	seq_puts(m, "\nScratch registers value:\n");
+	for (i = 0; i < 16; i++)
+		seq_printf(m, "\t%2d: \t0x%x\n", i, I915_READ(SOFT_SCRATCH(i)));
+
+	return 0;
+}
+
 static int i915_edp_psr_status(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -5046,6 +5082,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gem_hws_bsd", i915_hws_info, 0, (void *)VCS},
 	{"i915_gem_hws_vebox", i915_hws_info, 0, (void *)VECS},
 	{"i915_gem_batch_pool", i915_gem_batch_pool_info, 0},
+	{"i915_guc_load_status", i915_guc_load_status_info, 0},
 	{"i915_frequency_info", i915_frequency_info, 0},
 	{"i915_hangcheck_info", i915_hangcheck_info, 0},
 	{"i915_drpc_info", i915_drpc_info, 0},
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (5 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-16  9:35   ` Chris Wilson
  2015-06-17 12:18   ` Daniel Vetter
  2015-06-15 18:36 ` [PATCH 08/15] drm/i915: Move execlists defines from .c to .h Dave Gordon
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

In order to fully initialise the default contexts, we have to execute
batchbuffer commands on the GPU engines. But in the case of GuC-based
batch submission, we can't do that until any required firmware has
been loaded, which may not be possible during driver load, because the
filesystem(s) containing the firmware may not be mounted until later.

Therefore, we now allow the first call to the firmware-loading code to
return -EAGAIN to indicate that it's not yet ready, and that it should
be retried when the device is first opened from user code, by which
time we expect that all required filesystems will have been mounted.
The late-retry code will then re-attempt to load the firmware if the
early attempt failed.

If the late retry fails, the current open-in-progress will fail, but
the recovery code will disable GuC submission and reset the GPU and
driver. The next open will therefore be in non-GuC mode, and will be
allowed to complete even if the GuC cannot be loaded or used.

Issue: VIZ-4884
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Alex Dai <yu.dai@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |    2 ++
 drivers/gpu/drm/i915/i915_gem.c         |    9 +++++-
 drivers/gpu/drm/i915/i915_gem_context.c |   52 ++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |   48 ++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f47cde7..a1fc278 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1837,6 +1837,7 @@ struct drm_i915_private {
 	/* hda/i915 audio component */
 	bool audio_component_registered;
 
+	bool contexts_ready;
 	uint32_t hw_context_size;
 	struct list_head context_list;
 
@@ -2614,6 +2615,7 @@ void i915_queue_hangcheck(struct drm_device *dev);
 __printf(3, 4)
 void i915_handle_error(struct drm_device *dev, bool wedged,
 		       const char *fmt, ...);
+void i915_handle_guc_error(struct drm_device *dev, int err);
 
 extern void intel_irq_init(struct drm_i915_private *dev_priv);
 extern void intel_hpd_init(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cd4a865..d1a8862 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5025,8 +5025,15 @@ i915_gem_init_hw(struct drm_device *dev)
 
 	/* We can't enable contexts until all firmware is loaded */
 	ret = intel_guc_ucode_load(dev, false);
+	if (ret == -EAGAIN) {
+		ret = 0;
+		goto out;		/* too early */
+	}
+
 	ret = i915_gem_context_enable(dev_priv);
-	if (ret && ret != -EIO) {
+	if (ret == 0) {
+		dev_priv->contexts_ready = true;
+	} else if (ret && ret != -EIO) {
 		DRM_ERROR("Context enable failed %d\n", ret);
 		i915_gem_cleanup_ringbuffer(dev);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 133afcf..debbfc9 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -447,23 +447,65 @@ static int context_idr_cleanup(int id, void *p, void *data)
 	return 0;
 }
 
+/* Complete any late initialisation here */
+static int i915_gem_context_first_open(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	/*
+	 * We can't enable contexts until all firmware is loaded. This
+	 * call shouldn't return -EAGAIN because we pass wait=true, but
+	 * it can still fail with code -EIO if the GuC doesn't respond,
+	 * or -ENOEXEC if the GuC firmware image is invalid.
+	 */
+	ret = intel_guc_ucode_load(dev, true);
+	WARN_ON(ret == -EAGAIN);
+
+	/*
+	 * If an error occurred and GuC submission has been requested, we can
+	 * attempt recovery by disabling GuC submission and reinitialising
+	 * the GPU and driver. We then fail this open() anyway, but the next
+	 * attempt will find that GuC submission is already disabled, and so
+	 * proceed to complete context initialisation in non-GuC mode instead.
+	 */
+	if (ret && i915.enable_guc_submission) {
+		i915_handle_guc_error(dev, ret);
+		return ret;
+	}
+
+	ret = i915_gem_context_enable(dev_priv);
+	if (ret == 0)
+		dev_priv->contexts_ready = true;
+	return ret;
+}
+
 int i915_gem_context_open(struct drm_device *dev, struct drm_file *file)
 {
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct intel_context *ctx;
+	int ret = 0;
 
 	idr_init(&file_priv->context_idr);
 
 	mutex_lock(&dev->struct_mutex);
-	ctx = i915_gem_create_context(dev, file_priv);
+
+	if (!dev_priv->contexts_ready)
+		ret = i915_gem_context_first_open(dev);
+
+	if (ret == 0) {
+		ctx = i915_gem_create_context(dev, file_priv);
+		if (IS_ERR(ctx))
+			ret = PTR_ERR(ctx);
+	}
+
 	mutex_unlock(&dev->struct_mutex);
 
-	if (IS_ERR(ctx)) {
+	if (ret)
 		idr_destroy(&file_priv->context_idr);
-		return PTR_ERR(ctx);
-	}
 
-	return 0;
+	return ret;
 }
 
 void i915_gem_context_close(struct drm_device *dev, struct drm_file *file)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 56db9e74..f7dcf8d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2665,6 +2665,54 @@ void i915_handle_error(struct drm_device *dev, bool wedged,
 	i915_reset_and_wakeup(dev);
 }
 
+/**
+ * i915_handle_error - handle a GuC error
+ * @dev: drm device
+ *
+ * If the GuC can't be (re-)initialised, disable GuC submission and
+ * then reset and reinitialise the rest of the GPU, so that we can
+ * fall back to operating in ELSP mode. Don't bother capturing error
+ * state, because it probably isn't relevant here.
+ *
+ * Unlike i915_handle_error() above, this is called with the global
+ * struct_mutex held, so we need to release it after setting the
+ * reset-in-progress bit so that other threads can make progress,
+ * and reacquire it after the reset is complete.
+ */
+void i915_handle_guc_error(struct drm_device *dev, int err)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	DRM_ERROR("GuC failure %d, disabling GuC submission\n", err);
+	i915.enable_guc_submission = false;
+
+	i915_report_and_clear_eir(dev);	/* unlikely? */
+
+	atomic_set_mask(I915_RESET_IN_PROGRESS_FLAG,
+			&dev_priv->gpu_error.reset_counter);
+
+	mutex_unlock(&dev->struct_mutex);
+
+	/*
+	 * Wakeup waiting processes so that the reset function
+	 * i915_reset_and_wakeup doesn't deadlock trying to grab
+	 * various locks. By bumping the reset counter first, the woken
+	 * processes will see a reset in progress and back off,
+	 * releasing their locks and then wait for the reset completion.
+	 * We must do this for _all_ gpu waiters that might hold locks
+	 * that the reset work needs to acquire.
+	 *
+	 * Note: The wake_up serves as the required memory barrier to
+	 * ensure that the waiters see the updated value of the reset
+	 * counter atomic_t.
+	 */
+	i915_error_wake_up(dev_priv, false);
+
+	i915_reset_and_wakeup(dev);
+
+	mutex_lock(&dev->struct_mutex);
+}
+
 /* Called from drm generic code, passed 'crtc' which
  * we use as a pipe index
  */
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 08/15] drm/i915: Move execlists defines from .c to .h
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (6 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-16  9:37   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 09/15] drm/i915: GuC submission setup, phase 1 Dave Gordon
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx; +Cc: Michael H. Nguyen

From: "Michael H. Nguyen" <michael.h.nguyen@intel.com>

Move defines from intel_lrc.c to i915_reg.h so they are accessible
to the GuC submission code; and expose a previously static function
in the execlist code which will also be required for GuC submission.

Issue: VIZ-4884
Signed-off-by: Michael H. Nguyen <michael.h.nguyen@intel.com>
Signed-off-by: Alex Dai <yu.dai@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |   77 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c |   80 +-------------------------------------
 drivers/gpu/drm/i915/intel_lrc.h |    2 +
 3 files changed, 81 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0e4589e..56f81de 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7833,4 +7833,81 @@ enum skl_disp_power_wells {
 #define _PALETTE_A (dev_priv->info.display_mmio_offset + 0xa000)
 #define _PALETTE_B (dev_priv->info.display_mmio_offset + 0xa800)
 
+/* Exec Lists */
+#define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
+#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
+#define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
+
+#define RING_EXECLIST_QFULL		(1 << 0x2)
+#define RING_EXECLIST1_VALID		(1 << 0x3)
+#define RING_EXECLIST0_VALID		(1 << 0x4)
+#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
+#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
+#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
+
+#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
+#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
+#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
+#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
+#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
+#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
+
+#define CTX_LRI_HEADER_0		0x01
+#define CTX_CONTEXT_CONTROL		0x02
+#define CTX_RING_HEAD			0x04
+#define CTX_RING_TAIL			0x06
+#define CTX_RING_BUFFER_START		0x08
+#define CTX_RING_BUFFER_CONTROL		0x0a
+#define CTX_BB_HEAD_U			0x0c
+#define CTX_BB_HEAD_L			0x0e
+#define CTX_BB_STATE			0x10
+#define CTX_SECOND_BB_HEAD_U		0x12
+#define CTX_SECOND_BB_HEAD_L		0x14
+#define CTX_SECOND_BB_STATE		0x16
+#define CTX_BB_PER_CTX_PTR		0x18
+#define CTX_RCS_INDIRECT_CTX		0x1a
+#define CTX_RCS_INDIRECT_CTX_OFFSET	0x1c
+#define CTX_LRI_HEADER_1		0x21
+#define CTX_CTX_TIMESTAMP		0x22
+#define CTX_PDP3_UDW			0x24
+#define CTX_PDP3_LDW			0x26
+#define CTX_PDP2_UDW			0x28
+#define CTX_PDP2_LDW			0x2a
+#define CTX_PDP1_UDW			0x2c
+#define CTX_PDP1_LDW			0x2e
+#define CTX_PDP0_UDW			0x30
+#define CTX_PDP0_LDW			0x32
+#define CTX_LRI_HEADER_2		0x41
+#define CTX_R_PWR_CLK_STATE		0x42
+#define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
+
+#define GEN8_CTX_VALID (1<<0)
+#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
+#define GEN8_CTX_FORCE_RESTORE (1<<2)
+#define GEN8_CTX_L3LLC_COHERENT (1<<5)
+#define GEN8_CTX_PRIVILEGE (1<<8)
+
+#define ASSIGN_CTX_PDP(ppgtt, reg_state, n) { \
+	const u64 _addr = test_bit(n, ppgtt->pdp.used_pdpes) ? \
+		ppgtt->pdp.page_directory[n]->daddr : \
+		ppgtt->scratch_pd->daddr; \
+	reg_state[CTX_PDP ## n ## _UDW+1] = upper_32_bits(_addr); \
+	reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
+}
+
+enum {
+	ADVANCED_CONTEXT = 0,
+	LEGACY_CONTEXT,
+	ADVANCED_AD_CONTEXT,
+	LEGACY_64B_CONTEXT
+};
+#define GEN8_CTX_MODE_SHIFT 3
+enum {
+	FAULT_AND_HANG = 0,
+	FAULT_AND_HALT, /* Debug only */
+	FAULT_AND_STREAM,
+	FAULT_AND_CONTINUE /* Unsupported */
+};
+#define GEN8_CTX_ID_SHIFT 32
+
 #endif /* _I915_REG_H_ */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d5cfab3..4fd1941 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -136,82 +136,6 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 
-#define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
-#define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
-#define GEN8_LR_CONTEXT_OTHER_SIZE (2 * PAGE_SIZE)
-
-#define RING_EXECLIST_QFULL		(1 << 0x2)
-#define RING_EXECLIST1_VALID		(1 << 0x3)
-#define RING_EXECLIST0_VALID		(1 << 0x4)
-#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
-#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
-#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
-
-#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
-#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
-#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
-#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
-#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
-#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
-
-#define CTX_LRI_HEADER_0		0x01
-#define CTX_CONTEXT_CONTROL		0x02
-#define CTX_RING_HEAD			0x04
-#define CTX_RING_TAIL			0x06
-#define CTX_RING_BUFFER_START		0x08
-#define CTX_RING_BUFFER_CONTROL		0x0a
-#define CTX_BB_HEAD_U			0x0c
-#define CTX_BB_HEAD_L			0x0e
-#define CTX_BB_STATE			0x10
-#define CTX_SECOND_BB_HEAD_U		0x12
-#define CTX_SECOND_BB_HEAD_L		0x14
-#define CTX_SECOND_BB_STATE		0x16
-#define CTX_BB_PER_CTX_PTR		0x18
-#define CTX_RCS_INDIRECT_CTX		0x1a
-#define CTX_RCS_INDIRECT_CTX_OFFSET	0x1c
-#define CTX_LRI_HEADER_1		0x21
-#define CTX_CTX_TIMESTAMP		0x22
-#define CTX_PDP3_UDW			0x24
-#define CTX_PDP3_LDW			0x26
-#define CTX_PDP2_UDW			0x28
-#define CTX_PDP2_LDW			0x2a
-#define CTX_PDP1_UDW			0x2c
-#define CTX_PDP1_LDW			0x2e
-#define CTX_PDP0_UDW			0x30
-#define CTX_PDP0_LDW			0x32
-#define CTX_LRI_HEADER_2		0x41
-#define CTX_R_PWR_CLK_STATE		0x42
-#define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
-
-#define GEN8_CTX_VALID (1<<0)
-#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
-#define GEN8_CTX_FORCE_RESTORE (1<<2)
-#define GEN8_CTX_L3LLC_COHERENT (1<<5)
-#define GEN8_CTX_PRIVILEGE (1<<8)
-
-#define ASSIGN_CTX_PDP(ppgtt, reg_state, n) { \
-	const u64 _addr = test_bit(n, ppgtt->pdp.used_pdpes) ? \
-		ppgtt->pdp.page_directory[n]->daddr : \
-		ppgtt->scratch_pd->daddr; \
-	reg_state[CTX_PDP ## n ## _UDW+1] = upper_32_bits(_addr); \
-	reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
-}
-
-enum {
-	ADVANCED_CONTEXT = 0,
-	LEGACY_CONTEXT,
-	ADVANCED_AD_CONTEXT,
-	LEGACY_64B_CONTEXT
-};
-#define GEN8_CTX_MODE_SHIFT 3
-enum {
-	FAULT_AND_HANG = 0,
-	FAULT_AND_HALT, /* Debug only */
-	FAULT_AND_STREAM,
-	FAULT_AND_CONTINUE /* Unsupported */
-};
-#define GEN8_CTX_ID_SHIFT 32
-
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
 		struct intel_context *ctx);
 
@@ -263,8 +187,8 @@ u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 	return lrca >> 12;
 }
 
-static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
-					 struct drm_i915_gem_object *ctx_obj)
+uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
+				  struct drm_i915_gem_object *ctx_obj)
 {
 	struct drm_device *dev = ring->dev;
 	uint64_t desc;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 04d3a6d..19c9a02 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -85,6 +85,8 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct drm_i915_gem_object *batch_obj,
 			       u64 exec_start, u32 dispatch_flags);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
+uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
+				  struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
 void intel_execlists_retire_requests(struct intel_engine_cs *ring);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 09/15] drm/i915: GuC submission setup, phase 1
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (7 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 08/15] drm/i915: Move execlists defines from .c to .h Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-15 21:32   ` Chris Wilson
  2015-06-16 11:44   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 10/15] drm/i915: Enable GuC firmware log Dave Gordon
                   ` (8 subsequent siblings)
  17 siblings, 2 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

This adds the first of the data structures used to communicate with the
GuC (the pool of guc_context structures).

We create a GuC-specific wrapper round the GEM object allocator as all
GEM objects shared with the GuC must be pinned into GGTT space at an
address that is NOT in the range [0..WOPCM_SIZE), as that range of GGTT
addresses is not accessible to the GuC (from the GuC's point of view,
it's permanently reserved for other objects such as the BootROM & SRAM).

Later, we will need to allocate additional GuC-sharable objects for the
submission client(s) and the GuC's debug log.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/Makefile              |    3 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  122 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc.h           |    4 +
 drivers/gpu/drm/i915/intel_guc_loader.c    |   21 +++++
 4 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_guc_submission.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 15818df..4dbd6b5 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -42,7 +42,8 @@ i915-y += i915_cmd_parser.o \
 i915-y += intel_uc_loader.o
 
 # general-purpose microcontroller (GuC) support
-i915-y += intel_guc_loader.o
+i915-y += intel_guc_loader.o \
+	  i915_guc_submission.o
 
 # autogenerated null render state
 i915-y += intel_renderstate_gen6.o \
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
new file mode 100644
index 0000000..273cf38
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -0,0 +1,122 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+#include <linux/firmware.h>
+#include <linux/circ_buf.h>
+#include "i915_drv.h"
+#include "intel_guc.h"
+
+/**
+ * gem_allocate_guc_obj() - Allocate gem object for GuC usage
+ * @dev:	drm device
+ * @size:	size of object
+ *
+ * This is a wrapper to create a gem obj. In order to use it inside GuC, the
+ * object needs to be pinned lifetime. Also we must pin it to gtt space other
+ * than [0, GUC_WOPCM_SIZE] because this range is reserved inside GuC.
+ *
+ * Return:	A drm_i915_gem_object if successful, otherwise NULL.
+ */
+static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device *dev,
+							u32 size)
+{
+	struct drm_i915_gem_object *obj;
+
+	obj = i915_gem_alloc_object(dev, size);
+	if (!obj)
+		return NULL;
+
+	if (i915_gem_object_get_pages(obj)) {
+		drm_gem_object_unreference(&obj->base);
+		return NULL;
+	}
+
+	if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
+			PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE)) {
+		drm_gem_object_unreference(&obj->base);
+		return NULL;
+	}
+
+	return obj;
+}
+
+/**
+ * gem_release_guc_obj() - Release gem object allocated for GuC usage
+ * @obj:	gem obj to be released
+  */
+static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
+{
+	if (!obj)
+		return;
+
+	if (i915_gem_obj_is_pinned(obj))
+		i915_gem_object_ggtt_unpin(obj);
+
+	drm_gem_object_unreference(&obj->base);
+}
+
+/*
+ * Set up the memory resources to be shared with the GuC.  At this point,
+ * we require just one object that can be mapped through the GGTT.
+ */
+int i915_guc_submission_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	const size_t ctxsize = sizeof(struct guc_context_desc);
+	const size_t poolsize = MAX_GUC_GPU_CONTEXTS * ctxsize;
+	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
+	struct intel_guc *guc = &dev_priv->guc;
+
+	if (!i915.enable_guc_submission)
+		return 0; /* not enabled  */
+
+	if (guc->ctx_pool_obj)
+		return 0; /* already allocated */
+
+	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv->dev, gemsize);
+	if (!guc->ctx_pool_obj)
+		return -ENOMEM;
+
+	spin_lock_init(&dev_priv->guc.host2guc_lock);
+
+	ida_init(&guc->ctx_ids);
+
+	memset(guc->doorbell_bitmap, 0, sizeof(guc->doorbell_bitmap));
+	guc->db_cacheline = 0;
+
+	return 0;
+}
+
+void i915_guc_submission_fini(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_guc *guc = &dev_priv->guc;
+
+	gem_release_guc_obj(dev_priv->guc.log_obj);
+	guc->log_obj = NULL;
+
+	if (guc->ctx_pool_obj)
+		ida_destroy(&guc->ctx_ids);
+	gem_release_guc_obj(guc->ctx_pool_obj);
+	guc->ctx_pool_obj = NULL;
+}
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 0b44265..06b68c2 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -171,4 +171,8 @@ extern void intel_guc_ucode_init(struct drm_device *dev);
 extern int intel_guc_ucode_load(struct drm_device *dev, bool wait);
 extern void intel_guc_ucode_fini(struct drm_device *dev);
 
+/* i915_guc_submission.c */
+int i915_guc_submission_init(struct drm_device *dev);
+void i915_guc_submission_fini(struct drm_device *dev);
+
 #endif
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index 16eef4c..0f74876 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -111,6 +111,21 @@ static void set_guc_init_params(struct drm_i915_private *dev_priv)
 			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
 	}
 
+	/* If GuC scheduling is enabled, setup params here. */
+	if (i915.enable_guc_submission) {
+		u32 pgs = i915_gem_obj_ggtt_offset(dev_priv->guc.ctx_pool_obj);
+		u32 ctx_in_16 = MAX_GUC_GPU_CONTEXTS / 16;
+
+		pgs >>= PAGE_SHIFT;
+		params[GUC_CTL_CTXINFO] = (pgs << GUC_CTL_BASE_ADDR_SHIFT) |
+			(ctx_in_16 << GUC_CTL_CTXNUM_IN16_SHIFT);
+
+		params[GUC_CTL_FEATURE] |= GUC_CTL_KERNEL_SUBMISSIONS;
+
+		/* Unmask this bit to enable GuC scheduler */
+		params[GUC_CTL_FEATURE] &= ~GUC_CTL_DISABLE_SCHEDULER;
+	}
+
 	I915_WRITE(SOFT_SCRATCH(0), 0);
 
 	for (i = 0; i < GUC_CTL_MAX_DWORDS; i++)
@@ -387,6 +402,10 @@ int intel_guc_ucode_load(struct drm_device *dev, bool wait)
 	if (err)
 		goto fail;
 
+	err = i915_guc_submission_init(dev);
+	if (err)
+		goto fail;
+
 	err = guc_ucode_xfer(dev);
 	if (err)
 		goto fail;
@@ -412,5 +431,7 @@ void intel_guc_ucode_fini(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
 
+	i915_guc_submission_fini(dev);
+
 	intel_uc_fw_fini(guc_fw);
 }
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 10/15] drm/i915: Enable GuC firmware log
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (8 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 09/15] drm/i915: GuC submission setup, phase 1 Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-15 21:40   ` Chris Wilson
  2015-06-16  9:26   ` Tvrtko Ursulin
  2015-06-15 18:36 ` [PATCH 11/15] drm/i915: Implementation of GuC client Dave Gordon
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

Allocate a GEM object to hold GuC log data. A debugfs interface
(i915_guc_log_dump) is provided to print out the log content.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |   29 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_guc_submission.c |   43 ++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c52a745..b0aa4af 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2388,6 +2388,34 @@ static int i915_guc_load_status_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int i915_guc_log_dump(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_object *log_obj = dev_priv->guc.log_obj;
+	u32 *log;
+	int i = 0, pg;
+
+	if (!log_obj)
+		return 0;
+
+	for (pg = 0; pg < log_obj->base.size / PAGE_SIZE; pg++) {
+		log = kmap_atomic(i915_gem_object_get_page(log_obj, pg));
+
+		for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
+			seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
+				   *(log + i), *(log + i + 1),
+				   *(log + i + 2), *(log + i + 3));
+
+		kunmap_atomic(log);
+	}
+
+	seq_putc(m, '\n');
+
+	return 0;
+}
+
 static int i915_edp_psr_status(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -5083,6 +5111,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gem_hws_vebox", i915_hws_info, 0, (void *)VECS},
 	{"i915_gem_batch_pool", i915_gem_batch_pool_info, 0},
 	{"i915_guc_load_status", i915_guc_load_status_info, 0},
+	{"i915_guc_log_dump", i915_guc_log_dump, 0},
 	{"i915_frequency_info", i915_frequency_info, 0},
 	{"i915_hangcheck_info", i915_hangcheck_info, 0},
 	{"i915_drpc_info", i915_drpc_info, 0},
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 273cf38..4efb73a 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -75,6 +75,47 @@ static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
 	drm_gem_object_unreference(&obj->base);
 }
 
+static void guc_create_log(struct intel_guc *guc)
+{
+	struct drm_i915_private *dev_priv = guc_to_i915(guc);
+	struct drm_i915_gem_object *obj;
+	unsigned long offset;
+	uint32_t size, flags;
+
+	if (i915.guc_log_level < GUC_LOG_VERBOSITY_MIN)
+		return;
+
+	if (i915.guc_log_level > GUC_LOG_VERBOSITY_MAX)
+		i915.guc_log_level = GUC_LOG_VERBOSITY_MAX;
+
+	/* The first page is to save log buffer state. Allocate one
+	 * extra page for others in case for overlap */
+	size = (1 + GUC_LOG_DPC_PAGES + 1 +
+		GUC_LOG_ISR_PAGES + 1 +
+		GUC_LOG_CRASH_PAGES + 1) << PAGE_SHIFT;
+
+	obj = guc->log_obj;
+	if (!obj) {
+		obj = gem_allocate_guc_obj(dev_priv->dev, size);
+		if (!obj) {
+			/* logging will be off */
+			i915.guc_log_level = -1;
+			return;
+		}
+
+		guc->log_obj = obj;
+	}
+
+	/* each allocated unit is a page */
+	flags = GUC_LOG_VALID | GUC_LOG_NOTIFY_ON_HALF_FULL |
+		(GUC_LOG_DPC_PAGES << GUC_LOG_DPC_SHIFT) |
+		(GUC_LOG_ISR_PAGES << GUC_LOG_ISR_SHIFT) |
+		(GUC_LOG_CRASH_PAGES << GUC_LOG_CRASH_SHIFT);
+
+	offset = i915_gem_obj_ggtt_offset(obj) >> PAGE_SHIFT; /* in pages */
+	guc->log_flags = (offset << GUC_LOG_BUF_ADDR_SHIFT) | flags;
+}
+
 /*
  * Set up the memory resources to be shared with the GuC.  At this point,
  * we require just one object that can be mapped through the GGTT.
@@ -104,6 +145,8 @@ int i915_guc_submission_init(struct drm_device *dev)
 	memset(guc->doorbell_bitmap, 0, sizeof(guc->doorbell_bitmap));
 	guc->db_cacheline = 0;
 
+	guc_create_log(guc);
+
 	return 0;
 }
 
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 11/15] drm/i915: Implementation of GuC client
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (9 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 10/15] drm/i915: Enable GuC firmware log Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-15 21:55   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 12/15] drm/i915: Interrupt routing for GuC submission Dave Gordon
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

A GuC client has its own doorbell and workqueue. It maintains the
doorbell cache line, process description object and work queue item.

A default guc_client is created for the i915 driver to use for
normal-priority in-order submission.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c |  668 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc.h           |    5 +
 drivers/gpu/drm/i915/intel_guc_loader.c    |   10 +
 3 files changed, 683 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 4efb73a..487f295 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -27,6 +27,536 @@
 #include "intel_guc.h"
 
 /**
+ * DOC: GuC Client
+ *
+ * i915_guc_client:
+ * We use the term client to avoid confusion with contexts. A i915_guc_client is
+ * equivalent to GuC object guc_context_desc. This context descriptor is
+ * allocated from a pool of 1024 entries. Kernel driver will allocate doorbell
+ * and workqueue for it. Also the process descriptor (guc_process_desc), which
+ * is mapped to client space. So the client can write Work Item then ring the
+ * doorbell.
+ *
+ * To simplify the implementation, we allocate one gem object that contains all
+ * pages for doorbell, process descriptor and workqueue.
+ *
+ * The Scratch registers:
+ * There are 16 MMIO-based registers start from 0xC180. The kernel driver writes
+ * a value to the action register (SOFT_SCRATCH_0) along with any data. It then
+ * triggers an interrupt on the GuC via another register write (0xC4C8).
+ * Firmware writes a success/fail code back to the action register after
+ * processes the request. The kernel driver polls waiting for this update and
+ * then proceeds.
+ * See host2guc_action()
+ *
+ * Doorbells:
+ * Doorbells are interrupts to uKernel. A doorbell is a single cache line (QW)
+ * mapped into process space.
+ *
+ * Work Items:
+ * There are several types of work items that the host may place into a
+ * workqueue, each with its own requirements and limitations. Currently only
+ * WQ_TYPE_INORDER is needed to support legacy submission via GuC, which
+ * represents in-order queue. The kernel driver packs ring tail pointer and an
+ * ELSP context descriptor dword into Work Item.
+ * See guc_add_workqueue_item()
+ *
+ */
+
+/*
+ * Read GuC command/status register (SOFT_SCRATCH_0)
+ * Return true if it contains a response rather than a command
+ */
+static inline bool host2guc_get_response(struct drm_i915_private *dev_priv,
+					 u32 *status)
+{
+	u32 val = I915_READ(SOFT_SCRATCH(0));
+	*status = val;
+	return GUC2HOST_IS_RESPONSE(val);
+}
+
+static int host2guc_action(struct intel_guc *guc, u32 *data, u32 len)
+{
+	struct drm_i915_private *dev_priv = guc_to_i915(guc);
+	u32 status;
+	int i;
+	int ret;
+
+	if (WARN_ON(len < 1 || len > 15))
+		return -EINVAL;
+
+	spin_lock(&dev_priv->guc.host2guc_lock);
+
+	dev_priv->guc.action_count += 1;
+	dev_priv->guc.action_cmd = data[0];
+
+	for (i = 0; i < len; i++)
+		I915_WRITE(SOFT_SCRATCH(i), data[i]);
+
+	POSTING_READ(SOFT_SCRATCH(i - 1));
+
+	I915_WRITE(HOST2GUC_INTERRUPT, HOST2GUC_TRIGGER);
+
+	ret = wait_for_atomic(host2guc_get_response(dev_priv, &status), 10);
+	if (status != GUC2HOST_STATUS_SUCCESS) {
+		/* either GuC doesn't response, which is a TIMEOUT,
+		 * or a failure code is returned. */
+		if (ret != -ETIMEDOUT)
+			ret = -EIO;
+
+		DRM_ERROR("GUC: host2guc action 0x%X failed. ret=%d "
+				"status=0x%08X response=0x%08X\n",
+				data[0], ret, status,
+				I915_READ(SOFT_SCRATCH(15)));
+
+		dev_priv->guc.action_fail += 1;
+		dev_priv->guc.action_err = ret;
+	}
+	dev_priv->guc.action_status = status;
+
+	spin_unlock(&dev_priv->guc.host2guc_lock);
+
+	return ret;
+}
+
+/*
+ * Tell the GuC to allocate or deallocate a specific doorbell
+ */
+
+static int host2guc_allocate_doorbell(struct intel_guc *guc,
+				      struct i915_guc_client *client)
+{
+	u32 data[2];
+
+	data[0] = HOST2GUC_ACTION_ALLOCATE_DOORBELL;
+	data[1] = client->ctx_index;
+
+	return host2guc_action(guc, data, 2);
+}
+
+static int host2guc_release_doorbell(struct intel_guc *guc,
+				     struct i915_guc_client *client)
+{
+	u32 data[2];
+
+	data[0] = HOST2GUC_ACTION_DEALLOCATE_DOORBELL;
+	data[1] = client->ctx_index;
+
+	return host2guc_action(guc, data, 2);
+}
+
+/*
+ * Initialise, update, or clear doorbell data shared with the GuC
+ *
+ * These functions modify shared data and so need access to the mapped
+ * client object which contains the page being used for the doorbell
+ */
+
+static void guc_init_doorbell(struct intel_guc *guc,
+			      struct i915_guc_client *client)
+{
+	struct guc_doorbell_info *doorbell;
+	void *base;
+
+	base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
+	doorbell = base + client->doorbell_offset;
+
+	doorbell->db_status = 1;
+	doorbell->cookie = 0;
+
+	kunmap_atomic(base);
+}
+
+static int guc_ring_doorbell(struct i915_guc_client *gc)
+{
+	struct guc_process_desc *desc;
+	union guc_doorbell_qw db_cmp, db_exc, db_ret;
+	union guc_doorbell_qw *db;
+	void *base;
+	int attempt = 2, ret = -EAGAIN;
+
+	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
+	desc = base + gc->proc_desc_offset;
+
+	/* Update the tail so it is visible to GuC */
+	desc->tail = gc->wq_tail;
+
+	/* current cookie */
+	db_cmp.db_status = GUC_DOORBELL_ENABLED;
+	db_cmp.cookie = gc->cookie;
+
+	/* cookie to be updated */
+	db_exc.db_status = GUC_DOORBELL_ENABLED;
+	db_exc.cookie = gc->cookie + 1;
+	if (db_exc.cookie == 0)
+		db_exc.cookie = 1;
+
+	/* pointer of current doorbell cacheline */
+	db = base + gc->doorbell_offset;
+
+	while (attempt--) {
+		/* lets ring the doorbell */
+		db_ret.value_qw = atomic64_cmpxchg((atomic64_t *)db,
+			db_cmp.value_qw, db_exc.value_qw);
+
+		/* if the exchange was successfully executed */
+		if (db_ret.value_qw == db_cmp.value_qw) {
+			/* db was successfully rung */
+			gc->cookie = db_exc.cookie;
+			ret = 0;
+			break;
+		}
+
+		/* XXX: doorbell was lost and need to acquire it again */
+		if (db_ret.db_status == GUC_DOORBELL_DISABLED)
+			break;
+
+		DRM_ERROR("Cookie mismatch. Expected %d, returned %d\n",
+			  db_cmp.cookie, db_ret.cookie);
+
+		/* update the cookie to newly read cookie from GuC */
+		db_cmp.cookie = db_ret.cookie;
+		db_exc.cookie = db_ret.cookie + 1;
+		if (db_exc.cookie == 0)
+			db_exc.cookie = 1;
+	}
+
+	kunmap_atomic(base);
+	return ret;
+}
+
+static void guc_disable_doorbell(struct intel_guc *guc,
+				 struct i915_guc_client *client)
+{
+	struct drm_i915_private *dev_priv = guc_to_i915(guc);
+	struct guc_doorbell_info *doorbell;
+	void *base;
+	int drbreg = GEN8_DRBREGL(client->doorbell_id);
+	int value;
+
+	base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
+	doorbell = base + client->doorbell_offset;
+
+	doorbell->db_status = 0;
+
+	kunmap_atomic(base);
+
+	I915_WRITE(drbreg, I915_READ(drbreg) & ~GEN8_DRB_VALID);
+
+	value = I915_READ(drbreg);
+	WARN_ON((value & GEN8_DRB_VALID) != 0);
+
+	I915_WRITE(GEN8_DRBREGU(client->doorbell_id), 0);
+	I915_WRITE(drbreg, 0);
+
+	/* XXX: wait for any interrupts */
+	/* XXX: wait for workqueue to drain */
+}
+
+/*
+ * Select, assign and relase doorbell cachelines
+ *
+ * These functions track which doorbell cachelines are in use.
+ * The data they manipulate is protected by the host2guc lock.
+ */
+
+static off_t select_doorbell_cacheline(struct intel_guc *guc)
+{
+	const int cacheline_size = boot_cpu_data.x86_clflush_size;
+	const int cacheline_per_page = PAGE_SIZE / cacheline_size;
+	off_t offset;
+
+	spin_lock(&guc->host2guc_lock);
+
+	/* Doorbell uses single cache line */
+	offset = cacheline_size * guc->db_cacheline;
+
+	/* Moving to next cache line to reduce contention */
+	guc->db_cacheline = (guc->db_cacheline + 1) % cacheline_per_page;
+
+	spin_unlock(&guc->host2guc_lock);
+
+	return offset;
+}
+
+static uint16_t assign_doorbell(struct intel_guc *guc, u32 priority)
+{
+	const uint16_t size = I915_MAX_DOORBELLS;
+	uint16_t id;
+
+	spin_lock(&guc->host2guc_lock);
+
+	/* The bitmap is split into two halves - high and normal priority. */
+	if (priority <= GUC_CTX_PRIORITY_HIGH) {
+		id = find_next_zero_bit(guc->doorbell_bitmap, size, size / 2);
+		if (id == size)
+			id = INVALID_DOORBELL_ID;
+	} else {
+		id = find_next_zero_bit(guc->doorbell_bitmap, size / 2, 0);
+		if (id == size / 2)
+			id = INVALID_DOORBELL_ID;
+	}
+
+	if (id != INVALID_DOORBELL_ID)
+		bitmap_set(guc->doorbell_bitmap, id, 1);
+
+	spin_unlock(&guc->host2guc_lock);
+
+	return id;
+}
+
+static void release_doorbell(struct intel_guc *guc, uint16_t id)
+{
+	spin_lock(&guc->host2guc_lock);
+	bitmap_clear(guc->doorbell_bitmap, id, 1);
+	spin_unlock(&guc->host2guc_lock);
+}
+
+/*
+ * Initialise the process descriptor shared with the GuC firmware.
+ */
+static void guc_init_proc_desc(struct intel_guc *guc,
+			       struct i915_guc_client *client)
+{
+	struct guc_process_desc *desc;
+	void *base;
+
+	base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
+	desc = base + client->proc_desc_offset;
+
+	memset(desc, 0, sizeof(*desc));
+
+	/*
+	 * XXX: pDoorbell and WQVBaseAddress are pointers in process address
+	 * space for ring3 clients (set them as in mmap_ioctl) or kernel
+	 * space for kernel clients (map on demand instead? May make debug
+	 * easier to have it mapped).
+	 */
+	desc->wq_base_addr = 0;
+	desc->db_base_addr = 0;
+
+	desc->context_id = client->ctx_index;
+	desc->wq_size_bytes = client->wq_size;
+	desc->wq_status = WQ_STATUS_ACTIVE;
+	desc->priority = client->priority;
+
+	kunmap_atomic(base);
+}
+
+/*
+ * Initialise/clear the context descriptor shared with the GuC firmware.
+ *
+ * This descriptor tells the GuC where (in GGTT space) to find the important
+ * data structures relating to this client (doorbell, process descriptor,
+ * write queue, etc).
+ */
+
+static void guc_init_ctx_desc(struct intel_guc *guc,
+			      struct i915_guc_client *client)
+{
+	struct guc_context_desc desc;
+	struct sg_table *sg;
+
+	memset(&desc, 0, sizeof(desc));
+
+	desc.attribute = GUC_CTX_DESC_ATTR_ACTIVE | GUC_CTX_DESC_ATTR_KERNEL;
+	desc.context_id = client->ctx_index;
+	desc.priority = client->priority;
+	desc.engines_used = (1 << RCS) | (1 << VCS) | (1 << BCS) |
+			    (1 << VECS) | (1 << VCS2); /* all engines */
+	desc.db_id = client->doorbell_id;
+
+	/*
+	 * The CPU address is only needed at certain points, so kmap_atomic on
+	 * demand instead of storing it in the ctx descriptor.
+	 * XXX: May make debug easier to have it mapped
+	 */
+	desc.db_trigger_cpu = 0;
+	desc.db_trigger_uk = client->doorbell_offset +
+		i915_gem_obj_ggtt_offset(client->client_obj);
+	desc.db_trigger_phy = client->doorbell_offset +
+		sg_dma_address(client->client_obj->pages->sgl);
+
+	desc.process_desc = client->proc_desc_offset +
+		i915_gem_obj_ggtt_offset(client->client_obj);
+
+	desc.wq_addr = client->wq_offset +
+		i915_gem_obj_ggtt_offset(client->client_obj);
+
+	desc.wq_size = client->wq_size;
+
+	/*
+	 * XXX: Take LRCs from an existing intel_context if this is not an
+	 * IsKMDCreatedContext client
+	 */
+	desc.desc_private = (uintptr_t)client;
+
+	/* Pool context is pinned already */
+	sg = guc->ctx_pool_obj->pages;
+	sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
+			     sizeof(desc) * client->ctx_index);
+}
+
+static void guc_fini_ctx_desc(struct intel_guc *guc,
+			      struct i915_guc_client *client)
+{
+	struct guc_context_desc desc;
+	struct sg_table *sg;
+
+	memset(&desc, 0, sizeof(desc));
+
+	sg = guc->ctx_pool_obj->pages;
+	sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
+			     sizeof(desc) * client->ctx_index);
+}
+
+/* Get valid workqueue item and return it back to offset */
+static int guc_get_workqueue_space(struct i915_guc_client *gc, u32 *offset)
+{
+	struct guc_process_desc *desc;
+	void *base;
+	u32 size = sizeof(struct guc_wq_item);
+	int ret = 0, timeout_counter = 200;
+	unsigned long flags;
+
+	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
+	desc = base + gc->proc_desc_offset;
+
+	while (timeout_counter-- > 0) {
+		spin_lock_irqsave(&gc->wq_lock, flags);
+
+		ret = wait_for_atomic(CIRC_SPACE(gc->wq_tail, desc->head,
+				gc->wq_size) >= size, 1);
+
+		if (!ret) {
+			*offset = gc->wq_tail;
+
+			/* advance the tail for next workqueue item */
+			gc->wq_tail += size;
+			gc->wq_tail &= gc->wq_size - 1;
+
+			/* this will break the loop */
+			timeout_counter = 0;
+		}
+
+		spin_unlock_irqrestore(&gc->wq_lock, flags);
+	};
+
+	kunmap_atomic(base);
+
+	return ret;
+}
+
+static int guc_add_workqueue_item(struct i915_guc_client *gc,
+				  struct intel_context *ctx,
+				  struct intel_engine_cs *ring)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct guc_wq_item *wqi;
+	void *base;
+	u32 tail, wq_len, wq_off = 0;
+	int ret;
+
+	ret = guc_get_workqueue_space(gc, &wq_off);
+	if (ret)
+		return ret;
+
+	/* For now workqueue item is 4 DWs; workqueue buffer is 2 pages. So we
+	 * should not have the case where structure wqi is across page, neither
+	 * wrapped to the beginning. This simplifies the implementation below.
+	 *
+	 * XXX: if not the case, we need save data to a temp wqi and copy it to
+	 * workqueue buffer dw by dw.
+	 */
+	WARN_ON(sizeof(struct guc_wq_item) != 16);
+	WARN_ON(wq_off & 3);
+
+	/* wq starts from the page after doorbell / process_desc */
+	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
+			(wq_off + GUC_DB_SIZE) >> PAGE_SHIFT));
+	wq_off &= PAGE_SIZE - 1;
+	wqi = (struct guc_wq_item *)((char *)base + wq_off);
+
+	/* len does not include the header */
+	wq_len = sizeof(struct guc_wq_item) / sizeof(u32) - 1;
+	wqi->header = WQ_TYPE_INORDER |
+			(wq_len << WQ_LEN_SHIFT) |
+			(ring->id << WQ_TARGET_SHIFT) |
+			WQ_NO_WCFLUSH_WAIT;
+
+	wqi->context_desc = (u32)execlists_ctx_descriptor(ring, ctx_obj);
+
+	/* The GuC firmware wants the tail index in qw */
+	tail = ringbuf->tail >> 3;
+	wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
+	wqi->fence_id = 0; /*XXX: what fence to be here */
+
+	kunmap_atomic(base);
+
+	return 0;
+}
+
+static void update_context_image(struct intel_context *ctx,
+				 struct intel_engine_cs *ring)
+{
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	unsigned long ringaddr = i915_gem_obj_ggtt_offset(ringbuf->obj);
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct page *page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
+	uint32_t *reg_state;
+
+	reg_state = kmap_atomic(page);
+	reg_state[CTX_RING_BUFFER_START+1] = ringaddr;
+	kunmap_atomic(reg_state);
+}
+
+/**
+ * i915_guc_submit() - Submit commands through GuC
+ * @client:	the guc client where commands will go through
+ * @ctx:	LRC where commands come from
+ * @ring:	HW engine that will excute the commands
+ *
+ * Return:	0 if succeed
+ */
+int i915_guc_submit(struct i915_guc_client *client,
+		    struct intel_context *ctx,
+		    struct intel_engine_cs *ring)
+{
+	int q_ret, b_ret;
+	unsigned long flags;
+
+	/* Need this because of the deferred pin ctx and ring */
+	/* Shall we move this right after ring is pinned? */
+	update_context_image(ctx, ring);
+
+	q_ret = guc_add_workqueue_item(client, ctx, ring);
+	if (q_ret == 0)
+		b_ret = guc_ring_doorbell(client);
+
+	spin_lock_irqsave(&client->wq_lock, flags);
+	client->submissions += 1;
+	if (q_ret) {
+		client->q_fail += 1;
+		client->retcode = q_ret;
+	} else if (b_ret) {
+		client->b_fail += 1;
+		client->retcode = q_ret = b_ret;
+	} else {
+		client->retcode = 0;
+	}
+	spin_unlock_irqrestore(&client->wq_lock, flags);
+
+	return q_ret;
+}
+
+/*
+ * Everything below here is concerned with setup & teardown, and is
+ * therefore not part of the somewhat time-critical batch-submission
+ * path of i915_guc_submit() above.
+ */
+
+/**
  * gem_allocate_guc_obj() - Allocate gem object for GuC usage
  * @dev:	drm device
  * @size:	size of object
@@ -75,6 +605,118 @@ static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
 	drm_gem_object_unreference(&obj->base);
 }
 
+static void guc_client_free(struct drm_device *dev,
+			    struct i915_guc_client *client)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_guc *guc = &dev_priv->guc;
+
+	if (!client)
+		return;
+
+	if (client->doorbell_id != INVALID_DOORBELL_ID) {
+		/*
+		 * First disable the doorbell, then tell the GuC we've
+		 * finished with it, finally deallocate it in our bitmap
+		 */
+		guc_disable_doorbell(guc, client);
+		host2guc_release_doorbell(guc, client);
+		release_doorbell(guc, client->doorbell_id);
+	}
+
+	/*
+	 * XXX: wait for any outstanding submissions before freeing memory.
+	 * Be sure to drop any locks
+	 */
+
+	gem_release_guc_obj(client->client_obj);
+
+	if (client->ctx_index != INVALID_CTX_ID) {
+		guc_fini_ctx_desc(guc, client);
+		ida_simple_remove(&guc->ctx_ids, client->ctx_index);
+	}
+
+	kfree(client);
+}
+
+/**
+ * guc_client_alloc() - Allocate an i915_guc_client
+ * @dev:	drm device
+ * @priority:	four levels priority _CRITICAL, _HIGH, _NORMAL and _LOW
+ * 		The kernel client to replace ExecList submission is created with
+ * 		NORMAL priority. Priority of a client for scheduler can be HIGH,
+ * 		while a preemption context can use CRITICAL.
+ *
+ * Return:	An i915_guc_client object if success.
+ */
+static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
+						u32 priority)
+{
+	struct i915_guc_client *client;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_guc *guc = &dev_priv->guc;
+	struct drm_i915_gem_object *obj;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return NULL;
+
+	client->doorbell_id = INVALID_DOORBELL_ID;
+	client->priority = priority;
+
+	client->ctx_index = (uint32_t)ida_simple_get(&guc->ctx_ids, 0,
+			MAX_GUC_GPU_CONTEXTS, GFP_KERNEL);
+	if (client->ctx_index >= MAX_GUC_GPU_CONTEXTS) {
+		client->ctx_index = INVALID_CTX_ID;
+		goto err;
+	}
+
+	/* The first page is doorbell/proc_desc. Two followed pages are wq. */
+	obj = gem_allocate_guc_obj(dev, GUC_DB_SIZE + GUC_WQ_SIZE);
+	if (!obj)
+		goto err;
+
+	client->client_obj = obj;
+	client->wq_offset = GUC_DB_SIZE;
+	client->wq_size = GUC_WQ_SIZE;
+	spin_lock_init(&client->wq_lock);
+
+	client->doorbell_offset = select_doorbell_cacheline(guc);
+
+	/*
+	 * Since the doorbell only requires a single cacheline, we can save
+	 * space by putting the application process descriptor in the same
+	 * page. Use the half of the page that doesn't include the doorbell.
+	 */
+	if (client->doorbell_offset >= (GUC_DB_SIZE / 2))
+		client->proc_desc_offset = 0;
+	else
+		client->proc_desc_offset = (GUC_DB_SIZE / 2);
+
+	client->doorbell_id = assign_doorbell(guc, client->priority);
+	if (client->doorbell_id == INVALID_DOORBELL_ID)
+		/* XXX: evict a doorbell instead */
+		goto err;
+
+	guc_init_proc_desc(guc, client);
+	guc_init_ctx_desc(guc, client);
+	guc_init_doorbell(guc, client);
+
+	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
+	I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+
+	/* XXX: Any cache flushes needed? General domain mgmt calls? */
+
+	if (host2guc_allocate_doorbell(guc, client))
+		goto err;
+
+	return client;
+
+err:
+	guc_client_free(dev, client);
+	return NULL;
+}
+
 static void guc_create_log(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
@@ -150,6 +792,32 @@ int i915_guc_submission_init(struct drm_device *dev)
 	return 0;
 }
 
+int i915_guc_submission_enable(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_guc *guc = &dev_priv->guc;
+	struct i915_guc_client *client;
+
+	/* client for execbuf submission */
+	client = guc_client_alloc(dev, GUC_CTX_PRIORITY_NORMAL);
+	if (!client) {
+		DRM_ERROR("Failed to create execbuf guc_client\n");
+		return -ENOMEM;
+	}
+
+	guc->execbuf_client = client;
+	return 0;
+}
+
+void i915_guc_submission_disable(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_guc *guc = &dev_priv->guc;
+
+	guc_client_free(dev, guc->execbuf_client);
+	guc->execbuf_client = NULL;
+}
+
 void i915_guc_submission_fini(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 06b68c2..147d288 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -173,6 +173,11 @@ extern void intel_guc_ucode_fini(struct drm_device *dev);
 
 /* i915_guc_submission.c */
 int i915_guc_submission_init(struct drm_device *dev);
+int i915_guc_submission_enable(struct drm_device *dev);
+int i915_guc_submit(struct i915_guc_client *client,
+		    struct intel_context *ctx,
+		    struct intel_engine_cs *ring);
+void i915_guc_submission_disable(struct drm_device *dev);
 void i915_guc_submission_fini(struct drm_device *dev);
 
 #endif
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index 0f74876..6e4667d 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -390,6 +390,8 @@ int intel_guc_ucode_load(struct drm_device *dev, bool wait)
 	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING && !wait)
 		return -EAGAIN;
 
+	i915_guc_submission_disable(dev);
+
 	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_NONE)
 		return 0;
 
@@ -412,12 +414,20 @@ int intel_guc_ucode_load(struct drm_device *dev, bool wait)
 
 	guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_SUCCESS;
 
+	if (i915.enable_guc_submission) {
+		err = i915_guc_submission_enable(dev);
+		if (err)
+			goto fail;
+	}
+
 	return 0;
 
 fail:
 	if (guc_fw->uc_fw_load_status == INTEL_UC_FIRMWARE_PENDING)
 		guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_FAIL;
 
+	i915_guc_submission_disable(dev);
+
 	DRM_ERROR("Failed to initialize GuC, error %d\n", err);
 
 	return err;
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 12/15] drm/i915: Interrupt routing for GuC submission
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (10 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 11/15] drm/i915: Implementation of GuC client Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-16  9:24   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 13/15] drm/i915: Integrate GuC-based command submission Dave Gordon
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

Turn on interrupt steering to route necessary interrupts to GuC.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h         |   11 +++++--
 drivers/gpu/drm/i915/intel_guc_loader.c |   51 +++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 56f81de..e255253 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1638,12 +1638,18 @@ enum skl_disp_power_wells {
 #define GFX_MODE_GEN7	0x0229c
 #define RING_MODE_GEN7(ring)	((ring)->mmio_base+0x29c)
 #define   GFX_RUN_LIST_ENABLE		(1<<15)
+#define   GFX_INTERRUPT_STEERING	(1<<14)
 #define   GFX_TLB_INVALIDATE_EXPLICIT	(1<<13)
 #define   GFX_SURFACE_FAULT_ENABLE	(1<<12)
 #define   GFX_REPLAY_MODE		(1<<11)
 #define   GFX_PSMI_GRANULARITY		(1<<10)
 #define   GFX_PPGTT_ENABLE		(1<<9)
 
+#define   GFX_FORWARD_VBLANK_MASK	(3<<5)
+#define   GFX_FORWARD_VBLANK_NEVER	(0<<5)
+#define   GFX_FORWARD_VBLANK_ALWAYS	(1<<5)
+#define   GFX_FORWARD_VBLANK_COND	(2<<5)
+
 #define VLV_DISPLAY_BASE 0x180000
 #define VLV_MIPI_BASE VLV_DISPLAY_BASE
 
@@ -5627,11 +5633,12 @@ enum skl_disp_power_wells {
 #define GEN8_GT_IIR(which) (0x44308 + (0x10 * (which)))
 #define GEN8_GT_IER(which) (0x4430c + (0x10 * (which)))
 
-#define GEN8_BCS_IRQ_SHIFT 16
 #define GEN8_RCS_IRQ_SHIFT 0
-#define GEN8_VCS2_IRQ_SHIFT 16
+#define GEN8_BCS_IRQ_SHIFT 16
 #define GEN8_VCS1_IRQ_SHIFT 0
+#define GEN8_VCS2_IRQ_SHIFT 16
 #define GEN8_VECS_IRQ_SHIFT 0
+#define GEN8_WD_IRQ_SHIFT 16
 
 #define GEN8_DE_PIPE_ISR(pipe) (0x44400 + (0x10 * (pipe)))
 #define GEN8_DE_PIPE_IMR(pipe) (0x44404 + (0x10 * (pipe)))
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index 6e4667d..204777b 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -62,6 +62,53 @@
 #define I915_SKL_GUC_UCODE "i915/skl_guc_ver3.bin"
 MODULE_FIRMWARE(I915_SKL_GUC_UCODE);
 
+static void direct_interrupts_to_host(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *ring;
+	int i, irqs;
+
+	/* tell all command streamers NOT to forward interrupts and vblank to GuC */
+	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_NEVER);
+	irqs |= _MASKED_BIT_DISABLE(GFX_INTERRUPT_STEERING);
+	for_each_ring(ring, dev_priv, i)
+		I915_WRITE(RING_MODE_GEN7(ring), irqs);
+
+	/* tell DE to send nothing to GuC */
+	I915_WRITE(DE_GUCRMR, ~0);
+
+	/* route all GT interrupts to the host */
+	I915_WRITE(GUC_BCS_RCS_IER, 0);
+	I915_WRITE(GUC_VCS2_VCS1_IER, 0);
+	I915_WRITE(GUC_WD_VECS_IER, 0);
+}
+
+static void direct_interrupts_to_guc(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *ring;
+	int i, irqs;
+
+	/* tell all command streamers to forward interrupts and vblank to GuC */
+	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_ALWAYS);
+	irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
+	for_each_ring(ring, dev_priv, i)
+		I915_WRITE(RING_MODE_GEN7(ring), irqs);
+
+	/* tell DE to send (all) flip_done to GuC */
+	irqs = DERRMR_PIPEA_PRI_FLIP_DONE | DERRMR_PIPEA_SPR_FLIP_DONE |
+	       DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEB_SPR_FLIP_DONE |
+	       DERRMR_PIPEC_PRI_FLIP_DONE | DERRMR_PIPEC_SPR_FLIP_DONE;
+	/* Unmasked bits will cause GuC response message to be sent */
+	I915_WRITE(DE_GUCRMR, ~irqs);
+
+	/* route USER_INTERRUPT to Host, all others are sent to GuC. */
+	irqs = GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
+	       GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
+	/* These three registers have the same bit definitions */
+	I915_WRITE(GUC_BCS_RCS_IER, ~irqs);
+	I915_WRITE(GUC_VCS2_VCS1_IER, ~irqs);
+	I915_WRITE(GUC_WD_VECS_IER, ~irqs);
+}
+
 static u32 get_gttype(struct drm_device *dev)
 {
 	/* XXX: GT type based on PCI device ID? field seems unused by fw */
@@ -390,6 +437,7 @@ int intel_guc_ucode_load(struct drm_device *dev, bool wait)
 	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING && !wait)
 		return -EAGAIN;
 
+	direct_interrupts_to_host(dev_priv);
 	i915_guc_submission_disable(dev);
 
 	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_NONE)
@@ -418,6 +466,7 @@ int intel_guc_ucode_load(struct drm_device *dev, bool wait)
 		err = i915_guc_submission_enable(dev);
 		if (err)
 			goto fail;
+		direct_interrupts_to_guc(dev_priv);
 	}
 
 	return 0;
@@ -426,6 +475,7 @@ fail:
 	if (guc_fw->uc_fw_load_status == INTEL_UC_FIRMWARE_PENDING)
 		guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_FAIL;
 
+	direct_interrupts_to_host(dev_priv);
 	i915_guc_submission_disable(dev);
 
 	DRM_ERROR("Failed to initialize GuC, error %d\n", err);
@@ -441,6 +491,7 @@ void intel_guc_ucode_fini(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
 
+	direct_interrupts_to_host(dev_priv);
 	i915_guc_submission_fini(dev);
 
 	intel_uc_fw_fini(guc_fw);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 13/15] drm/i915: Integrate GuC-based command submission
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (11 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 12/15] drm/i915: Interrupt routing for GuC submission Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-16  9:22   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 14/15] drm/i915: Debugfs interface for GuC submission statistics Dave Gordon
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

GuC-based submission is mostly the same as execlist mode, up to
intel_logical_ring_advance_and_submit(), where the context being
dispatched would be added to the execlist queue; at this point
we submit the context to the GuC backend instead.

There are, however, a few other changes also required, notably:
1.  Contexts must be pinned at GGTT addresses accessible by the GuC
    i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
    PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.

2.  The GuC's TLB must be invalidated after a context is pinned at
    a new GGTT address.

3.  GuC firmware uses the one page before Ring Context as shared data.
    Therefore, whenever driver wants to get base address of LRC, we
    will offset one page for it. LRC_PPHWSP_PN is defined as the page
    number of LRCA.

4.  In the work queue used to pass requests to the GuC, the GuC
    firmware requires the ring-tail-offset to be represented as an
    11-bit value, expressed in QWords. Therefore, the ringbuffer
    size must be reduced to the representable range (4 pages).

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |    2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |   46 ++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_guc.h           |    1 +
 drivers/gpu/drm/i915/intel_lrc.c           |   48 ++++++++++++++++++++--------
 drivers/gpu/drm/i915/intel_lrc.h           |    6 ++++
 5 files changed, 86 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b0aa4af..c6e2582 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1975,7 +1975,7 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 		return;
 	}
 
-	page = i915_gem_object_get_page(ctx_obj, 1);
+	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
 	if (!WARN_ON(page == NULL)) {
 		reg_state = kmap_atomic(page);
 
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 487f295..b423faf 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -356,16 +356,52 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 {
 	struct guc_context_desc desc;
 	struct sg_table *sg;
+	int i;
 
 	memset(&desc, 0, sizeof(desc));
 
 	desc.attribute = GUC_CTX_DESC_ATTR_ACTIVE | GUC_CTX_DESC_ATTR_KERNEL;
 	desc.context_id = client->ctx_index;
 	desc.priority = client->priority;
-	desc.engines_used = (1 << RCS) | (1 << VCS) | (1 << BCS) |
-			    (1 << VECS) | (1 << VCS2); /* all engines */
 	desc.db_id = client->doorbell_id;
 
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct guc_execlist_context *lrc = &desc.lrc[i];
+		struct intel_engine_cs *ring;
+		struct drm_i915_gem_object *obj;
+
+		/* TODO: We have a design issue to be solved here. Only when we
+		 * receive the first batch, we know which engine is used by the
+		 * user. But here GuC expects the lrc and ring to be pinned. It
+		 * is not an issue for default context, which is the only one
+		 * for now who owns a GuC client. But for future owner of GuC
+		 * client, need to make sure lrc is pinned prior to enter here.
+		 */
+		obj = client->owner->engine[i].state;
+		if (!obj)
+			break;
+
+		ring = client->owner->engine[i].ringbuf->ring;
+
+		lrc->context_desc = execlists_ctx_descriptor(ring, obj);
+		/* The state page is after PPHWSP */
+		lrc->ring_lcra = i915_gem_obj_ggtt_offset(obj) +
+				LRC_STATE_PN * PAGE_SIZE;
+		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
+				(ring->id << GUC_ELC_ENGINE_OFFSET);
+
+		obj = client->owner->engine[i].ringbuf->obj;
+
+		lrc->ring_begin = i915_gem_obj_ggtt_offset(obj);
+		lrc->ring_end = lrc->ring_begin + obj->base.size - 1;
+		lrc->ring_next_free_location = lrc->ring_begin;
+		lrc->ring_current_tail_pointer_value = 0;
+
+		desc.engines_used |= (1 << ring->id);
+	}
+
+	WARN_ON(desc.engines_used == 0);
+
 	/*
 	 * The CPU address is only needed at certain points, so kmap_atomic on
 	 * demand instead of storing it in the ctx descriptor.
@@ -650,6 +686,7 @@ static void guc_client_free(struct drm_device *dev,
  * Return:	An i915_guc_client object if success.
  */
 static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
+						struct intel_context *ctx,
 						u32 priority)
 {
 	struct i915_guc_client *client;
@@ -698,6 +735,8 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
 		/* XXX: evict a doorbell instead */
 		goto err;
 
+	client->owner = ctx;
+
 	guc_init_proc_desc(guc, client);
 	guc_init_ctx_desc(guc, client);
 	guc_init_doorbell(guc, client);
@@ -796,10 +835,11 @@ int i915_guc_submission_enable(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_guc *guc = &dev_priv->guc;
+	struct intel_context *ctx = dev_priv->ring[RCS].default_context;
 	struct i915_guc_client *client;
 
 	/* client for execbuf submission */
-	client = guc_client_alloc(dev, GUC_CTX_PRIORITY_NORMAL);
+	client = guc_client_alloc(dev, ctx, GUC_CTX_PRIORITY_NORMAL);
 	if (!client) {
 		DRM_ERROR("Failed to create execbuf guc_client\n");
 		return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 147d288..4f7d55f 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -33,6 +33,7 @@
 struct i915_guc_client {
 	spinlock_t wq_lock;
 	struct drm_i915_gem_object *client_obj;
+	struct intel_context *owner;
 	u32 priority;
 	off_t doorbell_offset;
 	off_t proc_desc_offset;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4fd1941..2801fe2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -180,7 +180,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
  */
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 {
-	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
+			LRC_PPHWSP_PN * PAGE_SIZE;
 
 	/* LRCA is required to be 4K aligned so the more significant 20 bits
 	 * are globally unique */
@@ -192,7 +193,8 @@ uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 {
 	struct drm_device *dev = ring->dev;
 	uint64_t desc;
-	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
+			LRC_PPHWSP_PN * PAGE_SIZE;
 
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
@@ -262,7 +264,7 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	struct page *page;
 	uint32_t *reg_state;
 
-	page = i915_gem_object_get_page(ctx_obj, 1);
+	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
 
 	reg_state[CTX_RING_TAIL+1] = tail;
@@ -644,13 +646,17 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf,
 				      struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
 	intel_logical_ring_advance(ringbuf);
 
 	if (intel_ring_stopped(ring))
 		return;
 
-	execlists_context_queue(ring, ctx, ringbuf->tail, request);
+	if (dev_priv->guc.execbuf_client)
+		i915_guc_submit(dev_priv->guc.execbuf_client, ctx, ring);
+	else
+		execlists_context_queue(ring, ctx, ringbuf->tail, request);
 }
 
 static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
@@ -914,18 +920,23 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 {
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int ret = 0;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 	if (ctx->engine[ring->id].pin_count++ == 0) {
-		ret = i915_gem_obj_ggtt_pin(ctx_obj,
-				GEN8_LR_CONTEXT_ALIGN, 0);
+		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
+				PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE);
 		if (ret)
 			goto reset_pin_count;
 
 		ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 		if (ret)
 			goto unpin_ctx_obj;
+
+		/* Invalidate GuC TLB. */
+		if (i915.enable_guc_submission)
+			I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
 	}
 
 	return ret;
@@ -1337,8 +1348,13 @@ out:
 static int gen8_init_rcs_context(struct intel_engine_cs *ring,
 		       struct intel_context *ctx)
 {
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int ret;
 
+	/* Invalidate GuC TLB. */
+	if (i915.enable_guc_submission)
+		I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+
 	ret = intel_logical_ring_workarounds_emit(ring, ctx);
 	if (ret)
 		return ret;
@@ -1677,7 +1693,7 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 
 	/* The second page of the context object contains some fields which must
 	 * be set up prior to the first execution. */
-	page = i915_gem_object_get_page(ctx_obj, 1);
+	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
 
 	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
@@ -1823,12 +1839,14 @@ static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *default_ctx_obj)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct page *page;
 
 	/* The status page is offset 0 from the default context object
 	 * in LRC mode. */
-	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(default_ctx_obj);
-	ring->status_page.page_addr =
-			kmap(sg_page(default_ctx_obj->pages->sgl));
+	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(default_ctx_obj)
+			+ LRC_PPHWSP_PN * PAGE_SIZE;
+	page = i915_gem_object_get_page(default_ctx_obj, LRC_PPHWSP_PN);
+	ring->status_page.page_addr = kmap(page);
 	ring->status_page.obj = default_ctx_obj;
 
 	I915_WRITE(RING_HWS_PGA(ring->mmio_base),
@@ -1864,6 +1882,9 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	context_size = round_up(get_lr_context_size(ring), 4096);
 
+	/* One extra page as the sharing data between driver and GuC */
+	context_size += PAGE_SIZE * LRC_PPHWSP_PN;
+
 	ctx_obj = i915_gem_alloc_object(dev, context_size);
 	if (!ctx_obj) {
 		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed.\n");
@@ -1871,7 +1892,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	}
 
 	if (is_global_default_ctx) {
-		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
+				PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE);
 		if (ret) {
 			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n",
 					ret);
@@ -1890,7 +1912,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	ringbuf->ring = ring;
 
-	ringbuf->size = 32 * PAGE_SIZE;
+	ringbuf->size = 4 * PAGE_SIZE;
 	ringbuf->effective_size = ringbuf->size;
 	ringbuf->head = 0;
 	ringbuf->tail = 0;
@@ -1981,7 +2003,7 @@ void intel_lr_context_reset(struct drm_device *dev,
 			WARN(1, "Failed get_pages for context obj\n");
 			continue;
 		}
-		page = i915_gem_object_get_page(ctx_obj, 1);
+		page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
 		reg_state = kmap_atomic(page);
 
 		reg_state[CTX_RING_HEAD+1] = 0;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 19c9a02..fd5d791 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -67,6 +67,12 @@ static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
 }
 
 /* Logical Ring Contexts */
+
+/* One extra page is added before LRC for GuC as shared data */
+#define LRC_GUCSHR_PN	(0)
+#define LRC_PPHWSP_PN	(LRC_GUCSHR_PN + 1)
+#define LRC_STATE_PN	(LRC_PPHWSP_PN + 1)
+
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 14/15] drm/i915: Debugfs interface for GuC submission statistics
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (12 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 13/15] drm/i915: Integrate GuC-based command submission Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-16  9:28   ` Chris Wilson
  2015-06-15 18:36 ` [PATCH 15/15] Documentation/drm: kerneldoc for GuC Dave Gordon
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

This provides a means of reading status and counts relating
to GuC actions and submissions.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Alex Dai <yu.dai@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   41 +++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c6e2582..e699b38 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2388,6 +2388,46 @@ static int i915_guc_load_status_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int i915_guc_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_guc guc;
+	struct i915_guc_client client = { .client_obj = 0 };
+
+	if (!HAS_GUC_SCHED(dev_priv->dev))
+		return 0;
+
+	/* Take a local copy of the GuC data, so we can dump it at leisure */
+	spin_lock(&dev_priv->guc.host2guc_lock);
+	guc = dev_priv->guc;
+	if (guc.execbuf_client) {
+		spin_lock(&guc.execbuf_client->wq_lock);
+		client = *guc.execbuf_client;
+		spin_unlock(&guc.execbuf_client->wq_lock);
+	}
+	spin_unlock(&dev_priv->guc.host2guc_lock);
+
+	seq_printf(m, "GuC total action count: %llu\n", guc.action_count);
+	seq_printf(m, "GuC last action command: 0x%x\n", guc.action_cmd);
+	seq_printf(m, "GuC last action status: 0x%x\n", guc.action_status);
+
+	seq_printf(m, "GuC action failure count: %u\n", guc.action_fail);
+	seq_printf(m, "GuC last action error code: %d\n", guc.action_err);
+
+	seq_printf(m, "\nGuC execbuf client @ %p:\n", guc.execbuf_client);
+	seq_printf(m, "\tTotal submissions: %llu\n", client.submissions);
+	seq_printf(m, "\tFailed to queue: %u\n", client.q_fail);
+	seq_printf(m, "\tFailed doorbell: %u\n", client.b_fail);
+	seq_printf(m, "\tLast submission result: %d\n", client.retcode);
+
+	/* Add more as required ... */
+	seq_puts(m, "\n");
+
+	return 0;
+}
+
 static int i915_guc_log_dump(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -5110,6 +5150,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gem_hws_bsd", i915_hws_info, 0, (void *)VCS},
 	{"i915_gem_hws_vebox", i915_hws_info, 0, (void *)VECS},
 	{"i915_gem_batch_pool", i915_gem_batch_pool_info, 0},
+	{"i915_guc_info", i915_guc_info, 0},
 	{"i915_guc_load_status", i915_guc_load_status_info, 0},
 	{"i915_guc_log_dump", i915_guc_log_dump, 0},
 	{"i915_frequency_info", i915_frequency_info, 0},
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 15/15] Documentation/drm: kerneldoc for GuC
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (13 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 14/15] drm/i915: Debugfs interface for GuC submission statistics Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-15 18:36 ` [PATCH 16/15] drm/i915: Enable GuC submission, where supported Dave Gordon
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

Add overview design of GuC, plus some key points related to
the implementation.

Signed-off-by: Alex Dai <yu.dai@intel.com>
---
 Documentation/DocBook/drm.tmpl |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
index c0312cb..88b53ee 100644
--- a/Documentation/DocBook/drm.tmpl
+++ b/Documentation/DocBook/drm.tmpl
@@ -4218,6 +4218,25 @@ int num_ioctls;</synopsis>
       </sect2>
     </sect1>
     <sect1>
+      <title>GuC-based Command Submission</title>
+      <sect2>
+        <title>Microcontroller (uC) firmware loading support</title>
+!Pdrivers/gpu/drm/i915/intel_uc_loader.c Microcontroller (uC) firmware loading support
+!Idrivers/gpu/drm/i915/intel_uc_loader.c
+      </sect2>
+      <sect2>
+        <title>GuC</title>
+!Pdrivers/gpu/drm/i915/intel_guc_loader.c GuC-specific firmware loader
+!Idrivers/gpu/drm/i915/intel_guc_loader.c
+      </sect2>
+      <sect2>
+        <title>GuC Client</title>
+!Pdrivers/gpu/drm/i915/intel_guc_submission.c GuC-based command submissison
+!Idrivers/gpu/drm/i915/intel_guc_submission.c
+      </sect2>
+    </sect1>
+
+    <sect1>
       <title> Tracing </title>
       <para>
     This sections covers all things related to the tracepoints implemented in
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 16/15] drm/i915: Enable GuC submission, where supported
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (14 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 15/15] Documentation/drm: kerneldoc for GuC Dave Gordon
@ 2015-06-15 18:36 ` Dave Gordon
  2015-06-17 12:43 ` [PATCH 00/15] Batch submission via GuC Daniel Vetter
  2015-06-24 12:16 ` Daniel Vetter
  17 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-15 18:36 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_params.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 5134095..926a6df 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -54,7 +54,7 @@ struct i915_params i915 __read_mostly = {
 	.verbose_state_checks = 1,
 	.nuclear_pageflip = 0,
 	.edp_vswing = 0,
-	.enable_guc_submission = false,
+	.enable_guc_submission = true,
 	.guc_log_level = -1,
 };
 
@@ -196,7 +196,7 @@ MODULE_PARM_DESC(edp_vswing,
 		 "2=default swing(400mV))");
 
 module_param_named(enable_guc_submission, i915.enable_guc_submission, bool, 0400);
-MODULE_PARM_DESC(enable_guc_submission, "Enable GuC submission (default:false)");
+MODULE_PARM_DESC(enable_guc_submission, "Enable GuC submission (default:true)");
 
 module_param_named(guc_log_level, i915.guc_log_level, int, 0400);
 MODULE_PARM_DESC(guc_log_level,
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-15 18:36 ` [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c Dave Gordon
@ 2015-06-15 20:09   ` Chris Wilson
  2015-06-17  7:23     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-15 20:09 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> i915_gem_object_write() is a generic function to copy data from a plain
> linear buffer to a paged gem object.
> 
> We will need this for the microcontroller firmware loading support code.
> 
> Issue: VIZ-4884
> Signed-off-by: Alex Dai <yu.dai@intel.com>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |    2 ++
>  drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 611fbd8..9094c06 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
>  			 const struct drm_i915_gem_object_ops *ops);
> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
> +			  const void *data, size_t size);
>  struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>  						  size_t size);
>  void i915_init_vm(struct drm_i915_private *dev_priv,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index be35f04..75d63c2 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
>  	return false;
>  }
>  
> +/* Fill the @obj with the @size amount of @data */
> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
> +			const void *data, size_t size)
> +{
> +	struct sg_table *sg;
> +	size_t bytes;
> +	int ret;
> +
> +	ret = i915_gem_object_get_pages(obj);
> +	if (ret)
> +		return ret;
> +
> +	i915_gem_object_pin_pages(obj);

You don't set the object into the CPU domain, or instead manually handle
the domain flushing. You don't handle objects that cannot be written
directly by the CPU, nor do you handle objects whose representation in
memory is not linear.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/15] drm/i915: Add GuC-related header files
  2015-06-15 18:36 ` [PATCH 04/15] drm/i915: Add GuC-related header files Dave Gordon
@ 2015-06-15 20:20   ` Chris Wilson
  2015-06-17 15:01     ` Dave Gordon
  2015-06-24  7:41     ` Dave Gordon
  0 siblings, 2 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-15 20:20 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:22PM +0100, Dave Gordon wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> intel_guc_api.h contains the subset of the GuC interface that we
> will need for submission of commands through the GuC. These MUST
> be kept in sync with the definitions used by the GuC firmware.

intel_guc_hw.h or intel_guc_abi.h then. Calling it API doesn't make it
clear whose API you are talking about.
 
> intel_guc.h defines structures and parameters relevant to loading
> the GuC firmware and setting it running. Some of these also need
> to be kept in sync with the firmware.

intel_guc.h should be developed organically as features are added in the
series so that it is possible to track against implementation. Certainly
not in a patch that adds the entirety of the firmware ABI.

> +struct i915_guc_client {
> +	spinlock_t wq_lock;
> +	struct drm_i915_gem_object *client_obj;
> +	u32 priority;
> +	off_t doorbell_offset;
> +	off_t proc_desc_offset;
> +	off_t wq_offset;
> +	uint16_t doorbell_id;
> +	uint32_t ctx_index;
> +	uint32_t wq_size;
> +	uint32_t wq_tail;
> +	uint32_t cookie;
> +
> +	/* GuC submission statistics & status */
> +	uint64_t submissions;
> +	uint32_t q_fail;
> +	uint32_t b_fail;
> +	int retcode;

Mixture of classic kernel types and stdint types. And off_t! What size
exactly do you mean there?

> +};
> +
> +#define I915_MAX_DOORBELLS	256
> +#define INVALID_DOORBELL_ID	I915_MAX_DOORBELLS
> +
> +#define INVALID_CTX_ID		(MAX_GUC_GPU_CONTEXTS+1)
> +
> +struct intel_guc {
> +	/* Generic uC firmware management */
> +	struct intel_uc_fw guc_fw;

Haven't checked for size, but I guess this is going to be an init only
structure that we could discard.

> +	/* GuC-specific additions */
> +	uint32_t fw_ver_major;
> +	uint32_t fw_ver_minor;

I have no idea why you would want to keep these around.

> +	spinlock_t host2guc_lock;

Seems overly specific, no comment as to what it locks and lack of
implementation to be able to confirm.
> +
> +	struct drm_i915_gem_object *ctx_pool_obj;
> +	struct drm_i915_gem_object *log_obj;
> +	struct i915_guc_client *execbuf_client;

I expect these will want modification based on patches to be reviewed.

> +	struct ida ctx_ids;
> +	uint32_t log_flags;
> +	int db_cacheline;
> +	DECLARE_BITMAP(doorbell_bitmap, I915_MAX_DOORBELLS);
> +
> +	/* Action status & statistics */
> +	uint64_t action_count;		/* Total commands issued	*/
> +	uint32_t action_cmd;		/* Last command word		*/
> +	uint32_t action_status;		/* Last return status		*/
> +	uint32_t action_fail;		/* Total number of failures	*/
> +	int32_t action_err;		/* Last error code		*/

Any group of prefix_ immediately raises the question of "why isn't this
a struct?"
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-06-15 18:36 ` [PATCH 05/15] drm/i915: GuC-specific firmware loader Dave Gordon
@ 2015-06-15 20:30   ` Chris Wilson
  2015-06-18 17:53     ` Yu Dai
  2015-06-18 18:54     ` Dave Gordon
  0 siblings, 2 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-15 20:30 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote:
> +	/* We can't enable contexts until all firmware is loaded */
> +	ret = intel_guc_ucode_load(dev, false);

Pardon. I know context initialisation is broken, but adding to that
breakage is not pleasant.

>  	ret = i915_gem_context_enable(dev_priv);
>  	if (ret && ret != -EIO) {
>  		DRM_ERROR("Context enable failed %d\n", ret);

> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index 82367c9..0b44265 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -166,4 +166,9 @@ struct intel_guc {
>  #define GUC_WD_VECS_IER		0xC558
>  #define GUC_PM_P24C_IER		0xC55C
>  
> +/* intel_guc_loader.c */
> +extern void intel_guc_ucode_init(struct drm_device *dev);
> +extern int intel_guc_ucode_load(struct drm_device *dev, bool wait);
> +extern void intel_guc_ucode_fini(struct drm_device *dev);
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
> new file mode 100644
> index 0000000..16eef4c
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
> @@ -0,0 +1,416 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Vinit Azad <vinit.azad@intel.com>
> + *    Ben Widawsky <ben@bwidawsk.net>
> + *    Dave Gordon <david.s.gordon@intel.com>
> + *    Alex Dai <yu.dai@intel.com>
> + */
> +#include <linux/firmware.h>
> +#include "i915_drv.h"
> +#include "intel_guc.h"
> +
> +/**
> + * DOC: GuC
> + *
> + * intel_guc:
> + * Top level structure of guc. It handles firmware loading and manages client
> + * pool and doorbells. intel_guc owns a i915_guc_client to replace the legacy
> + * ExecList submission.
> + *
> + * Firmware versioning:
> + * The firmware build process will generate a version header file with major and
> + * minor version defined. The versions are built into CSS header of firmware.
> + * i915 kernel driver set the minimal firmware version required per platform.
> + * The firmware installation package will install (symbolic link) proper version
> + * of firmware.
> + *
> + * GuC address space:
> + * GuC does not allow any gfx GGTT address that falls into range [0, WOPCM_TOP),
> + * which is reserved for Boot ROM, SRAM and WOPCM. Currently this top address is
> + * 512K. In order to exclude 0-512K address space from GGTT, all gfx objects
> + * used by GuC is pinned with PIN_OFFSET_BIAS along with size of WOPCM.
> + *
> + * Firmware log:
> + * Firmware log is enabled by setting i915.guc_log_level to non-negative level.
> + * Log data is printed out via reading debugfs i915_guc_log_dump. Reading from
> + * i915_guc_load_status will print out firmware loading status and scratch
> + * registers value.
> + *
> + */
> +
> +#define I915_SKL_GUC_UCODE "i915/skl_guc_ver3.bin"
> +MODULE_FIRMWARE(I915_SKL_GUC_UCODE);
> +
> +static u32 get_gttype(struct drm_device *dev)
> +{
> +	/* XXX: GT type based on PCI device ID? field seems unused by fw */
> +	return 0;
> +}
> +
> +static u32 get_core_family(struct drm_device *dev)

For new code we really should be in the habit of passing around the
right pointer, not dev.

> +{
> +	switch (INTEL_INFO(dev)->gen) {
> +	case 8:
> +		return GFXCORE_FAMILY_GEN8;
> +	case 9:
> +		return GFXCORE_FAMILY_GEN9;
> +	default:
> +		DRM_ERROR("GUC: unknown gen for scheduler init\n");
> +		return GFXCORE_FAMILY_FORCE_ULONG;
> +	}
> +}
> +
> +static void set_guc_init_params(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_guc *guc = &dev_priv->guc;
> +	u32 params[GUC_CTL_MAX_DWORDS];
> +	int i;
> +
> +	memset(&params, 0, sizeof(params));
> +
> +	params[GUC_CTL_DEVICE_INFO] |=
> +		(get_gttype(dev_priv->dev) << GUC_CTL_GTTYPE_SHIFT) |
> +		(get_core_family(dev_priv->dev) << GUC_CTL_COREFAMILY_SHIFT);
> +
> +	/* GuC ARAT increment is 10 ns. GuC default scheduler quantum is one
> +	 * second. This ARAR is calculated by:
> +	 * Scheduler-Quantum-in-ns / ARAT-increment-in-ns = 1000000000 / 10
> +	 */
> +	params[GUC_CTL_ARAT_HIGH] = 0;
> +	params[GUC_CTL_ARAT_LOW] = 100000000;
> +
> +	params[GUC_CTL_WA] |= GUC_CTL_WA_UK_BY_DRIVER;
> +
> +	params[GUC_CTL_FEATURE] |= GUC_CTL_DISABLE_SCHEDULER |
> +			GUC_CTL_VCS2_ENABLED;
> +
> +	if (i915.guc_log_level >= 0) {
> +		params[GUC_CTL_LOG_PARAMS] = guc->log_flags;
> +		params[GUC_CTL_DEBUG] =
> +			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
> +	}
> +
> +	I915_WRITE(SOFT_SCRATCH(0), 0);
> +
> +	for (i = 0; i < GUC_CTL_MAX_DWORDS; i++)
> +		I915_WRITE(SOFT_SCRATCH(1 + i), params[i]);
> +}
> +
> +/* Read GuC status register (GUC_STATUS)
> + * Return true if get a success code from normal boot or RC6 boot
> + */
> +static inline bool i915_guc_get_status(struct drm_i915_private *dev_priv,
> +					u32 *status)
> +{
> +	*status = I915_READ(GUC_STATUS);
> +	return (((*status) & GS_UKERNEL_MASK) == GS_UKERNEL_READY ||
> +		((*status) & GS_UKERNEL_MASK) == GS_UKERNEL_LAPIC_DONE);

Weird function. Does two things, only one of those is get_status. Maybe
you would like to split this up better and use a switch when you mean a
switch. Or rename it to reflect it's use only as a condition.

> +}
> +
> +/* Transfers the firmware image to RAM for execution by the microcontroller.
> + *
> + * GuC Firmware layout:
> + * +-------------------------------+  ----
> + * |          CSS header           |  128B
> + * +-------------------------------+  ----
> + * |             uCode             |
> + * +-------------------------------+  ----
> + * |         RSA signature         |  256B
> + * +-------------------------------+  ----
> + * |         RSA public Key        |  256B
> + * +-------------------------------+  ----
> + * |       Public key modulus      |    4B
> + * +-------------------------------+  ----
> + *
> + * Architecturally, the DMA engine is bidirectional, and in can potentially
> + * even transfer between GTT locations. This functionality is left out of the
> + * API for now as there is no need for it.
> + *
> + * Be note that GuC need the CSS header plus uKernel code to be copied as one
> + * chunk of data. RSA sig data is loaded via MMIO.
> + */
> +static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
> +	struct drm_i915_gem_object *fw_obj = guc_fw->uc_fw_obj;
> +	unsigned long offset;
> +	struct sg_table *sg = fw_obj->pages;
> +	u32 status, ucode_size, rsa[UOS_RSA_SIG_SIZE / sizeof(u32)];
> +	int i, ret = 0;
> +
> +	/* uCode size, also is where RSA signature starts */
> +	offset = ucode_size = guc_fw->uc_fw_size - UOS_CSS_SIGNING_SIZE;
> +
> +	/* Copy RSA signature from the fw image to HW for verification */
> +	sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, UOS_RSA_SIG_SIZE, offset);
> +	for (i = 0; i < UOS_RSA_SIG_SIZE / sizeof(u32); i++)
> +		I915_WRITE(UOS_RSA_SCRATCH_0 + i * sizeof(u32), rsa[i]);
> +
> +	/* Set the source address for the new blob */
> +	offset = i915_gem_obj_ggtt_offset(fw_obj);

Why would it even have a GGTT vma? There's no precondition here to
assert that it should.

> +	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
> +	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
> +
> +	/* Set the destination. Current uCode expects an 8k stack starting from
> +	 * offset 0. */
> +	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
> +
> +	/* XXX: The image is automatically transfered to SRAM after the RSA
> +	 * verification. This is why the address space is chosen as such. */
> +	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
> +
> +	I915_WRITE(DMA_COPY_SIZE, ucode_size);
> +
> +	/* Finally start the DMA */
> +	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
> +

Just assuming that the writes land and in the order you expect?

> +	/*
> +	 * Spin-wait for the DMA to complete & the GuC to start up.
> +	 * NB: Docs recommend not using the interrupt for completion.
> +	 * FIXME: what's a valid timeout?
> +	 */
> +	ret = wait_for_atomic(i915_guc_get_status(dev_priv, &status), 10);

FIXME, error handling is too hard.

> +	DRM_DEBUG_DRIVER("DMA status = 0x%x, GuC status 0x%x\n",
> +			I915_READ(DMA_CTRL), status);
> +
> +	if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
> +		DRM_ERROR("%s firmware signature verification failed\n",
> +			guc_fw->uc_name);
> +		ret = -ENOEXEC;
> +	}
> +
> +	DRM_DEBUG_DRIVER("GuC fw load status %s %d\n",
> +			ret ? "FAIL" : "SUCCESS", ret);
> +
> +	return ret;
> +}

I'm guessing the other functions are basically more of the same...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 09/15] drm/i915: GuC submission setup, phase 1
  2015-06-15 18:36 ` [PATCH 09/15] drm/i915: GuC submission setup, phase 1 Dave Gordon
@ 2015-06-15 21:32   ` Chris Wilson
  2015-06-19 17:02     ` Dave Gordon
  2015-06-16 11:44   ` Chris Wilson
  1 sibling, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-15 21:32 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:27PM +0100, Dave Gordon wrote:
> +static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device *dev,
> +							u32 size)
> +{
> +	struct drm_i915_gem_object *obj;
> +
> +	obj = i915_gem_alloc_object(dev, size);
> +	if (!obj)
> +		return NULL;

Does it need to be a shmemfs object?

> +	if (i915_gem_object_get_pages(obj)) {
> +		drm_gem_object_unreference(&obj->base);
> +		return NULL;
> +	}

This is a random function call.

> +	if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
> +			PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE)) {
> +		drm_gem_object_unreference(&obj->base);
> +		return NULL;

How about reporting the right error code?

> +	}
> +
> +	return obj;
> +}
> +
> +/**
> + * gem_release_guc_obj() - Release gem object allocated for GuC usage
> + * @obj:	gem obj to be released
> +  */
> +static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
> +{
> +	if (!obj)
> +		return;
> +
> +	if (i915_gem_obj_is_pinned(obj))
> +		i915_gem_object_ggtt_unpin(obj);

What?

> +	drm_gem_object_unreference(&obj->base);
> +}
> +
> +/*
> + * Set up the memory resources to be shared with the GuC.  At this point,
> + * we require just one object that can be mapped through the GGTT.
> + */
> +int i915_guc_submission_init(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;

Bleh.

> +	const size_t ctxsize = sizeof(struct guc_context_desc);
> +	const size_t poolsize = MAX_GUC_GPU_CONTEXTS * ctxsize;
> +	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
> +	struct intel_guc *guc = &dev_priv->guc;
> +
> +	if (!i915.enable_guc_submission)
> +		return 0; /* not enabled  */
> +
> +	if (guc->ctx_pool_obj)
> +		return 0; /* already allocated */

Eh? Where have you hooked into... So looking at that, it looks like you
want to move this into the device initialisation rather than guc
firmware load. To me at least they are conceptually separate stages, and
judging by the above combining them has resulted in very clumsy code.

> +	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv->dev, gemsize);
> +	if (!guc->ctx_pool_obj)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&dev_priv->guc.host2guc_lock);
> +
> +	ida_init(&guc->ctx_ids);
> +
> +	memset(guc->doorbell_bitmap, 0, sizeof(guc->doorbell_bitmap));
> +	guc->db_cacheline = 0;

Before you relied on guc being zeroed, and now you memset it again.

> +
> +	return 0;
> +}
> +
> +void i915_guc_submission_fini(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_guc *guc = &dev_priv->guc;
> +
> +	gem_release_guc_obj(dev_priv->guc.log_obj);
> +	guc->log_obj = NULL;
> +
> +	if (guc->ctx_pool_obj)
> +		ida_destroy(&guc->ctx_ids);

Interesting guard. Maybe just make the GuC controller a pointer from
i915 and then you can do a more natural if (i915->guc == NULL) return;

> +	gem_release_guc_obj(guc->ctx_pool_obj);
> +	guc->ctx_pool_obj = NULL;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index 0b44265..06b68c2 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -171,4 +171,8 @@ extern void intel_guc_ucode_init(struct drm_device *dev);
>  extern int intel_guc_ucode_load(struct drm_device *dev, bool wait);
>  extern void intel_guc_ucode_fini(struct drm_device *dev);
>  
> +/* i915_guc_submission.c */
> +int i915_guc_submission_init(struct drm_device *dev);
> +void i915_guc_submission_fini(struct drm_device *dev);
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
> index 16eef4c..0f74876 100644
> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
> @@ -111,6 +111,21 @@ static void set_guc_init_params(struct drm_i915_private *dev_priv)
>  			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
>  	}
>  
> +	/* If GuC scheduling is enabled, setup params here. */
> +	if (i915.enable_guc_submission) {
> +		u32 pgs = i915_gem_obj_ggtt_offset(dev_priv->guc.ctx_pool_obj);
> +		u32 ctx_in_16 = MAX_GUC_GPU_CONTEXTS / 16;

So really you didn't need to pin the ctx_pool_obj until this point?

> +
> +		pgs >>= PAGE_SHIFT;
> +		params[GUC_CTL_CTXINFO] = (pgs << GUC_CTL_BASE_ADDR_SHIFT) |
> +			(ctx_in_16 << GUC_CTL_CTXNUM_IN16_SHIFT);
> +
> +		params[GUC_CTL_FEATURE] |= GUC_CTL_KERNEL_SUBMISSIONS;
> +
> +		/* Unmask this bit to enable GuC scheduler */
> +		params[GUC_CTL_FEATURE] &= ~GUC_CTL_DISABLE_SCHEDULER;

/* Enable multiple context submission through the GuC */
params[GUC_CTL_FEATURE] &= ~GUC_CTL_DISABLE_SCHEDULER;
params[GUC_CTL_FEATURE] |= GUC_CTL_KERNEL_SUBMISSIONS;

Try to keep comments to explain why rather than what. Most of the
comments here fall into the "i++; // postincrement i" category.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 10/15] drm/i915: Enable GuC firmware log
  2015-06-15 18:36 ` [PATCH 10/15] drm/i915: Enable GuC firmware log Dave Gordon
@ 2015-06-15 21:40   ` Chris Wilson
  2015-06-16  9:26   ` Tvrtko Ursulin
  1 sibling, 0 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-15 21:40 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:28PM +0100, Dave Gordon wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> Allocate a GEM object to hold GuC log data. A debugfs interface
> (i915_guc_log_dump) is provided to print out the log content.
> 
> Issue: VIZ-4884
> Signed-off-by: Alex Dai <yu.dai@intel.com>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |   29 +++++++++++++++++++
>  drivers/gpu/drm/i915/i915_guc_submission.c |   43 ++++++++++++++++++++++++++++
>  2 files changed, 72 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index c52a745..b0aa4af 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2388,6 +2388,34 @@ static int i915_guc_load_status_info(struct seq_file *m, void *data)
>  	return 0;
>  }
>  
> +static int i915_guc_log_dump(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_gem_object *log_obj = dev_priv->guc.log_obj;
> +	u32 *log;
> +	int i = 0, pg;
> +
> +	if (!log_obj)
> +		return 0;
> +
> +	for (pg = 0; pg < log_obj->base.size / PAGE_SIZE; pg++) {
> +		log = kmap_atomic(i915_gem_object_get_page(log_obj, pg));

Coherency? You don't mention in the changelong how you expect this to be
used. Do you have some parser that polls the debugfs for changes? If
this is likely to become API, use a context parameter instead to return
a handle to the log bo.

> +
> +		for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
> +			seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> +				   *(log + i), *(log + i + 1),
> +				   *(log + i + 2), *(log + i + 3));
> +
> +		kunmap_atomic(log);
> +	}
> +
> +	seq_putc(m, '\n');

You already have a newline at the end.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 11/15] drm/i915: Implementation of GuC client
  2015-06-15 18:36 ` [PATCH 11/15] drm/i915: Implementation of GuC client Dave Gordon
@ 2015-06-15 21:55   ` Chris Wilson
  2015-06-19 17:55     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-15 21:55 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:29PM +0100, Dave Gordon wrote:
> +/* Get valid workqueue item and return it back to offset */
> +static int guc_get_workqueue_space(struct i915_guc_client *gc, u32 *offset)
> +{
> +	struct guc_process_desc *desc;
> +	void *base;
> +	u32 size = sizeof(struct guc_wq_item);
> +	int ret = 0, timeout_counter = 200;
> +	unsigned long flags;
> +
> +	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
> +	desc = base + gc->proc_desc_offset;
> +
> +	while (timeout_counter-- > 0) {
> +		spin_lock_irqsave(&gc->wq_lock, flags);
> +
> +		ret = wait_for_atomic(CIRC_SPACE(gc->wq_tail, desc->head,
> +				gc->wq_size) >= size, 1);

What is the point of this loop? Drop the spinlock 200 times? You already
have a timeout, the loop extends that by a factor or 200. You merely
allow gazzumping, however I haven't looked at the locking to see what
you intend to lock (since it is not described at all).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 13/15] drm/i915: Integrate GuC-based command submission
  2015-06-15 18:36 ` [PATCH 13/15] drm/i915: Integrate GuC-based command submission Dave Gordon
@ 2015-06-16  9:22   ` Chris Wilson
  2015-06-19 18:18     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-16  9:22 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:31PM +0100, Dave Gordon wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> GuC-based submission is mostly the same as execlist mode, up to
> intel_logical_ring_advance_and_submit(), where the context being
> dispatched would be added to the execlist queue; at this point
> we submit the context to the GuC backend instead.
> 
> There are, however, a few other changes also required, notably:
> 1.  Contexts must be pinned at GGTT addresses accessible by the GuC
>     i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
>     PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
> 
> 2.  The GuC's TLB must be invalidated after a context is pinned at
>     a new GGTT address.
> 
> 3.  GuC firmware uses the one page before Ring Context as shared data.
>     Therefore, whenever driver wants to get base address of LRC, we
>     will offset one page for it. LRC_PPHWSP_PN is defined as the page
>     number of LRCA.
> 
> 4.  In the work queue used to pass requests to the GuC, the GuC
>     firmware requires the ring-tail-offset to be represented as an
>     11-bit value, expressed in QWords. Therefore, the ringbuffer
>     size must be reduced to the representable range (4 pages).

I don't like how this sabotages the existing execlists implementation
in order for i915_guc_submission (an interesting choice of file name,
since we go i915_gem_execbuffer (API) -> intel_execlists (HW) ->
i915_guc_submission (HW), not fitting into our, admittedly loose, naming
convention very well) to share a few functions. Even a couple of which
are already vfunc.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/15] drm/i915: Interrupt routing for GuC submission
  2015-06-15 18:36 ` [PATCH 12/15] drm/i915: Interrupt routing for GuC submission Dave Gordon
@ 2015-06-16  9:24   ` Chris Wilson
  2015-06-17  8:20     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-16  9:24 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:30PM +0100, Dave Gordon wrote:
> +static void direct_interrupts_to_guc(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_engine_cs *ring;
> +	int i, irqs;
> +
> +	/* tell all command streamers to forward interrupts and vblank to GuC */
> +	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_ALWAYS);
> +	irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
> +	for_each_ring(ring, dev_priv, i)
> +		I915_WRITE(RING_MODE_GEN7(ring), irqs);
> +
> +	/* tell DE to send (all) flip_done to GuC */
> +	irqs = DERRMR_PIPEA_PRI_FLIP_DONE | DERRMR_PIPEA_SPR_FLIP_DONE |
> +	       DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEB_SPR_FLIP_DONE |
> +	       DERRMR_PIPEC_PRI_FLIP_DONE | DERRMR_PIPEC_SPR_FLIP_DONE;
> +	/* Unmasked bits will cause GuC response message to be sent */
> +	I915_WRITE(DE_GUCRMR, ~irqs);

That's scary since userspace depends on a few more DERRMR events
(wait-for-scanline). Where will they end up?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 10/15] drm/i915: Enable GuC firmware log
  2015-06-15 18:36 ` [PATCH 10/15] drm/i915: Enable GuC firmware log Dave Gordon
  2015-06-15 21:40   ` Chris Wilson
@ 2015-06-16  9:26   ` Tvrtko Ursulin
  2015-06-16 11:40     ` Chris Wilson
  1 sibling, 1 reply; 94+ messages in thread
From: Tvrtko Ursulin @ 2015-06-16  9:26 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx


On 06/15/2015 07:36 PM, Dave Gordon wrote:
> From: Alex Dai <yu.dai@intel.com>
>
> Allocate a GEM object to hold GuC log data. A debugfs interface
> (i915_guc_log_dump) is provided to print out the log content.
>
> Issue: VIZ-4884
> Signed-off-by: Alex Dai <yu.dai@intel.com>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c        |   29 +++++++++++++++++++
>   drivers/gpu/drm/i915/i915_guc_submission.c |   43 ++++++++++++++++++++++++++++
>   2 files changed, 72 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index c52a745..b0aa4af 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2388,6 +2388,34 @@ static int i915_guc_load_status_info(struct seq_file *m, void *data)
>   	return 0;
>   }
>
> +static int i915_guc_log_dump(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_gem_object *log_obj = dev_priv->guc.log_obj;
> +	u32 *log;
> +	int i = 0, pg;
> +
> +	if (!log_obj)
> +		return 0;
> +
> +	for (pg = 0; pg < log_obj->base.size / PAGE_SIZE; pg++) {
> +		log = kmap_atomic(i915_gem_object_get_page(log_obj, pg));
> +
> +		for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
> +			seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> +				   *(log + i), *(log + i + 1),
> +				   *(log + i + 2), *(log + i + 3));
> +
> +		kunmap_atomic(log);
> +	}

This doesn't look performance critical, but you could also use sg_miter_ 
family of functions/macros to iterate and kmap sg list pages. I did not 
bother figuring out what kind of smarts i915_gem_object_get_page does, 
but it is not likely it can beat sg_miter_ for efficiency.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/15] drm/i915: Debugfs interface for GuC submission statistics
  2015-06-15 18:36 ` [PATCH 14/15] drm/i915: Debugfs interface for GuC submission statistics Dave Gordon
@ 2015-06-16  9:28   ` Chris Wilson
  2015-06-24  8:27     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-16  9:28 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:32PM +0100, Dave Gordon wrote:
> This provides a means of reading status and counts relating
> to GuC actions and submissions.

Anything that ends to ease debugging also tends to ease
postmortem error analysis...

> 
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Signed-off-by: Alex Dai <yu.dai@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   41 +++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index c6e2582..e699b38 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2388,6 +2388,46 @@ static int i915_guc_load_status_info(struct seq_file *m, void *data)
>  	return 0;
>  }
>  
> +static int i915_guc_info(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_guc guc;
> +	struct i915_guc_client client = { .client_obj = 0 };
> +
> +	if (!HAS_GUC_SCHED(dev_priv->dev))
> +		return 0;
> +
> +	/* Take a local copy of the GuC data, so we can dump it at leisure */
> +	spin_lock(&dev_priv->guc.host2guc_lock);
> +	guc = dev_priv->guc;
> +	if (guc.execbuf_client) {
> +		spin_lock(&guc.execbuf_client->wq_lock);
> +		client = *guc.execbuf_client;
> +		spin_unlock(&guc.execbuf_client->wq_lock);
> +	}
> +	spin_unlock(&dev_priv->guc.host2guc_lock);
> +
> +	seq_printf(m, "GuC total action count: %llu\n", guc.action_count);
> +	seq_printf(m, "GuC last action command: 0x%x\n", guc.action_cmd);
> +	seq_printf(m, "GuC last action status: 0x%x\n", guc.action_status);
> +
> +	seq_printf(m, "GuC action failure count: %u\n", guc.action_fail);
> +	seq_printf(m, "GuC last action error code: %d\n", guc.action_err);

If these had been a struct you could have minimised that copy.
Again, it would have been best if the debug inteface had been added all
at once, so we could take the extra infrastructure or leave it out
altogether.

> +	seq_printf(m, "\nGuC execbuf client @ %p:\n", guc.execbuf_client);
> +	seq_printf(m, "\tTotal submissions: %llu\n", client.submissions);
> +	seq_printf(m, "\tFailed to queue: %u\n", client.q_fail);
> +	seq_printf(m, "\tFailed doorbell: %u\n", client.b_fail);
> +	seq_printf(m, "\tLast submission result: %d\n", client.retcode);
> +
> +	/* Add more as required ... */
> +	seq_puts(m, "\n");

Trailing newline, why?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open
  2015-06-15 18:36 ` [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open Dave Gordon
@ 2015-06-16  9:35   ` Chris Wilson
  2015-06-19  9:42     ` Dave Gordon
  2015-06-17 12:18   ` Daniel Vetter
  1 sibling, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-16  9:35 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:25PM +0100, Dave Gordon wrote:
> +static int i915_gem_context_first_open(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret;
> +
> +	/*
> +	 * We can't enable contexts until all firmware is loaded. This
> +	 * call shouldn't return -EAGAIN because we pass wait=true, but
> +	 * it can still fail with code -EIO if the GuC doesn't respond,
> +	 * or -ENOEXEC if the GuC firmware image is invalid.
> +	 */
> +	ret = intel_guc_ucode_load(dev, true);
> +	WARN_ON(ret == -EAGAIN);
> +
> +	/*
> +	 * If an error occurred and GuC submission has been requested, we can
> +	 * attempt recovery by disabling GuC submission and reinitialising
> +	 * the GPU and driver. We then fail this open() anyway, but the next
> +	 * attempt will find that GuC submission is already disabled, and so
> +	 * proceed to complete context initialisation in non-GuC mode instead.
> +	 */
> +	if (ret && i915.enable_guc_submission) {
> +		i915_handle_guc_error(dev, ret);
> +		return ret;
> +	}

This is still backwards. What we wanted was for the submission process
to start up normally and then once the GuC loading succeeds, we then
start submitting the backlog to the GuC. If the loading fails, we can
then submit the backlog via execlists. It may be interesting to even
start userspace before GuC finishes loading.

So this makes more sense as to why you have the tight integration with
execlists then. I still don't think that justifies changing gen8 without
reason.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 08/15] drm/i915: Move execlists defines from .c to .h
  2015-06-15 18:36 ` [PATCH 08/15] drm/i915: Move execlists defines from .c to .h Dave Gordon
@ 2015-06-16  9:37   ` Chris Wilson
  2015-06-17  7:31     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-16  9:37 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx, Michael H. Nguyen

On Mon, Jun 15, 2015 at 07:36:26PM +0100, Dave Gordon wrote:
> From: "Michael H. Nguyen" <michael.h.nguyen@intel.com>
> 
> Move defines from intel_lrc.c to i915_reg.h so they are accessible
> to the GuC submission code; and expose a previously static function
> in the execlist code which will also be required for GuC submission.

What would have been better would have to been to split the lrc code
from the execlists code so that the sharing is more obvious and the
overloading separate from the common code.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status
  2015-06-15 18:36 ` [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status Dave Gordon
@ 2015-06-16  9:40   ` Chris Wilson
  2015-06-19  7:49     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-16  9:40 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:24PM +0100, Dave Gordon wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> The new node provides access to the status of the common uC loader
> code and the GuC-specific loader; also the scratch registers used
> for communicatio between the i915 driver and the GuC firmware.
> 
> Issue: VIZ-4884
> Signed-off-by: Alex Dai <yu.dai@intel.com>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   37 +++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 47636f3..c52a745 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2352,6 +2352,42 @@ static int i915_llc(struct seq_file *m, void *data)
>  	return 0;
>  }
>  
> +static void i915_uc_load_status_info(struct seq_file *m, struct intel_uc_fw *uc_fw)
> +{
> +	seq_printf(m, "%s firmware status:\n\tpath: <%s>\n\tfetch: %d\n\tload: %d\n",
> +			uc_fw->uc_name,
> +			uc_fw->uc_fw_path,
> +			uc_fw->uc_fw_fetch_status,
> +			uc_fw->uc_fw_load_status);

If you made this one seq_printf() per line visualing the resulting
format would have been easier - and easier to modify.

Don't use <%s>, that's just visual noise to make cutting and pasting
harder.

If you can decode numeric status values, do so.

> +}
> +
> +static int i915_guc_load_status_info(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = m->private;
> +	struct drm_i915_private *dev_priv = node->minor->dev->dev_private;
> +	u32 tmp, i;
> +
> +	if (!HAS_GUC_UCODE(dev_priv->dev))

Here and elsewhere it should be return -ENODEV;

> +		return 0;
> +
> +	i915_uc_load_status_info(m, &dev_priv->guc.guc_fw);
> +
> +	tmp = I915_READ(GUC_STATUS);
> +
> +	seq_printf(m, "\nGuC status 0x%08x:\n", tmp);
> +	seq_printf(m, "\tBootrom status = 0x%x\n",
> +		(tmp & GS_BOOTROM_MASK) >> GS_BOOTROM_SHIFT);
> +	seq_printf(m, "\tuKernel status = 0x%x\n",
> +		(tmp & GS_UKERNEL_MASK) >> GS_UKERNEL_SHIFT);
> +	seq_printf(m, "\tMIA Core status = 0x%x\n",
> +		(tmp & GS_MIA_MASK) >> GS_MIA_SHIFT);
> +	seq_puts(m, "\nScratch registers value:\n");
> +	for (i = 0; i < 16; i++)
> +		seq_printf(m, "\t%2d: \t0x%x\n", i, I915_READ(SOFT_SCRATCH(i)));

I have a feeling these probably don't want to be upstreamed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 10/15] drm/i915: Enable GuC firmware log
  2015-06-16  9:26   ` Tvrtko Ursulin
@ 2015-06-16 11:40     ` Chris Wilson
  2015-06-16 12:29       ` Tvrtko Ursulin
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-16 11:40 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Jun 16, 2015 at 10:26:40AM +0100, Tvrtko Ursulin wrote:
> 
> On 06/15/2015 07:36 PM, Dave Gordon wrote:
> >From: Alex Dai <yu.dai@intel.com>
> >
> >Allocate a GEM object to hold GuC log data. A debugfs interface
> >(i915_guc_log_dump) is provided to print out the log content.
> >
> >Issue: VIZ-4884
> >Signed-off-by: Alex Dai <yu.dai@intel.com>
> >Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_debugfs.c        |   29 +++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_guc_submission.c |   43 ++++++++++++++++++++++++++++
> >  2 files changed, 72 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> >index c52a745..b0aa4af 100644
> >--- a/drivers/gpu/drm/i915/i915_debugfs.c
> >+++ b/drivers/gpu/drm/i915/i915_debugfs.c
> >@@ -2388,6 +2388,34 @@ static int i915_guc_load_status_info(struct seq_file *m, void *data)
> >  	return 0;
> >  }
> >
> >+static int i915_guc_log_dump(struct seq_file *m, void *data)
> >+{
> >+	struct drm_info_node *node = m->private;
> >+	struct drm_device *dev = node->minor->dev;
> >+	struct drm_i915_private *dev_priv = dev->dev_private;
> >+	struct drm_i915_gem_object *log_obj = dev_priv->guc.log_obj;
> >+	u32 *log;
> >+	int i = 0, pg;
> >+
> >+	if (!log_obj)
> >+		return 0;
> >+
> >+	for (pg = 0; pg < log_obj->base.size / PAGE_SIZE; pg++) {
> >+		log = kmap_atomic(i915_gem_object_get_page(log_obj, pg));
> >+
> >+		for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
> >+			seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> >+				   *(log + i), *(log + i + 1),
> >+				   *(log + i + 2), *(log + i + 3));
> >+
> >+		kunmap_atomic(log);
> >+	}
> 
> This doesn't look performance critical, but you could also use
> sg_miter_ family of functions/macros to iterate and kmap sg list
> pages. I did not bother figuring out what kind of smarts
> i915_gem_object_get_page does, but it is not likely it can beat
> sg_miter_ for efficiency.

It does. I have patches to replace more uses of sg_page_iter because it
is the slow point in many functions.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 09/15] drm/i915: GuC submission setup, phase 1
  2015-06-15 18:36 ` [PATCH 09/15] drm/i915: GuC submission setup, phase 1 Dave Gordon
  2015-06-15 21:32   ` Chris Wilson
@ 2015-06-16 11:44   ` Chris Wilson
  1 sibling, 0 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-16 11:44 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:27PM +0100, Dave Gordon wrote:
> +	if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
> +			PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE)) {
> +		drm_gem_object_unreference(&obj->base);
> +		return NULL;
> +	}

Another question is should this take up mappable aperture space at all?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 10/15] drm/i915: Enable GuC firmware log
  2015-06-16 11:40     ` Chris Wilson
@ 2015-06-16 12:29       ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2015-06-16 12:29 UTC (permalink / raw)
  To: Chris Wilson, Dave Gordon, intel-gfx


On 06/16/2015 12:40 PM, Chris Wilson wrote:
> On Tue, Jun 16, 2015 at 10:26:40AM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/15/2015 07:36 PM, Dave Gordon wrote:
>>> From: Alex Dai <yu.dai@intel.com>
>>>
>>> Allocate a GEM object to hold GuC log data. A debugfs interface
>>> (i915_guc_log_dump) is provided to print out the log content.
>>>
>>> Issue: VIZ-4884
>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_debugfs.c        |   29 +++++++++++++++++++
>>>   drivers/gpu/drm/i915/i915_guc_submission.c |   43 ++++++++++++++++++++++++++++
>>>   2 files changed, 72 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index c52a745..b0aa4af 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -2388,6 +2388,34 @@ static int i915_guc_load_status_info(struct seq_file *m, void *data)
>>>   	return 0;
>>>   }
>>>
>>> +static int i915_guc_log_dump(struct seq_file *m, void *data)
>>> +{
>>> +	struct drm_info_node *node = m->private;
>>> +	struct drm_device *dev = node->minor->dev;
>>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>>> +	struct drm_i915_gem_object *log_obj = dev_priv->guc.log_obj;
>>> +	u32 *log;
>>> +	int i = 0, pg;
>>> +
>>> +	if (!log_obj)
>>> +		return 0;
>>> +
>>> +	for (pg = 0; pg < log_obj->base.size / PAGE_SIZE; pg++) {
>>> +		log = kmap_atomic(i915_gem_object_get_page(log_obj, pg));
>>> +
>>> +		for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
>>> +			seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
>>> +				   *(log + i), *(log + i + 1),
>>> +				   *(log + i + 2), *(log + i + 3));
>>> +
>>> +		kunmap_atomic(log);
>>> +	}
>>
>> This doesn't look performance critical, but you could also use
>> sg_miter_ family of functions/macros to iterate and kmap sg list
>> pages. I did not bother figuring out what kind of smarts
>> i915_gem_object_get_page does, but it is not likely it can beat
>> sg_miter_ for efficiency.
>
> It does. I have patches to replace more uses of sg_page_iter because it
> is the slow point in many functions.

Oh wow, so a "3rd party" random access iterator, used in sequential mode 
over a naturally sequential data structure, beats the native sequential 
iterator for performance? Amazing. :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-15 20:09   ` Chris Wilson
@ 2015-06-17  7:23     ` Dave Gordon
  2015-06-17 12:02       ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-17  7:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 15/06/15 21:09, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> i915_gem_object_write() is a generic function to copy data from a plain
>> linear buffer to a paged gem object.
>>
>> We will need this for the microcontroller firmware loading support code.
>>
>> Issue: VIZ-4884
>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_drv.h |    2 ++
>>  drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 611fbd8..9094c06 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
>>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
>>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
>>  			 const struct drm_i915_gem_object_ops *ops);
>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
>> +			  const void *data, size_t size);
>>  struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>>  						  size_t size);
>>  void i915_init_vm(struct drm_i915_private *dev_priv,
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index be35f04..75d63c2 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
>>  	return false;
>>  }
>>  
>> +/* Fill the @obj with the @size amount of @data */
>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
>> +			const void *data, size_t size)
>> +{
>> +	struct sg_table *sg;
>> +	size_t bytes;
>> +	int ret;
>> +
>> +	ret = i915_gem_object_get_pages(obj);
>> +	if (ret)
>> +		return ret;
>> +
>> +	i915_gem_object_pin_pages(obj);
> 
> You don't set the object into the CPU domain, or instead manually handle
> the domain flushing. You don't handle objects that cannot be written
> directly by the CPU, nor do you handle objects whose representation in
> memory is not linear.
> -Chris

No we don't handle just any random gem object, but we do return an error
code for any types not supported. However, as we don't really need the
full generality of writing into a gem object of any type, I will replace
this function with one that combines the allocation of a new object
(which will therefore definitely be of the correct type, in the correct
domain, etc) and filling it with the data to be preserved.

Bikeshedding over the name of such function welcome :)

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 08/15] drm/i915: Move execlists defines from .c to .h
  2015-06-16  9:37   ` Chris Wilson
@ 2015-06-17  7:31     ` Dave Gordon
  2015-06-17  7:54       ` Chris Wilson
  2015-06-17  7:59       ` Chris Wilson
  0 siblings, 2 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-17  7:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Michael H. Nguyen

On 16/06/15 10:37, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:26PM +0100, Dave Gordon wrote:
>> From: "Michael H. Nguyen" <michael.h.nguyen@intel.com>
>>
>> Move defines from intel_lrc.c to i915_reg.h so they are accessible
>> to the GuC submission code; and expose a previously static function
>> in the execlist code which will also be required for GuC submission.
> 
> What would have been better would have to been to split the lrc code
> from the execlists code so that the sharing is more obvious and the
> overloading separate from the common code.
> -Chris

What would have been better is not to have put these fairly generic
details about the hardware into a C file in the first place. And not to
have split execlist and ringbuffer modes into two entirely different
paths. And various other historical decisions. But we can only fix the
code as it stands, not as it ought to have been.

Anyway, this is just a bulk cut-n-paste, so I'm not inclined to do any
restructuring on it during this process. But someone working on
execlists could certainly tidy it up later, perhaps as part of a general
drive towards deduplicating the code paths and partitioning (context vs
ringbuffer vs engine) functionality in a more coherent way.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 08/15] drm/i915: Move execlists defines from .c to .h
  2015-06-17  7:31     ` Dave Gordon
@ 2015-06-17  7:54       ` Chris Wilson
  2015-06-17  7:59       ` Chris Wilson
  1 sibling, 0 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-17  7:54 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx, Michael H. Nguyen

On Wed, Jun 17, 2015 at 08:31:59AM +0100, Dave Gordon wrote:
> On 16/06/15 10:37, Chris Wilson wrote:
> > On Mon, Jun 15, 2015 at 07:36:26PM +0100, Dave Gordon wrote:
> >> From: "Michael H. Nguyen" <michael.h.nguyen@intel.com>
> >>
> >> Move defines from intel_lrc.c to i915_reg.h so they are accessible
> >> to the GuC submission code; and expose a previously static function
> >> in the execlist code which will also be required for GuC submission.
> > 
> > What would have been better would have to been to split the lrc code
> > from the execlists code so that the sharing is more obvious and the
> > overloading separate from the common code.
> > -Chris
> 
> What would have been better is not to have put these fairly generic
> details about the hardware into a C file in the first place. And not to
> have split execlist and ringbuffer modes into two entirely different
> paths. And various other historical decisions. But we can only fix the
> code as it stands, not as it ought to have been.

You know I sent patches...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 08/15] drm/i915: Move execlists defines from .c to .h
  2015-06-17  7:31     ` Dave Gordon
  2015-06-17  7:54       ` Chris Wilson
@ 2015-06-17  7:59       ` Chris Wilson
  2015-06-22 13:05         ` Dave Gordon
  1 sibling, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-17  7:59 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx, Michael H. Nguyen

On Wed, Jun 17, 2015 at 08:31:59AM +0100, Dave Gordon wrote:
> On 16/06/15 10:37, Chris Wilson wrote:
> > On Mon, Jun 15, 2015 at 07:36:26PM +0100, Dave Gordon wrote:
> >> From: "Michael H. Nguyen" <michael.h.nguyen@intel.com>
> >>
> >> Move defines from intel_lrc.c to i915_reg.h so they are accessible
> >> to the GuC submission code; and expose a previously static function
> >> in the execlist code which will also be required for GuC submission.
> > 
> > What would have been better would have to been to split the lrc code
> > from the execlists code so that the sharing is more obvious and the
> > overloading separate from the common code.
> > -Chris
> 
> What would have been better is not to have put these fairly generic
> details about the hardware into a C file in the first place. And not to
> have split execlist and ringbuffer modes into two entirely different
> paths. And various other historical decisions. But we can only fix the
> code as it stands, not as it ought to have been.
> 
> Anyway, this is just a bulk cut-n-paste, so I'm not inclined to do any
> restructuring on it during this process. But someone working on
> execlists could certainly tidy it up later, perhaps as part of a general
> drive towards deduplicating the code paths and partitioning (context vs
> ringbuffer vs engine) functionality in a more coherent way.

More to the point, you are increasing the technical debt of the code
rather than reducing it. Code will just become less and less
maintainable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/15] drm/i915: Interrupt routing for GuC submission
  2015-06-16  9:24   ` Chris Wilson
@ 2015-06-17  8:20     ` Dave Gordon
  2015-06-17 12:22       ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-17  8:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 16/06/15 10:24, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:30PM +0100, Dave Gordon wrote:
>> +static void direct_interrupts_to_guc(struct drm_i915_private *dev_priv)
>> +{
>> +	struct intel_engine_cs *ring;
>> +	int i, irqs;
>> +
>> +	/* tell all command streamers to forward interrupts and vblank to GuC */
>> +	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_ALWAYS);
>> +	irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
>> +	for_each_ring(ring, dev_priv, i)
>> +		I915_WRITE(RING_MODE_GEN7(ring), irqs);
>> +
>> +	/* tell DE to send (all) flip_done to GuC */
>> +	irqs = DERRMR_PIPEA_PRI_FLIP_DONE | DERRMR_PIPEA_SPR_FLIP_DONE |
>> +	       DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEB_SPR_FLIP_DONE |
>> +	       DERRMR_PIPEC_PRI_FLIP_DONE | DERRMR_PIPEC_SPR_FLIP_DONE;
>> +	/* Unmasked bits will cause GuC response message to be sent */
>> +	I915_WRITE(DE_GUCRMR, ~irqs);
> 
> That's scary since userspace depends on a few more DERRMR events
> (wait-for-scanline). Where will they end up?
> -Chris

This doesn't change any bits in DE_RRMR, or set the VBLANK or SCANLINE
bits in the DE_GUCRMR, so those events should be unaffected. The GuC
isn't interested in those, only in flip done.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-17  7:23     ` Dave Gordon
@ 2015-06-17 12:02       ` Daniel Vetter
  2015-06-18 11:49         ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-17 12:02 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote:
> On 15/06/15 21:09, Chris Wilson wrote:
> > On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote:
> >> From: Alex Dai <yu.dai@intel.com>
> >>
> >> i915_gem_object_write() is a generic function to copy data from a plain
> >> linear buffer to a paged gem object.
> >>
> >> We will need this for the microcontroller firmware loading support code.
> >>
> >> Issue: VIZ-4884
> >> Signed-off-by: Alex Dai <yu.dai@intel.com>
> >> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/i915_drv.h |    2 ++
> >>  drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
> >>  2 files changed, 30 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >> index 611fbd8..9094c06 100644
> >> --- a/drivers/gpu/drm/i915/i915_drv.h
> >> +++ b/drivers/gpu/drm/i915/i915_drv.h
> >> @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
> >>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
> >>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
> >>  			 const struct drm_i915_gem_object_ops *ops);
> >> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
> >> +			  const void *data, size_t size);
> >>  struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
> >>  						  size_t size);
> >>  void i915_init_vm(struct drm_i915_private *dev_priv,
> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >> index be35f04..75d63c2 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
> >>  	return false;
> >>  }
> >>  
> >> +/* Fill the @obj with the @size amount of @data */
> >> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
> >> +			const void *data, size_t size)
> >> +{
> >> +	struct sg_table *sg;
> >> +	size_t bytes;
> >> +	int ret;
> >> +
> >> +	ret = i915_gem_object_get_pages(obj);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	i915_gem_object_pin_pages(obj);
> > 
> > You don't set the object into the CPU domain, or instead manually handle
> > the domain flushing. You don't handle objects that cannot be written
> > directly by the CPU, nor do you handle objects whose representation in
> > memory is not linear.
> > -Chris
> 
> No we don't handle just any random gem object, but we do return an error
> code for any types not supported. However, as we don't really need the
> full generality of writing into a gem object of any type, I will replace
> this function with one that combines the allocation of a new object
> (which will therefore definitely be of the correct type, in the correct
> domain, etc) and filling it with the data to be preserved.

Domain handling is required for all gem objects, and the resulting bugs if
you don't for one-off objects are absolutely no fun to track down.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-15 18:36 ` [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support Dave Gordon
@ 2015-06-17 12:05   ` Daniel Vetter
  2015-06-18 12:11     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-17 12:05 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
> Current devices may contain one or more programmable microcontrollers
> that need to have a firmware image (aka "binary blob") loaded from an
> external medium and transferred to the device's memory.
> 
> This file provides generic support functions for doing this; they can
> then be used by each uC-specific loader, thus reducing code duplication
> and testing effort.
> 
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Signed-off-by: Alex Dai <yu.dai@intel.com>

Given that I'm just shredding the synchronization used by the dmc loader
I'm not convinced this is a good idea. Abstraction has cost, and a bit of
copy-paste for similar sounding but slightly different things doesn't
sound awful to me. And the critical bit in all the firmware loading I've
seen thus far is in synchronizing the loading with other operations,
hiding that isn't a good idea. Worse if we enforce stuff like requiring
dev->struct_mutex.
-Daniel


> ---
>  drivers/gpu/drm/i915/Makefile          |    3 +
>  drivers/gpu/drm/i915/intel_uc_loader.c |  312 ++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_uc_loader.h |   82 +++++++++
>  3 files changed, 397 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c
>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index b7ddf48..607fa2a 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -38,6 +38,9 @@ i915-y += i915_cmd_parser.o \
>  	  intel_ringbuffer.o \
>  	  intel_uncore.o
>  
> +# generic ancilliary microcontroller support
> +i915-y += intel_uc_loader.o
> +
>  # autogenerated null render state
>  i915-y += intel_renderstate_gen6.o \
>  	  intel_renderstate_gen7.o \
> diff --git a/drivers/gpu/drm/i915/intel_uc_loader.c b/drivers/gpu/drm/i915/intel_uc_loader.c
> new file mode 100644
> index 0000000..26f0fbe
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_uc_loader.c
> @@ -0,0 +1,312 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Author:
> + *	Dave Gordon <david.s.gordon@intel.com>
> + */
> +#include <linux/firmware.h>
> +#include "i915_drv.h"
> +#include "intel_uc_loader.h"
> +
> +/**
> + * DOC: Generic embedded microcontroller (uC) firmware loading support
> + *
> + * The functions in this file provide a generic way to load the firmware that
> + * may be required by an embedded microcontroller (uC).
> + *
> + * The function intel_uc_fw_init() should be called early, and will initiate
> + * an asynchronous request to fetch the firmware image (aka "binary blob").
> + * When the image has been fetched into memory, the kernel will call back to
> + * uc_fw_fetch_callback() whose function is simply to record the completion
> + * status, and stash the firmware blob for later.
> + *
> + * At some convenient point after GEM initialisation, the driver should call
> + * intel_uc_fw_check(); this will check whether the asynchronous thread has
> + * completed and wait for it if not, check whether the image was successfully
> + * fetched; and then allow the callback() function (if provided) to validate
> + * the image and/or save the data in a GEM object.
> + *
> + * Thereafter the uC-specific code can transfer the data in the GEM object
> + * to the uC's memory (in some uC-specific way, not handled here).
> + *
> + * During driver shutdown, or if driver load is aborted, intel_uc_fw_fini()
> + * should be called to release any remaining resources.
> + */
> +
> +
> +/*
> + * Called once per uC, late in driver initialisation. GEM is now ready, and so
> + * we can now create a GEM object to hold the uC firmware. But first, we must
> + * synchronise with the firmware-fetching thread that was initiated during
> + * early driver load, in intel_uc_fw_init(), and see whether it successfully
> + * fetched the firmware blob.
> + */
> +static void
> +uc_fw_fetch_wait(struct intel_uc_fw *uc_fw,
> +		 bool callback(struct intel_uc_fw *))
> +{
> +	struct drm_device *dev = uc_fw->uc_dev;
> +	struct drm_i915_gem_object *obj;
> +	const struct firmware *fw;
> +
> +	DRM_DEBUG_DRIVER("before waiting: %s fw fetch status %d, fw %p\n",
> +		uc_fw->uc_name, uc_fw->uc_fw_fetch_status, uc_fw->uc_fw_blob);
> +
> +	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> +	WARN_ON(uc_fw->uc_fw_fetch_status != INTEL_UC_FIRMWARE_PENDING);
> +
> +	wait_for_completion(&uc_fw->uc_fw_fetched);
> +
> +	DRM_DEBUG_DRIVER("after waiting: %s fw fetch status %d, fw %p\n",
> +		uc_fw->uc_name, uc_fw->uc_fw_fetch_status, uc_fw->uc_fw_blob);
> +
> +	fw = uc_fw->uc_fw_blob;
> +	if (!fw) {
> +		/* no firmware found; try again in case FS was not mounted */
> +		DRM_DEBUG_DRIVER("retry fetching %s fw from <%s>\n",
> +			uc_fw->uc_name, uc_fw->uc_fw_path);
> +		if (request_firmware(&fw, uc_fw->uc_fw_path, &dev->pdev->dev))
> +			goto fail;
> +		if (!fw)
> +			goto fail;
> +		DRM_DEBUG_DRIVER("fetch %s fw from <%s> succeeded, fw %p\n",
> +			uc_fw->uc_name, uc_fw->uc_fw_path, fw);
> +		uc_fw->uc_fw_blob = fw;
> +	}
> +
> +	/* Callback to the optional uC-specific function, if supplied */
> +	if (callback && !callback(uc_fw))
> +		goto fail;
> +
> +	/* Callback may have done the object allocation & write itself */
> +	obj = uc_fw->uc_fw_obj;
> +	if (!obj) {
> +		size_t pages = round_up(fw->size, PAGE_SIZE);
> +		obj = i915_gem_alloc_object(dev, pages);
> +		if (!obj)
> +			goto fail;
> +
> +		uc_fw->uc_fw_obj = obj;
> +		uc_fw->uc_fw_size = fw->size;
> +		if (i915_gem_object_write(obj, fw->data, fw->size))
> +			goto fail;
> +	}
> +
> +	DRM_DEBUG_DRIVER("%s fw fetch status SUCCESS\n", uc_fw->uc_name);
> +	release_firmware(fw);
> +	uc_fw->uc_fw_blob = NULL;
> +	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_SUCCESS;
> +	return;
> +
> +fail:
> +	DRM_DEBUG_DRIVER("%s fw fetch status FAIL; fw %p, obj %p\n",
> +		uc_fw->uc_name, fw, uc_fw->uc_fw_obj);
> +	DRM_ERROR("Failed to fetch %s firmware from <%s>\n",
> +		  uc_fw->uc_name, uc_fw->uc_fw_path);
> +
> +	obj = uc_fw->uc_fw_obj;
> +	if (obj)
> +		drm_gem_object_unreference(&obj->base);
> +	uc_fw->uc_fw_obj = NULL;
> +
> +	release_firmware(fw);		/* OK even if fw is NULL */
> +	uc_fw->uc_fw_blob = NULL;
> +	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
> +}
> +
> +/**
> + * intel_uc_fw_check() - check the status of the firmware fetching process
> + * @uc_fw:	intel_uc_fw structure
> + * @callback:	optional callback function to validate and/or save the image
> + *
> + * If the fetch is still PENDING, wait for completion first, then check and
> + * return the outcome. Subsequent calls will just return the same outcome
> + * based on the recorded fetch status, without triggering another fetch
> + * and without calling @callback().
> + *
> + * After this call, @uc_fw->uc_fw_fetch_status will show whether the firmware
> + * image was successfully fetched and transferred to a GEM object. If it is
> + * INTEL_UC_FIRMWARE_SUCCESS, @uc_fw->uc_fw_obj will be point to the GEM
> + * object, and the size of the image will be in @uc_fw->uc_fw_size.  For any
> + * other status value, these members are undefined.
> + *
> + * The @callback() parameter allows the uC-specific code to validate the
> + * image before it is saved, and also to override the default save mechanism
> + * if required. When it is called, @uc_fw->uc_fw_blob refers to the fetched
> + * firmware image, and @uc_fw->uc_fw_obj is NULL.
> + *
> + * If @callback() returns FALSE, the fetched image is considered invalid.
> + * The fetch status will be set to FAIL, and this function will return -EIO.
> + *
> + * If @callback() returns TRUE but doesn't set @uc_fw->uc_fw_obj, the image
> + * is considered good; it will be saved in a GEM object as described above.
> + * This is the default if no @callback() is supplied.
> + *
> + * If @callback() returns TRUE after setting @uc_fw->uc_fw_obj, this means
> + * that the image has already been saved by @callback() itself. This allows
> + * @callback() to customise the format of the data in the GEM object, for
> + * example if it needs to save only a portion of the loaded image.
> + *
> + * In all cases the firmware blob is released before this function returns.
> + *
> + * Return:	non-zero code on error
> + */
> +int
> +intel_uc_fw_check(struct intel_uc_fw *uc_fw,
> +		  bool callback(struct intel_uc_fw *))
> +{
> +	WARN_ON(!mutex_is_locked(&uc_fw->uc_dev->struct_mutex));
> +
> +	if (uc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING) {
> +		/* We only come here once */
> +		uc_fw_fetch_wait(uc_fw, callback);
> +		/* state must now be FAIL or SUCCESS */
> +	}
> +
> +	DRM_DEBUG_DRIVER("%s fw fetch status %d\n",
> +		uc_fw->uc_name, uc_fw->uc_fw_fetch_status);
> +
> +	switch (uc_fw->uc_fw_fetch_status) {
> +	case INTEL_UC_FIRMWARE_FAIL:
> +		/* something went wrong :( */
> +		return -EIO;
> +
> +	case INTEL_UC_FIRMWARE_NONE:
> +		/* no firmware, nothing to do (not an error) */
> +		return 0;
> +
> +	case INTEL_UC_FIRMWARE_PENDING:
> +	default:
> +		/* "can't happen" */
> +		WARN_ONCE(1, "%s fw <%s> invalid uc_fw_fetch_status %d!\n",
> +			uc_fw->uc_name, uc_fw->uc_fw_path,
> +			uc_fw->uc_fw_fetch_status);
> +		return -ENXIO;
> +
> +	case INTEL_UC_FIRMWARE_SUCCESS:
> +		return 0;
> +	}
> +}
> +
> +/*
> + * Callback from the kernel's asynchronous firmware-fetching subsystem.
> + * All we have to do here is stash the blob and signal completion.
> + * Error checking (e.g. no firmware found) is left to mainline code.
> + * We don't have (and don't want or need to acquire) the struct_mutex here.
> + */
> +static void
> +uc_fw_fetch_callback(const struct firmware *fw, void *context)
> +{
> +	struct intel_uc_fw *uc_fw = context;
> +
> +	WARN_ON(uc_fw->uc_fw_fetch_status != INTEL_UC_FIRMWARE_PENDING);
> +	DRM_DEBUG_DRIVER("%s firmware fetch from <%s> status %d, fw %p\n",
> +			uc_fw->uc_name, uc_fw->uc_fw_path,
> +			uc_fw->uc_fw_fetch_status, fw);
> +
> +	uc_fw->uc_fw_blob = fw;
> +	complete(&uc_fw->uc_fw_fetched);
> +}
> +
> +/**
> + * intel_uc_fw_init() - initiate the fetching of firmware
> + * @dev:	drm device
> + * @uc_fw:	intel_uc_fw structure
> + * @name:	human-readable device name (e.g. "GuC") for messages
> + * @fw_path:	(trailing parts of) path to firmware (e.g. "i915/guc_fw.bin")
> + * 		@fw_path == NULL means "no firmware expected" (not an error),
> + * 		@fw_path == "" (empty string) means "firmware unknown" i.e.
> + * 		the uC requires firmware, but the driver doesn't know where
> + * 		to find the proper version. This will be logged as an error.
> + *
> + * This is called just once per uC, during driver loading. It is therefore
> + * automatically single-threaded and does not need to acquire any mutexes
> + * or spinlocks. OTOH, GEM is not yet fully initialised, so we can't do
> + * very much here.
> + *
> + * The main task here is to initiate the fetching of the uC firmware into
> + * memory, using the standard kernel firmware fetching support.  The actual
> + * fetching will then proceed asynchronously and in parallel with the rest
> + * of driver initialisation; later in the loading process we will synchronise
> + * with the firmware-fetching thread before transferring the firmware image
> + * firstly into a GEM object and then into the uC's memory.
> + */
> +void
> +intel_uc_fw_init(struct drm_device *dev, struct intel_uc_fw *uc_fw,
> +		 const char *name, const char *fw_path)
> +{
> +	uc_fw->uc_dev = dev;
> +	uc_fw->uc_name = name;
> +	uc_fw->uc_fw_path = fw_path;
> +	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_NONE;
> +	uc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_NONE;
> +	init_completion(&uc_fw->uc_fw_fetched);
> +
> +	if (fw_path == NULL)
> +		return;
> +
> +	if (*fw_path == '\0') {
> +		DRM_ERROR("No %s firmware known for this platform\n", name);
> +		uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
> +		return;
> +	}
> +
> +	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_PENDING;
> +
> +	if (request_firmware_nowait(THIS_MODULE, true, fw_path,
> +				    &dev->pdev->dev,
> +				    GFP_KERNEL, uc_fw,
> +				    uc_fw_fetch_callback)) {
> +		DRM_ERROR("Failed to request %s firmware from <%s>\n",
> +			  name, fw_path);
> +		uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
> +		return;
> +	}
> +
> +	/* firmware fetch initiated, callback will signal completion */
> +	DRM_DEBUG_DRIVER("initiated fetching %s firmware from <%s>\n",
> +		name, fw_path);
> +}
> +
> +/**
> + * intel_uc_fw_fini() - clean up all uC firmware-related data
> + * @uc_fw:	intel_uc_fw structure
> + */
> +void
> +intel_uc_fw_fini(struct intel_uc_fw *uc_fw)
> +{
> +	WARN_ON(!mutex_is_locked(&uc_fw->uc_dev->struct_mutex));
> +
> +	/*
> +	 * Generally, the blob should have been released earlier, but
> +	 * if the driver load were aborted after the fetch had been
> +	 * initiated but not completed it might still be around
> +	 */
> +	if (uc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING)
> +		wait_for_completion(&uc_fw->uc_fw_fetched);
> +	release_firmware(uc_fw->uc_fw_blob);	/* OK even if NULL */
> +	uc_fw->uc_fw_blob = NULL;
> +
> +	if (uc_fw->uc_fw_obj)
> +		drm_gem_object_unreference(&uc_fw->uc_fw_obj->base);
> +	uc_fw->uc_fw_obj = NULL;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_uc_loader.h b/drivers/gpu/drm/i915/intel_uc_loader.h
> new file mode 100644
> index 0000000..22502ea
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_uc_loader.h
> @@ -0,0 +1,82 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Author:
> + *	Dave Gordon <david.s.gordon@intel.com>
> + */
> +#ifndef _INTEL_UC_LOADER_H
> +#define _INTEL_UC_LOADER_H
> +
> +/*
> + * Microcontroller (uC) firmware loading support
> + */
> +
> +/*
> + * These values are used to track the stages of getting the required firmware
> + * into an onboard microcontroller. The common code tracks the phases of
> + * fetching the firmware (aka "binary blob") from an external file into a GEM
> + * object in the 'uc_fw_fetch_status' field below; the uC-specific DMA code
> + * uses the 'uc_fw_load_status' field to track the transfer from GEM object
> + * to uC memory.
> + *
> + * For the first (fetch) stage, the interpretation of the values is:
> + * NONE - no firmware is being fetched e.g. because there is no uC
> + * PENDING - firmware fetch initiated; callback will complete 'uc_fw_fetched'
> + * SUCCESS - uC firmware fetched into a GEM object and ready for use
> + * FAIL - something went wrong; uC firmware is not available
> + *
> + * The second (load) stage is simpler as there is no asynchronous handoff:
> + * NONE - no firmware is being loaded e.g. because there is no uC
> + * PENDING - firmware DMA load in progress
> + * SUCCESS - uC firmware loaded into uC memory and ready for use
> + * FAIL - something went wrong; uC firmware is not available
> + */
> +enum intel_uc_fw_status {
> +	INTEL_UC_FIRMWARE_FAIL = -1,
> +	INTEL_UC_FIRMWARE_NONE = 0,
> +	INTEL_UC_FIRMWARE_PENDING,
> +	INTEL_UC_FIRMWARE_SUCCESS
> +};
> +
> +/*
> + * This structure encapsulates all the data needed during the process of
> + * fetching, caching, and loading the firmware image into the uC.
> + */
> +struct intel_uc_fw {
> +	struct drm_device *		uc_dev;
> +	const char *			uc_name;
> +	const char *			uc_fw_path;
> +	const struct firmware *		uc_fw_blob;
> +	struct completion		uc_fw_fetched;
> +	size_t				uc_fw_size;
> +	struct drm_i915_gem_object *	uc_fw_obj;
> +	enum intel_uc_fw_status		uc_fw_fetch_status;
> +	enum intel_uc_fw_status		uc_fw_load_status;
> +};
> +
> +void intel_uc_fw_init(struct drm_device *dev, struct intel_uc_fw *uc_fw,
> +		const char *uc_name, const char *fw_path);
> +int intel_uc_fw_check(struct intel_uc_fw *uc_fw,
> +		bool callback(struct intel_uc_fw *));
> +void intel_uc_fw_fini(struct intel_uc_fw *uc_fw);
> +
> +#endif
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open
  2015-06-15 18:36 ` [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open Dave Gordon
  2015-06-16  9:35   ` Chris Wilson
@ 2015-06-17 12:18   ` Daniel Vetter
  2015-06-19  9:19     ` Dave Gordon
  1 sibling, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-17 12:18 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:36:25PM +0100, Dave Gordon wrote:
> In order to fully initialise the default contexts, we have to execute
> batchbuffer commands on the GPU engines. But in the case of GuC-based
> batch submission, we can't do that until any required firmware has
> been loaded, which may not be possible during driver load, because the
> filesystem(s) containing the firmware may not be mounted until later.
> 
> Therefore, we now allow the first call to the firmware-loading code to
> return -EAGAIN to indicate that it's not yet ready, and that it should
> be retried when the device is first opened from user code, by which
> time we expect that all required filesystems will have been mounted.
> The late-retry code will then re-attempt to load the firmware if the
> early attempt failed.
> 
> If the late retry fails, the current open-in-progress will fail, but
> the recovery code will disable GuC submission and reset the GPU and
> driver. The next open will therefore be in non-GuC mode, and will be
> allowed to complete even if the GuC cannot be loaded or used.
> 
> Issue: VIZ-4884
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Signed-off-by: Alex Dai <yu.dai@intel.com>

I'm not really sold on this super-flexible fallback scheme implemented
here. Because such fallback schemes means more code to test (which no on
will do likely) or just even bigger fireworks when we actually hit them in
reality when something goes wrong. Imo if anything goes wrong in the setup
we just throw in the towel and fail the driver loading.

There's only one exception: If something fails with GT init we declare the
gpu wedged but proceed with all the modeset setup. This makes sense
because we need all the code to handle a wedge gpu anyway, dead-on-boot
gpus happen occasionally and it's really not nice to greet the user with a
black screen. But more fallbacks are imo just headache.

Hence when the guc fails we imo really shouldn't bother with fallbacks,
but instead just declare the thing wedged and carry on.

That should also allow us to simplify the firmware loading: We can do that
in an async worker and if the blob isn't there in time then we just move
on.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h         |    2 ++
>  drivers/gpu/drm/i915/i915_gem.c         |    9 +++++-
>  drivers/gpu/drm/i915/i915_gem_context.c |   52 ++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_irq.c         |   48 ++++++++++++++++++++++++++++
>  4 files changed, 105 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f47cde7..a1fc278 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1837,6 +1837,7 @@ struct drm_i915_private {
>  	/* hda/i915 audio component */
>  	bool audio_component_registered;
>  
> +	bool contexts_ready;
>  	uint32_t hw_context_size;
>  	struct list_head context_list;
>  
> @@ -2614,6 +2615,7 @@ void i915_queue_hangcheck(struct drm_device *dev);
>  __printf(3, 4)
>  void i915_handle_error(struct drm_device *dev, bool wedged,
>  		       const char *fmt, ...);
> +void i915_handle_guc_error(struct drm_device *dev, int err);
>  
>  extern void intel_irq_init(struct drm_i915_private *dev_priv);
>  extern void intel_hpd_init(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index cd4a865..d1a8862 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5025,8 +5025,15 @@ i915_gem_init_hw(struct drm_device *dev)
>  
>  	/* We can't enable contexts until all firmware is loaded */
>  	ret = intel_guc_ucode_load(dev, false);
> +	if (ret == -EAGAIN) {
> +		ret = 0;
> +		goto out;		/* too early */
> +	}
> +
>  	ret = i915_gem_context_enable(dev_priv);
> -	if (ret && ret != -EIO) {
> +	if (ret == 0) {
> +		dev_priv->contexts_ready = true;
> +	} else if (ret && ret != -EIO) {
>  		DRM_ERROR("Context enable failed %d\n", ret);
>  		i915_gem_cleanup_ringbuffer(dev);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 133afcf..debbfc9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -447,23 +447,65 @@ static int context_idr_cleanup(int id, void *p, void *data)
>  	return 0;
>  }
>  
> +/* Complete any late initialisation here */
> +static int i915_gem_context_first_open(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret;
> +
> +	/*
> +	 * We can't enable contexts until all firmware is loaded. This
> +	 * call shouldn't return -EAGAIN because we pass wait=true, but
> +	 * it can still fail with code -EIO if the GuC doesn't respond,
> +	 * or -ENOEXEC if the GuC firmware image is invalid.
> +	 */
> +	ret = intel_guc_ucode_load(dev, true);
> +	WARN_ON(ret == -EAGAIN);
> +
> +	/*
> +	 * If an error occurred and GuC submission has been requested, we can
> +	 * attempt recovery by disabling GuC submission and reinitialising
> +	 * the GPU and driver. We then fail this open() anyway, but the next
> +	 * attempt will find that GuC submission is already disabled, and so
> +	 * proceed to complete context initialisation in non-GuC mode instead.
> +	 */
> +	if (ret && i915.enable_guc_submission) {
> +		i915_handle_guc_error(dev, ret);
> +		return ret;
> +	}
> +
> +	ret = i915_gem_context_enable(dev_priv);
> +	if (ret == 0)
> +		dev_priv->contexts_ready = true;
> +	return ret;
> +}
> +
>  int i915_gem_context_open(struct drm_device *dev, struct drm_file *file)
>  {
> +	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_file_private *file_priv = file->driver_priv;
>  	struct intel_context *ctx;
> +	int ret = 0;
>  
>  	idr_init(&file_priv->context_idr);
>  
>  	mutex_lock(&dev->struct_mutex);
> -	ctx = i915_gem_create_context(dev, file_priv);
> +
> +	if (!dev_priv->contexts_ready)
> +		ret = i915_gem_context_first_open(dev);
> +
> +	if (ret == 0) {
> +		ctx = i915_gem_create_context(dev, file_priv);
> +		if (IS_ERR(ctx))
> +			ret = PTR_ERR(ctx);
> +	}
> +
>  	mutex_unlock(&dev->struct_mutex);
>  
> -	if (IS_ERR(ctx)) {
> +	if (ret)
>  		idr_destroy(&file_priv->context_idr);
> -		return PTR_ERR(ctx);
> -	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  void i915_gem_context_close(struct drm_device *dev, struct drm_file *file)
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 56db9e74..f7dcf8d 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2665,6 +2665,54 @@ void i915_handle_error(struct drm_device *dev, bool wedged,
>  	i915_reset_and_wakeup(dev);
>  }
>  
> +/**
> + * i915_handle_error - handle a GuC error
> + * @dev: drm device
> + *
> + * If the GuC can't be (re-)initialised, disable GuC submission and
> + * then reset and reinitialise the rest of the GPU, so that we can
> + * fall back to operating in ELSP mode. Don't bother capturing error
> + * state, because it probably isn't relevant here.
> + *
> + * Unlike i915_handle_error() above, this is called with the global
> + * struct_mutex held, so we need to release it after setting the
> + * reset-in-progress bit so that other threads can make progress,
> + * and reacquire it after the reset is complete.
> + */
> +void i915_handle_guc_error(struct drm_device *dev, int err)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	DRM_ERROR("GuC failure %d, disabling GuC submission\n", err);
> +	i915.enable_guc_submission = false;
> +
> +	i915_report_and_clear_eir(dev);	/* unlikely? */
> +
> +	atomic_set_mask(I915_RESET_IN_PROGRESS_FLAG,
> +			&dev_priv->gpu_error.reset_counter);
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	/*
> +	 * Wakeup waiting processes so that the reset function
> +	 * i915_reset_and_wakeup doesn't deadlock trying to grab
> +	 * various locks. By bumping the reset counter first, the woken
> +	 * processes will see a reset in progress and back off,
> +	 * releasing their locks and then wait for the reset completion.
> +	 * We must do this for _all_ gpu waiters that might hold locks
> +	 * that the reset work needs to acquire.
> +	 *
> +	 * Note: The wake_up serves as the required memory barrier to
> +	 * ensure that the waiters see the updated value of the reset
> +	 * counter atomic_t.
> +	 */
> +	i915_error_wake_up(dev_priv, false);
> +
> +	i915_reset_and_wakeup(dev);
> +
> +	mutex_lock(&dev->struct_mutex);
> +}
> +
>  /* Called from drm generic code, passed 'crtc' which
>   * we use as a pipe index
>   */
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/15] drm/i915: Interrupt routing for GuC submission
  2015-06-17  8:20     ` Dave Gordon
@ 2015-06-17 12:22       ` Daniel Vetter
  2015-06-17 12:41         ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-17 12:22 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Wed, Jun 17, 2015 at 09:20:44AM +0100, Dave Gordon wrote:
> On 16/06/15 10:24, Chris Wilson wrote:
> > On Mon, Jun 15, 2015 at 07:36:30PM +0100, Dave Gordon wrote:
> >> +static void direct_interrupts_to_guc(struct drm_i915_private *dev_priv)
> >> +{
> >> +	struct intel_engine_cs *ring;
> >> +	int i, irqs;
> >> +
> >> +	/* tell all command streamers to forward interrupts and vblank to GuC */
> >> +	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_ALWAYS);
> >> +	irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
> >> +	for_each_ring(ring, dev_priv, i)
> >> +		I915_WRITE(RING_MODE_GEN7(ring), irqs);
> >> +
> >> +	/* tell DE to send (all) flip_done to GuC */
> >> +	irqs = DERRMR_PIPEA_PRI_FLIP_DONE | DERRMR_PIPEA_SPR_FLIP_DONE |
> >> +	       DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEB_SPR_FLIP_DONE |
> >> +	       DERRMR_PIPEC_PRI_FLIP_DONE | DERRMR_PIPEC_SPR_FLIP_DONE;
> >> +	/* Unmasked bits will cause GuC response message to be sent */
> >> +	I915_WRITE(DE_GUCRMR, ~irqs);
> > 
> > That's scary since userspace depends on a few more DERRMR events
> > (wait-for-scanline). Where will they end up?
> > -Chris
> 
> This doesn't change any bits in DE_RRMR, or set the VBLANK or SCANLINE
> bits in the DE_GUCRMR, so those events should be unaffected. The GuC
> isn't interested in those, only in flip done.

Why does the guc care about flip_done? With atomic it'll get exactly none
of those, ever ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/15] drm/i915: Interrupt routing for GuC submission
  2015-06-17 12:22       ` Daniel Vetter
@ 2015-06-17 12:41         ` Daniel Vetter
  2015-06-23 11:33           ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-17 12:41 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Wed, Jun 17, 2015 at 02:22:19PM +0200, Daniel Vetter wrote:
> On Wed, Jun 17, 2015 at 09:20:44AM +0100, Dave Gordon wrote:
> > On 16/06/15 10:24, Chris Wilson wrote:
> > > On Mon, Jun 15, 2015 at 07:36:30PM +0100, Dave Gordon wrote:
> > >> +static void direct_interrupts_to_guc(struct drm_i915_private *dev_priv)
> > >> +{
> > >> +	struct intel_engine_cs *ring;
> > >> +	int i, irqs;
> > >> +
> > >> +	/* tell all command streamers to forward interrupts and vblank to GuC */
> > >> +	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_ALWAYS);
> > >> +	irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
> > >> +	for_each_ring(ring, dev_priv, i)
> > >> +		I915_WRITE(RING_MODE_GEN7(ring), irqs);
> > >> +
> > >> +	/* tell DE to send (all) flip_done to GuC */
> > >> +	irqs = DERRMR_PIPEA_PRI_FLIP_DONE | DERRMR_PIPEA_SPR_FLIP_DONE |
> > >> +	       DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEB_SPR_FLIP_DONE |
> > >> +	       DERRMR_PIPEC_PRI_FLIP_DONE | DERRMR_PIPEC_SPR_FLIP_DONE;
> > >> +	/* Unmasked bits will cause GuC response message to be sent */
> > >> +	I915_WRITE(DE_GUCRMR, ~irqs);
> > > 
> > > That's scary since userspace depends on a few more DERRMR events
> > > (wait-for-scanline). Where will they end up?
> > > -Chris
> > 
> > This doesn't change any bits in DE_RRMR, or set the VBLANK or SCANLINE
> > bits in the DE_GUCRMR, so those events should be unaffected. The GuC
> > isn't interested in those, only in flip done.
> 
> Why does the guc care about flip_done? With atomic it'll get exactly none
> of those, ever ...

Well I forgot that mmio writes also generate interrupts. Still strange
that GuC is interested in this. Would be really interesting to know what
GuC is up to here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/15] Batch submission via GuC
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (15 preceding siblings ...)
  2015-06-15 18:36 ` [PATCH 16/15] drm/i915: Enable GuC submission, where supported Dave Gordon
@ 2015-06-17 12:43 ` Daniel Vetter
  2015-06-25  7:23   ` Dave Gordon
  2015-06-24 12:16 ` Daniel Vetter
  17 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-17 12:43 UTC (permalink / raw)
  To: Dave Gordon; +Cc: Vinit Azad, intel-gfx, Ben Widawsky

On Mon, Jun 15, 2015 at 07:36:18PM +0100, Dave Gordon wrote:
> This patch series enables command submission via the GuC. In this mode,
> instead of the host CPU driving the execlist port directly, it hands
> over work items to the GuC, using a doorbell mechanism to tell the GuC
> that new items have been added to its work queue. The GuC then dispatches
> contexts to the various GPU engines, and manages the resulting context-
> switch interrupts. Completion of a batch is however still signalled to
> the CPU; the GuC is not involved in handling user interrupts.
> 
> There are three subsequences within the patch series:
> 
>   drm/i915: Add i915_gem_object_write() to i915_gem.c
>   drm/i915: Embedded microcontroller (uC) firmware loading support
> 
> These first two patches provide a generic framework for fetching the
> firmware that may be required by any embedded microcontroller from a
> file, using an asynchronous thread so that driver initialisation can
> continue while the firmware is being fetched. It is hoped that this
> framework is sufficiently general that it can be used for all curent
> and future microcontrollers.
> 
>   drm/i915: Add GuC-related module parameters
>   drm/i915: Add GuC-related header files
>   drm/i915: GuC-specific firmware loader
>   drm/i915: Debugfs interface to read GuC load status

Does that include all the nifty power management stuff GuC does?

> These four patches complete the GuC loader. At this point in the sequence
> we can load and activate the GuC firmware, but not submit any batches
> through it. (This is nonetheless a potentially useful state, as the GuC
> can do other useful work even when not handling batch submissions).
> 
>   drm/i915: Defer default hardware context initialisation until first
>   drm/i915: Move execlists defines from .c to .h
>   drm/i915: GuC submission setup, phase 1
>   drm/i915: Enable GuC firmware log
>   drm/i915: Implementation of GuC client
>   drm/i915: Interrupt routing for GuC submission
>   drm/i915: Integrate GuC-based command submission
>   drm/i915: Debugfs interface for GuC submission statistics
>   Documentation/drm: kerneldoc for GuC
>   drm/i915: Enable GuC submission, where supported
> 
> In the final section, we implement the GuC submission mechanism, link
> it into the (execlist-based) submission path, and finally enable it
> (on supported platforms). On platforms where there is no GuC, or if
> the GuC firmware cannot be found or is invalid, batch submission will
> revert to using the execlist mechanism directly.

I thought we had some perf data showing that GuC is now faster than
execbuf ... Where's that?

> The GuC firmware itself is not included in this patchset; it is or will
> be available for download from https://01.org/linuxgraphics/downloads/
> This driver works with and requires GuC firmware revision 3.x. It will
> not work with any firmware version 1.x, as the GuC protocol in those
> revisions was incompatible and is no longer supported.
> 
> Prerequisites: GuC submission will expose existing inadequacies in
> some of the existing codepaths unless certain other patches are applied.
> In particular we will require some version of Michel Thierry's patch
>   drm/i915/lrc: Update PDPx registers with lri commands
> (because the GuC support light-restore, which execlist mode doesn't),
> and my own 
>   drm/i915: Allocate OLR more safely (workaround until OLR goes away)
> because otherwise the changed timing means that there is an increased

s/timing/much reduced ring space I presume?

> risk of writing to a ringbuffer that is not currently pinned & mapped,
> causing a kernel OOPS.

Cheers, Daniel

> 
> Alex Dai (10):
>   drm/i915: Add i915_gem_object_write() to i915_gem.c
>   drm/i915: Add GuC-related module parameters
>   drm/i915: Add GuC-related header files
>   drm/i915: GuC-specific firmware loader
>   drm/i915: Debugfs interface to read GuC load status
>   drm/i915: GuC submission setup, phase 1
>   drm/i915: Enable GuC firmware log
>   drm/i915: Implementation of GuC client
>   drm/i915: Integrate GuC-based command submission
>   Documentation/drm: kerneldoc for GuC
> 
> Dave Gordon (5):
>   drm/i915: Embedded microcontroller (uC) firmware loading support
>   drm/i915: Defer default hardware context initialisation until first
>   drm/i915: Interrupt routing for GuC submission
>   drm/i915: Debugfs interface for GuC submission statistics
>   drm/i915: Enable GuC submission, where supported
> 
> Michael H. Nguyen (1):
>   drm/i915: Move execlists defines from .c to .h
> 
> Ben Widawsky
> Vinit Azad
>   created the original versions on which some of these patches are based.
> 
>  Documentation/DocBook/drm.tmpl             |   19 +
>  drivers/gpu/drm/i915/Makefile              |    7 +
>  drivers/gpu/drm/i915/i915_debugfs.c        |  109 +++-
>  drivers/gpu/drm/i915/i915_dma.c            |    4 +
>  drivers/gpu/drm/i915/i915_drv.h            |   17 +
>  drivers/gpu/drm/i915/i915_gem.c            |   39 +-
>  drivers/gpu/drm/i915/i915_gem_context.c    |   52 +-
>  drivers/gpu/drm/i915/i915_guc_submission.c |  873 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_irq.c            |   48 ++
>  drivers/gpu/drm/i915/i915_params.c         |    9 +
>  drivers/gpu/drm/i915/i915_reg.h            |   92 ++-
>  drivers/gpu/drm/i915/intel_guc.h           |  184 ++++++
>  drivers/gpu/drm/i915/intel_guc_api.h       |  227 ++++++++
>  drivers/gpu/drm/i915/intel_guc_loader.c    |  498 ++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c           |  128 ++--
>  drivers/gpu/drm/i915/intel_lrc.h           |    8 +
>  drivers/gpu/drm/i915/intel_uc_loader.c     |  312 ++++++++++
>  drivers/gpu/drm/i915/intel_uc_loader.h     |   82 +++
>  18 files changed, 2607 insertions(+), 101 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_guc_submission.c
>  create mode 100644 drivers/gpu/drm/i915/intel_guc.h
>  create mode 100644 drivers/gpu/drm/i915/intel_guc_api.h
>  create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c
>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c
>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h
> 
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/15] drm/i915: Add GuC-related header files
  2015-06-15 20:20   ` Chris Wilson
@ 2015-06-17 15:01     ` Dave Gordon
  2015-06-23 18:10       ` Dave Gordon
  2015-06-24  7:41     ` Dave Gordon
  1 sibling, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-17 15:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 15/06/15 21:20, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:22PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> intel_guc_api.h contains the subset of the GuC interface that we
>> will need for submission of commands through the GuC. These MUST
>> be kept in sync with the definitions used by the GuC firmware.
> 
> intel_guc_hw.h or intel_guc_abi.h then. Calling it API doesn't make it
> clear whose API you are talking about.

It's not 'hw' -- the hw register definitions are elsewhere, because they
don't depend on the firmware. What it defines is a set of interfaces
between the GuC firmware and the KMD, so I'll rename it to reflect that
("intel_guc_fwif.h", for FirmWareInterFace).

>> intel_guc.h defines structures and parameters relevant to loading
>> the GuC firmware and setting it running. Some of these also need
>> to be kept in sync with the firmware.
> 
> intel_guc.h should be developed organically as features are added in the
> series so that it is possible to track against implementation.

> Certainly not in a patch that adds the entirety of the firmware ABI.

What may not be obvious is that intel_guc_api.h (or intel_guc_fwif.h, as
I'm now going to call it) is autogenerated from the non-Linux-friendly
version actually used in building the GuC firmware. (Or at least, that's
the PoR; in practice Alex has hacked^W hand-tuned this version.) So it
makes no sense to break it into parts.

We /could/ do that with the purely KMD-defined structures in intel_guc.h
such as intel_guc, and /maybe/ i915_guc_client. OTOH when it's a new
file, containing a new structure, it's easier to see that the layout is
sensible when it's all added in one go, rather than repeatedly adding
bits here and there, especially if the logical order of fields in a
structure isn't going to be the same as the order of addition of the
code that uses them.

I'll see how it looks ...

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-17 12:02       ` Daniel Vetter
@ 2015-06-18 11:49         ` Dave Gordon
  2015-06-18 12:10           ` Chris Wilson
  2015-06-18 14:31           ` Daniel Vetter
  0 siblings, 2 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-18 11:49 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 17/06/15 13:02, Daniel Vetter wrote:
> On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote:
>> On 15/06/15 21:09, Chris Wilson wrote:
>>> On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote:
>>>> From: Alex Dai <yu.dai@intel.com>
>>>>
>>>> i915_gem_object_write() is a generic function to copy data from a plain
>>>> linear buffer to a paged gem object.
>>>>
>>>> We will need this for the microcontroller firmware loading support code.
>>>>
>>>> Issue: VIZ-4884
>>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>>> ---
>>>>  drivers/gpu/drm/i915/i915_drv.h |    2 ++
>>>>  drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
>>>>  2 files changed, 30 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>> index 611fbd8..9094c06 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
>>>>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
>>>>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
>>>>  			 const struct drm_i915_gem_object_ops *ops);
>>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
>>>> +			  const void *data, size_t size);
>>>>  struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>>>>  						  size_t size);
>>>>  void i915_init_vm(struct drm_i915_private *dev_priv,
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>> index be35f04..75d63c2 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
>>>>  	return false;
>>>>  }
>>>>  
>>>> +/* Fill the @obj with the @size amount of @data */
>>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
>>>> +			const void *data, size_t size)
>>>> +{
>>>> +	struct sg_table *sg;
>>>> +	size_t bytes;
>>>> +	int ret;
>>>> +
>>>> +	ret = i915_gem_object_get_pages(obj);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	i915_gem_object_pin_pages(obj);
>>>
>>> You don't set the object into the CPU domain, or instead manually handle
>>> the domain flushing. You don't handle objects that cannot be written
>>> directly by the CPU, nor do you handle objects whose representation in
>>> memory is not linear.
>>> -Chris
>>
>> No we don't handle just any random gem object, but we do return an error
>> code for any types not supported. However, as we don't really need the
>> full generality of writing into a gem object of any type, I will replace
>> this function with one that combines the allocation of a new object
>> (which will therefore definitely be of the correct type, in the correct
>> domain, etc) and filling it with the data to be preserved.

The usage pattern for the particular case is going to be:
	Once-only:
		Allocate
		Fill
	Then each time GuC is (re-)initialised:
		Map to GTT
		DMA-read from buffer into GuC private memory
		Unmap
	Only on unload:
		Dispose

So our object is write-once by the CPU (and that's always the first
operation), thereafter read-occasionally by the GuC's DMA engine.

> Domain handling is required for all gem objects, and the resulting bugs if
> you don't for one-off objects are absolutely no fun to track down.
> -Daniel

Is it not the case that the new object returned by
i915_gem_alloc_object() is
(a) of a type that can be mapped into the GTT, and
(b) initially in the CPU domain for both reading and writing?

So AFAICS the allocate-and-fill function I'm describing (to appear in
next patch series respin) doesn't need any further domain handling.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-18 11:49         ` Dave Gordon
@ 2015-06-18 12:10           ` Chris Wilson
  2015-06-18 18:07             ` Dave Gordon
  2015-06-18 14:31           ` Daniel Vetter
  1 sibling, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-18 12:10 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
> On 17/06/15 13:02, Daniel Vetter wrote:
> > Domain handling is required for all gem objects, and the resulting bugs if
> > you don't for one-off objects are absolutely no fun to track down.
> 
> Is it not the case that the new object returned by
> i915_gem_alloc_object() is
> (a) of a type that can be mapped into the GTT, and
> (b) initially in the CPU domain for both reading and writing?
> 
> So AFAICS the allocate-and-fill function I'm describing (to appear in
> next patch series respin) doesn't need any further domain handling.

A i915_gem_object_create_from_data() is a reasonable addition, and I
suspect it will make the code a bit more succinct.

Whilst your statement is true today, calling set_domain is then a no-op,
and helps document how you use the object and so reduces the likelihood
of us introducing bugs in the future.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-17 12:05   ` Daniel Vetter
@ 2015-06-18 12:11     ` Dave Gordon
  2015-06-18 14:49       ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-18 12:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 17/06/15 13:05, Daniel Vetter wrote:
> On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
>> Current devices may contain one or more programmable microcontrollers
>> that need to have a firmware image (aka "binary blob") loaded from an
>> external medium and transferred to the device's memory.
>>
>> This file provides generic support functions for doing this; they can
>> then be used by each uC-specific loader, thus reducing code duplication
>> and testing effort.
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Signed-off-by: Alex Dai <yu.dai@intel.com>
> 
> Given that I'm just shredding the synchronization used by the dmc loader
> I'm not convinced this is a good idea. Abstraction has cost, and a bit of
> copy-paste for similar sounding but slightly different things doesn't
> sound awful to me. And the critical bit in all the firmware loading I've
> seen thus far is in synchronizing the loading with other operations,
> hiding that isn't a good idea. Worse if we enforce stuff like requiring
> dev->struct_mutex.
> -Daniel

It's precisely because it's in some sense "trivial-but-tricky" that we
should write it once, get it right, and use it everywhere. Copypaste
/does/ sound awful; I've seen how the code this was derived from had
already been cloned into three flavours, all different and all wrong.

It's a very simple abstraction: one early call to kick things off as
early as possible, no locking required. One late call with the
struct_mutex held to complete the synchronisation and actually do the
work, thus guaranteeing that the transfer to the target uC is done in a
controlled fashion, at a time of the caller's choice, and by the
driver's mainline thread, NOT by an asynchronous thread racing with
other activity (which was one of the things wrong with the original
version).

We should convert the DMC loader to use this too, so there need be only
one bit of code in the whole driver that needs to understand how to use
completions to get correct handover from a free-running no-locks-held
thread to the properly disciplined environment of driver mainline for
purposes of programming the h/w.

.Dave.

>> ---
>>  drivers/gpu/drm/i915/Makefile          |    3 +
>>  drivers/gpu/drm/i915/intel_uc_loader.c |  312 ++++++++++++++++++++++++++++++++
>>  drivers/gpu/drm/i915/intel_uc_loader.h |   82 +++++++++
>>  3 files changed, 397 insertions(+)
>>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c
>>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>> index b7ddf48..607fa2a 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -38,6 +38,9 @@ i915-y += i915_cmd_parser.o \
>>  	  intel_ringbuffer.o \
>>  	  intel_uncore.o
>>  
>> +# generic ancilliary microcontroller support
>> +i915-y += intel_uc_loader.o
>> +
>>  # autogenerated null render state
>>  i915-y += intel_renderstate_gen6.o \
>>  	  intel_renderstate_gen7.o \
>> diff --git a/drivers/gpu/drm/i915/intel_uc_loader.c b/drivers/gpu/drm/i915/intel_uc_loader.c
>> new file mode 100644
>> index 0000000..26f0fbe
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/intel_uc_loader.c
>> @@ -0,0 +1,312 @@
>> +/*
>> + * Copyright © 2014 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + *
>> + * Author:
>> + *	Dave Gordon <david.s.gordon@intel.com>
>> + */
>> +#include <linux/firmware.h>
>> +#include "i915_drv.h"
>> +#include "intel_uc_loader.h"
>> +
>> +/**
>> + * DOC: Generic embedded microcontroller (uC) firmware loading support
>> + *
>> + * The functions in this file provide a generic way to load the firmware that
>> + * may be required by an embedded microcontroller (uC).
>> + *
>> + * The function intel_uc_fw_init() should be called early, and will initiate
>> + * an asynchronous request to fetch the firmware image (aka "binary blob").
>> + * When the image has been fetched into memory, the kernel will call back to
>> + * uc_fw_fetch_callback() whose function is simply to record the completion
>> + * status, and stash the firmware blob for later.
>> + *
>> + * At some convenient point after GEM initialisation, the driver should call
>> + * intel_uc_fw_check(); this will check whether the asynchronous thread has
>> + * completed and wait for it if not, check whether the image was successfully
>> + * fetched; and then allow the callback() function (if provided) to validate
>> + * the image and/or save the data in a GEM object.
>> + *
>> + * Thereafter the uC-specific code can transfer the data in the GEM object
>> + * to the uC's memory (in some uC-specific way, not handled here).
>> + *
>> + * During driver shutdown, or if driver load is aborted, intel_uc_fw_fini()
>> + * should be called to release any remaining resources.
>> + */
>> +
>> +
>> +/*
>> + * Called once per uC, late in driver initialisation. GEM is now ready, and so
>> + * we can now create a GEM object to hold the uC firmware. But first, we must
>> + * synchronise with the firmware-fetching thread that was initiated during
>> + * early driver load, in intel_uc_fw_init(), and see whether it successfully
>> + * fetched the firmware blob.
>> + */
>> +static void
>> +uc_fw_fetch_wait(struct intel_uc_fw *uc_fw,
>> +		 bool callback(struct intel_uc_fw *))
>> +{
>> +	struct drm_device *dev = uc_fw->uc_dev;
>> +	struct drm_i915_gem_object *obj;
>> +	const struct firmware *fw;
>> +
>> +	DRM_DEBUG_DRIVER("before waiting: %s fw fetch status %d, fw %p\n",
>> +		uc_fw->uc_name, uc_fw->uc_fw_fetch_status, uc_fw->uc_fw_blob);
>> +
>> +	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
>> +	WARN_ON(uc_fw->uc_fw_fetch_status != INTEL_UC_FIRMWARE_PENDING);
>> +
>> +	wait_for_completion(&uc_fw->uc_fw_fetched);
>> +
>> +	DRM_DEBUG_DRIVER("after waiting: %s fw fetch status %d, fw %p\n",
>> +		uc_fw->uc_name, uc_fw->uc_fw_fetch_status, uc_fw->uc_fw_blob);
>> +
>> +	fw = uc_fw->uc_fw_blob;
>> +	if (!fw) {
>> +		/* no firmware found; try again in case FS was not mounted */
>> +		DRM_DEBUG_DRIVER("retry fetching %s fw from <%s>\n",
>> +			uc_fw->uc_name, uc_fw->uc_fw_path);
>> +		if (request_firmware(&fw, uc_fw->uc_fw_path, &dev->pdev->dev))
>> +			goto fail;
>> +		if (!fw)
>> +			goto fail;
>> +		DRM_DEBUG_DRIVER("fetch %s fw from <%s> succeeded, fw %p\n",
>> +			uc_fw->uc_name, uc_fw->uc_fw_path, fw);
>> +		uc_fw->uc_fw_blob = fw;
>> +	}
>> +
>> +	/* Callback to the optional uC-specific function, if supplied */
>> +	if (callback && !callback(uc_fw))
>> +		goto fail;
>> +
>> +	/* Callback may have done the object allocation & write itself */
>> +	obj = uc_fw->uc_fw_obj;
>> +	if (!obj) {
>> +		size_t pages = round_up(fw->size, PAGE_SIZE);
>> +		obj = i915_gem_alloc_object(dev, pages);
>> +		if (!obj)
>> +			goto fail;
>> +
>> +		uc_fw->uc_fw_obj = obj;
>> +		uc_fw->uc_fw_size = fw->size;
>> +		if (i915_gem_object_write(obj, fw->data, fw->size))
>> +			goto fail;
>> +	}
>> +
>> +	DRM_DEBUG_DRIVER("%s fw fetch status SUCCESS\n", uc_fw->uc_name);
>> +	release_firmware(fw);
>> +	uc_fw->uc_fw_blob = NULL;
>> +	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_SUCCESS;
>> +	return;
>> +
>> +fail:
>> +	DRM_DEBUG_DRIVER("%s fw fetch status FAIL; fw %p, obj %p\n",
>> +		uc_fw->uc_name, fw, uc_fw->uc_fw_obj);
>> +	DRM_ERROR("Failed to fetch %s firmware from <%s>\n",
>> +		  uc_fw->uc_name, uc_fw->uc_fw_path);
>> +
>> +	obj = uc_fw->uc_fw_obj;
>> +	if (obj)
>> +		drm_gem_object_unreference(&obj->base);
>> +	uc_fw->uc_fw_obj = NULL;
>> +
>> +	release_firmware(fw);		/* OK even if fw is NULL */
>> +	uc_fw->uc_fw_blob = NULL;
>> +	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
>> +}
>> +
>> +/**
>> + * intel_uc_fw_check() - check the status of the firmware fetching process
>> + * @uc_fw:	intel_uc_fw structure
>> + * @callback:	optional callback function to validate and/or save the image
>> + *
>> + * If the fetch is still PENDING, wait for completion first, then check and
>> + * return the outcome. Subsequent calls will just return the same outcome
>> + * based on the recorded fetch status, without triggering another fetch
>> + * and without calling @callback().
>> + *
>> + * After this call, @uc_fw->uc_fw_fetch_status will show whether the firmware
>> + * image was successfully fetched and transferred to a GEM object. If it is
>> + * INTEL_UC_FIRMWARE_SUCCESS, @uc_fw->uc_fw_obj will be point to the GEM
>> + * object, and the size of the image will be in @uc_fw->uc_fw_size.  For any
>> + * other status value, these members are undefined.
>> + *
>> + * The @callback() parameter allows the uC-specific code to validate the
>> + * image before it is saved, and also to override the default save mechanism
>> + * if required. When it is called, @uc_fw->uc_fw_blob refers to the fetched
>> + * firmware image, and @uc_fw->uc_fw_obj is NULL.
>> + *
>> + * If @callback() returns FALSE, the fetched image is considered invalid.
>> + * The fetch status will be set to FAIL, and this function will return -EIO.
>> + *
>> + * If @callback() returns TRUE but doesn't set @uc_fw->uc_fw_obj, the image
>> + * is considered good; it will be saved in a GEM object as described above.
>> + * This is the default if no @callback() is supplied.
>> + *
>> + * If @callback() returns TRUE after setting @uc_fw->uc_fw_obj, this means
>> + * that the image has already been saved by @callback() itself. This allows
>> + * @callback() to customise the format of the data in the GEM object, for
>> + * example if it needs to save only a portion of the loaded image.
>> + *
>> + * In all cases the firmware blob is released before this function returns.
>> + *
>> + * Return:	non-zero code on error
>> + */
>> +int
>> +intel_uc_fw_check(struct intel_uc_fw *uc_fw,
>> +		  bool callback(struct intel_uc_fw *))
>> +{
>> +	WARN_ON(!mutex_is_locked(&uc_fw->uc_dev->struct_mutex));
>> +
>> +	if (uc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING) {
>> +		/* We only come here once */
>> +		uc_fw_fetch_wait(uc_fw, callback);
>> +		/* state must now be FAIL or SUCCESS */
>> +	}
>> +
>> +	DRM_DEBUG_DRIVER("%s fw fetch status %d\n",
>> +		uc_fw->uc_name, uc_fw->uc_fw_fetch_status);
>> +
>> +	switch (uc_fw->uc_fw_fetch_status) {
>> +	case INTEL_UC_FIRMWARE_FAIL:
>> +		/* something went wrong :( */
>> +		return -EIO;
>> +
>> +	case INTEL_UC_FIRMWARE_NONE:
>> +		/* no firmware, nothing to do (not an error) */
>> +		return 0;
>> +
>> +	case INTEL_UC_FIRMWARE_PENDING:
>> +	default:
>> +		/* "can't happen" */
>> +		WARN_ONCE(1, "%s fw <%s> invalid uc_fw_fetch_status %d!\n",
>> +			uc_fw->uc_name, uc_fw->uc_fw_path,
>> +			uc_fw->uc_fw_fetch_status);
>> +		return -ENXIO;
>> +
>> +	case INTEL_UC_FIRMWARE_SUCCESS:
>> +		return 0;
>> +	}
>> +}
>> +
>> +/*
>> + * Callback from the kernel's asynchronous firmware-fetching subsystem.
>> + * All we have to do here is stash the blob and signal completion.
>> + * Error checking (e.g. no firmware found) is left to mainline code.
>> + * We don't have (and don't want or need to acquire) the struct_mutex here.
>> + */
>> +static void
>> +uc_fw_fetch_callback(const struct firmware *fw, void *context)
>> +{
>> +	struct intel_uc_fw *uc_fw = context;
>> +
>> +	WARN_ON(uc_fw->uc_fw_fetch_status != INTEL_UC_FIRMWARE_PENDING);
>> +	DRM_DEBUG_DRIVER("%s firmware fetch from <%s> status %d, fw %p\n",
>> +			uc_fw->uc_name, uc_fw->uc_fw_path,
>> +			uc_fw->uc_fw_fetch_status, fw);
>> +
>> +	uc_fw->uc_fw_blob = fw;
>> +	complete(&uc_fw->uc_fw_fetched);
>> +}
>> +
>> +/**
>> + * intel_uc_fw_init() - initiate the fetching of firmware
>> + * @dev:	drm device
>> + * @uc_fw:	intel_uc_fw structure
>> + * @name:	human-readable device name (e.g. "GuC") for messages
>> + * @fw_path:	(trailing parts of) path to firmware (e.g. "i915/guc_fw.bin")
>> + * 		@fw_path == NULL means "no firmware expected" (not an error),
>> + * 		@fw_path == "" (empty string) means "firmware unknown" i.e.
>> + * 		the uC requires firmware, but the driver doesn't know where
>> + * 		to find the proper version. This will be logged as an error.
>> + *
>> + * This is called just once per uC, during driver loading. It is therefore
>> + * automatically single-threaded and does not need to acquire any mutexes
>> + * or spinlocks. OTOH, GEM is not yet fully initialised, so we can't do
>> + * very much here.
>> + *
>> + * The main task here is to initiate the fetching of the uC firmware into
>> + * memory, using the standard kernel firmware fetching support.  The actual
>> + * fetching will then proceed asynchronously and in parallel with the rest
>> + * of driver initialisation; later in the loading process we will synchronise
>> + * with the firmware-fetching thread before transferring the firmware image
>> + * firstly into a GEM object and then into the uC's memory.
>> + */
>> +void
>> +intel_uc_fw_init(struct drm_device *dev, struct intel_uc_fw *uc_fw,
>> +		 const char *name, const char *fw_path)
>> +{
>> +	uc_fw->uc_dev = dev;
>> +	uc_fw->uc_name = name;
>> +	uc_fw->uc_fw_path = fw_path;
>> +	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_NONE;
>> +	uc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_NONE;
>> +	init_completion(&uc_fw->uc_fw_fetched);
>> +
>> +	if (fw_path == NULL)
>> +		return;
>> +
>> +	if (*fw_path == '\0') {
>> +		DRM_ERROR("No %s firmware known for this platform\n", name);
>> +		uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
>> +		return;
>> +	}
>> +
>> +	uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_PENDING;
>> +
>> +	if (request_firmware_nowait(THIS_MODULE, true, fw_path,
>> +				    &dev->pdev->dev,
>> +				    GFP_KERNEL, uc_fw,
>> +				    uc_fw_fetch_callback)) {
>> +		DRM_ERROR("Failed to request %s firmware from <%s>\n",
>> +			  name, fw_path);
>> +		uc_fw->uc_fw_fetch_status = INTEL_UC_FIRMWARE_FAIL;
>> +		return;
>> +	}
>> +
>> +	/* firmware fetch initiated, callback will signal completion */
>> +	DRM_DEBUG_DRIVER("initiated fetching %s firmware from <%s>\n",
>> +		name, fw_path);
>> +}
>> +
>> +/**
>> + * intel_uc_fw_fini() - clean up all uC firmware-related data
>> + * @uc_fw:	intel_uc_fw structure
>> + */
>> +void
>> +intel_uc_fw_fini(struct intel_uc_fw *uc_fw)
>> +{
>> +	WARN_ON(!mutex_is_locked(&uc_fw->uc_dev->struct_mutex));
>> +
>> +	/*
>> +	 * Generally, the blob should have been released earlier, but
>> +	 * if the driver load were aborted after the fetch had been
>> +	 * initiated but not completed it might still be around
>> +	 */
>> +	if (uc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_PENDING)
>> +		wait_for_completion(&uc_fw->uc_fw_fetched);
>> +	release_firmware(uc_fw->uc_fw_blob);	/* OK even if NULL */
>> +	uc_fw->uc_fw_blob = NULL;
>> +
>> +	if (uc_fw->uc_fw_obj)
>> +		drm_gem_object_unreference(&uc_fw->uc_fw_obj->base);
>> +	uc_fw->uc_fw_obj = NULL;
>> +}
>> diff --git a/drivers/gpu/drm/i915/intel_uc_loader.h b/drivers/gpu/drm/i915/intel_uc_loader.h
>> new file mode 100644
>> index 0000000..22502ea
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/intel_uc_loader.h
>> @@ -0,0 +1,82 @@
>> +/*
>> + * Copyright © 2014 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + *
>> + * Author:
>> + *	Dave Gordon <david.s.gordon@intel.com>
>> + */
>> +#ifndef _INTEL_UC_LOADER_H
>> +#define _INTEL_UC_LOADER_H
>> +
>> +/*
>> + * Microcontroller (uC) firmware loading support
>> + */
>> +
>> +/*
>> + * These values are used to track the stages of getting the required firmware
>> + * into an onboard microcontroller. The common code tracks the phases of
>> + * fetching the firmware (aka "binary blob") from an external file into a GEM
>> + * object in the 'uc_fw_fetch_status' field below; the uC-specific DMA code
>> + * uses the 'uc_fw_load_status' field to track the transfer from GEM object
>> + * to uC memory.
>> + *
>> + * For the first (fetch) stage, the interpretation of the values is:
>> + * NONE - no firmware is being fetched e.g. because there is no uC
>> + * PENDING - firmware fetch initiated; callback will complete 'uc_fw_fetched'
>> + * SUCCESS - uC firmware fetched into a GEM object and ready for use
>> + * FAIL - something went wrong; uC firmware is not available
>> + *
>> + * The second (load) stage is simpler as there is no asynchronous handoff:
>> + * NONE - no firmware is being loaded e.g. because there is no uC
>> + * PENDING - firmware DMA load in progress
>> + * SUCCESS - uC firmware loaded into uC memory and ready for use
>> + * FAIL - something went wrong; uC firmware is not available
>> + */
>> +enum intel_uc_fw_status {
>> +	INTEL_UC_FIRMWARE_FAIL = -1,
>> +	INTEL_UC_FIRMWARE_NONE = 0,
>> +	INTEL_UC_FIRMWARE_PENDING,
>> +	INTEL_UC_FIRMWARE_SUCCESS
>> +};
>> +
>> +/*
>> + * This structure encapsulates all the data needed during the process of
>> + * fetching, caching, and loading the firmware image into the uC.
>> + */
>> +struct intel_uc_fw {
>> +	struct drm_device *		uc_dev;
>> +	const char *			uc_name;
>> +	const char *			uc_fw_path;
>> +	const struct firmware *		uc_fw_blob;
>> +	struct completion		uc_fw_fetched;
>> +	size_t				uc_fw_size;
>> +	struct drm_i915_gem_object *	uc_fw_obj;
>> +	enum intel_uc_fw_status		uc_fw_fetch_status;
>> +	enum intel_uc_fw_status		uc_fw_load_status;
>> +};
>> +
>> +void intel_uc_fw_init(struct drm_device *dev, struct intel_uc_fw *uc_fw,
>> +		const char *uc_name, const char *fw_path);
>> +int intel_uc_fw_check(struct intel_uc_fw *uc_fw,
>> +		bool callback(struct intel_uc_fw *));
>> +void intel_uc_fw_fini(struct intel_uc_fw *uc_fw);
>> +
>> +#endif
>> -- 
>> 1.7.9.5
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-18 11:49         ` Dave Gordon
  2015-06-18 12:10           ` Chris Wilson
@ 2015-06-18 14:31           ` Daniel Vetter
  2015-06-18 18:28             ` Dave Gordon
  1 sibling, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-18 14:31 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
> On 17/06/15 13:02, Daniel Vetter wrote:
> > On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote:
> >> On 15/06/15 21:09, Chris Wilson wrote:
> >>> On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote:
> >>>> From: Alex Dai <yu.dai@intel.com>
> >>>>
> >>>> i915_gem_object_write() is a generic function to copy data from a plain
> >>>> linear buffer to a paged gem object.
> >>>>
> >>>> We will need this for the microcontroller firmware loading support code.
> >>>>
> >>>> Issue: VIZ-4884
> >>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
> >>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >>>> ---
> >>>>  drivers/gpu/drm/i915/i915_drv.h |    2 ++
> >>>>  drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
> >>>>  2 files changed, 30 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>> index 611fbd8..9094c06 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_drv.h
> >>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>> @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
> >>>>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
> >>>>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
> >>>>  			 const struct drm_i915_gem_object_ops *ops);
> >>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
> >>>> +			  const void *data, size_t size);
> >>>>  struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
> >>>>  						  size_t size);
> >>>>  void i915_init_vm(struct drm_i915_private *dev_priv,
> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>> index be35f04..75d63c2 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>> @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
> >>>>  	return false;
> >>>>  }
> >>>>  
> >>>> +/* Fill the @obj with the @size amount of @data */
> >>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
> >>>> +			const void *data, size_t size)
> >>>> +{
> >>>> +	struct sg_table *sg;
> >>>> +	size_t bytes;
> >>>> +	int ret;
> >>>> +
> >>>> +	ret = i915_gem_object_get_pages(obj);
> >>>> +	if (ret)
> >>>> +		return ret;
> >>>> +
> >>>> +	i915_gem_object_pin_pages(obj);
> >>>
> >>> You don't set the object into the CPU domain, or instead manually handle
> >>> the domain flushing. You don't handle objects that cannot be written
> >>> directly by the CPU, nor do you handle objects whose representation in
> >>> memory is not linear.
> >>> -Chris
> >>
> >> No we don't handle just any random gem object, but we do return an error
> >> code for any types not supported. However, as we don't really need the
> >> full generality of writing into a gem object of any type, I will replace
> >> this function with one that combines the allocation of a new object
> >> (which will therefore definitely be of the correct type, in the correct
> >> domain, etc) and filling it with the data to be preserved.
> 
> The usage pattern for the particular case is going to be:
> 	Once-only:
> 		Allocate
> 		Fill
> 	Then each time GuC is (re-)initialised:
> 		Map to GTT
> 		DMA-read from buffer into GuC private memory
> 		Unmap
> 	Only on unload:
> 		Dispose
> 
> So our object is write-once by the CPU (and that's always the first
> operation), thereafter read-occasionally by the GuC's DMA engine.

Yup. The problem is more that on atom platforms the objects aren't
coherent by default and generally you need to do something. Hence we
either have
- an explicit set_caching call to document that this is a gpu object which
  is always coherent (so also on chv/bxt), even when that's a no-op on big
  core
- or wrap everything in set_domain calls, even when those are no-ops too.

If either of those lack, reviews tend to freak out preemptively and the
reptil brain takes over ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-18 12:11     ` Dave Gordon
@ 2015-06-18 14:49       ` Daniel Vetter
  2015-06-18 15:27         ` Chris Wilson
  2015-06-19  8:43         ` Dave Gordon
  0 siblings, 2 replies; 94+ messages in thread
From: Daniel Vetter @ 2015-06-18 14:49 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
> On 17/06/15 13:05, Daniel Vetter wrote:
> > On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
> >> Current devices may contain one or more programmable microcontrollers
> >> that need to have a firmware image (aka "binary blob") loaded from an
> >> external medium and transferred to the device's memory.
> >>
> >> This file provides generic support functions for doing this; they can
> >> then be used by each uC-specific loader, thus reducing code duplication
> >> and testing effort.
> >>
> >> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >> Signed-off-by: Alex Dai <yu.dai@intel.com>
> > 
> > Given that I'm just shredding the synchronization used by the dmc loader
> > I'm not convinced this is a good idea. Abstraction has cost, and a bit of
> > copy-paste for similar sounding but slightly different things doesn't
> > sound awful to me. And the critical bit in all the firmware loading I've
> > seen thus far is in synchronizing the loading with other operations,
> > hiding that isn't a good idea. Worse if we enforce stuff like requiring
> > dev->struct_mutex.
> > -Daniel
> 
> It's precisely because it's in some sense "trivial-but-tricky" that we
> should write it once, get it right, and use it everywhere. Copypaste
> /does/ sound awful; I've seen how the code this was derived from had
> already been cloned into three flavours, all different and all wrong.
> 
> It's a very simple abstraction: one early call to kick things off as
> early as possible, no locking required. One late call with the
> struct_mutex held to complete the synchronisation and actually do the
> work, thus guaranteeing that the transfer to the target uC is done in a
> controlled fashion, at a time of the caller's choice, and by the
> driver's mainline thread, NOT by an asynchronous thread racing with
> other activity (which was one of the things wrong with the original
> version).

Yeah I've seen the origins of this in the display code, and that code gets
the syncing wrong. The only thing that one has do to is grab a runtime pm
reference for the appropriate power well to prevent dc5 entry, and release
it when the firmware is loaded and initialized.

Which means any kind of firmware loader which requires/uses
dev->struct_mutex get stuff wrong and is not appropriate everywhere.

> We should convert the DMC loader to use this too, so there need be only
> one bit of code in the whole driver that needs to understand how to use
> completions to get correct handover from a free-running no-locks-held
> thread to the properly disciplined environment of driver mainline for
> purposes of programming the h/w.

Nack on using this for dmc, since I want them to convert it to the above
synchronization, since that's how all the other async power initialization
is done.

Guc is different since we really must have it ready for execbuf, and for
that usecase a completion at drm_open time sounds like the right thing.

As a rule of thumb for refactoring and share infastructure we use the
following recipe in drm:
- first driver implements things as straightforward as possible
- 2nd user copypastes
- 3rd one has the duty to figure out whether some refactoring is in order
  or not.

Imo that approach leads a really good balance between avoiding
overengineering and having maintainable code.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-18 14:49       ` Daniel Vetter
@ 2015-06-18 15:27         ` Chris Wilson
  2015-06-18 15:35           ` Daniel Vetter
  2015-06-19  8:43         ` Dave Gordon
  1 sibling, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-18 15:27 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 04:49:49PM +0200, Daniel Vetter wrote:
> Guc is different since we really must have it ready for execbuf, and for
> that usecase a completion at drm_open time sounds like the right thing.

But do we? It would be nice if we had a definite answer that the hw was
ready before we started using it in anger, but I don't see any reason
why we would have to delay userspace for a slow microcode update...

(This presupposes that userspace batches are unaffected by GuC/execlist
setup, which for userspace sanity I hope they are - or at least using
predicate registers and conditional execution.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-18 15:27         ` Chris Wilson
@ 2015-06-18 15:35           ` Daniel Vetter
  2015-06-18 15:49             ` Chris Wilson
  0 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-18 15:35 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Dave Gordon, intel-gfx

On Thu, Jun 18, 2015 at 04:27:52PM +0100, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 04:49:49PM +0200, Daniel Vetter wrote:
> > Guc is different since we really must have it ready for execbuf, and for
> > that usecase a completion at drm_open time sounds like the right thing.
> 
> But do we? It would be nice if we had a definite answer that the hw was
> ready before we started using it in anger, but I don't see any reason
> why we would have to delay userspace for a slow microcode update...
> 
> (This presupposes that userspace batches are unaffected by GuC/execlist
> setup, which for userspace sanity I hope they are - or at least using
> predicate registers and conditional execution.)

Well I figured a wait_completion or flush_work unconditionally in execbuf
is not to your liking, and it's better to keep that in open. But I think
we should be able to get away with this at execbuf time. Might even be
better since this wouldn't block sw-rendered boot-splashs.

But either way should be suitable I think.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-18 15:35           ` Daniel Vetter
@ 2015-06-18 15:49             ` Chris Wilson
  0 siblings, 0 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-18 15:49 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 05:35:29PM +0200, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 04:27:52PM +0100, Chris Wilson wrote:
> > On Thu, Jun 18, 2015 at 04:49:49PM +0200, Daniel Vetter wrote:
> > > Guc is different since we really must have it ready for execbuf, and for
> > > that usecase a completion at drm_open time sounds like the right thing.
> > 
> > But do we? It would be nice if we had a definite answer that the hw was
> > ready before we started using it in anger, but I don't see any reason
> > why we would have to delay userspace for a slow microcode update...
> > 
> > (This presupposes that userspace batches are unaffected by GuC/execlist
> > setup, which for userspace sanity I hope they are - or at least using
> > predicate registers and conditional execution.)
> 
> Well I figured a wait_completion or flush_work unconditionally in execbuf
> is not to your liking, and it's better to keep that in open. But I think
> we should be able to get away with this at execbuf time. Might even be
> better since this wouldn't block sw-rendered boot-splashs.
> 
> But either way should be suitable I think.

I am optimistic that we can make the request interface robust enough to be
able queue up not only the ring initialisation and ppgtt initialisation
requests, but also userspace requests. If it all works out, we only need
to truly worry about microcode completion in hangcheck.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-06-15 20:30   ` Chris Wilson
@ 2015-06-18 17:53     ` Yu Dai
  2015-06-18 20:12       ` Chris Wilson
  2015-06-18 18:54     ` Dave Gordon
  1 sibling, 1 reply; 94+ messages in thread
From: Yu Dai @ 2015-06-18 17:53 UTC (permalink / raw)
  To: Chris Wilson, Gordon, David S; +Cc: intel-gfx



On 06/15/2015 01:30 PM, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote:
> ----snip----
> > + * Return true if get a success code from normal boot or RC6 boot
> > + */
> > +static inline bool i915_guc_get_status(struct drm_i915_private *dev_priv,
> > +					u32 *status)
> > +{
> > +	*status = I915_READ(GUC_STATUS);
> > +	return (((*status) & GS_UKERNEL_MASK) == GS_UKERNEL_READY ||
> > +		((*status) & GS_UKERNEL_MASK) == GS_UKERNEL_LAPIC_DONE);
>
> Weird function. Does two things, only one of those is get_status. Maybe
> you would like to split this up better and use a switch when you mean a
> switch. Or rename it to reflect it's use only as a condition.
Yes. It makes sense to change it to something like 
i915_guc_is_ucode_loaded().
> > +}
> > +
> > +/* Transfers the firmware image to RAM for execution by the microcontroller.
> > + *
> > + * GuC Firmware layout:
> > + * +-------------------------------+  ----
> > + * |          CSS header           |  128B
> > + * +-------------------------------+  ----
> > + * |             uCode             |
> > + * +-------------------------------+  ----
> > + * |         RSA signature         |  256B
> > + * +-------------------------------+  ----
> > + * |         RSA public Key        |  256B
> > + * +-------------------------------+  ----
> > + * |       Public key modulus      |    4B
> > + * +-------------------------------+  ----
> > + *
> > + * Architecturally, the DMA engine is bidirectional, and in can potentially
> > + * even transfer between GTT locations. This functionality is left out of the
> > + * API for now as there is no need for it.
> > + *
> > + * Be note that GuC need the CSS header plus uKernel code to be copied as one
> > + * chunk of data. RSA sig data is loaded via MMIO.
> > + */
> > +static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
> > +{
> > +	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
> > +	struct drm_i915_gem_object *fw_obj = guc_fw->uc_fw_obj;
> > +	unsigned long offset;
> > +	struct sg_table *sg = fw_obj->pages;
> > +	u32 status, ucode_size, rsa[UOS_RSA_SIG_SIZE / sizeof(u32)];
> > +	int i, ret = 0;
> > +
> > +	/* uCode size, also is where RSA signature starts */
> > +	offset = ucode_size = guc_fw->uc_fw_size - UOS_CSS_SIGNING_SIZE;
> > +
> > +	/* Copy RSA signature from the fw image to HW for verification */
> > +	sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, UOS_RSA_SIG_SIZE, offset);
> > +	for (i = 0; i < UOS_RSA_SIG_SIZE / sizeof(u32); i++)
> > +		I915_WRITE(UOS_RSA_SCRATCH_0 + i * sizeof(u32), rsa[i]);
> > +
> > +	/* Set the source address for the new blob */
> > +	offset = i915_gem_obj_ggtt_offset(fw_obj);
>
> Why would it even have a GGTT vma? There's no precondition here to
> assert that it should.
It is pinned into GGTT inside gem_allocate_guc_obj.
> > +	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
> > +	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
> > +
> > +	/* Set the destination. Current uCode expects an 8k stack starting from
> > +	 * offset 0. */
> > +	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
> > +
> > +	/* XXX: The image is automatically transfered to SRAM after the RSA
> > +	 * verification. This is why the address space is chosen as such. */
> > +	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
> > +
> > +	I915_WRITE(DMA_COPY_SIZE, ucode_size);
> > +
> > +	/* Finally start the DMA */
> > +	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
> > +
>
> Just assuming that the writes land and in the order you expect?
A POSTING_READ of DMA_COPY_SIZE before issue the DMA is enough here? Or, 
POSTING_READ all those writes?

-Alex
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-18 12:10           ` Chris Wilson
@ 2015-06-18 18:07             ` Dave Gordon
  2015-06-19  8:44               ` Chris Wilson
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-18 18:07 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On 18/06/15 13:10, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
>> On 17/06/15 13:02, Daniel Vetter wrote:
>>> Domain handling is required for all gem objects, and the resulting bugs if
>>> you don't for one-off objects are absolutely no fun to track down.
>>
>> Is it not the case that the new object returned by
>> i915_gem_alloc_object() is
>> (a) of a type that can be mapped into the GTT, and
>> (b) initially in the CPU domain for both reading and writing?
>>
>> So AFAICS the allocate-and-fill function I'm describing (to appear in
>> next patch series respin) doesn't need any further domain handling.
> 
> A i915_gem_object_create_from_data() is a reasonable addition, and I
> suspect it will make the code a bit more succinct.

I shall adopt this name for it :)

> Whilst your statement is true today, calling set_domain is then a no-op,
> and helps document how you use the object and so reduces the likelihood
> of us introducing bugs in the future.
> -Chris

So here's the new function ... where should the set-to-cpu-domain go?
After the pin_pages and before the sg_copy_from_buffer?

/* Allocate a new GEM object and fill it with the supplied data */
struct drm_i915_gem_object *
i915_gem_object_create_from_data(struct drm_device *dev,
                                 const void *data, size_t size)
{
        struct drm_i915_gem_object *obj;
        struct sg_table *sg;
        size_t bytes;
        int ret;

        obj = i915_gem_alloc_object(dev, round_up(size, PAGE_SIZE));
        if (!obj)
                return NULL;

        ret = i915_gem_object_get_pages(obj);
        if (ret)
                goto fail;

        i915_gem_object_pin_pages(obj);
        sg = obj->pages;
        bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
        i915_gem_object_unpin_pages(obj);

        if (WARN_ON(bytes != size)) {
                DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
                goto fail;
        }

        return obj;

fail:
        drm_gem_object_unreference(&obj->base);
        return NULL;
}

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-18 14:31           ` Daniel Vetter
@ 2015-06-18 18:28             ` Dave Gordon
  2015-06-24  9:32               ` Daniel Vetter
  2015-06-24  9:40               ` Chris Wilson
  0 siblings, 2 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-18 18:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 18/06/15 15:31, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
>> On 17/06/15 13:02, Daniel Vetter wrote:
>>> On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote:
>>>> On 15/06/15 21:09, Chris Wilson wrote:
>>>>> On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote:
>>>>>> From: Alex Dai <yu.dai@intel.com>
>>>>>>
>>>>>> i915_gem_object_write() is a generic function to copy data from a plain
>>>>>> linear buffer to a paged gem object.
>>>>>>
>>>>>> We will need this for the microcontroller firmware loading support code.
>>>>>>
>>>>>> Issue: VIZ-4884
>>>>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>>>>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/i915/i915_drv.h |    2 ++
>>>>>>  drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
>>>>>>  2 files changed, 30 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>>> index 611fbd8..9094c06 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>>> @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
>>>>>>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
>>>>>>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
>>>>>>  			 const struct drm_i915_gem_object_ops *ops);
>>>>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
>>>>>> +			  const void *data, size_t size);
>>>>>>  struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>>>>>>  						  size_t size);
>>>>>>  void i915_init_vm(struct drm_i915_private *dev_priv,
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>>> index be35f04..75d63c2 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>> @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
>>>>>>  	return false;
>>>>>>  }
>>>>>>  
>>>>>> +/* Fill the @obj with the @size amount of @data */
>>>>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
>>>>>> +			const void *data, size_t size)
>>>>>> +{
>>>>>> +	struct sg_table *sg;
>>>>>> +	size_t bytes;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	ret = i915_gem_object_get_pages(obj);
>>>>>> +	if (ret)
>>>>>> +		return ret;
>>>>>> +
>>>>>> +	i915_gem_object_pin_pages(obj);
>>>>>
>>>>> You don't set the object into the CPU domain, or instead manually handle
>>>>> the domain flushing. You don't handle objects that cannot be written
>>>>> directly by the CPU, nor do you handle objects whose representation in
>>>>> memory is not linear.
>>>>> -Chris
>>>>
>>>> No we don't handle just any random gem object, but we do return an error
>>>> code for any types not supported. However, as we don't really need the
>>>> full generality of writing into a gem object of any type, I will replace
>>>> this function with one that combines the allocation of a new object
>>>> (which will therefore definitely be of the correct type, in the correct
>>>> domain, etc) and filling it with the data to be preserved.
>>
>> The usage pattern for the particular case is going to be:
>> 	Once-only:
>> 		Allocate
>> 		Fill
>> 	Then each time GuC is (re-)initialised:
>> 		Map to GTT
>> 		DMA-read from buffer into GuC private memory
>> 		Unmap
>> 	Only on unload:
>> 		Dispose
>>
>> So our object is write-once by the CPU (and that's always the first
>> operation), thereafter read-occasionally by the GuC's DMA engine.
> 
> Yup. The problem is more that on atom platforms the objects aren't
> coherent by default and generally you need to do something. Hence we
> either have
> - an explicit set_caching call to document that this is a gpu object which
>   is always coherent (so also on chv/bxt), even when that's a no-op on big
>   core
> - or wrap everything in set_domain calls, even when those are no-ops too.
> 
> If either of those lack, reviews tend to freak out preemptively and the
> reptil brain takes over ;-)
> 
> Cheers, Daniel

We don't need "coherency" as such. The buffer is filled (once only) by
the CPU (so I should put a set-to-cpu-domain between the allocate and
fill stages?) Once it's filled, the CPU need not read or write it ever
again.

Then before the DMA engine accesses it, we call i915_gem_obj_ggtt_pin,
which I'm assuming will take care of any coherency issues (making sure
the data written by the CPU is now visible to the DMA engine) when it
puts the buffer into the GTT-readable domain. Is that not sufficient?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-06-15 20:30   ` Chris Wilson
  2015-06-18 17:53     ` Yu Dai
@ 2015-06-18 18:54     ` Dave Gordon
  1 sibling, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-18 18:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 15/06/15 21:30, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote:
>> +	/* We can't enable contexts until all firmware is loaded */
>> +	ret = intel_guc_ucode_load(dev, false);
> 
> Pardon. I know context initialisation is broken, but adding to that
> breakage is not pleasant.

Sorry, but that's just the way it works. If you want to use the GuC for
batch submission, then you cannot submit any commands to any engine via
the GuC before its firmware is loaded, nor can you submit anything at
all directly to the ELSPs.

However in /this/ patch the 'false' above should have been 'true' to
give synchronous load semantics; and then ignoring the return is
intentional, because either it's worked and we're going to use the GuC,
or it hasn't and we're not (and it's already printed a message). Then
there's a later patch that tries to decouple engine MMIO setup from
engine setup using batches & contexts, at which point we can make use of
the return code.

>>  	ret = i915_gem_context_enable(dev_priv);
>>  	if (ret && ret != -EIO) {
>>  		DRM_ERROR("Context enable failed %d\n", ret);
> 
>> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
>> index 82367c9..0b44265 100644
>> --- a/drivers/gpu/drm/i915/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>> @@ -166,4 +166,9 @@ struct intel_guc {
>>  #define GUC_WD_VECS_IER		0xC558
>>  #define GUC_PM_P24C_IER		0xC55C
>>  
>> +/* intel_guc_loader.c */
>> +extern void intel_guc_ucode_init(struct drm_device *dev);
>> +extern int intel_guc_ucode_load(struct drm_device *dev, bool wait);
>> +extern void intel_guc_ucode_fini(struct drm_device *dev);
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
>> new file mode 100644
>> index 0000000..16eef4c
>> --- /dev/null
>> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
>> @@ -0,0 +1,416 @@
>> +/*
>> + * Copyright © 2014 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + *
>> + * Authors:
>> + *    Vinit Azad <vinit.azad@intel.com>
>> + *    Ben Widawsky <ben@bwidawsk.net>
>> + *    Dave Gordon <david.s.gordon@intel.com>
>> + *    Alex Dai <yu.dai@intel.com>
>> + */
>> +#include <linux/firmware.h>
>> +#include "i915_drv.h"
>> +#include "intel_guc.h"
>> +
>> +/**
>> + * DOC: GuC
>> + *
>> + * intel_guc:
>> + * Top level structure of guc. It handles firmware loading and manages client
>> + * pool and doorbells. intel_guc owns a i915_guc_client to replace the legacy
>> + * ExecList submission.
>> + *
>> + * Firmware versioning:
>> + * The firmware build process will generate a version header file with major and
>> + * minor version defined. The versions are built into CSS header of firmware.
>> + * i915 kernel driver set the minimal firmware version required per platform.
>> + * The firmware installation package will install (symbolic link) proper version
>> + * of firmware.
>> + *
>> + * GuC address space:
>> + * GuC does not allow any gfx GGTT address that falls into range [0, WOPCM_TOP),
>> + * which is reserved for Boot ROM, SRAM and WOPCM. Currently this top address is
>> + * 512K. In order to exclude 0-512K address space from GGTT, all gfx objects
>> + * used by GuC is pinned with PIN_OFFSET_BIAS along with size of WOPCM.
>> + *
>> + * Firmware log:
>> + * Firmware log is enabled by setting i915.guc_log_level to non-negative level.
>> + * Log data is printed out via reading debugfs i915_guc_log_dump. Reading from
>> + * i915_guc_load_status will print out firmware loading status and scratch
>> + * registers value.
>> + *
>> + */
>> +
>> +#define I915_SKL_GUC_UCODE "i915/skl_guc_ver3.bin"
>> +MODULE_FIRMWARE(I915_SKL_GUC_UCODE);
>> +
>> +static u32 get_gttype(struct drm_device *dev)
>> +{
>> +	/* XXX: GT type based on PCI device ID? field seems unused by fw */
>> +	return 0;
>> +}
>> +
>> +static u32 get_core_family(struct drm_device *dev)
> 
> For new code we really should be in the habit of passing around the
> right pointer, not dev.

Good idea :) Especially as the caller actually passes dev_priv->dev!!

>> +{
>> +	switch (INTEL_INFO(dev)->gen) {
>> +	case 8:
>> +		return GFXCORE_FAMILY_GEN8;
>> +	case 9:
>> +		return GFXCORE_FAMILY_GEN9;
>> +	default:
>> +		DRM_ERROR("GUC: unknown gen for scheduler init\n");
>> +		return GFXCORE_FAMILY_FORCE_ULONG;
>> +	}
>> +}
>> +
>> +static void set_guc_init_params(struct drm_i915_private *dev_priv)
>> +{
>> +	struct intel_guc *guc = &dev_priv->guc;
>> +	u32 params[GUC_CTL_MAX_DWORDS];
>> +	int i;
>> +
>> +	memset(&params, 0, sizeof(params));
>> +
>> +	params[GUC_CTL_DEVICE_INFO] |=
>> +		(get_gttype(dev_priv->dev) << GUC_CTL_GTTYPE_SHIFT) |
>> +		(get_core_family(dev_priv->dev) << GUC_CTL_COREFAMILY_SHIFT);
>> +
>> +	/* GuC ARAT increment is 10 ns. GuC default scheduler quantum is one
>> +	 * second. This ARAR is calculated by:
>> +	 * Scheduler-Quantum-in-ns / ARAT-increment-in-ns = 1000000000 / 10
>> +	 */
>> +	params[GUC_CTL_ARAT_HIGH] = 0;
>> +	params[GUC_CTL_ARAT_LOW] = 100000000;
>> +
>> +	params[GUC_CTL_WA] |= GUC_CTL_WA_UK_BY_DRIVER;
>> +
>> +	params[GUC_CTL_FEATURE] |= GUC_CTL_DISABLE_SCHEDULER |
>> +			GUC_CTL_VCS2_ENABLED;
>> +
>> +	if (i915.guc_log_level >= 0) {
>> +		params[GUC_CTL_LOG_PARAMS] = guc->log_flags;
>> +		params[GUC_CTL_DEBUG] =
>> +			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
>> +	}
>> +
>> +	I915_WRITE(SOFT_SCRATCH(0), 0);
>> +
>> +	for (i = 0; i < GUC_CTL_MAX_DWORDS; i++)
>> +		I915_WRITE(SOFT_SCRATCH(1 + i), params[i]);
>> +}
>> +
>> +/* Read GuC status register (GUC_STATUS)
>> + * Return true if get a success code from normal boot or RC6 boot
>> + */
>> +static inline bool i915_guc_get_status(struct drm_i915_private *dev_priv,
>> +					u32 *status)
>> +{
>> +	*status = I915_READ(GUC_STATUS);
>> +	return (((*status) & GS_UKERNEL_MASK) == GS_UKERNEL_READY ||
>> +		((*status) & GS_UKERNEL_MASK) == GS_UKERNEL_LAPIC_DONE);
> 
> Weird function. Does two things, only one of those is get_status. Maybe
> you would like to split this up better and use a switch when you mean a
> switch. Or rename it to reflect it's use only as a condition.

The weirdness is down to the fact that it's passed as an argument to the
MACRO "wait_for_atomic()". The "caller" of wait_for_atomic() also wants
to see the status value that caused the MACRO to exit so it has to save
that indirectly via the pointer. We can't break the "status = READ()"
and "classify the result" stages into two separate functions because we
have to pass a single expression to the MACRO; both have to be inside
the generated loop.

So it may be weird, but at least it's simple; and the comment above does
tell you that it does two things. We could call it
i915_read_guc_status_and_test_whether_ready() if you like, but I think
that'll make the line where it's used more than 80 characters ;-(
Other (shorter) suggestions happily accepted.

Macros that repeatedly evaluate the text of their arguments are ugly :(

>> +}
>> +
>> +/* Transfers the firmware image to RAM for execution by the microcontroller.
>> + *
>> + * GuC Firmware layout:
>> + * +-------------------------------+  ----
>> + * |          CSS header           |  128B
>> + * +-------------------------------+  ----
>> + * |             uCode             |
>> + * +-------------------------------+  ----
>> + * |         RSA signature         |  256B
>> + * +-------------------------------+  ----
>> + * |         RSA public Key        |  256B
>> + * +-------------------------------+  ----
>> + * |       Public key modulus      |    4B
>> + * +-------------------------------+  ----
>> + *
>> + * Architecturally, the DMA engine is bidirectional, and in can potentially
>> + * even transfer between GTT locations. This functionality is left out of the
>> + * API for now as there is no need for it.
>> + *
>> + * Be note that GuC need the CSS header plus uKernel code to be copied as one
>> + * chunk of data. RSA sig data is loaded via MMIO.
>> + */
>> +static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
>> +{
>> +	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
>> +	struct drm_i915_gem_object *fw_obj = guc_fw->uc_fw_obj;
>> +	unsigned long offset;
>> +	struct sg_table *sg = fw_obj->pages;
>> +	u32 status, ucode_size, rsa[UOS_RSA_SIG_SIZE / sizeof(u32)];
>> +	int i, ret = 0;
>> +
>> +	/* uCode size, also is where RSA signature starts */
>> +	offset = ucode_size = guc_fw->uc_fw_size - UOS_CSS_SIGNING_SIZE;
>> +
>> +	/* Copy RSA signature from the fw image to HW for verification */
>> +	sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, UOS_RSA_SIG_SIZE, offset);
>> +	for (i = 0; i < UOS_RSA_SIG_SIZE / sizeof(u32); i++)
>> +		I915_WRITE(UOS_RSA_SCRATCH_0 + i * sizeof(u32), rsa[i]);
>> +
>> +	/* Set the source address for the new blob */
>> +	offset = i915_gem_obj_ggtt_offset(fw_obj);
> 
> Why would it even have a GGTT vma? There's no precondition here to
> assert that it should.

The (only) caller already did:

        ret = i915_gem_obj_ggtt_pin(guc_fw->uc_fw_obj, 0, 0);

and also deals with unpinning it after use.

>> +	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
>> +	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
>> +
>> +	/* Set the destination. Current uCode expects an 8k stack starting from
>> +	 * offset 0. */
>> +	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
>> +
>> +	/* XXX: The image is automatically transfered to SRAM after the RSA
>> +	 * verification. This is why the address space is chosen as such. */
>> +	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
>> +
>> +	I915_WRITE(DMA_COPY_SIZE, ucode_size);
>> +
>> +	/* Finally start the DMA */
>> +	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
>> +
> 
> Just assuming that the writes land and in the order you expect?

Yes. If they don't then the mapping of the MMIO registers is set up
wrong. No one should ever map h/w registers as writeback or
write-combining; or in fact anything other than uncached and strongly
ordered w.r.t. each other for both reads and writes.

Sometimes we need a POSTING_READ() to ensure that a WRITE has reached
the h/w before touching something other than a device register -- s/w
state or shared memory -- but not between consecutive writes to
registers of the same device.

The next operation is going to be a READ (inside i915_guc_get_status()
above, so that will flush (in order) any of the above writes that
haven't actually reached the h/w yet ...

>> +	/*
>> +	 * Spin-wait for the DMA to complete & the GuC to start up.
>> +	 * NB: Docs recommend not using the interrupt for completion.
>> +	 * FIXME: what's a valid timeout?
>> +	 */
>> +	ret = wait_for_atomic(i915_guc_get_status(dev_priv, &status), 10);
> 
> FIXME, error handling is too hard.

I got a new timeout value from the MinuteIA team, so I'll update that.
The error code is passed back for the caller to handle.

>> +	DRM_DEBUG_DRIVER("DMA status = 0x%x, GuC status 0x%x\n",
>> +			I915_READ(DMA_CTRL), status);
>> +
>> +	if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
>> +		DRM_ERROR("%s firmware signature verification failed\n",
>> +			guc_fw->uc_name);
>> +		ret = -ENOEXEC;
>> +	}
>> +
>> +	DRM_DEBUG_DRIVER("GuC fw load status %s %d\n",
>> +			ret ? "FAIL" : "SUCCESS", ret);
>> +
>> +	return ret;
>> +}
> 
> I'm guessing the other functions are basically more of the same...
> -Chris

?
.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-06-18 17:53     ` Yu Dai
@ 2015-06-18 20:12       ` Chris Wilson
  2015-06-19 14:34         ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-18 20:12 UTC (permalink / raw)
  To: Yu Dai; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 10:53:10AM -0700, Yu Dai wrote:
> 
> 
> On 06/15/2015 01:30 PM, Chris Wilson wrote:
> >On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote:
> >> +	/* Set the source address for the new blob */
> >> +	offset = i915_gem_obj_ggtt_offset(fw_obj);
> >
> >Why would it even have a GGTT vma? There's no precondition here to
> >assert that it should.
> It is pinned into GGTT inside gem_allocate_guc_obj.

The basic rules when reviewing is pinning is:
- is there a reason for this pin?
- is the lifetime of the pin bound to the hardware access?
- are the pad-to-size/alignment correct?
- is the vma in the wrong location?

Pinning early (and then not even stating in the function preamble that
you expect the object to be pinned) makes it hard to review both the
reason and check the lifetime. An easy solution to avoiding the
assumption of having a pinned object is to pass around the vma instead.
Though because you pin too early it is not clear the reason for the pin
nor that you only pin it for the lifetime of the hardware access, and
you have to scour the code to ensure that the pin isn't randomly dropped
or reused for another access.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status
  2015-06-16  9:40   ` Chris Wilson
@ 2015-06-19  7:49     ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-19  7:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 16/06/15 10:40, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:24PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> The new node provides access to the status of the common uC loader
>> code and the GuC-specific loader; also the scratch registers used
>> for communicatio between the i915 driver and the GuC firmware.
>>
>> Issue: VIZ-4884
>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_debugfs.c |   37 +++++++++++++++++++++++++++++++++++
>>  1 file changed, 37 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 47636f3..c52a745 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -2352,6 +2352,42 @@ static int i915_llc(struct seq_file *m, void *data)
>>  	return 0;
>>  }
>>  
>> +static void i915_uc_load_status_info(struct seq_file *m, struct intel_uc_fw *uc_fw)
>> +{
>> +	seq_printf(m, "%s firmware status:\n\tpath: <%s>\n\tfetch: %d\n\tload: %d\n",
>> +			uc_fw->uc_name,
>> +			uc_fw->uc_fw_path,
>> +			uc_fw->uc_fw_fetch_status,
>> +			uc_fw->uc_fw_load_status);
> 
> If you made this one seq_printf() per line visualing the resulting
> format would have been easier - and easier to modify.

Done.

> Don't use <%s>, that's just visual noise to make cutting and pasting
> harder.

My terminal doesn't include <> in word selections (but DOES include "/"
and ".") so selecting a pathname just works :) But I've removed the
<angle.brackets> anyway.

> If you can decode numeric status values, do so.

Done. I've added a _repr function for decoding the enum and used it
everywhere.

>> +}
>> +
>> +static int i915_guc_load_status_info(struct seq_file *m, void *data)
>> +{
>> +	struct drm_info_node *node = m->private;
>> +	struct drm_i915_private *dev_priv = node->minor->dev->dev_private;
>> +	u32 tmp, i;
>> +
>> +	if (!HAS_GUC_UCODE(dev_priv->dev))
> 
> Here and elsewhere it should be return -ENODEV;

There's only one other use of HAS_GUC_UCODE (in intel_guc_ucode_init())
and that one doesn't and mustn't trigger an error if false. I don't see
why it should be an *error* here either; the caller hasn't done anything
wrong, and there's no h/w or s/w failure. An empty result (EOF) is a
nice way of saying that there's nothing to say, without making the user
think something broke.

In fact it may be perfectly meaningful to continue rather than
returning; consider the case of a future GuC that comes with firmware
preloaded, so HAS_GUC() is true but HAS_GUC_UCODE() is FALSE. We could
still read and decode the GUC_STATUS even though we haven't loaded any
firmware.

>> +		return 0;
>> +
>> +	i915_uc_load_status_info(m, &dev_priv->guc.guc_fw);
>> +
>> +	tmp = I915_READ(GUC_STATUS);
>> +
>> +	seq_printf(m, "\nGuC status 0x%08x:\n", tmp);
>> +	seq_printf(m, "\tBootrom status = 0x%x\n",
>> +		(tmp & GS_BOOTROM_MASK) >> GS_BOOTROM_SHIFT);
>> +	seq_printf(m, "\tuKernel status = 0x%x\n",
>> +		(tmp & GS_UKERNEL_MASK) >> GS_UKERNEL_SHIFT);
>> +	seq_printf(m, "\tMIA Core status = 0x%x\n",
>> +		(tmp & GS_MIA_MASK) >> GS_MIA_SHIFT);
>> +	seq_puts(m, "\nScratch registers value:\n");
>> +	for (i = 0; i < 16; i++)
>> +		seq_printf(m, "\t%2d: \t0x%x\n", i, I915_READ(SOFT_SCRATCH(i)));
> 
> I have a feeling these probably don't want to be upstreamed.
> -Chris

It's just a register dump; nothing secret there. You could read them
with IGT's register dumper anyway.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-18 14:49       ` Daniel Vetter
  2015-06-18 15:27         ` Chris Wilson
@ 2015-06-19  8:43         ` Dave Gordon
  2015-06-24 10:29           ` Daniel Vetter
  1 sibling, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-19  8:43 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 18/06/15 15:49, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
>> On 17/06/15 13:05, Daniel Vetter wrote:
>>> On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
>>>> Current devices may contain one or more programmable microcontrollers
>>>> that need to have a firmware image (aka "binary blob") loaded from an
>>>> external medium and transferred to the device's memory.
>>>>
>>>> This file provides generic support functions for doing this; they can
>>>> then be used by each uC-specific loader, thus reducing code duplication
>>>> and testing effort.
>>>>
>>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>>>
>>> Given that I'm just shredding the synchronization used by the dmc loader
>>> I'm not convinced this is a good idea. Abstraction has cost, and a bit of
>>> copy-paste for similar sounding but slightly different things doesn't
>>> sound awful to me. And the critical bit in all the firmware loading I've
>>> seen thus far is in synchronizing the loading with other operations,
>>> hiding that isn't a good idea. Worse if we enforce stuff like requiring
>>> dev->struct_mutex.
>>> -Daniel
>>
>> It's precisely because it's in some sense "trivial-but-tricky" that we
>> should write it once, get it right, and use it everywhere. Copypaste
>> /does/ sound awful; I've seen how the code this was derived from had
>> already been cloned into three flavours, all different and all wrong.
>>
>> It's a very simple abstraction: one early call to kick things off as
>> early as possible, no locking required. One late call with the
>> struct_mutex held to complete the synchronisation and actually do the
>> work, thus guaranteeing that the transfer to the target uC is done in a
>> controlled fashion, at a time of the caller's choice, and by the
>> driver's mainline thread, NOT by an asynchronous thread racing with
>> other activity (which was one of the things wrong with the original
>> version).
> 
> Yeah I've seen the origins of this in the display code, and that code gets
> the syncing wrong. The only thing that one has do to is grab a runtime pm
> reference for the appropriate power well to prevent dc5 entry, and release
> it when the firmware is loaded and initialized.

Agreed.

> Which means any kind of firmware loader which requires/uses
> dev->struct_mutex get stuff wrong and is not appropriate everywhere.

BUT, the loading of the firmware into any uC MUST be done in a
controlled manner i.e. at a time when no other thread is touching the
h/w. Otherwise the f/w load and whatever else is concurrently accessing
the h/w could in some cases interfere disastrously. Examples of
interference might be:

* interleaved accesses to the ELSP (in the case of the GuC)
* incorrect handover of power management (DMC, GuC)
* erroneous management of forcewake state

In general the f/w that is just starting on the uC may have certain
expectations about the initial state of the h/w, which may not be met if
other threads are accessing various bits of h/w while the uC is booting up.

So we absolutely need to guarantee that the f/w load is done by a thread
which has exclusive ownership of any bit of the h/w that the f/w is
going to make assumptions about. With the current locking structure of
the driver, that means holding the struct_mutex (it shouldn't really,
there should be a separate mutex for h/w register access vs.
driver-private data structures, but there isn't).

>> We should convert the DMC loader to use this too, so there need be only
>> one bit of code in the whole driver that needs to understand how to use
>> completions to get correct handover from a free-running no-locks-held
>> thread to the properly disciplined environment of driver mainline for
>> purposes of programming the h/w.
> 
> Nack on using this for dmc, since I want them to convert it to the above
> synchronization, since that's how all the other async power initialization
> is done.
> 
> Guc is different since we really must have it ready for execbuf, and for
> that usecase a completion at drm_open time sounds like the right thing.
> 
> As a rule of thumb for refactoring and share infastructure we use the
> following recipe in drm:
> - first driver implements things as straightforward as possible
> - 2nd user copypastes
> - 3rd one has the duty to figure out whether some refactoring is in order
>   or not.
> Imo that approach leads a really good balance between avoiding
> overengineering and having maintainable code.
> -Daniel

We've already been through these phases; the code has already been
cloned twice (and then changed, but not enough to fix the problems with
the original) and then when I found the issues with the GuC loader and
noticed the hilarious ownership dance it was doing during handover I
realised it was time to fix it in one place rather than several, and
posted a patchset to the internal mailing list on 2015-02-24 with this
commentary:

> The GuC loader uses an asynchronous thread to fetch the firmware image
> (aka "binary blob") from a file and load it into the GuC's memory.
> Unfortunately the GuC loading occurs *after* the internally-generated
> batches used to initialise contexts have already been submitted using
> direct access to the ELSP.  Also, the firmware ends up being loaded at
> an indeterminate time, with consequent potential for confusion in the
> switchover from ELSP- to GuC-based submission.
> 
> This patch series therefore reorganises the GuC loader to ensure that
> the loading process occurs both early enough and at a well-defined
> point in the sequence of operations during driver initialisation,
> specifically *before* any batches are submitted to hardware.
> 
> [PATCH 1/3] GuC: reorganise source before rewriting this code
> [PATCH 2/3] GuC: load firmware image from main thread
> [PATCH 3/3] GuC: update names & comments ("load" => "fetch")

followed by [PATCH 0/2] unify and tidy firmware loading code
on 2015-03-02.

For the DMC module, the basic conversion process is to separate
intel_csr_load_program() from finish_csr_load(). The latter would remain
as the callback in the async thread loading process that has to validate
the loaded image; the former would then become the callback for the
synchronous post-handover transfer of the image to the h/w.

BTW, the existing DMC loader probably won't work on Android :(

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-18 18:07             ` Dave Gordon
@ 2015-06-19  8:44               ` Chris Wilson
  2015-06-22 11:59                 ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-19  8:44 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 07:07:46PM +0100, Dave Gordon wrote:
> On 18/06/15 13:10, Chris Wilson wrote:
> > On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
> >> On 17/06/15 13:02, Daniel Vetter wrote:
> >>> Domain handling is required for all gem objects, and the resulting bugs if
> >>> you don't for one-off objects are absolutely no fun to track down.
> >>
> >> Is it not the case that the new object returned by
> >> i915_gem_alloc_object() is
> >> (a) of a type that can be mapped into the GTT, and
> >> (b) initially in the CPU domain for both reading and writing?
> >>
> >> So AFAICS the allocate-and-fill function I'm describing (to appear in
> >> next patch series respin) doesn't need any further domain handling.
> > 
> > A i915_gem_object_create_from_data() is a reasonable addition, and I
> > suspect it will make the code a bit more succinct.
> 
> I shall adopt this name for it :)
> 
> > Whilst your statement is true today, calling set_domain is then a no-op,
> > and helps document how you use the object and so reduces the likelihood
> > of us introducing bugs in the future.
> > -Chris
> 
> So here's the new function ... where should the set-to-cpu-domain go?
> After the pin_pages and before the sg_copy_from_buffer?

Either, since the domain will not change whilst you have the lock,
but if you do it before get_pages() you will have a slightly easier
error path.

Part of the reason why I want a function like this is so that I can
replace it with a stolen object and so need to write the data through a
temporary GGTT mapping. Speak now if you need more flags to the function
to prevent certain classes of objects being created.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open
  2015-06-17 12:18   ` Daniel Vetter
@ 2015-06-19  9:19     ` Dave Gordon
  2015-06-24 10:15       ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-19  9:19 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 17/06/15 13:18, Daniel Vetter wrote:
> On Mon, Jun 15, 2015 at 07:36:25PM +0100, Dave Gordon wrote:
>> In order to fully initialise the default contexts, we have to execute
>> batchbuffer commands on the GPU engines. But in the case of GuC-based
>> batch submission, we can't do that until any required firmware has
>> been loaded, which may not be possible during driver load, because the
>> filesystem(s) containing the firmware may not be mounted until later.
>>
>> Therefore, we now allow the first call to the firmware-loading code to
>> return -EAGAIN to indicate that it's not yet ready, and that it should
>> be retried when the device is first opened from user code, by which
>> time we expect that all required filesystems will have been mounted.
>> The late-retry code will then re-attempt to load the firmware if the
>> early attempt failed.
>>
>> If the late retry fails, the current open-in-progress will fail, but
>> the recovery code will disable GuC submission and reset the GPU and
>> driver. The next open will therefore be in non-GuC mode, and will be
>> allowed to complete even if the GuC cannot be loaded or used.
>>
>> Issue: VIZ-4884
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Signed-off-by: Alex Dai <yu.dai@intel.com>
> 
> I'm not really sold on this super-flexible fallback scheme implemented
> here. Because such fallback schemes means more code to test (which no on
> will do likely) or just even bigger fireworks when we actually hit them in
> reality when something goes wrong. Imo if anything goes wrong in the setup
> we just throw in the towel and fail the driver loading.

Firstly, GuC submission is an OPTION. That means we already have code to
work with or without a GuC. The fallback just allows us to keep going
after finding that although GuC submission has been requested, and we do
have a GuC, nonetheless the request cannot be satisfied. That's no
different from automatically disabling PPGTT or execlist mode if they're
requested on platforms where we don't support them.

> There's only one exception: If something fails with GT init we declare the
> gpu wedged but proceed with all the modeset setup. This makes sense
> because we need all the code to handle a wedge gpu anyway, dead-on-boot
> gpus happen occasionally and it's really not nice to greet the user with a
> black screen. But more fallbacks are imo just headache.
> 
> Hence when the guc fails we imo really shouldn't bother with fallbacks,
> but instead just declare the thing wedged and carry on.

So the strategy here is exactly the same as for GT init; declare the GPU
wedged, but after disabling GuC mode. The recovery will then get us into
the same state as if there were no GuC, or GuC mode had not been
selected in the first place. We can't switch between GuC and execlists
arbitrarily; the only switchover is from GuC to non-GuC, and it can only
happen ONCE.

To test this is easy; just rename your firmware blob so the driver can't
find it and reboot. It should automatically run in execlist mode, with a
log message telling you what went wrong (f/w file not found). Much nicer
than your screen staying blank because you upgraded the driver and not
the firmware, or vice versa.

> That should also allow us to simplify the firmware loading: We can do that
> in an async worker and if the blob isn't there in time then we just move
> on.
> -Daniel

Under no circumstances can you ever load the firmware from an async
worker thread, because Bad Things Will Happen if there is hardware
activity already in progress when the GuC f/w starts up.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open
  2015-06-16  9:35   ` Chris Wilson
@ 2015-06-19  9:42     ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-19  9:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 16/06/15 10:35, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:25PM +0100, Dave Gordon wrote:
>> +static int i915_gem_context_first_open(struct drm_device *dev)
>> +{
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +	int ret;
>> +
>> +	/*
>> +	 * We can't enable contexts until all firmware is loaded. This
>> +	 * call shouldn't return -EAGAIN because we pass wait=true, but
>> +	 * it can still fail with code -EIO if the GuC doesn't respond,
>> +	 * or -ENOEXEC if the GuC firmware image is invalid.
>> +	 */
>> +	ret = intel_guc_ucode_load(dev, true);
>> +	WARN_ON(ret == -EAGAIN);
>> +
>> +	/*
>> +	 * If an error occurred and GuC submission has been requested, we can
>> +	 * attempt recovery by disabling GuC submission and reinitialising
>> +	 * the GPU and driver. We then fail this open() anyway, but the next
>> +	 * attempt will find that GuC submission is already disabled, and so
>> +	 * proceed to complete context initialisation in non-GuC mode instead.
>> +	 */
>> +	if (ret && i915.enable_guc_submission) {
>> +		i915_handle_guc_error(dev, ret);
>> +		return ret;
>> +	}
> 
> This is still backwards. What we wanted was for the submission process
> to start up normally and then once the GuC loading succeeds, we then
> start submitting the backlog to the GuC. If the loading fails, we can
> then submit the backlog via execlists. It may be interesting to even
> start userspace before GuC finishes loading.

Absolutely. The latter is what this allows :)

(And its a requirement for Android, as on those platforms the f/w won't
become available until userspace is running).

But we're not going to keep stuff queued up in the rings. It would add
more complexity to manage the backlog and remember that we can accept
calls to add_request but not actually submit them. Also there would be
issue with any code that sent commands to the engines to program
registers via LRIs and then EITHER assumed that they had taken effect OR
waited for completion (because that would block indefinitely).

Instead, we've split hw_init() into early (MMIO) and late (context and
batch) phases, and deferred all of the latter iff we need GuC f/w to be
loaded before batch submission. When the late phase runs, it can submit
batches, and wait for completion if required, without blocking the
entire system as it would if called from driver_load().

It might even make sense as an overall strategy to defer MORE of the
driver initialisation process, so that the critical single-threaded
driver load during system startup does as little as possible.

> So this makes more sense as to why you have the tight integration with
> execlists then. I still don't think that justifies changing gen8 without
> reason.
> -Chris

It's tightly integrated because GuC submission *IS* execlist submission,
only with the GuC doing the actual poking of the ELSP and fielding the
resulting context-switch interrupts.

Deferred initialisation doesn't apply to Gen8, or anything else that
doesn't have and use GuC submission. The delayed-context-initialisation
codepath is only reachable if there is GuC firmware to be loaded.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-06-18 20:12       ` Chris Wilson
@ 2015-06-19 14:34         ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-19 14:34 UTC (permalink / raw)
  To: Chris Wilson, Yu Dai, intel-gfx

On 18/06/15 21:12, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 10:53:10AM -0700, Yu Dai wrote:
>>
>>
>> On 06/15/2015 01:30 PM, Chris Wilson wrote:
>>> On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote:
>>>> +	/* Set the source address for the new blob */
>>>> +	offset = i915_gem_obj_ggtt_offset(fw_obj);
>>>
>>> Why would it even have a GGTT vma? There's no precondition here to
>>> assert that it should.
>> It is pinned into GGTT inside gem_allocate_guc_obj.

This particular object wasn't allocated with that function; that's only
used for objects that need to be permanently accessible by the GuC
(context pool, GuC logbuffer, per-client structure). As I already
mentioned in another reply, /this/ one was pinned (and will be unpinned)
by the *immediate caller* of this function.

.Dave.

> The basic rules when reviewing is pinning is:
> - is there a reason for this pin?
> - is the lifetime of the pin bound to the hardware access?
> - are the pad-to-size/alignment correct?
> - is the vma in the wrong location?
> 
> Pinning early (and then not even stating in the function preamble that
> you expect the object to be pinned) makes it hard to review both the
> reason and check the lifetime. An easy solution to avoiding the
> assumption of having a pinned object is to pass around the vma instead.
> Though because you pin too early it is not clear the reason for the pin
> nor that you only pin it for the lifetime of the hardware access, and
> you have to scour the code to ensure that the pin isn't randomly dropped
> or reused for another access.
> -Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 09/15] drm/i915: GuC submission setup, phase 1
  2015-06-15 21:32   ` Chris Wilson
@ 2015-06-19 17:02     ` Dave Gordon
  2015-06-19 17:22       ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-19 17:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 15/06/15 22:32, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:27PM +0100, Dave Gordon wrote:
>> +static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device *dev,
>> +							u32 size)
>> +{
>> +	struct drm_i915_gem_object *obj;
>> +
>> +	obj = i915_gem_alloc_object(dev, size);
>> +	if (!obj)
>> +		return NULL;
> 
> Does it need to be a shmemfs object?

It needs to be permanently in RAM, so probably not. But I don't see an
allocator that gives you a GEM memory object /without/ backing store. Do
we have one?

>> +	if (i915_gem_object_get_pages(obj)) {
>> +		drm_gem_object_unreference(&obj->base);
>> +		return NULL;
>> +	}
> 
> This is a random function call.

Which is? Unreferencing the newly-allocated object before returning?
Otherwise it will leak :(

Presumably if the object didn't have backing store, the get_pages()
would be unnecessary as they would already be resident? Or would they
not exist until the first get_pages call instantiated them?

>> +	if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
>> +			PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE)) {
>> +		drm_gem_object_unreference(&obj->base);
>> +		return NULL;
> 
> How about reporting the right error code?

Doesn't add anything. Allocation failure is going to be fatal anyway.
And i915_gem_alloc_object() just returns NULL for failure, so we'd have
to *make up* an error code for that case :(

Oh, and ERR_PTR/PTR_ERR are ugly.

>> +	}
>> +
>> +	return obj;
>> +}
>> +
>> +/**
>> + * gem_release_guc_obj() - Release gem object allocated for GuC usage
>> + * @obj:	gem obj to be released
>> +  */
>> +static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
>> +{
>> +	if (!obj)
>> +		return;
>> +
>> +	if (i915_gem_obj_is_pinned(obj))
>> +		i915_gem_object_ggtt_unpin(obj);
> 
> What?

The object /should/ be pinned when we arrive here, thanks to the
i915_gem_obj_ggtt_pin() call discussed above. We could just always
unpin, and see whether we trigger this:

        WARN_ON(vma->pin_count == 0);

inside i915_gem_object_ggtt_unpin(). The test is just in case there's an
error path where the object being released wasn't in fact pinned.

>> +	drm_gem_object_unreference(&obj->base);
>> +}
>> +
>> +/*
>> + * Set up the memory resources to be shared with the GuC.  At this point,
>> + * we require just one object that can be mapped through the GGTT.
>> + */
>> +int i915_guc_submission_init(struct drm_device *dev)
>> +{
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
> 
> Bleh.

Cross-file interface, nonstatic, hence passes 'dev'; also it needs 'dev'
anyway, so there's no gain in passing dev_priv. And dev_priv
(i.e. struct drm_i915_private) isn't even in scope when this function is
declared in the header file.

>> +	const size_t ctxsize = sizeof(struct guc_context_desc);
>> +	const size_t poolsize = MAX_GUC_GPU_CONTEXTS * ctxsize;
>> +	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
>> +	struct intel_guc *guc = &dev_priv->guc;
>> +
>> +	if (!i915.enable_guc_submission)
>> +		return 0; /* not enabled  */
>> +
>> +	if (guc->ctx_pool_obj)
>> +		return 0; /* already allocated */
> 
> Eh? Where have you hooked into... So looking at that, it looks like you
> want to move this into the device initialisation rather than guc
> firmware load. To me at least they are conceptually separate stages, and
> judging by the above combining them has resulted in very clumsy code.

So ... we don't want to allocate any GuC-related structures unless we're
going at least try to use the GuC, so this has to come /after/ firmware
fetch and validation. OTOH we can't actually fire up the GuC until after
these structures are allocated, because the GGTT address of the
ctx_pool_obj has to be passed to the GuC firmware as one of its startup
parameters. Thus, the only place to do this allocation is in between
deciding to transfer the f/w to the GuC and actually doing so.

Hence, the GuC loading code calls this each time we're about to squirt
the f/w into the GuC; but, we don't need to allocate this more than once
(OTOH it would be a violation of modularity for the loader to know that;
only the submission code needs to know that little detail). So we end up
with the above do-this-only-once code.

>> +	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv->dev, gemsize);
>> +	if (!guc->ctx_pool_obj)
>> +		return -ENOMEM;
>> +
>> +	spin_lock_init(&dev_priv->guc.host2guc_lock);
>> +
>> +	ida_init(&guc->ctx_ids);
>> +
>> +	memset(guc->doorbell_bitmap, 0, sizeof(guc->doorbell_bitmap));
>> +	guc->db_cacheline = 0;
> 
> Before you relied on guc being zeroed, and now you memset it again.

Hmm ... perhaps we should rezero some of these things /every/ time instead?

/me examines code ...

Nope; it looks like everything is once again zero at the point when this
function takes the early exit.

>> +	return 0;
>> +}
>> +
>> +void i915_guc_submission_fini(struct drm_device *dev)
>> +{
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +	struct intel_guc *guc = &dev_priv->guc;
>> +
>> +	gem_release_guc_obj(dev_priv->guc.log_obj);
>> +	guc->log_obj = NULL;
>> +
>> +	if (guc->ctx_pool_obj)
>> +		ida_destroy(&guc->ctx_ids);
> 
> Interesting guard. Maybe just make the GuC controller a pointer from
> i915 and then you can do a more natural if (i915->guc == NULL) return;

That test was because there's no way to tell from ctx_ids itself whether
it was initialised (in any case, it's a black box as far as I'm
concerned). But the init code above guarantees that iff the pool was
allocated, then the rest of the initialisation was also done, so we
should call ida_destroy().

I wouldn't object to changing the intel_guc to a separate allocation
rather than embedding it. We'd need to add a backpointer though as we
currently use container_of() inside the guc_to_i915 macro. But you'd
still end up with something like the above, because the allocation of
the ctx_pool is still a separate step that can fail after the intel_guc
structure has been allocated; and you need the f/w-loading-related data
very early. The only saving is a small amount of memory, for an cost of
extra memory dereference at various points. So probably not worth it.

>> +	gem_release_guc_obj(guc->ctx_pool_obj);
>> +	guc->ctx_pool_obj = NULL;
>> +}
>> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
>> index 0b44265..06b68c2 100644
>> --- a/drivers/gpu/drm/i915/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>> @@ -171,4 +171,8 @@ extern void intel_guc_ucode_init(struct drm_device *dev);
>>  extern int intel_guc_ucode_load(struct drm_device *dev, bool wait);
>>  extern void intel_guc_ucode_fini(struct drm_device *dev);
>>  
>> +/* i915_guc_submission.c */
>> +int i915_guc_submission_init(struct drm_device *dev);
>> +void i915_guc_submission_fini(struct drm_device *dev);
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
>> index 16eef4c..0f74876 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
>> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
>> @@ -111,6 +111,21 @@ static void set_guc_init_params(struct drm_i915_private *dev_priv)
>>  			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
>>  	}
>>  
>> +	/* If GuC scheduling is enabled, setup params here. */
>> +	if (i915.enable_guc_submission) {
>> +		u32 pgs = i915_gem_obj_ggtt_offset(dev_priv->guc.ctx_pool_obj);
>> +		u32 ctx_in_16 = MAX_GUC_GPU_CONTEXTS / 16;
> 
> So really you didn't need to pin the ctx_pool_obj until this point?

Possibly. But that's not long after the allocation above (it's called
from the next function that the caller of i915_guc_submission_init()
calls after a successful return from that function).

intel_guc_ucode_load()
  i915_guc_submission_init()
    gem_allocate_guc_obj() -- returns pinned object
  guc_ucode_xfer()
    set_guc_init_params() -- needs GGTT address of pinned object

And we really don't want any extra failure paths at this depth. Better
to pin the object early and bail out if there's a problem. Its going to
be pinned for its entire lifetime anyway, so I don't think there's a
problem with pinning it a few microseconds early, especially /during
first open/ when there's no contention for GGTT space.

>> +		pgs >>= PAGE_SHIFT;
>> +		params[GUC_CTL_CTXINFO] = (pgs << GUC_CTL_BASE_ADDR_SHIFT) |
>> +			(ctx_in_16 << GUC_CTL_CTXNUM_IN16_SHIFT);
>> +
>> +		params[GUC_CTL_FEATURE] |= GUC_CTL_KERNEL_SUBMISSIONS;
>> +
>> +		/* Unmask this bit to enable GuC scheduler */
>> +		params[GUC_CTL_FEATURE] &= ~GUC_CTL_DISABLE_SCHEDULER;

This line deserves the comment firstly because we explicitly set this
bit earlier in the function, but have now decided to clear it again (it
was tidier than having unbalanced if-else legs); and secondly to help
people not get confused by the number of negations (&= ~x means clear
something, but what we're clearing has negative semantics "disable"). So
it does convey our intent ("why? to enable the GuC scheduler") as well
as how ("*un*mask this bit").

[aside]
At least the GuC parameter semantics are not as ugly as some workarounds
in the BSpec, where I regularly find things such as "this workaround
disables feature A when using option B but need not be applied if
condition C holds unless condition D is false or feature E is disabled.
The workaround must not be applied in mode F." *Bleeuurgh*
[/aside]

> /* Enable multiple context submission through the GuC */
> params[GUC_CTL_FEATURE] &= ~GUC_CTL_DISABLE_SCHEDULER;
> params[GUC_CTL_FEATURE] |= GUC_CTL_KERNEL_SUBMISSIONS;
> 
> Try to keep comments to explain why rather than what. Most of the
> comments here fall into the "i++; // postincrement i" category.
> -Chris

Most of the "what" comments in this file are associated with accesses to
specific h/w registers, which therefore have semantic effect beyond what
is explicit in the code. For example this comment:

/* tell all command streamers to forward interrupts and vblank to GuC */
irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK,GFX_FORWARD_VBLANK_ALWAYS);
irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
for_each_ring(ring, dev_priv, i)
    I915_WRITE(RING_MODE_GEN7(ring), irqs);

helps the reader what the /effect/ of the writes is intended to be. It's
quite different from:

/* write bitmask to GEN7 ring mode register */
I915_WRITE(RING_MODE_GEN7(ring),MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING));

and means you may not have to dig through the BSpec to find out what the
less helpfully-named bits actually do. And this:

        I915_WRITE(DE_GUCRMR, ~0);

would be incomprehensible without reading the BSpec ... or the comment

	/* tell DE to send nothing to GuC */

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 09/15] drm/i915: GuC submission setup, phase 1
  2015-06-19 17:02     ` Dave Gordon
@ 2015-06-19 17:22       ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-19 17:22 UTC (permalink / raw)
  To: intel-gfx

On 19/06/15 18:02, Dave Gordon wrote:
> On 15/06/15 22:32, Chris Wilson wrote:

[snip]

>> Try to keep comments to explain why rather than what. Most of the
>> comments here fall into the "i++; // postincrement i" category.
>> -Chris
> 
> Most of the "what" comments in this file are associated with accesses to
> specific h/w registers, which therefore have semantic effect beyond what
> is explicit in the code. For example this comment:
> 
> /* tell all command streamers to forward interrupts and vblank to GuC */
> irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK,GFX_FORWARD_VBLANK_ALWAYS);
> irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
> for_each_ring(ring, dev_priv, i)
>     I915_WRITE(RING_MODE_GEN7(ring), irqs);
> 
> helps the reader what the /effect/ of the writes is intended to be. It's
> quite different from:
> 
> /* write bitmask to GEN7 ring mode register */
> I915_WRITE(RING_MODE_GEN7(ring),MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING));
> 
> and means you may not have to dig through the BSpec to find out what the
> less helpfully-named bits actually do. And this:
> 
>         I915_WRITE(DE_GUCRMR, ~0);
> 
> would be incomprehensible without reading the BSpec ... or the comment
> 
> 	/* tell DE to send nothing to GuC */
> 
> .Dave.

Oops, those comments aren't actually in this patch, they're in a later
one. But they *will* be in this file by the end of the patchset ...

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 11/15] drm/i915: Implementation of GuC client
  2015-06-15 21:55   ` Chris Wilson
@ 2015-06-19 17:55     ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-19 17:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 15/06/15 22:55, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:29PM +0100, Dave Gordon wrote:
>> +/* Get valid workqueue item and return it back to offset */
>> +static int guc_get_workqueue_space(struct i915_guc_client *gc, u32 *offset)
>> +{
>> +	struct guc_process_desc *desc;
>> +	void *base;
>> +	u32 size = sizeof(struct guc_wq_item);
>> +	int ret = 0, timeout_counter = 200;
>> +	unsigned long flags;
>> +
>> +	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
>> +	desc = base + gc->proc_desc_offset;
>> +
>> +	while (timeout_counter-- > 0) {
>> +		spin_lock_irqsave(&gc->wq_lock, flags);
>> +
>> +		ret = wait_for_atomic(CIRC_SPACE(gc->wq_tail, desc->head,
>> +				gc->wq_size) >= size, 1);
> 
> What is the point of this loop? Drop the spinlock 200 times? You already
> have a timeout, the loop extends that by a factor or 200. You merely
> allow gazzumping, however I haven't looked at the locking to see what
> you intend to lock (since it is not described at all).
> -Chris

Hmmm .. I didn't write this code, so I'm guessing somewhat; but ...

A 'wq_lock' must lock a 'wq', which from the name of the function is a
workqueue, which is a circular buffer shared between the host and the
GuC, where (like the main ringbuffers) the host (producer) advances the
tail (gc->wq_tail) and the other partner (consumer, in this case the
GuC) advances the head (desc->head).

Presumably the GuC could take many (up to 200) ms to get round to making
space available, in a worst-case scenario where it's busy servicing
interrupts and doing other things.

Now we certainly don't want to spin for up to 200ms with interrupts
disabled, so

    spin_lock_irqsave(&gc->wq_lock, flags);
    ret = wait_for_atomic(CIRC_SPACE(gc->wq_tail, desc->head,
                                     gc->wq_size) >= size, *200*);
    spin_unlock_irqrestore(&gc->wq_lock, flags);

would be a bad idea. OTOH I don't think there's any other lock held by
anyone higher up in the callchain, so we /probably do/ need the spinlock
to protect the updating of wq_tail when the wait_for_atomic succeeds.

So yes, I think up-to-200 iterations of polling for freespace for up to
1ms each time is not too unreasonable, given that apparently we have to
poll, at least for now (once the scheduler lands, we will always be able
to predict how much space is available and avoid trying to launch
batches when there isn't enough).

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 13/15] drm/i915: Integrate GuC-based command submission
  2015-06-16  9:22   ` Chris Wilson
@ 2015-06-19 18:18     ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-19 18:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 16/06/15 10:22, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:31PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> GuC-based submission is mostly the same as execlist mode, up to
>> intel_logical_ring_advance_and_submit(), where the context being
>> dispatched would be added to the execlist queue; at this point
>> we submit the context to the GuC backend instead.
>>
>> There are, however, a few other changes also required, notably:
>> 1.  Contexts must be pinned at GGTT addresses accessible by the GuC
>>     i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
>>     PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
>>
>> 2.  The GuC's TLB must be invalidated after a context is pinned at
>>     a new GGTT address.
>>
>> 3.  GuC firmware uses the one page before Ring Context as shared data.
>>     Therefore, whenever driver wants to get base address of LRC, we
>>     will offset one page for it. LRC_PPHWSP_PN is defined as the page
>>     number of LRCA.
>>
>> 4.  In the work queue used to pass requests to the GuC, the GuC
>>     firmware requires the ring-tail-offset to be represented as an
>>     11-bit value, expressed in QWords. Therefore, the ringbuffer
>>     size must be reduced to the representable range (4 pages).
> 
> I don't like how this sabotages the existing execlists implementation
> in order for i915_guc_submission (an interesting choice of file name,
> since we go i915_gem_execbuffer (API) -> intel_execlists (HW) ->
> i915_guc_submission (HW), not fitting into our, admittedly loose, naming
> convention very well) to share a few functions. Even a couple of which
> are already vfunc.
> -Chris

Not really "sabotages"; big ringbuffers are a waste of space anyway.
Four pages is enough to have at least 64 batchbuffers queued up for the
engine to process (1 second of video, or 0.00012 of a bitcoin). When the
scheduler lands, it will generally reduce the number of batches in the
h/w rings anyway, mostly to improve responsiveness and fair-sharing
among different applications.

I quite agree that the execlists implementation, which is mostly in a
file called intel_lrc.c, should really be split into LRC-related code
(common to execlists and GuC modes) with the rest moved into
intel_execlist_submission.c. Or is that i915_execlist_submission.c?

If we took contexts as primary, rather than "rings" (engines) we could
also share a lot more between GuC/execlists and legacy ringbuffer mode.
At least we should get some improvement with AntiOLR, so we'll have

    request->context<->ringbuffer->engine->submission mechanism :)

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-19  8:44               ` Chris Wilson
@ 2015-06-22 11:59                 ` Dave Gordon
  2015-06-22 12:37                   ` Chris Wilson
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-22 11:59 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On 19/06/15 09:44, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 07:07:46PM +0100, Dave Gordon wrote:
>> On 18/06/15 13:10, Chris Wilson wrote:
>>> On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
>>>> On 17/06/15 13:02, Daniel Vetter wrote:
>>>>> Domain handling is required for all gem objects, and the resulting bugs if
>>>>> you don't for one-off objects are absolutely no fun to track down.
>>>>
>>>> Is it not the case that the new object returned by
>>>> i915_gem_alloc_object() is
>>>> (a) of a type that can be mapped into the GTT, and
>>>> (b) initially in the CPU domain for both reading and writing?
>>>>
>>>> So AFAICS the allocate-and-fill function I'm describing (to appear in
>>>> next patch series respin) doesn't need any further domain handling.
>>>
>>> A i915_gem_object_create_from_data() is a reasonable addition, and I
>>> suspect it will make the code a bit more succinct.
>>
>> I shall adopt this name for it :)
>>
>>> Whilst your statement is true today, calling set_domain is then a no-op,
>>> and helps document how you use the object and so reduces the likelihood
>>> of us introducing bugs in the future.
>>> -Chris
>>
>> So here's the new function ... where should the set-to-cpu-domain go?
>> After the pin_pages and before the sg_copy_from_buffer?
> 
> Either, since the domain will not change whilst you have the lock,
> but if you do it before get_pages() you will have a slightly easier
> error path.

OK, call to i915_gem_object_set_to_cpu_domain(obj, true) added right
after the i915_gem_alloc_object(); also, since we now have multiple
failure paths where the ability to distinguish them might be useful (and
since this function is a public addition to the gem_object repertoire),
I've made it return object-or-error-code, with the incomplete-copy case
returning ERR_PTR(-EIO).

> Part of the reason why I want a function like this is so that I can
> replace it with a stolen object and so need to write the data through a
> temporary GGTT mapping. Speak now if you need more flags to the function
> to prevent certain classes of objects being created.
> -Chris

For the GuC, the firmware image is written once by the CPU and
thereafter read only by the DMA engine via a GGTT address; other uC
devices might have different requirements e.g. the CSR/DMC doesn't have
a DMA engine AFAIK and the f/w is transferred to the h/w via MMIO writes
by the CPU. The primary reason for storing the image in a GEM object
(rather than kmalloc'd space, as the DMC loader does) is to make it
pageable; it's needed multiple times, as we have to reload the f/w after
reset or on exit from low-power states, but not used the rest of the
time. So the existing objects seem a good match.

The GuC's pool and log objects, OTOH, must be permanently resident in
RAM and permanently mapped via the GGTT above GUC_WOPCM_OFFSET. So for
these it would be useful to have an allocator that *didn't* make the
object shmfs-backed.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-22 11:59                 ` Dave Gordon
@ 2015-06-22 12:37                   ` Chris Wilson
  2015-06-23 16:54                     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Chris Wilson @ 2015-06-22 12:37 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 22, 2015 at 12:59:00PM +0100, Dave Gordon wrote:
> On 19/06/15 09:44, Chris Wilson wrote:
> > On Thu, Jun 18, 2015 at 07:07:46PM +0100, Dave Gordon wrote:
> >> On 18/06/15 13:10, Chris Wilson wrote:
> >>> On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
> >>>> On 17/06/15 13:02, Daniel Vetter wrote:
> >>>>> Domain handling is required for all gem objects, and the resulting bugs if
> >>>>> you don't for one-off objects are absolutely no fun to track down.
> >>>>
> >>>> Is it not the case that the new object returned by
> >>>> i915_gem_alloc_object() is
> >>>> (a) of a type that can be mapped into the GTT, and
> >>>> (b) initially in the CPU domain for both reading and writing?
> >>>>
> >>>> So AFAICS the allocate-and-fill function I'm describing (to appear in
> >>>> next patch series respin) doesn't need any further domain handling.
> >>>
> >>> A i915_gem_object_create_from_data() is a reasonable addition, and I
> >>> suspect it will make the code a bit more succinct.
> >>
> >> I shall adopt this name for it :)
> >>
> >>> Whilst your statement is true today, calling set_domain is then a no-op,
> >>> and helps document how you use the object and so reduces the likelihood
> >>> of us introducing bugs in the future.
> >>> -Chris
> >>
> >> So here's the new function ... where should the set-to-cpu-domain go?
> >> After the pin_pages and before the sg_copy_from_buffer?
> > 
> > Either, since the domain will not change whilst you have the lock,
> > but if you do it before get_pages() you will have a slightly easier
> > error path.
> 
> OK, call to i915_gem_object_set_to_cpu_domain(obj, true) added right
> after the i915_gem_alloc_object(); also, since we now have multiple
> failure paths where the ability to distinguish them might be useful (and
> since this function is a public addition to the gem_object repertoire),
> I've made it return object-or-error-code, with the incomplete-copy case
> returning ERR_PTR(-EIO).

I'd stick to only using EIO when we have a GPU failure. Incomplete copy
is EFAULT.

> > Part of the reason why I want a function like this is so that I can
> > replace it with a stolen object and so need to write the data through a
> > temporary GGTT mapping. Speak now if you need more flags to the function
> > to prevent certain classes of objects being created.
> > -Chris
> 
> For the GuC, the firmware image is written once by the CPU and
> thereafter read only by the DMA engine via a GGTT address; other uC
> devices might have different requirements e.g. the CSR/DMC doesn't have
> a DMA engine AFAIK and the f/w is transferred to the h/w via MMIO writes
> by the CPU. The primary reason for storing the image in a GEM object
> (rather than kmalloc'd space, as the DMC loader does) is to make it
> pageable; it's needed multiple times, as we have to reload the f/w after
> reset or on exit from low-power states, but not used the rest of the
> time. So the existing objects seem a good match.
> 
> The GuC's pool and log objects, OTOH, must be permanently resident in
> RAM and permanently mapped via the GGTT above GUC_WOPCM_OFFSET. So for
> these it would be useful to have an allocator that *didn't* make the
> object shmfs-backed.

http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=3f74a251aa9a3e5c1215226f7857ed53693c563f

Though I want to use stolen as much as is practically possible, however
without direct CPU access, stolen is very much an idle fancy.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 08/15] drm/i915: Move execlists defines from .c to .h
  2015-06-17  7:59       ` Chris Wilson
@ 2015-06-22 13:05         ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-22 13:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Michael H. Nguyen

On 17/06/15 08:59, Chris Wilson wrote:
> On Wed, Jun 17, 2015 at 08:31:59AM +0100, Dave Gordon wrote:
>> On 16/06/15 10:37, Chris Wilson wrote:
>>> On Mon, Jun 15, 2015 at 07:36:26PM +0100, Dave Gordon wrote:
>>>> From: "Michael H. Nguyen" <michael.h.nguyen@intel.com>
>>>>
>>>> Move defines from intel_lrc.c to i915_reg.h so they are accessible
>>>> to the GuC submission code; and expose a previously static function
>>>> in the execlist code which will also be required for GuC submission.
>>>
>>> What would have been better would have to been to split the lrc code
>>> from the execlists code so that the sharing is more obvious and the
>>> overloading separate from the common code.
>>> -Chris
>>
>> What would have been better is not to have put these fairly generic
>> details about the hardware into a C file in the first place. And not to
>> have split execlist and ringbuffer modes into two entirely different
>> paths. And various other historical decisions. But we can only fix the
>> code as it stands, not as it ought to have been.
>>
>> Anyway, this is just a bulk cut-n-paste, so I'm not inclined to do any
>> restructuring on it during this process. But someone working on
>> execlists could certainly tidy it up later, perhaps as part of a general
>> drive towards deduplicating the code paths and partitioning (context vs
>> ringbuffer vs engine) functionality in a more coherent way.
> 
> More to the point, you are increasing the technical debt of the code
> rather than reducing it. Code will just become less and less
> maintainable.
> -Chris

OK, I have abolished the bulk cut'n'paste :)

Turns out we really only needed a bit of it, and then I spotted a way to
reuse some of the "execlists" code (which is really LRC code) instead of
having a GuC version of same (which is what needed some of these #defines).

So in the end, this patch is replaced by simply renaming-and-exposing
the /two/ functions in intel_lrc.c that we actually need for GuC
submission. Better still, one of them may go away entirely once we eat
some of Chris' low-hanging fruit :)

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/15] drm/i915: Interrupt routing for GuC submission
  2015-06-17 12:41         ` Daniel Vetter
@ 2015-06-23 11:33           ` Dave Gordon
  2015-06-23 23:48             ` Yu Dai
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-23 11:33 UTC (permalink / raw)
  To: Daniel Vetter, Dai, Yu; +Cc: intel-gfx

On 17/06/15 13:41, Daniel Vetter wrote:
> On Wed, Jun 17, 2015 at 02:22:19PM +0200, Daniel Vetter wrote:
>> On Wed, Jun 17, 2015 at 09:20:44AM +0100, Dave Gordon wrote:
>>> On 16/06/15 10:24, Chris Wilson wrote:
>>>> On Mon, Jun 15, 2015 at 07:36:30PM +0100, Dave Gordon wrote:
>>>>> +static void direct_interrupts_to_guc(struct drm_i915_private *dev_priv)
>>>>> +{
>>>>> +	struct intel_engine_cs *ring;
>>>>> +	int i, irqs;
>>>>> +
>>>>> +	/* tell all command streamers to forward interrupts and vblank to GuC */
>>>>> +	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_ALWAYS);
>>>>> +	irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
>>>>> +	for_each_ring(ring, dev_priv, i)
>>>>> +		I915_WRITE(RING_MODE_GEN7(ring), irqs);
>>>>> +
>>>>> +	/* tell DE to send (all) flip_done to GuC */
>>>>> +	irqs = DERRMR_PIPEA_PRI_FLIP_DONE | DERRMR_PIPEA_SPR_FLIP_DONE |
>>>>> +	       DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEB_SPR_FLIP_DONE |
>>>>> +	       DERRMR_PIPEC_PRI_FLIP_DONE | DERRMR_PIPEC_SPR_FLIP_DONE;
>>>>> +	/* Unmasked bits will cause GuC response message to be sent */
>>>>> +	I915_WRITE(DE_GUCRMR, ~irqs);
>>>>
>>>> That's scary since userspace depends on a few more DERRMR events
>>>> (wait-for-scanline). Where will they end up?
>>>> -Chris
>>>
>>> This doesn't change any bits in DE_RRMR, or set the VBLANK or SCANLINE
>>> bits in the DE_GUCRMR, so those events should be unaffected. The GuC
>>> isn't interested in those, only in flip done.
>>
>> Why does the guc care about flip_done? With atomic it'll get exactly none
>> of those, ever ...
> 
> Well I forgot that mmio writes also generate interrupts. Still strange
> that GuC is interested in this. Would be really interesting to know what
> GuC is up to here.
> -Daniel

Maybe Alex knows ... otherwise we can ask the GuC f/w team ...

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-22 12:37                   ` Chris Wilson
@ 2015-06-23 16:54                     ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-23 16:54 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On 22/06/15 13:37, Chris Wilson wrote:
> On Mon, Jun 22, 2015 at 12:59:00PM +0100, Dave Gordon wrote:
>> On 19/06/15 09:44, Chris Wilson wrote:
>>> On Thu, Jun 18, 2015 at 07:07:46PM +0100, Dave Gordon wrote:
>>>> On 18/06/15 13:10, Chris Wilson wrote:
>>>>> On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
>>>>>> On 17/06/15 13:02, Daniel Vetter wrote:
>>>>>>> Domain handling is required for all gem objects, and the resulting bugs if
>>>>>>> you don't for one-off objects are absolutely no fun to track down.
>>>>>>
>>>>>> Is it not the case that the new object returned by
>>>>>> i915_gem_alloc_object() is
>>>>>> (a) of a type that can be mapped into the GTT, and
>>>>>> (b) initially in the CPU domain for both reading and writing?
>>>>>>
>>>>>> So AFAICS the allocate-and-fill function I'm describing (to appear in
>>>>>> next patch series respin) doesn't need any further domain handling.
>>>>>
>>>>> A i915_gem_object_create_from_data() is a reasonable addition, and I
>>>>> suspect it will make the code a bit more succinct.
>>>>
>>>> I shall adopt this name for it :)
>>>>
>>>>> Whilst your statement is true today, calling set_domain is then a no-op,
>>>>> and helps document how you use the object and so reduces the likelihood
>>>>> of us introducing bugs in the future.
>>>>> -Chris
>>>>
>>>> So here's the new function ... where should the set-to-cpu-domain go?
>>>> After the pin_pages and before the sg_copy_from_buffer?
>>>
>>> Either, since the domain will not change whilst you have the lock,
>>> but if you do it before get_pages() you will have a slightly easier
>>> error path.
>>
>> OK, call to i915_gem_object_set_to_cpu_domain(obj, true) added right
>> after the i915_gem_alloc_object(); also, since we now have multiple
>> failure paths where the ability to distinguish them might be useful (and
>> since this function is a public addition to the gem_object repertoire),
>> I've made it return object-or-error-code, with the incomplete-copy case
>> returning ERR_PTR(-EIO).
> 
> I'd stick to only using EIO when we have a GPU failure. Incomplete copy
> is EFAULT.

Done, it now returns -EFAULT on incomplete copy.

>>> Part of the reason why I want a function like this is so that I can
>>> replace it with a stolen object and so need to write the data through a
>>> temporary GGTT mapping. Speak now if you need more flags to the function
>>> to prevent certain classes of objects being created.
>>> -Chris
>>
>> For the GuC, the firmware image is written once by the CPU and
>> thereafter read only by the DMA engine via a GGTT address; other uC
>> devices might have different requirements e.g. the CSR/DMC doesn't have
>> a DMA engine AFAIK and the f/w is transferred to the h/w via MMIO writes
>> by the CPU. The primary reason for storing the image in a GEM object
>> (rather than kmalloc'd space, as the DMC loader does) is to make it
>> pageable; it's needed multiple times, as we have to reload the f/w after
>> reset or on exit from low-power states, but not used the rest of the
>> time. So the existing objects seem a good match.
>>
>> The GuC's pool and log objects, OTOH, must be permanently resident in
>> RAM and permanently mapped via the GGTT above GUC_WOPCM_OFFSET. So for
>> these it would be useful to have an allocator that *didn't* make the
>> object shmfs-backed.
> 
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=3f74a251aa9a3e5c1215226f7857ed53693c563f
> 
> Though I want to use stolen as much as is practically possible, however
> without direct CPU access, stolen is very much an idle fancy.
> -Chris

Yes, that looks useful ...

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/15] drm/i915: Add GuC-related header files
  2015-06-17 15:01     ` Dave Gordon
@ 2015-06-23 18:10       ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-23 18:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 17/06/15 16:01, Dave Gordon wrote:
> On 15/06/15 21:20, Chris Wilson wrote:
>> On Mon, Jun 15, 2015 at 07:36:22PM +0100, Dave Gordon wrote:
>>> From: Alex Dai <yu.dai@intel.com>
>>>
>>> intel_guc_api.h contains the subset of the GuC interface that we
>>> will need for submission of commands through the GuC. These MUST
>>> be kept in sync with the definitions used by the GuC firmware.
>>
>> intel_guc_hw.h or intel_guc_abi.h then. Calling it API doesn't make it
>> clear whose API you are talking about.
> 
> It's not 'hw' -- the hw register definitions are elsewhere, because they
> don't depend on the firmware. What it defines is a set of interfaces
> between the GuC firmware and the KMD, so I'll rename it to reflect that
> ("intel_guc_fwif.h", for FirmWareInterFace).
> 
>>> intel_guc.h defines structures and parameters relevant to loading
>>> the GuC firmware and setting it running. Some of these also need
>>> to be kept in sync with the firmware.
>>
>> intel_guc.h should be developed organically as features are added in the
>> series so that it is possible to track against implementation.
> 
>> Certainly not in a patch that adds the entirety of the firmware ABI.
> 
> What may not be obvious is that intel_guc_api.h (or intel_guc_fwif.h, as
> I'm now going to call it) is autogenerated from the non-Linux-friendly
> version actually used in building the GuC firmware. (Or at least, that's
> the PoR; in practice Alex has hacked^W hand-tuned this version.) So it
> makes no sense to break it into parts.
> 
> We /could/ do that with the purely KMD-defined structures in intel_guc.h
> such as intel_guc, and /maybe/ i915_guc_client. OTOH when it's a new
> file, containing a new structure, it's easier to see that the layout is
> sensible when it's all added in one go, rather than repeatedly adding
> bits here and there, especially if the logical order of fields in a
> structure isn't going to be the same as the order of addition of the
> code that uses them.
> 
> I'll see how it looks ...
> 
> .Dave.

So ... I've renamed the sort-of-autogenerated header containing f/w i/f
definitions, to intel_guc_fwif.h, and put all the invariant h/w-related
defines into i915_guc_reg.h (which could be merged into i915_reg.h if
required, but for now it's easier to keep them separate). There is
probably unused stuff in both of these but there's really no point in
removing it, as we may just have to add it back someday.

This leaves only the driver-defined data structures & related stuff in
intel_guc.h, which is now delivered in three tranches (loader, ctx pool
setup, client).

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/15] drm/i915: Interrupt routing for GuC submission
  2015-06-23 11:33           ` Dave Gordon
@ 2015-06-23 23:48             ` Yu Dai
  2015-06-24 10:02               ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Yu Dai @ 2015-06-23 23:48 UTC (permalink / raw)
  To: Dave Gordon, Daniel Vetter; +Cc: intel-gfx



On 06/23/2015 04:33 AM, Dave Gordon wrote:
> On 17/06/15 13:41, Daniel Vetter wrote:
> > On Wed, Jun 17, 2015 at 02:22:19PM +0200, Daniel Vetter wrote:
> >> On Wed, Jun 17, 2015 at 09:20:44AM +0100, Dave Gordon wrote:
> >>> On 16/06/15 10:24, Chris Wilson wrote:
> >>>> On Mon, Jun 15, 2015 at 07:36:30PM +0100, Dave Gordon wrote:
> >>>>> +static void direct_interrupts_to_guc(struct drm_i915_private *dev_priv)
> >>>>> +{
> >>>>> +	struct intel_engine_cs *ring;
> >>>>> +	int i, irqs;
> >>>>> +
> >>>>> +	/* tell all command streamers to forward interrupts and vblank to GuC */
> >>>>> +	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_ALWAYS);
> >>>>> +	irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
> >>>>> +	for_each_ring(ring, dev_priv, i)
> >>>>> +		I915_WRITE(RING_MODE_GEN7(ring), irqs);
> >>>>> +
> >>>>> +	/* tell DE to send (all) flip_done to GuC */
> >>>>> +	irqs = DERRMR_PIPEA_PRI_FLIP_DONE | DERRMR_PIPEA_SPR_FLIP_DONE |
> >>>>> +	       DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEB_SPR_FLIP_DONE |
> >>>>> +	       DERRMR_PIPEC_PRI_FLIP_DONE | DERRMR_PIPEC_SPR_FLIP_DONE;
> >>>>> +	/* Unmasked bits will cause GuC response message to be sent */
> >>>>> +	I915_WRITE(DE_GUCRMR, ~irqs);
> >>>>
> >>>> That's scary since userspace depends on a few more DERRMR events
> >>>> (wait-for-scanline). Where will they end up?
> >>>> -Chris
> >>>
> >>> This doesn't change any bits in DE_RRMR, or set the VBLANK or SCANLINE
> >>> bits in the DE_GUCRMR, so those events should be unaffected. The GuC
> >>> isn't interested in those, only in flip done.
> >>
> >> Why does the guc care about flip_done? With atomic it'll get exactly none
> >> of those, ever ...
> >
> > Well I forgot that mmio writes also generate interrupts. Still strange
> > that GuC is interested in this. Would be really interesting to know what
> > GuC is up to here.
> > -Daniel
>
> Maybe Alex knows ... otherwise we can ask the GuC f/w team ...
>
The SLPC (Single Loop Power Control) within GuC needs these. However, to 
enable it or not is yet determined because architecture review is not done.

Alex
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/15] drm/i915: Add GuC-related header files
  2015-06-15 20:20   ` Chris Wilson
  2015-06-17 15:01     ` Dave Gordon
@ 2015-06-24  7:41     ` Dave Gordon
  2015-06-24  9:37       ` Daniel Vetter
  1 sibling, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-24  7:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 15/06/15 21:20, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:22PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> intel_guc_api.h contains the subset of the GuC interface that we
>> will need for submission of commands through the GuC. These MUST
>> be kept in sync with the definitions used by the GuC firmware.
> 
> intel_guc_hw.h or intel_guc_abi.h then. Calling it API doesn't make it
> clear whose API you are talking about.
>  
>> intel_guc.h defines structures and parameters relevant to loading
>> the GuC firmware and setting it running. Some of these also need
>> to be kept in sync with the firmware.
> 
> intel_guc.h should be developed organically as features are added in the
> series so that it is possible to track against implementation. Certainly
> not in a patch that adds the entirety of the firmware ABI.

Done.

>> +struct i915_guc_client {
>> +	spinlock_t wq_lock;
>> +	struct drm_i915_gem_object *client_obj;
>> +	u32 priority;
>> +	off_t doorbell_offset;
>> +	off_t proc_desc_offset;
>> +	off_t wq_offset;
>> +	uint16_t doorbell_id;
>> +	uint32_t ctx_index;
>> +	uint32_t wq_size;
>> +	uint32_t wq_tail;
>> +	uint32_t cookie;
>> +
>> +	/* GuC submission statistics & status */
>> +	uint64_t submissions;
>> +	uint32_t q_fail;
>> +	uint32_t b_fail;
>> +	int retcode;
> 
> Mixture of classic kernel types and stdint types. And off_t! What size
> exactly do you mean there?

All converted to stdint now.

>> +};
>> +
>> +#define I915_MAX_DOORBELLS	256
>> +#define INVALID_DOORBELL_ID	I915_MAX_DOORBELLS
>> +
>> +#define INVALID_CTX_ID		(MAX_GUC_GPU_CONTEXTS+1)
>> +
>> +struct intel_guc {
>> +	/* Generic uC firmware management */
>> +	struct intel_uc_fw guc_fw;
> 
> Haven't checked for size, but I guess this is going to be an init only
> structure that we could discard.

No, we need it for reloading. It's not very big anyway.

>> +	/* GuC-specific additions */
>> +	uint32_t fw_ver_major;
>> +	uint32_t fw_ver_minor;
> 
> I have no idea why you would want to keep these around.

Firstly, they're used to pass the designation for the version of the f/w
interface that this revision of the driver can work with.

Secondly we want to print them in messages, so I've split these into
(major,minor) that we wanted and (major,minor) that we found. These were
already printed in a NOTICE, but not in the debugfs output.

>> +	spinlock_t host2guc_lock;
> 
> Seems overly specific, no comment as to what it locks and lack of
> implementation to be able to confirm.

Now reorganised so it's adjacent to the protected data :)

>> +	struct drm_i915_gem_object *ctx_pool_obj;
>> +	struct drm_i915_gem_object *log_obj;
>> +	struct i915_guc_client *execbuf_client;
> 
> I expect these will want modification based on patches to be reviewed.

Now added in separate patches.

>> +	struct ida ctx_ids;
>> +	uint32_t log_flags;
>> +	int db_cacheline;
>> +	DECLARE_BITMAP(doorbell_bitmap, I915_MAX_DOORBELLS);
>> +
>> +	/* Action status & statistics */
>> +	uint64_t action_count;		/* Total commands issued	*/
>> +	uint32_t action_cmd;		/* Last command word		*/
>> +	uint32_t action_status;		/* Last return status		*/
>> +	uint32_t action_fail;		/* Total number of failures	*/
>> +	int32_t action_err;		/* Last error code		*/
> 
> Any group of prefix_ immediately raises the question of "why isn't this
> a struct?"
> -Chris

Not really worth making and naming a struct. There's only one instance
of this whole thing; the code that updates these touches them
individually, and the debugfs code that prints them can't really make
use of them collectively either.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/15] drm/i915: Debugfs interface for GuC submission statistics
  2015-06-16  9:28   ` Chris Wilson
@ 2015-06-24  8:27     ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-24  8:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 16/06/15 10:28, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:32PM +0100, Dave Gordon wrote:
>> This provides a means of reading status and counts relating
>> to GuC actions and submissions.
> 
> Anything that ends to ease debugging also tends to ease
> postmortem error analysis...

So maybe someday we'll add GuC info to an error dump, though I haven't
yet seen any cases where it would have helped. We'll file this under
"future enhancements".

The GuC debugfs files remain accessible even when the GPU is hung, so
one can already capture the GuC statistics alongside the error dump.

>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_debugfs.c |   41 +++++++++++++++++++++++++++++++++++
>>  1 file changed, 41 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index c6e2582..e699b38 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -2388,6 +2388,46 @@ static int i915_guc_load_status_info(struct seq_file *m, void *data)
>>  	return 0;
>>  }
>>  
>> +static int i915_guc_info(struct seq_file *m, void *data)
>> +{
>> +	struct drm_info_node *node = m->private;
>> +	struct drm_device *dev = node->minor->dev;
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +	struct intel_guc guc;
>> +	struct i915_guc_client client = { .client_obj = 0 };
>> +
>> +	if (!HAS_GUC_SCHED(dev_priv->dev))
>> +		return 0;
>> +
>> +	/* Take a local copy of the GuC data, so we can dump it at leisure */
>> +	spin_lock(&dev_priv->guc.host2guc_lock);
>> +	guc = dev_priv->guc;
>> +	if (guc.execbuf_client) {
>> +		spin_lock(&guc.execbuf_client->wq_lock);
>> +		client = *guc.execbuf_client;
>> +		spin_unlock(&guc.execbuf_client->wq_lock);
>> +	}
>> +	spin_unlock(&dev_priv->guc.host2guc_lock);
>> +
>> +	seq_printf(m, "GuC total action count: %llu\n", guc.action_count);
>> +	seq_printf(m, "GuC last action command: 0x%x\n", guc.action_cmd);
>> +	seq_printf(m, "GuC last action status: 0x%x\n", guc.action_status);
>> +
>> +	seq_printf(m, "GuC action failure count: %u\n", guc.action_fail);
>> +	seq_printf(m, "GuC last action error code: %d\n", guc.action_err);
> 
> If these had been a struct you could have minimised that copy.

We needed to copy some other parts of the "struct intel_guc" anyway, in
particular the execbuf_client pointer. And anything else we might choose
to print in future, such as the ctx or doorbell bitmaps.

> Again, it would have been best if the debug inteface had been added all
> at once, so we could take the extra infrastructure or leave it out
> altogether.

Well, no. You argued for stuff to be added to the structs in the header
files incrementally, so debugfs dumping has to be added in parallel. So
there are /two/ debugfs interfaces, each added in a separate patch. The
first relates only to the loading process; the second to /use/ of the
GuC. You can still leave both out if you choose.

>> +	seq_printf(m, "\nGuC execbuf client @ %p:\n", guc.execbuf_client);
>> +	seq_printf(m, "\tTotal submissions: %llu\n", client.submissions);
>> +	seq_printf(m, "\tFailed to queue: %u\n", client.q_fail);
>> +	seq_printf(m, "\tFailed doorbell: %u\n", client.b_fail);
>> +	seq_printf(m, "\tLast submission result: %d\n", client.retcode);
>> +
>> +	/* Add more as required ... */
>> +	seq_puts(m, "\n");
> 
> Trailing newline, why?
> -Chris

Looks prettier when I cat i915_guc* in the debugfs directory. Also so
it's ready for "adding more as required" :) But I've taken it away again
for now ...

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-18 18:28             ` Dave Gordon
@ 2015-06-24  9:32               ` Daniel Vetter
  2015-06-25 12:28                 ` Dave Gordon
  2015-06-24  9:40               ` Chris Wilson
  1 sibling, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-24  9:32 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 07:28:26PM +0100, Dave Gordon wrote:
> On 18/06/15 15:31, Daniel Vetter wrote:
> > On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
> >> On 17/06/15 13:02, Daniel Vetter wrote:
> >>> On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote:
> >>>> On 15/06/15 21:09, Chris Wilson wrote:
> >>>>> On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote:
> >>>>>> From: Alex Dai <yu.dai@intel.com>
> >>>>>>
> >>>>>> i915_gem_object_write() is a generic function to copy data from a plain
> >>>>>> linear buffer to a paged gem object.
> >>>>>>
> >>>>>> We will need this for the microcontroller firmware loading support code.
> >>>>>>
> >>>>>> Issue: VIZ-4884
> >>>>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
> >>>>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >>>>>> ---
> >>>>>>  drivers/gpu/drm/i915/i915_drv.h |    2 ++
> >>>>>>  drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
> >>>>>>  2 files changed, 30 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>> index 611fbd8..9094c06 100644
> >>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>> @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
> >>>>>>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
> >>>>>>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
> >>>>>>  			 const struct drm_i915_gem_object_ops *ops);
> >>>>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
> >>>>>> +			  const void *data, size_t size);
> >>>>>>  struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
> >>>>>>  						  size_t size);
> >>>>>>  void i915_init_vm(struct drm_i915_private *dev_priv,
> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>>>> index be35f04..75d63c2 100644
> >>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>>>> @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
> >>>>>>  	return false;
> >>>>>>  }
> >>>>>>  
> >>>>>> +/* Fill the @obj with the @size amount of @data */
> >>>>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
> >>>>>> +			const void *data, size_t size)
> >>>>>> +{
> >>>>>> +	struct sg_table *sg;
> >>>>>> +	size_t bytes;
> >>>>>> +	int ret;
> >>>>>> +
> >>>>>> +	ret = i915_gem_object_get_pages(obj);
> >>>>>> +	if (ret)
> >>>>>> +		return ret;
> >>>>>> +
> >>>>>> +	i915_gem_object_pin_pages(obj);
> >>>>>
> >>>>> You don't set the object into the CPU domain, or instead manually handle
> >>>>> the domain flushing. You don't handle objects that cannot be written
> >>>>> directly by the CPU, nor do you handle objects whose representation in
> >>>>> memory is not linear.
> >>>>> -Chris
> >>>>
> >>>> No we don't handle just any random gem object, but we do return an error
> >>>> code for any types not supported. However, as we don't really need the
> >>>> full generality of writing into a gem object of any type, I will replace
> >>>> this function with one that combines the allocation of a new object
> >>>> (which will therefore definitely be of the correct type, in the correct
> >>>> domain, etc) and filling it with the data to be preserved.
> >>
> >> The usage pattern for the particular case is going to be:
> >> 	Once-only:
> >> 		Allocate
> >> 		Fill
> >> 	Then each time GuC is (re-)initialised:
> >> 		Map to GTT
> >> 		DMA-read from buffer into GuC private memory
> >> 		Unmap
> >> 	Only on unload:
> >> 		Dispose
> >>
> >> So our object is write-once by the CPU (and that's always the first
> >> operation), thereafter read-occasionally by the GuC's DMA engine.
> > 
> > Yup. The problem is more that on atom platforms the objects aren't
> > coherent by default and generally you need to do something. Hence we
> > either have
> > - an explicit set_caching call to document that this is a gpu object which
> >   is always coherent (so also on chv/bxt), even when that's a no-op on big
> >   core
> > - or wrap everything in set_domain calls, even when those are no-ops too.
> > 
> > If either of those lack, reviews tend to freak out preemptively and the
> > reptil brain takes over ;-)
> > 
> > Cheers, Daniel
> 
> We don't need "coherency" as such. The buffer is filled (once only) by
> the CPU (so I should put a set-to-cpu-domain between the allocate and
> fill stages?) Once it's filled, the CPU need not read or write it ever
> again.
> 
> Then before the DMA engine accesses it, we call i915_gem_obj_ggtt_pin,
> which I'm assuming will take care of any coherency issues (making sure
> the data written by the CPU is now visible to the DMA engine) when it
> puts the buffer into the GTT-readable domain. Is that not sufficient?

Pinning is orthogonal to coherency, it'll just make sure the backing
storage is there. set_domain(CPU) before writing and set_domain(GTT)
before each upload to the guc using the hw copy thing would be prudent.
The coherency tracking should no-op out any calls which aren't needed for
you.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/15] drm/i915: Add GuC-related header files
  2015-06-24  7:41     ` Dave Gordon
@ 2015-06-24  9:37       ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2015-06-24  9:37 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Wed, Jun 24, 2015 at 08:41:02AM +0100, Dave Gordon wrote:
> On 15/06/15 21:20, Chris Wilson wrote:
> >> +	struct ida ctx_ids;
> >> +	uint32_t log_flags;
> >> +	int db_cacheline;
> >> +	DECLARE_BITMAP(doorbell_bitmap, I915_MAX_DOORBELLS);
> >> +
> >> +	/* Action status & statistics */
> >> +	uint64_t action_count;		/* Total commands issued	*/
> >> +	uint32_t action_cmd;		/* Last command word		*/
> >> +	uint32_t action_status;		/* Last return status		*/
> >> +	uint32_t action_fail;		/* Total number of failures	*/
> >> +	int32_t action_err;		/* Last error code		*/
> > 
> > Any group of prefix_ immediately raises the question of "why isn't this
> > a struct?"
> > -Chris
> 
> Not really worth making and naming a struct. There's only one instance
> of this whole thing; the code that updates these touches them
> individually, and the debugfs code that prints them can't really make
> use of them collectively either.

We have a lot of single-instance structs all over the place to group
related data around. It imo does help a lot, but yeah might be on the
fence here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-18 18:28             ` Dave Gordon
  2015-06-24  9:32               ` Daniel Vetter
@ 2015-06-24  9:40               ` Chris Wilson
  1 sibling, 0 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-24  9:40 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 07:28:26PM +0100, Dave Gordon wrote:
> We don't need "coherency" as such. The buffer is filled (once only) by
> the CPU (so I should put a set-to-cpu-domain between the allocate and
> fill stages?) Once it's filled, the CPU need not read or write it ever
> again.
> 
> Then before the DMA engine accesses it, we call i915_gem_obj_ggtt_pin,
> which I'm assuming will take care of any coherency issues (making sure
> the data written by the CPU is now visible to the DMA engine) when it
> puts the buffer into the GTT-readable domain. Is that not sufficient?

No. pin just ensures that there is a binding for the object in the
appropriate VM and then increments the vma's pin_count to make sure it
can not be relinquished until we say so. That is we often do want
multiple mappings of an object in different VM, and direct access from
the CPU (i.e. in the CPU domain whilst bound to the GPU).

To ensure that is ready for access by the GPU, you need to set it to the
appropriate domain prior to that access.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 12/15] drm/i915: Interrupt routing for GuC submission
  2015-06-23 23:48             ` Yu Dai
@ 2015-06-24 10:02               ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2015-06-24 10:02 UTC (permalink / raw)
  To: Yu Dai; +Cc: intel-gfx

On Tue, Jun 23, 2015 at 04:48:11PM -0700, Yu Dai wrote:
> 
> 
> On 06/23/2015 04:33 AM, Dave Gordon wrote:
> >On 17/06/15 13:41, Daniel Vetter wrote:
> >> On Wed, Jun 17, 2015 at 02:22:19PM +0200, Daniel Vetter wrote:
> >>> On Wed, Jun 17, 2015 at 09:20:44AM +0100, Dave Gordon wrote:
> >>>> On 16/06/15 10:24, Chris Wilson wrote:
> >>>>> On Mon, Jun 15, 2015 at 07:36:30PM +0100, Dave Gordon wrote:
> >>>>>> +static void direct_interrupts_to_guc(struct drm_i915_private *dev_priv)
> >>>>>> +{
> >>>>>> +	struct intel_engine_cs *ring;
> >>>>>> +	int i, irqs;
> >>>>>> +
> >>>>>> +	/* tell all command streamers to forward interrupts and vblank to GuC */
> >>>>>> +	irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK, GFX_FORWARD_VBLANK_ALWAYS);
> >>>>>> +	irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
> >>>>>> +	for_each_ring(ring, dev_priv, i)
> >>>>>> +		I915_WRITE(RING_MODE_GEN7(ring), irqs);
> >>>>>> +
> >>>>>> +	/* tell DE to send (all) flip_done to GuC */
> >>>>>> +	irqs = DERRMR_PIPEA_PRI_FLIP_DONE | DERRMR_PIPEA_SPR_FLIP_DONE |
> >>>>>> +	       DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEB_SPR_FLIP_DONE |
> >>>>>> +	       DERRMR_PIPEC_PRI_FLIP_DONE | DERRMR_PIPEC_SPR_FLIP_DONE;
> >>>>>> +	/* Unmasked bits will cause GuC response message to be sent */
> >>>>>> +	I915_WRITE(DE_GUCRMR, ~irqs);
> >>>>>
> >>>>> That's scary since userspace depends on a few more DERRMR events
> >>>>> (wait-for-scanline). Where will they end up?
> >>>>> -Chris
> >>>>
> >>>> This doesn't change any bits in DE_RRMR, or set the VBLANK or SCANLINE
> >>>> bits in the DE_GUCRMR, so those events should be unaffected. The GuC
> >>>> isn't interested in those, only in flip done.
> >>>
> >>> Why does the guc care about flip_done? With atomic it'll get exactly none
> >>> of those, ever ...
> >>
> >> Well I forgot that mmio writes also generate interrupts. Still strange
> >> that GuC is interested in this. Would be really interesting to know what
> >> GuC is up to here.
> >
> >Maybe Alex knows ... otherwise we can ask the GuC f/w team ...
> >
> The SLPC (Single Loop Power Control) within GuC needs these. However, to
> enable it or not is yet determined because architecture review is not done.

Well if guc needs to know about display activity, the kernel can tell it
directly. And it could even tell the guc when we've missed a frame, which
is something the guc has absolutely no idea about since with atomic that's
all implemented on the kernel side and never goes through the rings.

Sounds like a "not engineerd for linux kernel" feature :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open
  2015-06-19  9:19     ` Dave Gordon
@ 2015-06-24 10:15       ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2015-06-24 10:15 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jun 19, 2015 at 10:19:04AM +0100, Dave Gordon wrote:
> On 17/06/15 13:18, Daniel Vetter wrote:
> > On Mon, Jun 15, 2015 at 07:36:25PM +0100, Dave Gordon wrote:
> >> In order to fully initialise the default contexts, we have to execute
> >> batchbuffer commands on the GPU engines. But in the case of GuC-based
> >> batch submission, we can't do that until any required firmware has
> >> been loaded, which may not be possible during driver load, because the
> >> filesystem(s) containing the firmware may not be mounted until later.
> >>
> >> Therefore, we now allow the first call to the firmware-loading code to
> >> return -EAGAIN to indicate that it's not yet ready, and that it should
> >> be retried when the device is first opened from user code, by which
> >> time we expect that all required filesystems will have been mounted.
> >> The late-retry code will then re-attempt to load the firmware if the
> >> early attempt failed.
> >>
> >> If the late retry fails, the current open-in-progress will fail, but
> >> the recovery code will disable GuC submission and reset the GPU and
> >> driver. The next open will therefore be in non-GuC mode, and will be
> >> allowed to complete even if the GuC cannot be loaded or used.
> >>
> >> Issue: VIZ-4884
> >> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >> Signed-off-by: Alex Dai <yu.dai@intel.com>
> > 
> > I'm not really sold on this super-flexible fallback scheme implemented
> > here. Because such fallback schemes means more code to test (which no on
> > will do likely) or just even bigger fireworks when we actually hit them in
> > reality when something goes wrong. Imo if anything goes wrong in the setup
> > we just throw in the towel and fail the driver loading.
> 
> Firstly, GuC submission is an OPTION. That means we already have code to
> work with or without a GuC. The fallback just allows us to keep going
> after finding that although GuC submission has been requested, and we do
> have a GuC, nonetheless the request cannot be satisfied. That's no
> different from automatically disabling PPGTT or execlist mode if they're
> requested on platforms where we don't support them.

It is since we do the automatic ppgtt/execlist/whatever disabling decision
once at driver load and then stick to it. Well you can change it sometimes
at runtime it might work, but it's not something we test or recommend - it
autotaints the kernel even when you just touch these options.

> > There's only one exception: If something fails with GT init we declare the
> > gpu wedged but proceed with all the modeset setup. This makes sense
> > because we need all the code to handle a wedge gpu anyway, dead-on-boot
> > gpus happen occasionally and it's really not nice to greet the user with a
> > black screen. But more fallbacks are imo just headache.
> > 
> > Hence when the guc fails we imo really shouldn't bother with fallbacks,
> > but instead just declare the thing wedged and carry on.
> 
> So the strategy here is exactly the same as for GT init; declare the GPU
> wedged, but after disabling GuC mode. The recovery will then get us into
> the same state as if there were no GuC, or GuC mode had not been
> selected in the first place. We can't switch between GuC and execlists
> arbitrarily; the only switchover is from GuC to non-GuC, and it can only
> happen ONCE.

The existing wedged logic is a terminal state (except when developers
reset it through debugfs). There's no automatic recover/fallback ever if
we can't get the gpu up&running in the mode we want it to run in.

> To test this is easy; just rename your firmware blob so the driver can't
> find it and reboot. It should automatically run in execlist mode, with a
> log message telling you what went wrong (f/w file not found). Much nicer
> than your screen staying blank because you upgraded the driver and not
> the firmware, or vice versa.

The screen will not stay blank since we'll still enable the modeset driver
of i915, and at least basic userspace drivers know how to fall back to sw
rendering. The entire point of declaring the gpu wedged if init fails is
to increase the chances that we can get a bug report.

> > That should also allow us to simplify the firmware loading: We can do that
> > in an async worker and if the blob isn't there in time then we just move
> > on.
> > -Daniel
> 
> Under no circumstances can you ever load the firmware from an async
> worker thread, because Bad Things Will Happen if there is hardware
> activity already in progress when the GuC f/w starts up.

Whether you load the firmware through an async work item in a kernel
thread or from a userspace process (in open) doesn't materially change
things at all - it's concurrent and you need to cope with it. And
dev->struct_mutex is a big lock (way too big and one of the most serious
if not the worst piece of technical debt we carry around), but it does not
protect against concurrent access to the hardware for everything.

The upside of doing the init in an explicit async worker is that it's
explicit, looks scary and you don't have any illusions about it ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-19  8:43         ` Dave Gordon
@ 2015-06-24 10:29           ` Daniel Vetter
  2015-07-06 12:44             ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-24 10:29 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jun 19, 2015 at 09:43:11AM +0100, Dave Gordon wrote:
> On 18/06/15 15:49, Daniel Vetter wrote:
> > On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
> >> On 17/06/15 13:05, Daniel Vetter wrote:
> >>> On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
> >>>> Current devices may contain one or more programmable microcontrollers
> >>>> that need to have a firmware image (aka "binary blob") loaded from an
> >>>> external medium and transferred to the device's memory.
> >>>>
> >>>> This file provides generic support functions for doing this; they can
> >>>> then be used by each uC-specific loader, thus reducing code duplication
> >>>> and testing effort.
> >>>>
> >>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
> >>>
> >>> Given that I'm just shredding the synchronization used by the dmc loader
> >>> I'm not convinced this is a good idea. Abstraction has cost, and a bit of
> >>> copy-paste for similar sounding but slightly different things doesn't
> >>> sound awful to me. And the critical bit in all the firmware loading I've
> >>> seen thus far is in synchronizing the loading with other operations,
> >>> hiding that isn't a good idea. Worse if we enforce stuff like requiring
> >>> dev->struct_mutex.
> >>> -Daniel
> >>
> >> It's precisely because it's in some sense "trivial-but-tricky" that we
> >> should write it once, get it right, and use it everywhere. Copypaste
> >> /does/ sound awful; I've seen how the code this was derived from had
> >> already been cloned into three flavours, all different and all wrong.
> >>
> >> It's a very simple abstraction: one early call to kick things off as
> >> early as possible, no locking required. One late call with the
> >> struct_mutex held to complete the synchronisation and actually do the
> >> work, thus guaranteeing that the transfer to the target uC is done in a
> >> controlled fashion, at a time of the caller's choice, and by the
> >> driver's mainline thread, NOT by an asynchronous thread racing with
> >> other activity (which was one of the things wrong with the original
> >> version).
> > 
> > Yeah I've seen the origins of this in the display code, and that code gets
> > the syncing wrong. The only thing that one has do to is grab a runtime pm
> > reference for the appropriate power well to prevent dc5 entry, and release
> > it when the firmware is loaded and initialized.
> 
> Agreed.
> 
> > Which means any kind of firmware loader which requires/uses
> > dev->struct_mutex get stuff wrong and is not appropriate everywhere.
> 
> BUT, the loading of the firmware into any uC MUST be done in a
> controlled manner i.e. at a time when no other thread is touching the
> h/w. Otherwise the f/w load and whatever else is concurrently accessing
> the h/w could in some cases interfere disastrously. Examples of
> interference might be:
> 
> * interleaved accesses to the ELSP (in the case of the GuC)
> * incorrect handover of power management (DMC, GuC)
> * erroneous management of forcewake state
> 
> In general the f/w that is just starting on the uC may have certain
> expectations about the initial state of the h/w, which may not be met if
> other threads are accessing various bits of h/w while the uC is booting up.
> 
> So we absolutely need to guarantee that the f/w load is done by a thread
> which has exclusive ownership of any bit of the h/w that the f/w is
> going to make assumptions about. With the current locking structure of
> the driver, that means holding the struct_mutex (it shouldn't really,
> there should be a separate mutex for h/w register access vs.
> driver-private data structures, but there isn't).

If you really need this guarantee (and I seriously hope not) then the only
option is a synchronous firmware load at driver init _before_ we launch
any of the asynchronous setup code. And there is already a lot of that,
and we're adding more all the time.

What I expect we need is synchronization of just the revelant part with
the firmware loading, which necessarily needs to be somewhat async to be
able to support cros/android requirements. And yes that needs to be done
in a controlled manner, but most likely we need very specific solutions
for the problem at hand. Unconditionally holding dev->struct_mutex isn't
that solution.

The other problem with dev->struct_mutex is that it's a giantic lock with
ill defined coverage and semantics. It's imo the biggest piece of
technical debt we carry around in i915.ko, and we pay the price for that
dearly&daily. Which means that since a few years any kind of code
which extended dev->struct_mutex to anything not clearly core gem data
structures was rejected.

> >> We should convert the DMC loader to use this too, so there need be only
> >> one bit of code in the whole driver that needs to understand how to use
> >> completions to get correct handover from a free-running no-locks-held
> >> thread to the properly disciplined environment of driver mainline for
> >> purposes of programming the h/w.
> > 
> > Nack on using this for dmc, since I want them to convert it to the above
> > synchronization, since that's how all the other async power initialization
> > is done.
> > 
> > Guc is different since we really must have it ready for execbuf, and for
> > that usecase a completion at drm_open time sounds like the right thing.
> > 
> > As a rule of thumb for refactoring and share infastructure we use the
> > following recipe in drm:
> > - first driver implements things as straightforward as possible
> > - 2nd user copypastes
> > - 3rd one has the duty to figure out whether some refactoring is in order
> >   or not.
> > Imo that approach leads a really good balance between avoiding
> > overengineering and having maintainable code.
> 
> We've already been through these phases; the code has already been
> cloned twice (and then changed, but not enough to fix the problems with
> the original) and then when I found the issues with the GuC loader and
> noticed the hilarious ownership dance it was doing during handover I
> realised it was time to fix it in one place rather than several, and
> posted a patchset to the internal mailing list on 2015-02-24 with this
> commentary:
> 
> > The GuC loader uses an asynchronous thread to fetch the firmware image
> > (aka "binary blob") from a file and load it into the GuC's memory.
> > Unfortunately the GuC loading occurs *after* the internally-generated
> > batches used to initialise contexts have already been submitted using
> > direct access to the ELSP.  Also, the firmware ends up being loaded at
> > an indeterminate time, with consequent potential for confusion in the
> > switchover from ELSP- to GuC-based submission.
> > 
> > This patch series therefore reorganises the GuC loader to ensure that
> > the loading process occurs both early enough and at a well-defined
> > point in the sequence of operations during driver initialisation,
> > specifically *before* any batches are submitted to hardware.
> > 
> > [PATCH 1/3] GuC: reorganise source before rewriting this code
> > [PATCH 2/3] GuC: load firmware image from main thread
> > [PATCH 3/3] GuC: update names & comments ("load" => "fetch")
> 
> followed by [PATCH 0/2] unify and tidy firmware loading code
> on 2015-03-02.
> 
> For the DMC module, the basic conversion process is to separate
> intel_csr_load_program() from finish_csr_load(). The latter would remain
> as the callback in the async thread loading process that has to validate
> the loaded image; the former would then become the callback for the
> synchronous post-handover transfer of the image to the h/w.
> 
> BTW, the existing DMC loader probably won't work on Android :(

Yeah I completely missed out on this fun since I presumed that firmware
loading is easy and simple. And if you look around on other drm drivers it
indeed is, they all use a synchronous request_firmware and if the firmware
isn't there, they just fall over (fully in the case of radeon.ko,
partially in the case of nouveau.ko since they have all the support in
place for handling a kms-only accel-less gpu in userspace anyway, like we
do). But for a bunch of reasons (afaik it's "you can't include a blob in a
gpl-ed kernel image" we need async firmware loading for cros&android).

That leaves us with a situation where we should have done a special design
discussion about asynchronous firmware, but somehow failed do to that.
Which leaves us in a very ugly position.

I talked with a bunch of people over the past few days to figure out how
this is supposed to work and also figure out why it's being done like that
today. I think I have a reasonable good plan for moving forward too. I'll
start a new top-level thread here to discuss this.

Thanks, Daneil
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/15] Batch submission via GuC
  2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
                   ` (16 preceding siblings ...)
  2015-06-17 12:43 ` [PATCH 00/15] Batch submission via GuC Daniel Vetter
@ 2015-06-24 12:16 ` Daniel Vetter
  2015-06-24 12:57   ` Chris Wilson
  17 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-06-24 12:16 UTC (permalink / raw)
  To: Dave Gordon; +Cc: Vinit Azad, intel-gfx, Ben Widawsky

Hi all,

Ok top post on this firmware loading problem. First to note imo is that
off the over 80 request_firmware calls in drm drivers only one (outside of
i915) uses request_firmware_nowait. And none of the other have async
firmware loading implemented with their own driver. That was kinda my
background for all this and why I missed the entire design implications -
I expected that the entirety of the firmware loading problem would boil
down to one request_firmware() call and that's it.

Obviously things are not quite as simple, so I started to dig around.

Looking at the big drivers (nouveau and radeon/amdgpu) they only ever try
to load the firmware once and if it's not there they either fail driver
init outright (amd) or disable parts of it (nouveau can e.g. disable
accelaration). None of them ever bothers with a 2nd attempt at firmware
loading.

I confirmed with systemd/udev developers that synchronous firmware loading
is indeed how modern systems are expected to work. Which also implies that
if you want to load a driver from the initrd, then the firmware must be in
the initrd (which modern linux distros ensure). And if you build the
driver into the kernel then the firmware also must be built-in. Delayed
loading of firmware is considered broken userspace.

The reason for this design switch is that with the delayed loading done
years ago random timeouts when the firmware would never show up routinely
showed up when booting. And on a general-purpose distro there's no way to
know whether the firmware isn't loaded yet because userspace isn't ready
or because it's simply not there. Hence the switch over to immediate
failure and requiring the firmware to be present when the driver is
initializing.

So from that pov the only (and really the only thing) we need for enabling
firmware loading on a normal linux distro is a synchronous
request_firmware call in the driver load function somewhere. This might
need some adjustements to the initrd builders in userspace, but that's all
we need in the kernel.

This is also all we need to enable internal testing, since as long as you
don't ship the resulting image anywhere you can always built-in the
firmware, which also resolves any "firmware isn't there yet" bugs.

Now if you want delayed firmware loading there's still support for that,
hidden behind the FW_LOADER_USER_HELPER_FALLBACK kernel config. Which
because of the unfixable timeouts is disabled by default, and also no
longer supported on the userspace side in modern distros. But the uevent
handler is only 3 lines of bash, so not a big deal to wire up.

Except that for hysterical raisins (one special snowflake driver needs it
like that, and the patch author decided it's easier to have an inconsisten
interface than fix up that driver) this userspace fallback stuff only
works with the request_firmware interface, and _not_ with
request_firmware_nowait. Despite that both run essentially the same code,
_nowait simply wraps it up in an async work for convenience.

That means to be able to use this we need our own async work item and use
request_firmware in there. Converting the implicit async work to an
explicit one is better anyway since it makes it more likely that patch
authors and reviewers don't forget to correctly synchronize/flush with
that async work. DMC loader is a perfect example of why implicit async
callbacks are evil in kernel code.

The problem now is that doing parts of gem init asynchronously is really
scary, and we never had a proper design discussion about that. And
unsurprisingly a lot of the review comments center around that. Otoh we
don't want to block merging guc-based cmd submission just on that.
Therefore my proposal is to rip out the current firmwareloader and
async respectively re-loading code with all it's complications fromm the
current patch series, and replace it with a simple&synchronous
request_firmware. Also no fallbacks to keep everything simple. The actual
guc enabling itself should carry over unchanged.

With that we can unblock everyone (at least internally) and figure out a
proper solutions for async gem init. To kickstart the design discussion
for that a few general and gem-specific things I think are important:

- Locking, async work and coordination should imo all be done explicitly
  in the code by default. Otherwise chances are way to high that critical
  issues get overlooked in the detailed code review. That means an
  explicit schedule_work and for synchronization explicit flush_work. No
  hiding them in frameworks and helper libraries.

- dev->struct_mutex is scary and shouldn't ever be extended. New stuff
  needs new locking. Might not apply here for guc specifically, but just a
  general principle.

- I'm absolutely scared off platform/feature differences in how the
  overall driver load sequence works. Which means if we need async gem
  init for guc, then we will do async gem init for everyone. That means
  improved test coverage for corner-cases and leass headaches to keep the
  unified driver afloat since there's only one way to do things. Of course
  this only applies to hw features actually shared between platforms, e.g.
  if we do async dmc loading that doesn't mean we need to do async loading
  of some other unrelated display feature. GuC is special here because
  cmd submission is the entire point of gem and hence has a big impact.

- If we do async init at driver load time, then we'll also do the same on
  resume and gpu reset, for the same reasons as the cross-platform
  consistency. For GuC async gem init on gpu reset might not make that
  much sense, since gpu reset is pretty much just gem (re) init and it's
  already an async worker. But async gem init on resume definitely makes
  sense.

- A common confusion I see with concurrent code is mixing up data
  consistency (resolved with locking) and handling synchronization and
  lifetimes (resolved with refcounting, completions and other
  synchronization design patterns). There's grey areas and exceptions, but
  I'm fairly picky on this topic. Also a big no-go is rolling your own
  synchronization - we have two of these in i915 (pageflip completion and
  reset handling) and both are still suspect or outright broken after
  years of banging heads against these walls.

- For async gem init it's tempting to repurpose the gpu reset code with
  all the existing synchronization: We'd only need to set the
  RESET_IN_PROGRESS flag synchronously and then run a gpu reset after the
  firmware is loaded, all from the async worker. The problem with that is
  that we still have issues with rogue -EIO escaping into the modeset code
  while a reset is pending, and that would not be good since we want
  modesets to work right away ofc. Also most modeset paths need to stall
  for gpu resets synchronously, which again isn't what we want.

- We should be able to reuse at least the locking scheme gpu reset relies
  on. And since gpu resets run gem_init_hw we should be able to run that
  asynchronously, with some care.

- Beside synchronizing at open time like you're doing now, or in the guts
  of the lrc ctx sumbission code like Chris suggesteds the now merged
  anti-olr also gives us the possibility to sync at request creation. The
  problems I see with that is locking inversions between flush_work and
  dev->struct_mutex. But we could fix that with the ioctl restart handling
  we already have. But that has the same caveats as the -EIO handling -
  our modeset code doesn't cope well with -EAGAIN.

  For me the exact synchronization point with the async gem init work is
  the really big open here.

Anyway that's my thoughts on firmware loading and all that, I hope this is
useful as a basis to get this merged in an efficient way.

Yours, Daniel

On Mon, Jun 15, 2015 at 07:36:18PM +0100, Dave Gordon wrote:
> This patch series enables command submission via the GuC. In this mode,
> instead of the host CPU driving the execlist port directly, it hands
> over work items to the GuC, using a doorbell mechanism to tell the GuC
> that new items have been added to its work queue. The GuC then dispatches
> contexts to the various GPU engines, and manages the resulting context-
> switch interrupts. Completion of a batch is however still signalled to
> the CPU; the GuC is not involved in handling user interrupts.
> 
> There are three subsequences within the patch series:
> 
>   drm/i915: Add i915_gem_object_write() to i915_gem.c
>   drm/i915: Embedded microcontroller (uC) firmware loading support
> 
> These first two patches provide a generic framework for fetching the
> firmware that may be required by any embedded microcontroller from a
> file, using an asynchronous thread so that driver initialisation can
> continue while the firmware is being fetched. It is hoped that this
> framework is sufficiently general that it can be used for all curent
> and future microcontrollers.
> 
>   drm/i915: Add GuC-related module parameters
>   drm/i915: Add GuC-related header files
>   drm/i915: GuC-specific firmware loader
>   drm/i915: Debugfs interface to read GuC load status
> 
> These four patches complete the GuC loader. At this point in the sequence
> we can load and activate the GuC firmware, but not submit any batches
> through it. (This is nonetheless a potentially useful state, as the GuC
> can do other useful work even when not handling batch submissions).
> 
>   drm/i915: Defer default hardware context initialisation until first
>   drm/i915: Move execlists defines from .c to .h
>   drm/i915: GuC submission setup, phase 1
>   drm/i915: Enable GuC firmware log
>   drm/i915: Implementation of GuC client
>   drm/i915: Interrupt routing for GuC submission
>   drm/i915: Integrate GuC-based command submission
>   drm/i915: Debugfs interface for GuC submission statistics
>   Documentation/drm: kerneldoc for GuC
>   drm/i915: Enable GuC submission, where supported
> 
> In the final section, we implement the GuC submission mechanism, link
> it into the (execlist-based) submission path, and finally enable it
> (on supported platforms). On platforms where there is no GuC, or if
> the GuC firmware cannot be found or is invalid, batch submission will
> revert to using the execlist mechanism directly.
> 
> The GuC firmware itself is not included in this patchset; it is or will
> be available for download from https://01.org/linuxgraphics/downloads/
> This driver works with and requires GuC firmware revision 3.x. It will
> not work with any firmware version 1.x, as the GuC protocol in those
> revisions was incompatible and is no longer supported.
> 
> Prerequisites: GuC submission will expose existing inadequacies in
> some of the existing codepaths unless certain other patches are applied.
> In particular we will require some version of Michel Thierry's patch
>   drm/i915/lrc: Update PDPx registers with lri commands
> (because the GuC support light-restore, which execlist mode doesn't),
> and my own 
>   drm/i915: Allocate OLR more safely (workaround until OLR goes away)
> because otherwise the changed timing means that there is an increased
> risk of writing to a ringbuffer that is not currently pinned & mapped,
> causing a kernel OOPS.
> 
> Alex Dai (10):
>   drm/i915: Add i915_gem_object_write() to i915_gem.c
>   drm/i915: Add GuC-related module parameters
>   drm/i915: Add GuC-related header files
>   drm/i915: GuC-specific firmware loader
>   drm/i915: Debugfs interface to read GuC load status
>   drm/i915: GuC submission setup, phase 1
>   drm/i915: Enable GuC firmware log
>   drm/i915: Implementation of GuC client
>   drm/i915: Integrate GuC-based command submission
>   Documentation/drm: kerneldoc for GuC
> 
> Dave Gordon (5):
>   drm/i915: Embedded microcontroller (uC) firmware loading support
>   drm/i915: Defer default hardware context initialisation until first
>   drm/i915: Interrupt routing for GuC submission
>   drm/i915: Debugfs interface for GuC submission statistics
>   drm/i915: Enable GuC submission, where supported
> 
> Michael H. Nguyen (1):
>   drm/i915: Move execlists defines from .c to .h
> 
> Ben Widawsky
> Vinit Azad
>   created the original versions on which some of these patches are based.
> 
>  Documentation/DocBook/drm.tmpl             |   19 +
>  drivers/gpu/drm/i915/Makefile              |    7 +
>  drivers/gpu/drm/i915/i915_debugfs.c        |  109 +++-
>  drivers/gpu/drm/i915/i915_dma.c            |    4 +
>  drivers/gpu/drm/i915/i915_drv.h            |   17 +
>  drivers/gpu/drm/i915/i915_gem.c            |   39 +-
>  drivers/gpu/drm/i915/i915_gem_context.c    |   52 +-
>  drivers/gpu/drm/i915/i915_guc_submission.c |  873 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_irq.c            |   48 ++
>  drivers/gpu/drm/i915/i915_params.c         |    9 +
>  drivers/gpu/drm/i915/i915_reg.h            |   92 ++-
>  drivers/gpu/drm/i915/intel_guc.h           |  184 ++++++
>  drivers/gpu/drm/i915/intel_guc_api.h       |  227 ++++++++
>  drivers/gpu/drm/i915/intel_guc_loader.c    |  498 ++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c           |  128 ++--
>  drivers/gpu/drm/i915/intel_lrc.h           |    8 +
>  drivers/gpu/drm/i915/intel_uc_loader.c     |  312 ++++++++++
>  drivers/gpu/drm/i915/intel_uc_loader.h     |   82 +++
>  18 files changed, 2607 insertions(+), 101 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_guc_submission.c
>  create mode 100644 drivers/gpu/drm/i915/intel_guc.h
>  create mode 100644 drivers/gpu/drm/i915/intel_guc_api.h
>  create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c
>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c
>  create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h
> 
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/15] Batch submission via GuC
  2015-06-24 12:16 ` Daniel Vetter
@ 2015-06-24 12:57   ` Chris Wilson
  0 siblings, 0 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-24 12:57 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Ben Widawsky, Vinit Azad

On Wed, Jun 24, 2015 at 02:16:05PM +0200, Daniel Vetter wrote:
> - For async gem init it's tempting to repurpose the gpu reset code with
>   all the existing synchronization: We'd only need to set the
>   RESET_IN_PROGRESS flag synchronously and then run a gpu reset after the
>   firmware is loaded, all from the async worker. The problem with that is
>   that we still have issues with rogue -EIO escaping into the modeset code
>   while a reset is pending, and that would not be good since we want
>   modesets to work right away ofc. Also most modeset paths need to stall
>   for gpu resets synchronously, which again isn't what we want.

We already have async GEM init! It is the source of a bug I keep
reporting.

async GEM init is trivial, the memory management code is independent of
request submission. Request submission can be delayed indefinitely until
the submission port is ready. The only effect is that we have to be
careful when doing hangcheck to be sure that the submission port is
running before declaring it hung.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/15] Batch submission via GuC
  2015-06-17 12:43 ` [PATCH 00/15] Batch submission via GuC Daniel Vetter
@ 2015-06-25  7:23   ` Dave Gordon
  2015-06-25  8:05     ` Chris Wilson
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-06-25  7:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 17/06/15 13:43, Daniel Vetter wrote:
> On Mon, Jun 15, 2015 at 07:36:18PM +0100, Dave Gordon wrote:
>> This patch series enables command submission via the GuC. In this mode,
>> instead of the host CPU driving the execlist port directly, it hands
>> over work items to the GuC, using a doorbell mechanism to tell the GuC
>> that new items have been added to its work queue. The GuC then dispatches
>> contexts to the various GPU engines, and manages the resulting context-
>> switch interrupts. Completion of a batch is however still signalled to
>> the CPU; the GuC is not involved in handling user interrupts.
>>
>> There are three subsequences within the patch series:
>>
>>   drm/i915: Add i915_gem_object_write() to i915_gem.c
>>   drm/i915: Embedded microcontroller (uC) firmware loading support
>>
>> These first two patches provide a generic framework for fetching the
>> firmware that may be required by any embedded microcontroller from a
>> file, using an asynchronous thread so that driver initialisation can
>> continue while the firmware is being fetched. It is hoped that this
>> framework is sufficiently general that it can be used for all curent
>> and future microcontrollers.
>>
>>   drm/i915: Add GuC-related module parameters
>>   drm/i915: Add GuC-related header files
>>   drm/i915: GuC-specific firmware loader
>>   drm/i915: Debugfs interface to read GuC load status
> 
> Does that include all the nifty power management stuff GuC does?

No; the GuC f/w may be doing such things but I don't have any code to
interrogate it about power management. None of that appears in the GuC
submission HLD, so I'd guess we're not presenting that until it has a
stable i/f.

>> These four patches complete the GuC loader. At this point in the sequence
>> we can load and activate the GuC firmware, but not submit any batches
>> through it. (This is nonetheless a potentially useful state, as the GuC
>> can do other useful work even when not handling batch submissions).
>>
>>   drm/i915: Defer default hardware context initialisation until first
>>   drm/i915: Move execlists defines from .c to .h
>>   drm/i915: GuC submission setup, phase 1
>>   drm/i915: Enable GuC firmware log
>>   drm/i915: Implementation of GuC client
>>   drm/i915: Interrupt routing for GuC submission
>>   drm/i915: Integrate GuC-based command submission
>>   drm/i915: Debugfs interface for GuC submission statistics
>>   Documentation/drm: kerneldoc for GuC
>>   drm/i915: Enable GuC submission, where supported
>>
>> In the final section, we implement the GuC submission mechanism, link
>> it into the (execlist-based) submission path, and finally enable it
>> (on supported platforms). On platforms where there is no GuC, or if
>> the GuC firmware cannot be found or is invalid, batch submission will
>> revert to using the execlist mechanism directly.
> 
> I thought we had some perf data showing that GuC is now faster than
> execbuf ... Where's that?

Alex has run some benchmarks, generally showing a small improvement, up
to about 5% depending on workload. OTOH John H knows of one application
that improved by more than 20% :)

>> The GuC firmware itself is not included in this patchset; it is or will
>> be available for download from https://01.org/linuxgraphics/downloads/
>> This driver works with and requires GuC firmware revision 3.x. It will
>> not work with any firmware version 1.x, as the GuC protocol in those
>> revisions was incompatible and is no longer supported.
>>
>> Prerequisites: GuC submission will expose existing inadequacies in
>> some of the existing codepaths unless certain other patches are applied.
>> In particular we will require some version of Michel Thierry's patch
>>   drm/i915/lrc: Update PDPx registers with lri commands
>> (because the GuC support light-restore, which execlist mode doesn't),
>> and my own 
>>   drm/i915: Allocate OLR more safely (workaround until OLR goes away)
>> because otherwise the changed timing means that there is an increased
> 
> s/timing/much reduced ring space I presume?

I think it's more likely timing, but of course it depends on total
system activity as to what happens to be pinned (and therefore kmapped)
at any particular instant.

>> risk of writing to a ringbuffer that is not currently pinned & mapped,
>> causing a kernel OOPS.
> 
> Cheers, Daniel

New version incorporating all feedback should appear later today. It
would probably have been yesterday were there not conflicts between
"drm/i915: Defer default hardware context initialisation until first
open" and one of the AntiOLR patches which also splits init_hw() :(

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/15] Batch submission via GuC
  2015-06-25  7:23   ` Dave Gordon
@ 2015-06-25  8:05     ` Chris Wilson
  0 siblings, 0 replies; 94+ messages in thread
From: Chris Wilson @ 2015-06-25  8:05 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Thu, Jun 25, 2015 at 08:23:08AM +0100, Dave Gordon wrote:
> On 17/06/15 13:43, Daniel Vetter wrote:
> > On Mon, Jun 15, 2015 at 07:36:18PM +0100, Dave Gordon wrote:
> >> This patch series enables command submission via the GuC. In this mode,
> >> instead of the host CPU driving the execlist port directly, it hands
> >> over work items to the GuC, using a doorbell mechanism to tell the GuC
> >> that new items have been added to its work queue. The GuC then dispatches
> >> contexts to the various GPU engines, and manages the resulting context-
> >> switch interrupts. Completion of a batch is however still signalled to
> >> the CPU; the GuC is not involved in handling user interrupts.
> >>
> >> There are three subsequences within the patch series:
> >>
> >>   drm/i915: Add i915_gem_object_write() to i915_gem.c
> >>   drm/i915: Embedded microcontroller (uC) firmware loading support
> >>
> >> These first two patches provide a generic framework for fetching the
> >> firmware that may be required by any embedded microcontroller from a
> >> file, using an asynchronous thread so that driver initialisation can
> >> continue while the firmware is being fetched. It is hoped that this
> >> framework is sufficiently general that it can be used for all curent
> >> and future microcontrollers.
> >>
> >>   drm/i915: Add GuC-related module parameters
> >>   drm/i915: Add GuC-related header files
> >>   drm/i915: GuC-specific firmware loader
> >>   drm/i915: Debugfs interface to read GuC load status
> > 
> > Does that include all the nifty power management stuff GuC does?
> 
> No; the GuC f/w may be doing such things but I don't have any code to
> interrogate it about power management. None of that appears in the GuC
> submission HLD, so I'd guess we're not presenting that until it has a
> stable i/f.
> 
> >> These four patches complete the GuC loader. At this point in the sequence
> >> we can load and activate the GuC firmware, but not submit any batches
> >> through it. (This is nonetheless a potentially useful state, as the GuC
> >> can do other useful work even when not handling batch submissions).
> >>
> >>   drm/i915: Defer default hardware context initialisation until first
> >>   drm/i915: Move execlists defines from .c to .h
> >>   drm/i915: GuC submission setup, phase 1
> >>   drm/i915: Enable GuC firmware log
> >>   drm/i915: Implementation of GuC client
> >>   drm/i915: Interrupt routing for GuC submission
> >>   drm/i915: Integrate GuC-based command submission
> >>   drm/i915: Debugfs interface for GuC submission statistics
> >>   Documentation/drm: kerneldoc for GuC
> >>   drm/i915: Enable GuC submission, where supported
> >>
> >> In the final section, we implement the GuC submission mechanism, link
> >> it into the (execlist-based) submission path, and finally enable it
> >> (on supported platforms). On platforms where there is no GuC, or if
> >> the GuC firmware cannot be found or is invalid, batch submission will
> >> revert to using the execlist mechanism directly.
> > 
> > I thought we had some perf data showing that GuC is now faster than
> > execbuf ... Where's that?
> 
> Alex has run some benchmarks, generally showing a small improvement, up
> to about 5% depending on workload. OTOH John H knows of one application
> that improved by more than 20% :)

Is this compared to execlists? Big deal, we now that ELSP is very slow
and need to tackle the regressions introduced by the switch to
execlists in current gen.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
  2015-06-24  9:32               ` Daniel Vetter
@ 2015-06-25 12:28                 ` Dave Gordon
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Gordon @ 2015-06-25 12:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 24/06/15 10:32, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 07:28:26PM +0100, Dave Gordon wrote:
>> On 18/06/15 15:31, Daniel Vetter wrote:
>>> On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
>>>> On 17/06/15 13:02, Daniel Vetter wrote:
>>>>> On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote:
>>>>>> On 15/06/15 21:09, Chris Wilson wrote:
>>>>>>> On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote:
>>>>>>>> From: Alex Dai <yu.dai@intel.com>
>>>>>>>>
>>>>>>>> i915_gem_object_write() is a generic function to copy data from a plain
>>>>>>>> linear buffer to a paged gem object.
>>>>>>>>
>>>>>>>> We will need this for the microcontroller firmware loading support code.
>>>>>>>>
>>>>>>>> Issue: VIZ-4884
>>>>>>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>>>>>>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>>>>>>> ---
>>>>>>>>  drivers/gpu/drm/i915/i915_drv.h |    2 ++
>>>>>>>>  drivers/gpu/drm/i915/i915_gem.c |   28 ++++++++++++++++++++++++++++
>>>>>>>>  2 files changed, 30 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>>>>> index 611fbd8..9094c06 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>>>>> @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev);
>>>>>>>>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
>>>>>>>>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
>>>>>>>>  			 const struct drm_i915_gem_object_ops *ops);
>>>>>>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
>>>>>>>> +			  const void *data, size_t size);
>>>>>>>>  struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>>>>>>>>  						  size_t size);
>>>>>>>>  void i915_init_vm(struct drm_i915_private *dev_priv,
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>> index be35f04..75d63c2 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>> @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
>>>>>>>>  	return false;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +/* Fill the @obj with the @size amount of @data */
>>>>>>>> +int i915_gem_object_write(struct drm_i915_gem_object *obj,
>>>>>>>> +			const void *data, size_t size)
>>>>>>>> +{
>>>>>>>> +	struct sg_table *sg;
>>>>>>>> +	size_t bytes;
>>>>>>>> +	int ret;
>>>>>>>> +
>>>>>>>> +	ret = i915_gem_object_get_pages(obj);
>>>>>>>> +	if (ret)
>>>>>>>> +		return ret;
>>>>>>>> +
>>>>>>>> +	i915_gem_object_pin_pages(obj);
>>>>>>>
>>>>>>> You don't set the object into the CPU domain, or instead manually handle
>>>>>>> the domain flushing. You don't handle objects that cannot be written
>>>>>>> directly by the CPU, nor do you handle objects whose representation in
>>>>>>> memory is not linear.
>>>>>>> -Chris
>>>>>>
>>>>>> No we don't handle just any random gem object, but we do return an error
>>>>>> code for any types not supported. However, as we don't really need the
>>>>>> full generality of writing into a gem object of any type, I will replace
>>>>>> this function with one that combines the allocation of a new object
>>>>>> (which will therefore definitely be of the correct type, in the correct
>>>>>> domain, etc) and filling it with the data to be preserved.
>>>>
>>>> The usage pattern for the particular case is going to be:
>>>> 	Once-only:
>>>> 		Allocate
>>>> 		Fill
>>>> 	Then each time GuC is (re-)initialised:
>>>> 		Map to GTT
>>>> 		DMA-read from buffer into GuC private memory
>>>> 		Unmap
>>>> 	Only on unload:
>>>> 		Dispose
>>>>
>>>> So our object is write-once by the CPU (and that's always the first
>>>> operation), thereafter read-occasionally by the GuC's DMA engine.
>>>
>>> Yup. The problem is more that on atom platforms the objects aren't
>>> coherent by default and generally you need to do something. Hence we
>>> either have
>>> - an explicit set_caching call to document that this is a gpu object which
>>>   is always coherent (so also on chv/bxt), even when that's a no-op on big
>>>   core
>>> - or wrap everything in set_domain calls, even when those are no-ops too.
>>>
>>> If either of those lack, reviews tend to freak out preemptively and the
>>> reptil brain takes over ;-)
>>>
>>> Cheers, Daniel
>>
>> We don't need "coherency" as such. The buffer is filled (once only) by
>> the CPU (so I should put a set-to-cpu-domain between the allocate and
>> fill stages?) Once it's filled, the CPU need not read or write it ever
>> again.
>>
>> Then before the DMA engine accesses it, we call i915_gem_obj_ggtt_pin,
>> which I'm assuming will take care of any coherency issues (making sure
>> the data written by the CPU is now visible to the DMA engine) when it
>> puts the buffer into the GTT-readable domain. Is that not sufficient?
> 
> Pinning is orthogonal to coherency, it'll just make sure the backing
> storage is there. set_domain(CPU) before writing and set_domain(GTT)
> before each upload to the guc using the hw copy thing would be prudent.
> The coherency tracking should no-op out any calls which aren't needed for
> you.
> -Daniel

OK, done; I had already added set_domain(CPU) in the allocate-and-fill
code, now I've just added i915_gem_object_set_to_gtt_domain(obj, false)
in the dma-to-h/w code.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-06-24 10:29           ` Daniel Vetter
@ 2015-07-06 12:44             ` Dave Gordon
  2015-07-06 13:24               ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-07-06 12:44 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 24/06/15 11:29, Daniel Vetter wrote:
> On Fri, Jun 19, 2015 at 09:43:11AM +0100, Dave Gordon wrote:
>> On 18/06/15 15:49, Daniel Vetter wrote:
>>> On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
>>>> On 17/06/15 13:05, Daniel Vetter wrote:
>>>>> On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
>>>>>> Current devices may contain one or more programmable microcontrollers
>>>>>> that need to have a firmware image (aka "binary blob") loaded from an
>>>>>> external medium and transferred to the device's memory.
>>>>>>
>>>>>> This file provides generic support functions for doing this; they can
>>>>>> then be used by each uC-specific loader, thus reducing code duplication
>>>>>> and testing effort.
>>>>>>
>>>>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>>>>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>>>>>
>>>>> Given that I'm just shredding the synchronization used by the dmc loader
>>>>> I'm not convinced this is a good idea. Abstraction has cost, and a bit of
>>>>> copy-paste for similar sounding but slightly different things doesn't
>>>>> sound awful to me. And the critical bit in all the firmware loading I've
>>>>> seen thus far is in synchronizing the loading with other operations,
>>>>> hiding that isn't a good idea. Worse if we enforce stuff like requiring
>>>>> dev->struct_mutex.
>>>>> -Daniel
>>>>
>>>> It's precisely because it's in some sense "trivial-but-tricky" that we
>>>> should write it once, get it right, and use it everywhere. Copypaste
>>>> /does/ sound awful; I've seen how the code this was derived from had
>>>> already been cloned into three flavours, all different and all wrong.
>>>>
>>>> It's a very simple abstraction: one early call to kick things off as
>>>> early as possible, no locking required. One late call with the
>>>> struct_mutex held to complete the synchronisation and actually do the
>>>> work, thus guaranteeing that the transfer to the target uC is done in a
>>>> controlled fashion, at a time of the caller's choice, and by the
>>>> driver's mainline thread, NOT by an asynchronous thread racing with
>>>> other activity (which was one of the things wrong with the original
>>>> version).
>>>
>>> Yeah I've seen the origins of this in the display code, and that code gets
>>> the syncing wrong. The only thing that one has do to is grab a runtime pm
>>> reference for the appropriate power well to prevent dc5 entry, and release
>>> it when the firmware is loaded and initialized.
>>
>> Agreed.
>>
>>> Which means any kind of firmware loader which requires/uses
>>> dev->struct_mutex get stuff wrong and is not appropriate everywhere.
>>
>> BUT, the loading of the firmware into any uC MUST be done in a
>> controlled manner i.e. at a time when no other thread is touching the
>> h/w. Otherwise the f/w load and whatever else is concurrently accessing
>> the h/w could in some cases interfere disastrously. Examples of
>> interference might be:
>>
>> * interleaved accesses to the ELSP (in the case of the GuC)
>> * incorrect handover of power management (DMC, GuC)
>> * erroneous management of forcewake state
>>
>> In general the f/w that is just starting on the uC may have certain
>> expectations about the initial state of the h/w, which may not be met if
>> other threads are accessing various bits of h/w while the uC is booting up.
>>
>> So we absolutely need to guarantee that the f/w load is done by a thread
>> which has exclusive ownership of any bit of the h/w that the f/w is
>> going to make assumptions about. With the current locking structure of
>> the driver, that means holding the struct_mutex (it shouldn't really,
>> there should be a separate mutex for h/w register access vs.
>> driver-private data structures, but there isn't).
>
> If you really need this guarantee (and I seriously hope not) then the only
> option is a synchronous firmware load at driver init _before_ we launch
> any of the asynchronous setup code. And there is already a lot of that,
> and we're adding more all the time.
>
> What I expect we need is synchronization of just the revelant part with
> the firmware loading, which necessarily needs to be somewhat async to be
> able to support cros/android requirements. And yes that needs to be done
> in a controlled manner, but most likely we need very specific solutions
> for the problem at hand. Unconditionally holding dev->struct_mutex isn't
> that solution.
>
> The other problem with dev->struct_mutex is that it's a giantic lock with
> ill defined coverage and semantics. It's imo the biggest piece of
> technical debt we carry around in i915.ko, and we pay the price for that
> dearly&daily. Which means that since a few years any kind of code
> which extended dev->struct_mutex to anything not clearly core gem data
> structures was rejected.

Oh, I quite agree that the struct_mutex is an abomination and would 
certainly like to eliminate it. But at the moment it's the only 
sufficiently large-scale synchronisation operation available to ensure 
that (for example) we don't try to load the f/w at the same time that 
another thread is trying to reset the h/w.

None of this loader code really needs the struct_mutex specifically; the 
WARN_ON macros were just there to help callers know what degree of 
synchronisation they need to organise before calling these functions.

[snip]

>> BTW, the existing DMC loader probably won't work on Android :(
>
> Yeah I completely missed out on this fun since I presumed that firmware
> loading is easy and simple. And if you look around on other drm drivers it
> indeed is, they all use a synchronous request_firmware and if the firmware
> isn't there, they just fall over (fully in the case of radeon.ko,
> partially in the case of nouveau.ko since they have all the support in
> place for handling a kms-only accel-less gpu in userspace anyway, like we
> do). But for a bunch of reasons (afaik it's "you can't include a blob in a
> gpl-ed kernel image" we need async firmware loading for cros&android).
>
> That leaves us with a situation where we should have done a special design
> discussion about asynchronous firmware, but somehow failed do to that.
> Which leaves us in a very ugly position.
>
> I talked with a bunch of people over the past few days to figure out how
> this is supposed to work and also figure out why it's being done like that
> today. I think I have a reasonable good plan for moving forward too. I'll
> start a new top-level thread here to discuss this.
>
> Thanks, Daniel

It really isn't "asynchronous", it's just "deferred" -- but implying 
that everything that relies on having the firmware available also has to 
be deferred. For the DMC, that means we can't have full PM; and for the 
GuC, we can't submit any batches at all to any engine until the f/w load 
is done.

Really, it would be simpler if we didn't support automatic firmware 
loading in the kernel at all, and had a userland startup process whose 
job was to locate and transfer the required firmware before the device 
could be used. But that would just give us the opposite problem, because 
the display device is one of the things that /should/ be usable even 
before the rootfs is mounted.

Anyway, I've posted a simplified v3 which only supports synchronous 
fetch, without the prefetch capability. See separate thread ...

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
  2015-07-06 12:44             ` Dave Gordon
@ 2015-07-06 13:24               ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2015-07-06 13:24 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jul 06, 2015 at 01:44:10PM +0100, Dave Gordon wrote:
> On 24/06/15 11:29, Daniel Vetter wrote:
> >On Fri, Jun 19, 2015 at 09:43:11AM +0100, Dave Gordon wrote:
> >>On 18/06/15 15:49, Daniel Vetter wrote:
> >>>On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
> >>>>On 17/06/15 13:05, Daniel Vetter wrote:
> >>>>>On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
> >>>>>>Current devices may contain one or more programmable microcontrollers
> >>>>>>that need to have a firmware image (aka "binary blob") loaded from an
> >>>>>>external medium and transferred to the device's memory.
> >>>>>>
> >>>>>>This file provides generic support functions for doing this; they can
> >>>>>>then be used by each uC-specific loader, thus reducing code duplication
> >>>>>>and testing effort.
> >>>>>>
> >>>>>>Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >>>>>>Signed-off-by: Alex Dai <yu.dai@intel.com>
> >>>>>
> >>>>>Given that I'm just shredding the synchronization used by the dmc loader
> >>>>>I'm not convinced this is a good idea. Abstraction has cost, and a bit of
> >>>>>copy-paste for similar sounding but slightly different things doesn't
> >>>>>sound awful to me. And the critical bit in all the firmware loading I've
> >>>>>seen thus far is in synchronizing the loading with other operations,
> >>>>>hiding that isn't a good idea. Worse if we enforce stuff like requiring
> >>>>>dev->struct_mutex.
> >>>>>-Daniel
> >>>>
> >>>>It's precisely because it's in some sense "trivial-but-tricky" that we
> >>>>should write it once, get it right, and use it everywhere. Copypaste
> >>>>/does/ sound awful; I've seen how the code this was derived from had
> >>>>already been cloned into three flavours, all different and all wrong.
> >>>>
> >>>>It's a very simple abstraction: one early call to kick things off as
> >>>>early as possible, no locking required. One late call with the
> >>>>struct_mutex held to complete the synchronisation and actually do the
> >>>>work, thus guaranteeing that the transfer to the target uC is done in a
> >>>>controlled fashion, at a time of the caller's choice, and by the
> >>>>driver's mainline thread, NOT by an asynchronous thread racing with
> >>>>other activity (which was one of the things wrong with the original
> >>>>version).
> >>>
> >>>Yeah I've seen the origins of this in the display code, and that code gets
> >>>the syncing wrong. The only thing that one has do to is grab a runtime pm
> >>>reference for the appropriate power well to prevent dc5 entry, and release
> >>>it when the firmware is loaded and initialized.
> >>
> >>Agreed.
> >>
> >>>Which means any kind of firmware loader which requires/uses
> >>>dev->struct_mutex get stuff wrong and is not appropriate everywhere.
> >>
> >>BUT, the loading of the firmware into any uC MUST be done in a
> >>controlled manner i.e. at a time when no other thread is touching the
> >>h/w. Otherwise the f/w load and whatever else is concurrently accessing
> >>the h/w could in some cases interfere disastrously. Examples of
> >>interference might be:
> >>
> >>* interleaved accesses to the ELSP (in the case of the GuC)
> >>* incorrect handover of power management (DMC, GuC)
> >>* erroneous management of forcewake state
> >>
> >>In general the f/w that is just starting on the uC may have certain
> >>expectations about the initial state of the h/w, which may not be met if
> >>other threads are accessing various bits of h/w while the uC is booting up.
> >>
> >>So we absolutely need to guarantee that the f/w load is done by a thread
> >>which has exclusive ownership of any bit of the h/w that the f/w is
> >>going to make assumptions about. With the current locking structure of
> >>the driver, that means holding the struct_mutex (it shouldn't really,
> >>there should be a separate mutex for h/w register access vs.
> >>driver-private data structures, but there isn't).
> >
> >If you really need this guarantee (and I seriously hope not) then the only
> >option is a synchronous firmware load at driver init _before_ we launch
> >any of the asynchronous setup code. And there is already a lot of that,
> >and we're adding more all the time.
> >
> >What I expect we need is synchronization of just the revelant part with
> >the firmware loading, which necessarily needs to be somewhat async to be
> >able to support cros/android requirements. And yes that needs to be done
> >in a controlled manner, but most likely we need very specific solutions
> >for the problem at hand. Unconditionally holding dev->struct_mutex isn't
> >that solution.
> >
> >The other problem with dev->struct_mutex is that it's a giantic lock with
> >ill defined coverage and semantics. It's imo the biggest piece of
> >technical debt we carry around in i915.ko, and we pay the price for that
> >dearly&daily. Which means that since a few years any kind of code
> >which extended dev->struct_mutex to anything not clearly core gem data
> >structures was rejected.
> 
> Oh, I quite agree that the struct_mutex is an abomination and would
> certainly like to eliminate it. But at the moment it's the only sufficiently
> large-scale synchronisation operation available to ensure that (for example)
> we don't try to load the f/w at the same time that another thread is trying
> to reset the h/w.

I guess this is the crux here - for me part of the big problems around
dev->struct_mutex is that it doesn't just protect data structures to
ensure they're consistent, it also intermingles a lot of lifetime rules
(e.g. holding struct_mutex synchronizes against any final gem bo unref
too). And my experience from fixing up a few horribly misguided locking
designs in drm subsystem is that in general using locks to do
synchronization leads to serious maintainance headaches a few years down
the road. So if you state that only struct_mutex is a big enough lock to
synchronize async guc init then that kills your design for me already. Yes
there's a grey area like always in design topics, but if givin how massive
(and massively undocumented) struct_mutex is I'm against going into that
grey area with extreme prejudice.

I guess I need to write a whitepaper or something about this (it's
commonly accepted wisdom in the drm subsystem), but the summary is
relative simple: Don't use locks to ensure ordering of concurrent
operations or solve lifetime problems. For ordering use waitqueues,
completions or the work/timer specific things to order stuff. For lifetime
problems use refcounting.

Ofc because struct_mutex is such a sprawling beast that's a bit tricky,
but the simple ordering problem should be solved with a flush_work of the
async guc loader or similar. You still need to grab the struct_mutex for
the async loader (because that touches shared gem state, i.e. it's about
data consistency), but not because of ordering issues.

> None of this loader code really needs the struct_mutex specifically; the
> WARN_ON macros were just there to help callers know what degree of
> synchronisation they need to organise before calling these functions.
> 
> [snip]
> 
> >>BTW, the existing DMC loader probably won't work on Android :(
> >
> >Yeah I completely missed out on this fun since I presumed that firmware
> >loading is easy and simple. And if you look around on other drm drivers it
> >indeed is, they all use a synchronous request_firmware and if the firmware
> >isn't there, they just fall over (fully in the case of radeon.ko,
> >partially in the case of nouveau.ko since they have all the support in
> >place for handling a kms-only accel-less gpu in userspace anyway, like we
> >do). But for a bunch of reasons (afaik it's "you can't include a blob in a
> >gpl-ed kernel image" we need async firmware loading for cros&android).
> >
> >That leaves us with a situation where we should have done a special design
> >discussion about asynchronous firmware, but somehow failed do to that.
> >Which leaves us in a very ugly position.
> >
> >I talked with a bunch of people over the past few days to figure out how
> >this is supposed to work and also figure out why it's being done like that
> >today. I think I have a reasonable good plan for moving forward too. I'll
> >start a new top-level thread here to discuss this.
> >
> >Thanks, Daniel
> 
> It really isn't "asynchronous", it's just "deferred" -- but implying that
> everything that relies on having the firmware available also has to be
> deferred. For the DMC, that means we can't have full PM; and for the GuC, we
> can't submit any batches at all to any engine until the f/w load is done.

Yeah I'm ok with just calling it "deferred", as long as we agree on the
meaning of "not run synchronously from driver load". And since there's no
reason to wait for userspace to do anything we might as well run anything
we defer in an async worker (and insert any necessary waits there), which
is why I just call all deferred driver load work "async".

> Really, it would be simpler if we didn't support automatic firmware loading
> in the kernel at all, and had a userland startup process whose job was to
> locate and transfer the required firmware before the device could be used.
> But that would just give us the opposite problem, because the display device
> is one of the things that /should/ be usable even before the rootfs is
> mounted.

request_firmware supports this userland process you're describing above,
if you enable CONFIG_FW_LOADER_USER_HELPER_FALLBACK. Which is what I
suggested Android should use (cros might run into a nack from google
upstream). Unfortunately (special snowflake driver that no one wanted to
convert) request_firmware_nowait does _not_ support this. That's why we
need to use request_firmware + our own worker (otherwise request_firmware
would just block everyone).

Wrt using kms apis before we have all the firmware loaded: That's
definitely I use case we should try to support, unfortunately the current
proposal of blocking for guc loading in gem_open will break that - there's
only one device node for both kms and rendering. Some of the other
proposals for async^Wdeferred gem init (both stalling in add_request or
reusing the reset-in-progress logic) would solve that, but have their own
sets of downsides. These kind of problems are exactly why I want a proper
design discussion about async^Wdeferred gem init: We need a clear set of
requirements, then weigh the different options against each another and
decide upon one of the 3-4 proposed thus far.

> Anyway, I've posted a simplified v3 which only supports synchronous fetch,
> without the prefetch capability. See separate thread ...

Will take a look, thanks.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-07-06 16:37     ` Dave Gordon
@ 2015-07-06 18:12       ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2015-07-06 18:12 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jul 06, 2015 at 05:37:57PM +0100, Dave Gordon wrote:
> On 06/07/15 15:28, Daniel Vetter wrote:
> >On Fri, Jul 03, 2015 at 01:30:27PM +0100, Dave Gordon wrote:
> >>From: Alex Dai <yu.dai@intel.com>
> >>
> >>This uses the common firmware loader to fetch the firmware image,
> >>then loads it into the GuC's memory via a dedicated DMA engine.
> >>
> >>This patch is derived from GuC loading work originally done by
> >>Vinit Azad and Ben Widawsky. It has been reconstructed to accord
> >>with the common firmware loading mechanism by Dave Gordon as well
> >>as new firmware layout etc.
> >>
> >>v2:
> >>     Various improvements per review comments by Chris Wilson
> >>
> >>v3:
> >>     Removed 'wait' parameter to intel_guc_ucode_load() as prefetch
> >>         is no longer supported in the common firmware loader, per
> >>	Daniel Vetter's request.
> >>     F/w checker callback fn now returns errno rather than bool.
> >>
> >>Issue: VIZ-4884
> >>Signed-off-by: Alex Dai <yu.dai@intel.com>
> >>Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >>---
> >>  drivers/gpu/drm/i915/Makefile           |   3 +
> >>  drivers/gpu/drm/i915/i915_dma.c         |   4 +
> >>  drivers/gpu/drm/i915/i915_drv.h         |  11 +
> >>  drivers/gpu/drm/i915/i915_gem.c         |   8 +
> >>  drivers/gpu/drm/i915/i915_reg.h         |   4 +-
> >>  drivers/gpu/drm/i915/intel_guc.h        |  49 ++++
> >>  drivers/gpu/drm/i915/intel_guc_loader.c | 448 ++++++++++++++++++++++++++++++++
> >>  7 files changed, 526 insertions(+), 1 deletion(-)
> >>  create mode 100644 drivers/gpu/drm/i915/intel_guc.h
> >>  create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c
> >>
> >>diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> >>index f1f80fc..62a8c83 100644
> >>--- a/drivers/gpu/drm/i915/Makefile
> >>+++ b/drivers/gpu/drm/i915/Makefile
> >>@@ -42,6 +42,9 @@ i915-y += i915_cmd_parser.o \
> >>  # generic ancilliary microcontroller support
> >>  i915-y += intel_uc_loader.o
> >>
> >>+# general-purpose microcontroller (GuC) support
> >>+i915-y += intel_guc_loader.o
> >>+
> >>  # autogenerated null render state
> >>  i915-y += intel_renderstate_gen6.o \
> >>  	  intel_renderstate_gen7.o \
> >>diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> >>index c5349fa..730d91b 100644
> >>--- a/drivers/gpu/drm/i915/i915_dma.c
> >>+++ b/drivers/gpu/drm/i915/i915_dma.c
> >>@@ -469,6 +469,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
> >>
> >>  cleanup_gem:
> >>  	mutex_lock(&dev->struct_mutex);
> >>+	intel_guc_ucode_fini(dev);
> >>  	i915_gem_cleanup_ringbuffer(dev);
> >>  	i915_gem_context_fini(dev);
> >>  	mutex_unlock(&dev->struct_mutex);
> >>@@ -866,6 +867,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
> >>
> >>  	intel_uncore_init(dev);
> >>
> >>+	intel_guc_ucode_init(dev);
> >>+
> >>  	/* Load CSR Firmware for SKL */
> >>  	intel_csr_ucode_init(dev);
> >>
> >>@@ -1117,6 +1120,7 @@ int i915_driver_unload(struct drm_device *dev)
> >>  	flush_workqueue(dev_priv->wq);
> >>
> >>  	mutex_lock(&dev->struct_mutex);
> >>+	intel_guc_ucode_fini(dev);
> >>  	i915_gem_cleanup_ringbuffer(dev);
> >>  	i915_gem_context_fini(dev);
> >>  	mutex_unlock(&dev->struct_mutex);
> >>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>index 9618f57..a7ccac5 100644
> >>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>@@ -50,6 +50,7 @@
> >>  #include <linux/intel-iommu.h>
> >>  #include <linux/kref.h>
> >>  #include <linux/pm_qos.h>
> >>+#include "intel_guc.h"
> >>
> >>  /* General customization:
> >>   */
> >>@@ -1687,6 +1688,8 @@ struct drm_i915_private {
> >>
> >>  	struct i915_virtual_gpu vgpu;
> >>
> >>+	struct intel_guc guc;
> >>+
> >>  	struct intel_csr csr;
> >>
> >>  	/* Display CSR-related protection */
> >>@@ -1931,6 +1934,11 @@ static inline struct drm_i915_private *dev_to_i915(struct device *dev)
> >>  	return to_i915(dev_get_drvdata(dev));
> >>  }
> >>
> >>+static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
> >>+{
> >>+	return container_of(guc, struct drm_i915_private, guc);
> >>+}
> >>+
> >>  /* Iterate over initialised rings */
> >>  #define for_each_ring(ring__, dev_priv__, i__) \
> >>  	for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
> >>@@ -2539,6 +2547,9 @@ struct drm_i915_cmd_table {
> >>
> >>  #define HAS_CSR(dev)	(IS_SKYLAKE(dev))
> >>
> >>+#define HAS_GUC_UCODE(dev)	(IS_GEN9(dev))
> >>+#define HAS_GUC_SCHED(dev)	(IS_GEN9(dev))
> >>+
> >>  #define INTEL_PCH_DEVICE_ID_MASK		0xff00
> >>  #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
> >>  #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
> >>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>index aa8f4c3..80d7890 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>@@ -5076,6 +5076,14 @@ i915_gem_init_hw(struct drm_device *dev)
> >>  			goto out;
> >>  	}
> >>
> >>+	/*
> >>+	 * We can't enable contexts until all firmware is loaded; if this
> >>+	 * fails, disable GuC submissions and fall back to execlist mode
> >>+	 */
> >>+	ret = intel_guc_ucode_load(dev);
> >>+	if (ret)
> >>+		i915.enable_guc_submission = false;
> >
> >I want an -EIO or similar here since runtime fallbacks to other modes
> >really aren't great from a maintainance perspective, see my comments on
> >the irq routing code.
> >
> >Yes we can make this work, but givin our stellar track record with keeping
> >disabled features working it won't work for long. And it will impact us
> >with additional constraints until we give up and rip it out again. Not
> >worth it imo - if we decide to use the guc on a given platform we should
> >imo require it and stick to that decision for at least as long as the
> >driver is loaded. Developers can still change the option when reloading the
> >driver, users won't have a chance to cause trouble.
> >-Daniel
> 
> Again, this isn't really "runtime" -- we're still in the driver loading
> stage here. This is analogous to the various "sanitize" functions where we
> cross-check what options have been set and decide which to override, except
> that here we can't determine whether we're going to respect the default or
> user-specified request for GuC submission mode until we know whether we have
> valid firmware for the GuC.

The problem is that a pile of code has run already (specifically irq
setup). Not a problem from the correctness pov it can all be made to work,
but from a longer-term maintainance pov. It'll just bitrot until it's
ripped out. But until that happens everyone has to keep it in mind and
try not to break this fallback. Too much trouble imo for no real benefit.

> At this point, we haven't submitted any batches, so the main point of use of
> this flag -- in the submission path, to switch between execlist and GuC
> modes -- has never yet been executed. So there should be no problem with
> changing the value before it's first used.
> 
> And this is a one-way switch; you (or the default config) asked for GuC
> submission, but we can't support it so we disable the option. There's no way
> to switch it back on without reloading the driver. So this /is/ the point at
> which we decide to use the GuC on a given platform and then stick to that
> decision for at least as long as the driver is loaded.
> 
> We have to support execlist mode for the foreseeable future anyway, so using
> it on a machine which (we think) ought to be GuC-capable doesn't add /any/
> extra maintenance overhead at all.
> 
> Why break the user's machine unnecessarily? With real "end-users",
> especially those who have never used Linux before, you only get one chance.
> Sometimes I've installed Linux on a (Windows-using) friend's machine, and it
> hasn't worked first time. Then I switch to another VT, type some magic
> incantations, and 10 minutes later we have a usable login screen. Will they
> adopt Linux? Unlikely :( No matter how good it looks thereafter, if the
> machine's hardware doesn't work with the distro straight out of the box,
> they're just not going to believe it's something they can use. So it's very
> important that everything essential to the first-time experience works even
> when misconfigured -- and nothing is more essential than the display driver
> (networking and wi-fi are the next things that will put the user off if they
> don't work -- and they're also drivers that commonly rely on firmware
> blobs).

Those end-users install fedora or ubuntu which get the firmware blob
loading just right. None of the big other drivers bother with falling back
to some other mode if the firmware they need isn't there: radeon just
outright bails out, nouveau just disables accel or runtime pm if the
firmware blob is only needed for these features.

The only user I can see are people allergic to firmware blobs (they can
use execlist and we don't care) or developers (can change mod options
too). Misconfigured systems from newbies is not a target market, neither
for intel nor for upstream (yes we WONTFIX bugs of people with funky
configs if the problem doesn't exist on a properly configured distro).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-07-06 14:28   ` Daniel Vetter
@ 2015-07-06 16:37     ` Dave Gordon
  2015-07-06 18:12       ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-07-06 16:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 06/07/15 15:28, Daniel Vetter wrote:
> On Fri, Jul 03, 2015 at 01:30:27PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> This uses the common firmware loader to fetch the firmware image,
>> then loads it into the GuC's memory via a dedicated DMA engine.
>>
>> This patch is derived from GuC loading work originally done by
>> Vinit Azad and Ben Widawsky. It has been reconstructed to accord
>> with the common firmware loading mechanism by Dave Gordon as well
>> as new firmware layout etc.
>>
>> v2:
>>      Various improvements per review comments by Chris Wilson
>>
>> v3:
>>      Removed 'wait' parameter to intel_guc_ucode_load() as prefetch
>>          is no longer supported in the common firmware loader, per
>> 	Daniel Vetter's request.
>>      F/w checker callback fn now returns errno rather than bool.
>>
>> Issue: VIZ-4884
>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> ---
>>   drivers/gpu/drm/i915/Makefile           |   3 +
>>   drivers/gpu/drm/i915/i915_dma.c         |   4 +
>>   drivers/gpu/drm/i915/i915_drv.h         |  11 +
>>   drivers/gpu/drm/i915/i915_gem.c         |   8 +
>>   drivers/gpu/drm/i915/i915_reg.h         |   4 +-
>>   drivers/gpu/drm/i915/intel_guc.h        |  49 ++++
>>   drivers/gpu/drm/i915/intel_guc_loader.c | 448 ++++++++++++++++++++++++++++++++
>>   7 files changed, 526 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/i915/intel_guc.h
>>   create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>> index f1f80fc..62a8c83 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -42,6 +42,9 @@ i915-y += i915_cmd_parser.o \
>>   # generic ancilliary microcontroller support
>>   i915-y += intel_uc_loader.o
>>
>> +# general-purpose microcontroller (GuC) support
>> +i915-y += intel_guc_loader.o
>> +
>>   # autogenerated null render state
>>   i915-y += intel_renderstate_gen6.o \
>>   	  intel_renderstate_gen7.o \
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>> index c5349fa..730d91b 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -469,6 +469,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
>>
>>   cleanup_gem:
>>   	mutex_lock(&dev->struct_mutex);
>> +	intel_guc_ucode_fini(dev);
>>   	i915_gem_cleanup_ringbuffer(dev);
>>   	i915_gem_context_fini(dev);
>>   	mutex_unlock(&dev->struct_mutex);
>> @@ -866,6 +867,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
>>
>>   	intel_uncore_init(dev);
>>
>> +	intel_guc_ucode_init(dev);
>> +
>>   	/* Load CSR Firmware for SKL */
>>   	intel_csr_ucode_init(dev);
>>
>> @@ -1117,6 +1120,7 @@ int i915_driver_unload(struct drm_device *dev)
>>   	flush_workqueue(dev_priv->wq);
>>
>>   	mutex_lock(&dev->struct_mutex);
>> +	intel_guc_ucode_fini(dev);
>>   	i915_gem_cleanup_ringbuffer(dev);
>>   	i915_gem_context_fini(dev);
>>   	mutex_unlock(&dev->struct_mutex);
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 9618f57..a7ccac5 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -50,6 +50,7 @@
>>   #include <linux/intel-iommu.h>
>>   #include <linux/kref.h>
>>   #include <linux/pm_qos.h>
>> +#include "intel_guc.h"
>>
>>   /* General customization:
>>    */
>> @@ -1687,6 +1688,8 @@ struct drm_i915_private {
>>
>>   	struct i915_virtual_gpu vgpu;
>>
>> +	struct intel_guc guc;
>> +
>>   	struct intel_csr csr;
>>
>>   	/* Display CSR-related protection */
>> @@ -1931,6 +1934,11 @@ static inline struct drm_i915_private *dev_to_i915(struct device *dev)
>>   	return to_i915(dev_get_drvdata(dev));
>>   }
>>
>> +static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
>> +{
>> +	return container_of(guc, struct drm_i915_private, guc);
>> +}
>> +
>>   /* Iterate over initialised rings */
>>   #define for_each_ring(ring__, dev_priv__, i__) \
>>   	for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
>> @@ -2539,6 +2547,9 @@ struct drm_i915_cmd_table {
>>
>>   #define HAS_CSR(dev)	(IS_SKYLAKE(dev))
>>
>> +#define HAS_GUC_UCODE(dev)	(IS_GEN9(dev))
>> +#define HAS_GUC_SCHED(dev)	(IS_GEN9(dev))
>> +
>>   #define INTEL_PCH_DEVICE_ID_MASK		0xff00
>>   #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
>>   #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index aa8f4c3..80d7890 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -5076,6 +5076,14 @@ i915_gem_init_hw(struct drm_device *dev)
>>   			goto out;
>>   	}
>>
>> +	/*
>> +	 * We can't enable contexts until all firmware is loaded; if this
>> +	 * fails, disable GuC submissions and fall back to execlist mode
>> +	 */
>> +	ret = intel_guc_ucode_load(dev);
>> +	if (ret)
>> +		i915.enable_guc_submission = false;
>
> I want an -EIO or similar here since runtime fallbacks to other modes
> really aren't great from a maintainance perspective, see my comments on
> the irq routing code.
>
> Yes we can make this work, but givin our stellar track record with keeping
> disabled features working it won't work for long. And it will impact us
> with additional constraints until we give up and rip it out again. Not
> worth it imo - if we decide to use the guc on a given platform we should
> imo require it and stick to that decision for at least as long as the
> driver is loaded. Developers can still change the option when reloading the
> driver, users won't have a chance to cause trouble.
> -Daniel

Again, this isn't really "runtime" -- we're still in the driver loading 
stage here. This is analogous to the various "sanitize" functions where 
we cross-check what options have been set and decide which to override, 
except that here we can't determine whether we're going to respect the 
default or user-specified request for GuC submission mode until we know 
whether we have valid firmware for the GuC.

At this point, we haven't submitted any batches, so the main point of 
use of this flag -- in the submission path, to switch between execlist 
and GuC modes -- has never yet been executed. So there should be no 
problem with changing the value before it's first used.

And this is a one-way switch; you (or the default config) asked for GuC 
submission, but we can't support it so we disable the option. There's no 
way to switch it back on without reloading the driver. So this /is/ the 
point at which we decide to use the GuC on a given platform and then 
stick to that decision for at least as long as the driver is loaded.

We have to support execlist mode for the foreseeable future anyway, so 
using it on a machine which (we think) ought to be GuC-capable doesn't 
add /any/ extra maintenance overhead at all.

Why break the user's machine unnecessarily? With real "end-users", 
especially those who have never used Linux before, you only get one 
chance. Sometimes I've installed Linux on a (Windows-using) friend's 
machine, and it hasn't worked first time. Then I switch to another VT, 
type some magic incantations, and 10 minutes later we have a usable 
login screen. Will they adopt Linux? Unlikely :( No matter how good it 
looks thereafter, if the machine's hardware doesn't work with the distro 
straight out of the box, they're just not going to believe it's 
something they can use. So it's very important that everything essential 
to the first-time experience works even when misconfigured -- and 
nothing is more essential than the display driver (networking and wi-fi 
are the next things that will put the user off if they don't work -- and 
they're also drivers that commonly rely on firmware blobs).

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-07-03 12:30 ` [PATCH 05/15] drm/i915: GuC-specific firmware loader Dave Gordon
@ 2015-07-06 14:28   ` Daniel Vetter
  2015-07-06 16:37     ` Dave Gordon
  0 siblings, 1 reply; 94+ messages in thread
From: Daniel Vetter @ 2015-07-06 14:28 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jul 03, 2015 at 01:30:27PM +0100, Dave Gordon wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> This uses the common firmware loader to fetch the firmware image,
> then loads it into the GuC's memory via a dedicated DMA engine.
> 
> This patch is derived from GuC loading work originally done by
> Vinit Azad and Ben Widawsky. It has been reconstructed to accord
> with the common firmware loading mechanism by Dave Gordon as well
> as new firmware layout etc.
> 
> v2:
>     Various improvements per review comments by Chris Wilson
> 
> v3:
>     Removed 'wait' parameter to intel_guc_ucode_load() as prefetch
>         is no longer supported in the common firmware loader, per
> 	Daniel Vetter's request.
>     F/w checker callback fn now returns errno rather than bool.
> 
> Issue: VIZ-4884
> Signed-off-by: Alex Dai <yu.dai@intel.com>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile           |   3 +
>  drivers/gpu/drm/i915/i915_dma.c         |   4 +
>  drivers/gpu/drm/i915/i915_drv.h         |  11 +
>  drivers/gpu/drm/i915/i915_gem.c         |   8 +
>  drivers/gpu/drm/i915/i915_reg.h         |   4 +-
>  drivers/gpu/drm/i915/intel_guc.h        |  49 ++++
>  drivers/gpu/drm/i915/intel_guc_loader.c | 448 ++++++++++++++++++++++++++++++++
>  7 files changed, 526 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/intel_guc.h
>  create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index f1f80fc..62a8c83 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -42,6 +42,9 @@ i915-y += i915_cmd_parser.o \
>  # generic ancilliary microcontroller support
>  i915-y += intel_uc_loader.o
>  
> +# general-purpose microcontroller (GuC) support
> +i915-y += intel_guc_loader.o
> +
>  # autogenerated null render state
>  i915-y += intel_renderstate_gen6.o \
>  	  intel_renderstate_gen7.o \
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index c5349fa..730d91b 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -469,6 +469,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
>  
>  cleanup_gem:
>  	mutex_lock(&dev->struct_mutex);
> +	intel_guc_ucode_fini(dev);
>  	i915_gem_cleanup_ringbuffer(dev);
>  	i915_gem_context_fini(dev);
>  	mutex_unlock(&dev->struct_mutex);
> @@ -866,6 +867,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
>  
>  	intel_uncore_init(dev);
>  
> +	intel_guc_ucode_init(dev);
> +
>  	/* Load CSR Firmware for SKL */
>  	intel_csr_ucode_init(dev);
>  
> @@ -1117,6 +1120,7 @@ int i915_driver_unload(struct drm_device *dev)
>  	flush_workqueue(dev_priv->wq);
>  
>  	mutex_lock(&dev->struct_mutex);
> +	intel_guc_ucode_fini(dev);
>  	i915_gem_cleanup_ringbuffer(dev);
>  	i915_gem_context_fini(dev);
>  	mutex_unlock(&dev->struct_mutex);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9618f57..a7ccac5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -50,6 +50,7 @@
>  #include <linux/intel-iommu.h>
>  #include <linux/kref.h>
>  #include <linux/pm_qos.h>
> +#include "intel_guc.h"
>  
>  /* General customization:
>   */
> @@ -1687,6 +1688,8 @@ struct drm_i915_private {
>  
>  	struct i915_virtual_gpu vgpu;
>  
> +	struct intel_guc guc;
> +
>  	struct intel_csr csr;
>  
>  	/* Display CSR-related protection */
> @@ -1931,6 +1934,11 @@ static inline struct drm_i915_private *dev_to_i915(struct device *dev)
>  	return to_i915(dev_get_drvdata(dev));
>  }
>  
> +static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
> +{
> +	return container_of(guc, struct drm_i915_private, guc);
> +}
> +
>  /* Iterate over initialised rings */
>  #define for_each_ring(ring__, dev_priv__, i__) \
>  	for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
> @@ -2539,6 +2547,9 @@ struct drm_i915_cmd_table {
>  
>  #define HAS_CSR(dev)	(IS_SKYLAKE(dev))
>  
> +#define HAS_GUC_UCODE(dev)	(IS_GEN9(dev))
> +#define HAS_GUC_SCHED(dev)	(IS_GEN9(dev))
> +
>  #define INTEL_PCH_DEVICE_ID_MASK		0xff00
>  #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
>  #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index aa8f4c3..80d7890 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5076,6 +5076,14 @@ i915_gem_init_hw(struct drm_device *dev)
>  			goto out;
>  	}
>  
> +	/*
> +	 * We can't enable contexts until all firmware is loaded; if this
> +	 * fails, disable GuC submissions and fall back to execlist mode
> +	 */
> +	ret = intel_guc_ucode_load(dev);
> +	if (ret)
> +		i915.enable_guc_submission = false;

I want an -EIO or similar here since runtime fallbacks to other modes
really aren't great from a maintainance perspective, see my comments on
the irq routing code.

Yes we can make this work, but givin our stellar track record with keeping
disabled features working it won't work for long. And it will impact us
with additional constraints until we give up and rip it out again. Not
worth it imo - if we decide to use the guc on a given platform we should
imo require it and stick to that decision for at least as long as the
driver is loaded. Developers can still change the option when reloading the
driver, users won't have a chance to cause trouble.
-Daniel

> +
>  	/* Now it is safe to go back round and do everything else: */
>  	for_each_ring(ring, dev_priv, i) {
>  		struct drm_i915_gem_request *req;
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 313b1f9..eefb847 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -6837,7 +6837,9 @@ enum skl_disp_power_wells {
>  #define   GEN9_PGCTL_SSB_EU311_ACK	(1 << 14)
>  
>  #define GEN7_MISCCPCTL			(0x9424)
> -#define   GEN7_DOP_CLOCK_GATE_ENABLE	(1<<0)
> +#define   GEN7_DOP_CLOCK_GATE_ENABLE		(1<<0)
> +#define   GEN8_DOP_CLOCK_GATE_CFCLK_ENABLE	(1<<2)
> +#define   GEN8_DOP_CLOCK_GATE_GUC_ENABLE	(1<<4)
>  
>  /* IVYBRIDGE DPF */
>  #define GEN7_L3CDERRST1			0xB008 /* L3CD Error Status 1 */
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> new file mode 100644
> index 0000000..b38d2b0
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -0,0 +1,49 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +#ifndef _INTEL_GUC_H_
> +#define _INTEL_GUC_H_
> +
> +#include "intel_uc_loader.h"
> +#include "intel_guc_fwif.h"
> +#include "i915_guc_reg.h"
> +
> +struct intel_guc {
> +	/* Generic uC firmware management */
> +	struct intel_uc_fw guc_fw;
> +
> +	/* GuC-specific additions */
> +	uint16_t fw_major_wanted;
> +	uint16_t fw_minor_wanted;
> +	uint16_t fw_major_found;
> +	uint16_t fw_minor_found;
> +
> +	uint32_t log_flags;
> +};
> +
> +/* intel_guc_loader.c */
> +extern void intel_guc_ucode_init(struct drm_device *dev);
> +extern int intel_guc_ucode_load(struct drm_device *dev);
> +extern void intel_guc_ucode_fini(struct drm_device *dev);
> +
> +#endif
> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
> new file mode 100644
> index 0000000..4929838
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
> @@ -0,0 +1,448 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Vinit Azad <vinit.azad@intel.com>
> + *    Ben Widawsky <ben@bwidawsk.net>
> + *    Dave Gordon <david.s.gordon@intel.com>
> + *    Alex Dai <yu.dai@intel.com>
> + */
> +#include <linux/firmware.h>
> +#include "i915_drv.h"
> +#include "intel_guc.h"
> +
> +/**
> + * DOC: GuC
> + *
> + * intel_guc:
> + * Top level structure of guc. It handles firmware loading and manages client
> + * pool and doorbells. intel_guc owns a i915_guc_client to replace the legacy
> + * ExecList submission.
> + *
> + * Firmware versioning:
> + * The firmware build process will generate a version header file with major and
> + * minor version defined. The versions are built into CSS header of firmware.
> + * i915 kernel driver set the minimal firmware version required per platform.
> + * The firmware installation package will install (symbolic link) proper version
> + * of firmware.
> + *
> + * GuC address space:
> + * GuC does not allow any gfx GGTT address that falls into range [0, WOPCM_TOP),
> + * which is reserved for Boot ROM, SRAM and WOPCM. Currently this top address is
> + * 512K. In order to exclude 0-512K address space from GGTT, all gfx objects
> + * used by GuC is pinned with PIN_OFFSET_BIAS along with size of WOPCM.
> + *
> + * Firmware log:
> + * Firmware log is enabled by setting i915.guc_log_level to non-negative level.
> + * Log data is printed out via reading debugfs i915_guc_log_dump. Reading from
> + * i915_guc_load_status will print out firmware loading status and scratch
> + * registers value.
> + *
> + */
> +
> +#define I915_SKL_GUC_UCODE "i915/skl_guc_ver3.bin"
> +MODULE_FIRMWARE(I915_SKL_GUC_UCODE);
> +
> +static u32 get_gttype(struct drm_i915_private *dev_priv)
> +{
> +	/* XXX: GT type based on PCI device ID? field seems unused by fw */
> +	return 0;
> +}
> +
> +static u32 get_core_family(struct drm_i915_private *dev_priv)
> +{
> +	switch (INTEL_INFO(dev_priv)->gen) {
> +	case 8:
> +		return GFXCORE_FAMILY_GEN8;
> +	case 9:
> +		return GFXCORE_FAMILY_GEN9;
> +	default:
> +		DRM_ERROR("GUC: unknown gen for scheduler init\n");
> +		return GFXCORE_FAMILY_FORCE_ULONG;
> +	}
> +}
> +
> +static void set_guc_init_params(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_guc *guc = &dev_priv->guc;
> +	u32 params[GUC_CTL_MAX_DWORDS];
> +	int i;
> +
> +	memset(&params, 0, sizeof(params));
> +
> +	params[GUC_CTL_DEVICE_INFO] |=
> +		(get_gttype(dev_priv) << GUC_CTL_GTTYPE_SHIFT) |
> +		(get_core_family(dev_priv) << GUC_CTL_COREFAMILY_SHIFT);
> +
> +	/* GuC ARAT increment is 10 ns. GuC default scheduler quantum is one
> +	 * second. This ARAR is calculated by:
> +	 * Scheduler-Quantum-in-ns / ARAT-increment-in-ns = 1000000000 / 10
> +	 */
> +	params[GUC_CTL_ARAT_HIGH] = 0;
> +	params[GUC_CTL_ARAT_LOW] = 100000000;
> +
> +	params[GUC_CTL_WA] |= GUC_CTL_WA_UK_BY_DRIVER;
> +
> +	params[GUC_CTL_FEATURE] |= GUC_CTL_DISABLE_SCHEDULER |
> +			GUC_CTL_VCS2_ENABLED;
> +
> +	if (i915.guc_log_level >= 0) {
> +		params[GUC_CTL_LOG_PARAMS] = guc->log_flags;
> +		params[GUC_CTL_DEBUG] =
> +			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
> +	}
> +
> +	I915_WRITE(SOFT_SCRATCH(0), 0);
> +
> +	for (i = 0; i < GUC_CTL_MAX_DWORDS; i++)
> +		I915_WRITE(SOFT_SCRATCH(1 + i), params[i]);
> +}
> +
> +/*
> + * Read the GuC status register (GUC_STATUS) and store it in the
> + * specified location; then return a boolean indicating whether
> + * the value matches either of two values representing completion
> + * of the GuC boot process.
> + *
> + * This is used for polling the GuC status in a wait_for_atomic()
> + * loop below.
> + */
> +static inline bool guc_ucode_response(struct drm_i915_private *dev_priv,
> +				      u32 *status)
> +{
> +	u32 val = I915_READ(GUC_STATUS);
> +	*status = val;
> +	return ((val & GS_UKERNEL_MASK) == GS_UKERNEL_READY ||
> +		(val & GS_UKERNEL_MASK) == GS_UKERNEL_LAPIC_DONE);
> +}
> +
> +/*
> + * Transfer the firmware image to RAM for execution by the microcontroller.
> + *
> + * GuC Firmware layout:
> + * +-------------------------------+  ----
> + * |          CSS header           |  128B
> + * | contains major/minor version  |
> + * +-------------------------------+  ----
> + * |             uCode             |
> + * +-------------------------------+  ----
> + * |         RSA signature         |  256B
> + * +-------------------------------+  ----
> + * |         RSA public Key        |  256B
> + * +-------------------------------+  ----
> + * |       Public key modulus      |    4B
> + * +-------------------------------+  ----
> + *
> + * Architecturally, the DMA engine is bidirectional, and can potentially even
> + * transfer between GTT locations. This functionality is left out of the API
> + * for now as there is no need for it.
> + *
> + * Note that GuC needs the CSS header plus uKernel code to be copied by the
> + * DMA engine in one operation, whereas the RSA signature is loaded via MMIO.
> + */
> +
> +#define UOS_CSS_HEADER_OFFSET		0
> +#define UOS_VER_MINOR_OFFSET		0x44
> +#define UOS_VER_MAJOR_OFFSET		0x46
> +#define UOS_CSS_HEADER_SIZE		0x80
> +#define UOS_RSA_SIG_SIZE		0x100
> +#define UOS_CSS_SIGNING_SIZE		0x204
> +
> +static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
> +	struct drm_i915_gem_object *fw_obj = guc_fw->uc_fw_obj;
> +	unsigned long offset;
> +	struct sg_table *sg = fw_obj->pages;
> +	u32 status, ucode_size, rsa[UOS_RSA_SIG_SIZE / sizeof(u32)];
> +	int i, ret = 0;
> +
> +	/* uCode size, also is where RSA signature starts */
> +	offset = ucode_size = guc_fw->uc_fw_size - UOS_CSS_SIGNING_SIZE;
> +
> +	/* Copy RSA signature from the fw image to HW for verification */
> +	sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, UOS_RSA_SIG_SIZE, offset);
> +	for (i = 0; i < UOS_RSA_SIG_SIZE / sizeof(u32); i++)
> +		I915_WRITE(UOS_RSA_SCRATCH_0 + i * sizeof(u32), rsa[i]);
> +
> +	/* Set the source address for the new blob */
> +	offset = i915_gem_obj_ggtt_offset(fw_obj);
> +	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
> +	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
> +
> +	/* Set the destination. Current uCode expects an 8k stack starting from
> +	 * offset 0. */
> +	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
> +
> +	/* XXX: The image is automatically transfered to SRAM after the RSA
> +	 * verification. This is why the address space is chosen as such. */
> +	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
> +
> +	I915_WRITE(DMA_COPY_SIZE, ucode_size);
> +
> +	/* Finally start the DMA */
> +	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
> +
> +	/*
> +	 * Spin-wait for the DMA to complete & the GuC to start up.
> +	 * NB: Docs recommend not using the interrupt for completion.
> +	 * Measurements indicate this should take no more than 20ms, so a
> +	 * timeout here indicates that the GuC has failed and is unusable.
> +	 * (Higher levels of the driver will attempt to fall back to
> +	 * execlist mode if this happens.)
> +	 */
> +	ret = wait_for_atomic(guc_ucode_response(dev_priv, &status), 100);
> +
> +	DRM_DEBUG_DRIVER("DMA status 0x%x, GuC status 0x%x\n",
> +			I915_READ(DMA_CTRL), status);
> +
> +	if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
> +		DRM_ERROR("%s firmware signature verification failed\n",
> +			guc_fw->uc_name);
> +		ret = -ENOEXEC;
> +	}
> +
> +	DRM_DEBUG_DRIVER("returning %d\n", ret);
> +
> +	return ret;
> +}
> +
> +/*
> + * Load the GuC firmware blob into the MinuteIA.
> + */
> +static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
> +	struct drm_device *dev = dev_priv->dev;
> +	int ret;
> +
> +	ret = i915_gem_object_set_to_gtt_domain(guc_fw->uc_fw_obj, false);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("set-domain failed %d\n", ret);
> +		return ret;
> +	}
> +
> +	ret = i915_gem_obj_ggtt_pin(guc_fw->uc_fw_obj, 0, 0);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("pin failed %d\n", ret);
> +		return ret;
> +	}
> +
> +	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> +
> +	/* init WOPCM */
> +	I915_WRITE(GUC_WOPCM_SIZE, GUC_WOPCM_SIZE_VALUE);
> +	I915_WRITE(DMA_GUC_WOPCM_OFFSET, GUC_WOPCM_OFFSET);
> +
> +	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
> +	I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
> +
> +	/* Set MMIO/WA for GuC init */
> +	I915_WRITE(DRBMISC1, DOORBELL_ENABLE);
> +
> +	/* Enable MIA caching. GuC clock gating is disabled. */
> +	I915_WRITE(GUC_SHIM_CONTROL, GUC_SHIM_CONTROL_VALUE);
> +
> +	/* WaC6DisallowByGfxPause*/
> +	I915_WRITE(GEN6_GFXPAUSE, 0x30FFF);
> +
> +	if (IS_SKYLAKE(dev))
> +		I915_WRITE(GEN9_GT_PM_CONFIG, GEN8_GT_DOORBELL_ENABLE);
> +	else
> +		I915_WRITE(GEN8_GT_PM_CONFIG, GEN8_GT_DOORBELL_ENABLE);
> +
> +	if (IS_GEN9(dev)) {
> +		/* DOP Clock Gating Enable for GuC clocks */
> +		I915_WRITE(GEN7_MISCCPCTL, (GEN8_DOP_CLOCK_GATE_GUC_ENABLE |
> +					    I915_READ(GEN7_MISCCPCTL)));
> +
> +		/* allows for 5us before GT can go to RC6 */
> +		I915_WRITE(GUC_ARAT_C6DIS, 0x1FF);
> +	}
> +
> +	set_guc_init_params(dev_priv);
> +
> +	ret = guc_ucode_xfer_dma(dev_priv);
> +
> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +
> +	/*
> +	 * We keep the object pages for reuse during resume. But we can unpin it
> +	 * now that DMA has completed, so it doesn't continue to take up space.
> +	 */
> +	i915_gem_object_ggtt_unpin(guc_fw->uc_fw_obj);
> +
> +	return ret;
> +}
> +
> +/*
> + * Check the firmware that was found; if it's the wrong size or the wrong
> + * version, return a negative error code. If it's OK, return a positive
> + * status value. Here we can just return INTEL_UC_FW_GOOD; the common loader
> + * code will then save the data for later in a pageable (tmpfs-backed) GEM
> + * object.
> + *
> + * Alternatively (for example if we wanted only part of the image) we could
> + * save the required portion here and then return INTEL_UC_FW_SAVED to tell
> + * the common loader that the data is good, and we've already handled saving
> + * anything we need later.
> + *
> + * The GuC firmware image has the version number embedded at a well-known
> + * offset within the firmware blob; note that major / minor version are
> + * TWO bytes each (i.e. u16), although all pointers and offsets are defined
> + * in terms of bytes (u8).
> + */
> +static int guc_ucode_check(struct intel_uc_fw *guc_fw)
> +{
> +	struct intel_guc *guc = container_of(guc_fw, struct intel_guc, guc_fw);
> +	const u8 *css_header = guc_fw->uc_fw_blob->data + UOS_CSS_HEADER_OFFSET;
> +	const size_t blobsize = guc_fw->uc_fw_blob->size;
> +	const size_t minsize = UOS_CSS_HEADER_SIZE + UOS_CSS_SIGNING_SIZE;
> +	const size_t maxsize = GUC_WOPCM_SIZE_VALUE + UOS_CSS_SIGNING_SIZE
> +			- 0x8000; /* 32k reserved (8K stack + 24k context) */
> +
> +	DRM_DEBUG_DRIVER("firmware file size %zu (minimum %zu, maximum %zu)\n",
> +		blobsize, minsize, maxsize);
> +
> +	/* Check the size of the blob befoe examining buffer contents */
> +	if (blobsize < minsize || blobsize > maxsize)
> +		return -ENOEXEC;
> +
> +	guc->fw_major_found = *(u16 *)(css_header + UOS_VER_MAJOR_OFFSET);
> +	guc->fw_minor_found = *(u16 *)(css_header + UOS_VER_MINOR_OFFSET);
> +
> +	if (guc->fw_major_found != guc->fw_major_wanted ||
> +	    guc->fw_minor_found < guc->fw_minor_wanted) {
> +		DRM_ERROR("GuC firmware version %d.%d, required %d.%d\n",
> +			guc->fw_major_found, guc->fw_minor_found,
> +			guc->fw_major_wanted, guc->fw_minor_wanted);
> +		return -ENOEXEC;
> +	}
> +
> +	DRM_DEBUG_DRIVER("firmware version %d.%d OK (minimum %d.%d)\n",
> +			guc->fw_major_found, guc->fw_minor_found,
> +			guc->fw_major_wanted, guc->fw_minor_wanted);
> +
> +	return INTEL_UC_FW_GOOD;
> +}
> +
> +/**
> + * intel_guc_ucode_init() - initiate a firmware loading request
> + *
> + * Called early during driver load, before GEM is initialised.
> + * Driver is single threaded, so no mutex is required.
> + *
> + * This just sets parameters for use when intel_guc_ucode_load()
> + * is called later, after GEM initialisation is complete.
> + */
> +void intel_guc_ucode_init(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_guc *guc = &dev_priv->guc;
> +	struct intel_uc_fw *guc_fw = &guc->guc_fw;
> +	const char *path;
> +
> +	if (!HAS_GUC_SCHED(dev))
> +		i915.enable_guc_submission = false;
> +
> +	if (!HAS_GUC_UCODE(dev)) {
> +		path = NULL;
> +	} else if (IS_SKYLAKE(dev)) {
> +		path = I915_SKL_GUC_UCODE;
> +		guc->fw_major_wanted = 3;
> +		guc->fw_minor_wanted = 0;
> +	} else {
> +		i915.enable_guc_submission = false;
> +		path = "";	/* unknown device */
> +	}
> +
> +	intel_uc_fw_init(dev, guc_fw, "GuC", path);
> +}
> +
> +/**
> + * intel_guc_ucode_load() - load GuC uCode into the device
> + *
> + * Called from gem_init_hw() during driver loading and also after a GPU reset.
> + *
> + * Calls the common loader to get the firmware registered earlier. On the first
> + * call, this will actually fetch it from the filesystem; thereafter, we will
> + * already either have the blob in a GEM object, or have determined that no
> + * valid firmware image could be found).
> + *
> + * If we have a good firmware image, transfer it to the h/w.
> + */
> +int intel_guc_ucode_load(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
> +	int err;
> +
> +	DRM_DEBUG_DRIVER("%s fw status: fetch %s, load %s\n",
> +		guc_fw->uc_name,
> +		intel_uc_fw_status_repr(guc_fw->uc_fw_fetch_status),
> +		intel_uc_fw_status_repr(guc_fw->uc_fw_load_status));
> +
> +	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_NONE)
> +		return 0;
> +
> +	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_SUCCESS &&
> +	    guc_fw->uc_fw_load_status == INTEL_UC_FIRMWARE_FAIL)
> +		return -ENOEXEC;
> +
> +	guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_PENDING;
> +	err = intel_uc_fw_fetch(guc_fw, guc_ucode_check);
> +	if (err)
> +		goto fail;
> +
> +	err = guc_ucode_xfer(dev_priv);
> +	if (err)
> +		goto fail;
> +
> +	guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_SUCCESS;
> +
> +	DRM_DEBUG_DRIVER("%s fw status: fetch %s, load %s\n",
> +		guc_fw->uc_name,
> +		intel_uc_fw_status_repr(guc_fw->uc_fw_fetch_status),
> +		intel_uc_fw_status_repr(guc_fw->uc_fw_load_status));
> +
> +	return 0;
> +
> +fail:
> +	if (guc_fw->uc_fw_load_status == INTEL_UC_FIRMWARE_PENDING)
> +		guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_FAIL;
> +
> +	DRM_ERROR("Failed to initialize GuC, error %d\n", err);
> +
> +	return err;
> +}
> +
> +/**
> + * intel_guc_ucode_fini() - clean up all allocated resources
> + */
> +void intel_guc_ucode_fini(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
> +
> +	intel_uc_fw_fini(guc_fw);
> +}
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 05/15] drm/i915: GuC-specific firmware loader
  2015-07-03 12:30 [PATCH 00/15 v3] " Dave Gordon
@ 2015-07-03 12:30 ` Dave Gordon
  2015-07-06 14:28   ` Daniel Vetter
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Gordon @ 2015-07-03 12:30 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

This uses the common firmware loader to fetch the firmware image,
then loads it into the GuC's memory via a dedicated DMA engine.

This patch is derived from GuC loading work originally done by
Vinit Azad and Ben Widawsky. It has been reconstructed to accord
with the common firmware loading mechanism by Dave Gordon as well
as new firmware layout etc.

v2:
    Various improvements per review comments by Chris Wilson

v3:
    Removed 'wait' parameter to intel_guc_ucode_load() as prefetch
        is no longer supported in the common firmware loader, per
	Daniel Vetter's request.
    F/w checker callback fn now returns errno rather than bool.

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/Makefile           |   3 +
 drivers/gpu/drm/i915/i915_dma.c         |   4 +
 drivers/gpu/drm/i915/i915_drv.h         |  11 +
 drivers/gpu/drm/i915/i915_gem.c         |   8 +
 drivers/gpu/drm/i915/i915_reg.h         |   4 +-
 drivers/gpu/drm/i915/intel_guc.h        |  49 ++++
 drivers/gpu/drm/i915/intel_guc_loader.c | 448 ++++++++++++++++++++++++++++++++
 7 files changed, 526 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/intel_guc.h
 create mode 100644 drivers/gpu/drm/i915/intel_guc_loader.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index f1f80fc..62a8c83 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -42,6 +42,9 @@ i915-y += i915_cmd_parser.o \
 # generic ancilliary microcontroller support
 i915-y += intel_uc_loader.o
 
+# general-purpose microcontroller (GuC) support
+i915-y += intel_guc_loader.o
+
 # autogenerated null render state
 i915-y += intel_renderstate_gen6.o \
 	  intel_renderstate_gen7.o \
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index c5349fa..730d91b 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -469,6 +469,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
 
 cleanup_gem:
 	mutex_lock(&dev->struct_mutex);
+	intel_guc_ucode_fini(dev);
 	i915_gem_cleanup_ringbuffer(dev);
 	i915_gem_context_fini(dev);
 	mutex_unlock(&dev->struct_mutex);
@@ -866,6 +867,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 
 	intel_uncore_init(dev);
 
+	intel_guc_ucode_init(dev);
+
 	/* Load CSR Firmware for SKL */
 	intel_csr_ucode_init(dev);
 
@@ -1117,6 +1120,7 @@ int i915_driver_unload(struct drm_device *dev)
 	flush_workqueue(dev_priv->wq);
 
 	mutex_lock(&dev->struct_mutex);
+	intel_guc_ucode_fini(dev);
 	i915_gem_cleanup_ringbuffer(dev);
 	i915_gem_context_fini(dev);
 	mutex_unlock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9618f57..a7ccac5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -50,6 +50,7 @@
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include "intel_guc.h"
 
 /* General customization:
  */
@@ -1687,6 +1688,8 @@ struct drm_i915_private {
 
 	struct i915_virtual_gpu vgpu;
 
+	struct intel_guc guc;
+
 	struct intel_csr csr;
 
 	/* Display CSR-related protection */
@@ -1931,6 +1934,11 @@ static inline struct drm_i915_private *dev_to_i915(struct device *dev)
 	return to_i915(dev_get_drvdata(dev));
 }
 
+static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
+{
+	return container_of(guc, struct drm_i915_private, guc);
+}
+
 /* Iterate over initialised rings */
 #define for_each_ring(ring__, dev_priv__, i__) \
 	for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
@@ -2539,6 +2547,9 @@ struct drm_i915_cmd_table {
 
 #define HAS_CSR(dev)	(IS_SKYLAKE(dev))
 
+#define HAS_GUC_UCODE(dev)	(IS_GEN9(dev))
+#define HAS_GUC_SCHED(dev)	(IS_GEN9(dev))
+
 #define INTEL_PCH_DEVICE_ID_MASK		0xff00
 #define INTEL_PCH_IBX_DEVICE_ID_TYPE		0x3b00
 #define INTEL_PCH_CPT_DEVICE_ID_TYPE		0x1c00
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index aa8f4c3..80d7890 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5076,6 +5076,14 @@ i915_gem_init_hw(struct drm_device *dev)
 			goto out;
 	}
 
+	/*
+	 * We can't enable contexts until all firmware is loaded; if this
+	 * fails, disable GuC submissions and fall back to execlist mode
+	 */
+	ret = intel_guc_ucode_load(dev);
+	if (ret)
+		i915.enable_guc_submission = false;
+
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 313b1f9..eefb847 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6837,7 +6837,9 @@ enum skl_disp_power_wells {
 #define   GEN9_PGCTL_SSB_EU311_ACK	(1 << 14)
 
 #define GEN7_MISCCPCTL			(0x9424)
-#define   GEN7_DOP_CLOCK_GATE_ENABLE	(1<<0)
+#define   GEN7_DOP_CLOCK_GATE_ENABLE		(1<<0)
+#define   GEN8_DOP_CLOCK_GATE_CFCLK_ENABLE	(1<<2)
+#define   GEN8_DOP_CLOCK_GATE_GUC_ENABLE	(1<<4)
 
 /* IVYBRIDGE DPF */
 #define GEN7_L3CDERRST1			0xB008 /* L3CD Error Status 1 */
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
new file mode 100644
index 0000000..b38d2b0
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+#ifndef _INTEL_GUC_H_
+#define _INTEL_GUC_H_
+
+#include "intel_uc_loader.h"
+#include "intel_guc_fwif.h"
+#include "i915_guc_reg.h"
+
+struct intel_guc {
+	/* Generic uC firmware management */
+	struct intel_uc_fw guc_fw;
+
+	/* GuC-specific additions */
+	uint16_t fw_major_wanted;
+	uint16_t fw_minor_wanted;
+	uint16_t fw_major_found;
+	uint16_t fw_minor_found;
+
+	uint32_t log_flags;
+};
+
+/* intel_guc_loader.c */
+extern void intel_guc_ucode_init(struct drm_device *dev);
+extern int intel_guc_ucode_load(struct drm_device *dev);
+extern void intel_guc_ucode_fini(struct drm_device *dev);
+
+#endif
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
new file mode 100644
index 0000000..4929838
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -0,0 +1,448 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Vinit Azad <vinit.azad@intel.com>
+ *    Ben Widawsky <ben@bwidawsk.net>
+ *    Dave Gordon <david.s.gordon@intel.com>
+ *    Alex Dai <yu.dai@intel.com>
+ */
+#include <linux/firmware.h>
+#include "i915_drv.h"
+#include "intel_guc.h"
+
+/**
+ * DOC: GuC
+ *
+ * intel_guc:
+ * Top level structure of guc. It handles firmware loading and manages client
+ * pool and doorbells. intel_guc owns a i915_guc_client to replace the legacy
+ * ExecList submission.
+ *
+ * Firmware versioning:
+ * The firmware build process will generate a version header file with major and
+ * minor version defined. The versions are built into CSS header of firmware.
+ * i915 kernel driver set the minimal firmware version required per platform.
+ * The firmware installation package will install (symbolic link) proper version
+ * of firmware.
+ *
+ * GuC address space:
+ * GuC does not allow any gfx GGTT address that falls into range [0, WOPCM_TOP),
+ * which is reserved for Boot ROM, SRAM and WOPCM. Currently this top address is
+ * 512K. In order to exclude 0-512K address space from GGTT, all gfx objects
+ * used by GuC is pinned with PIN_OFFSET_BIAS along with size of WOPCM.
+ *
+ * Firmware log:
+ * Firmware log is enabled by setting i915.guc_log_level to non-negative level.
+ * Log data is printed out via reading debugfs i915_guc_log_dump. Reading from
+ * i915_guc_load_status will print out firmware loading status and scratch
+ * registers value.
+ *
+ */
+
+#define I915_SKL_GUC_UCODE "i915/skl_guc_ver3.bin"
+MODULE_FIRMWARE(I915_SKL_GUC_UCODE);
+
+static u32 get_gttype(struct drm_i915_private *dev_priv)
+{
+	/* XXX: GT type based on PCI device ID? field seems unused by fw */
+	return 0;
+}
+
+static u32 get_core_family(struct drm_i915_private *dev_priv)
+{
+	switch (INTEL_INFO(dev_priv)->gen) {
+	case 8:
+		return GFXCORE_FAMILY_GEN8;
+	case 9:
+		return GFXCORE_FAMILY_GEN9;
+	default:
+		DRM_ERROR("GUC: unknown gen for scheduler init\n");
+		return GFXCORE_FAMILY_FORCE_ULONG;
+	}
+}
+
+static void set_guc_init_params(struct drm_i915_private *dev_priv)
+{
+	struct intel_guc *guc = &dev_priv->guc;
+	u32 params[GUC_CTL_MAX_DWORDS];
+	int i;
+
+	memset(&params, 0, sizeof(params));
+
+	params[GUC_CTL_DEVICE_INFO] |=
+		(get_gttype(dev_priv) << GUC_CTL_GTTYPE_SHIFT) |
+		(get_core_family(dev_priv) << GUC_CTL_COREFAMILY_SHIFT);
+
+	/* GuC ARAT increment is 10 ns. GuC default scheduler quantum is one
+	 * second. This ARAR is calculated by:
+	 * Scheduler-Quantum-in-ns / ARAT-increment-in-ns = 1000000000 / 10
+	 */
+	params[GUC_CTL_ARAT_HIGH] = 0;
+	params[GUC_CTL_ARAT_LOW] = 100000000;
+
+	params[GUC_CTL_WA] |= GUC_CTL_WA_UK_BY_DRIVER;
+
+	params[GUC_CTL_FEATURE] |= GUC_CTL_DISABLE_SCHEDULER |
+			GUC_CTL_VCS2_ENABLED;
+
+	if (i915.guc_log_level >= 0) {
+		params[GUC_CTL_LOG_PARAMS] = guc->log_flags;
+		params[GUC_CTL_DEBUG] =
+			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
+	}
+
+	I915_WRITE(SOFT_SCRATCH(0), 0);
+
+	for (i = 0; i < GUC_CTL_MAX_DWORDS; i++)
+		I915_WRITE(SOFT_SCRATCH(1 + i), params[i]);
+}
+
+/*
+ * Read the GuC status register (GUC_STATUS) and store it in the
+ * specified location; then return a boolean indicating whether
+ * the value matches either of two values representing completion
+ * of the GuC boot process.
+ *
+ * This is used for polling the GuC status in a wait_for_atomic()
+ * loop below.
+ */
+static inline bool guc_ucode_response(struct drm_i915_private *dev_priv,
+				      u32 *status)
+{
+	u32 val = I915_READ(GUC_STATUS);
+	*status = val;
+	return ((val & GS_UKERNEL_MASK) == GS_UKERNEL_READY ||
+		(val & GS_UKERNEL_MASK) == GS_UKERNEL_LAPIC_DONE);
+}
+
+/*
+ * Transfer the firmware image to RAM for execution by the microcontroller.
+ *
+ * GuC Firmware layout:
+ * +-------------------------------+  ----
+ * |          CSS header           |  128B
+ * | contains major/minor version  |
+ * +-------------------------------+  ----
+ * |             uCode             |
+ * +-------------------------------+  ----
+ * |         RSA signature         |  256B
+ * +-------------------------------+  ----
+ * |         RSA public Key        |  256B
+ * +-------------------------------+  ----
+ * |       Public key modulus      |    4B
+ * +-------------------------------+  ----
+ *
+ * Architecturally, the DMA engine is bidirectional, and can potentially even
+ * transfer between GTT locations. This functionality is left out of the API
+ * for now as there is no need for it.
+ *
+ * Note that GuC needs the CSS header plus uKernel code to be copied by the
+ * DMA engine in one operation, whereas the RSA signature is loaded via MMIO.
+ */
+
+#define UOS_CSS_HEADER_OFFSET		0
+#define UOS_VER_MINOR_OFFSET		0x44
+#define UOS_VER_MAJOR_OFFSET		0x46
+#define UOS_CSS_HEADER_SIZE		0x80
+#define UOS_RSA_SIG_SIZE		0x100
+#define UOS_CSS_SIGNING_SIZE		0x204
+
+static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
+{
+	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
+	struct drm_i915_gem_object *fw_obj = guc_fw->uc_fw_obj;
+	unsigned long offset;
+	struct sg_table *sg = fw_obj->pages;
+	u32 status, ucode_size, rsa[UOS_RSA_SIG_SIZE / sizeof(u32)];
+	int i, ret = 0;
+
+	/* uCode size, also is where RSA signature starts */
+	offset = ucode_size = guc_fw->uc_fw_size - UOS_CSS_SIGNING_SIZE;
+
+	/* Copy RSA signature from the fw image to HW for verification */
+	sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, UOS_RSA_SIG_SIZE, offset);
+	for (i = 0; i < UOS_RSA_SIG_SIZE / sizeof(u32); i++)
+		I915_WRITE(UOS_RSA_SCRATCH_0 + i * sizeof(u32), rsa[i]);
+
+	/* Set the source address for the new blob */
+	offset = i915_gem_obj_ggtt_offset(fw_obj);
+	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
+	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
+
+	/* Set the destination. Current uCode expects an 8k stack starting from
+	 * offset 0. */
+	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
+
+	/* XXX: The image is automatically transfered to SRAM after the RSA
+	 * verification. This is why the address space is chosen as such. */
+	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
+
+	I915_WRITE(DMA_COPY_SIZE, ucode_size);
+
+	/* Finally start the DMA */
+	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
+
+	/*
+	 * Spin-wait for the DMA to complete & the GuC to start up.
+	 * NB: Docs recommend not using the interrupt for completion.
+	 * Measurements indicate this should take no more than 20ms, so a
+	 * timeout here indicates that the GuC has failed and is unusable.
+	 * (Higher levels of the driver will attempt to fall back to
+	 * execlist mode if this happens.)
+	 */
+	ret = wait_for_atomic(guc_ucode_response(dev_priv, &status), 100);
+
+	DRM_DEBUG_DRIVER("DMA status 0x%x, GuC status 0x%x\n",
+			I915_READ(DMA_CTRL), status);
+
+	if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
+		DRM_ERROR("%s firmware signature verification failed\n",
+			guc_fw->uc_name);
+		ret = -ENOEXEC;
+	}
+
+	DRM_DEBUG_DRIVER("returning %d\n", ret);
+
+	return ret;
+}
+
+/*
+ * Load the GuC firmware blob into the MinuteIA.
+ */
+static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
+{
+	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
+	struct drm_device *dev = dev_priv->dev;
+	int ret;
+
+	ret = i915_gem_object_set_to_gtt_domain(guc_fw->uc_fw_obj, false);
+	if (ret) {
+		DRM_DEBUG_DRIVER("set-domain failed %d\n", ret);
+		return ret;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(guc_fw->uc_fw_obj, 0, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("pin failed %d\n", ret);
+		return ret;
+	}
+
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* init WOPCM */
+	I915_WRITE(GUC_WOPCM_SIZE, GUC_WOPCM_SIZE_VALUE);
+	I915_WRITE(DMA_GUC_WOPCM_OFFSET, GUC_WOPCM_OFFSET);
+
+	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
+	I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+
+	/* Set MMIO/WA for GuC init */
+	I915_WRITE(DRBMISC1, DOORBELL_ENABLE);
+
+	/* Enable MIA caching. GuC clock gating is disabled. */
+	I915_WRITE(GUC_SHIM_CONTROL, GUC_SHIM_CONTROL_VALUE);
+
+	/* WaC6DisallowByGfxPause*/
+	I915_WRITE(GEN6_GFXPAUSE, 0x30FFF);
+
+	if (IS_SKYLAKE(dev))
+		I915_WRITE(GEN9_GT_PM_CONFIG, GEN8_GT_DOORBELL_ENABLE);
+	else
+		I915_WRITE(GEN8_GT_PM_CONFIG, GEN8_GT_DOORBELL_ENABLE);
+
+	if (IS_GEN9(dev)) {
+		/* DOP Clock Gating Enable for GuC clocks */
+		I915_WRITE(GEN7_MISCCPCTL, (GEN8_DOP_CLOCK_GATE_GUC_ENABLE |
+					    I915_READ(GEN7_MISCCPCTL)));
+
+		/* allows for 5us before GT can go to RC6 */
+		I915_WRITE(GUC_ARAT_C6DIS, 0x1FF);
+	}
+
+	set_guc_init_params(dev_priv);
+
+	ret = guc_ucode_xfer_dma(dev_priv);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+
+	/*
+	 * We keep the object pages for reuse during resume. But we can unpin it
+	 * now that DMA has completed, so it doesn't continue to take up space.
+	 */
+	i915_gem_object_ggtt_unpin(guc_fw->uc_fw_obj);
+
+	return ret;
+}
+
+/*
+ * Check the firmware that was found; if it's the wrong size or the wrong
+ * version, return a negative error code. If it's OK, return a positive
+ * status value. Here we can just return INTEL_UC_FW_GOOD; the common loader
+ * code will then save the data for later in a pageable (tmpfs-backed) GEM
+ * object.
+ *
+ * Alternatively (for example if we wanted only part of the image) we could
+ * save the required portion here and then return INTEL_UC_FW_SAVED to tell
+ * the common loader that the data is good, and we've already handled saving
+ * anything we need later.
+ *
+ * The GuC firmware image has the version number embedded at a well-known
+ * offset within the firmware blob; note that major / minor version are
+ * TWO bytes each (i.e. u16), although all pointers and offsets are defined
+ * in terms of bytes (u8).
+ */
+static int guc_ucode_check(struct intel_uc_fw *guc_fw)
+{
+	struct intel_guc *guc = container_of(guc_fw, struct intel_guc, guc_fw);
+	const u8 *css_header = guc_fw->uc_fw_blob->data + UOS_CSS_HEADER_OFFSET;
+	const size_t blobsize = guc_fw->uc_fw_blob->size;
+	const size_t minsize = UOS_CSS_HEADER_SIZE + UOS_CSS_SIGNING_SIZE;
+	const size_t maxsize = GUC_WOPCM_SIZE_VALUE + UOS_CSS_SIGNING_SIZE
+			- 0x8000; /* 32k reserved (8K stack + 24k context) */
+
+	DRM_DEBUG_DRIVER("firmware file size %zu (minimum %zu, maximum %zu)\n",
+		blobsize, minsize, maxsize);
+
+	/* Check the size of the blob befoe examining buffer contents */
+	if (blobsize < minsize || blobsize > maxsize)
+		return -ENOEXEC;
+
+	guc->fw_major_found = *(u16 *)(css_header + UOS_VER_MAJOR_OFFSET);
+	guc->fw_minor_found = *(u16 *)(css_header + UOS_VER_MINOR_OFFSET);
+
+	if (guc->fw_major_found != guc->fw_major_wanted ||
+	    guc->fw_minor_found < guc->fw_minor_wanted) {
+		DRM_ERROR("GuC firmware version %d.%d, required %d.%d\n",
+			guc->fw_major_found, guc->fw_minor_found,
+			guc->fw_major_wanted, guc->fw_minor_wanted);
+		return -ENOEXEC;
+	}
+
+	DRM_DEBUG_DRIVER("firmware version %d.%d OK (minimum %d.%d)\n",
+			guc->fw_major_found, guc->fw_minor_found,
+			guc->fw_major_wanted, guc->fw_minor_wanted);
+
+	return INTEL_UC_FW_GOOD;
+}
+
+/**
+ * intel_guc_ucode_init() - initiate a firmware loading request
+ *
+ * Called early during driver load, before GEM is initialised.
+ * Driver is single threaded, so no mutex is required.
+ *
+ * This just sets parameters for use when intel_guc_ucode_load()
+ * is called later, after GEM initialisation is complete.
+ */
+void intel_guc_ucode_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_guc *guc = &dev_priv->guc;
+	struct intel_uc_fw *guc_fw = &guc->guc_fw;
+	const char *path;
+
+	if (!HAS_GUC_SCHED(dev))
+		i915.enable_guc_submission = false;
+
+	if (!HAS_GUC_UCODE(dev)) {
+		path = NULL;
+	} else if (IS_SKYLAKE(dev)) {
+		path = I915_SKL_GUC_UCODE;
+		guc->fw_major_wanted = 3;
+		guc->fw_minor_wanted = 0;
+	} else {
+		i915.enable_guc_submission = false;
+		path = "";	/* unknown device */
+	}
+
+	intel_uc_fw_init(dev, guc_fw, "GuC", path);
+}
+
+/**
+ * intel_guc_ucode_load() - load GuC uCode into the device
+ *
+ * Called from gem_init_hw() during driver loading and also after a GPU reset.
+ *
+ * Calls the common loader to get the firmware registered earlier. On the first
+ * call, this will actually fetch it from the filesystem; thereafter, we will
+ * already either have the blob in a GEM object, or have determined that no
+ * valid firmware image could be found).
+ *
+ * If we have a good firmware image, transfer it to the h/w.
+ */
+int intel_guc_ucode_load(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
+	int err;
+
+	DRM_DEBUG_DRIVER("%s fw status: fetch %s, load %s\n",
+		guc_fw->uc_name,
+		intel_uc_fw_status_repr(guc_fw->uc_fw_fetch_status),
+		intel_uc_fw_status_repr(guc_fw->uc_fw_load_status));
+
+	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_NONE)
+		return 0;
+
+	if (guc_fw->uc_fw_fetch_status == INTEL_UC_FIRMWARE_SUCCESS &&
+	    guc_fw->uc_fw_load_status == INTEL_UC_FIRMWARE_FAIL)
+		return -ENOEXEC;
+
+	guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_PENDING;
+	err = intel_uc_fw_fetch(guc_fw, guc_ucode_check);
+	if (err)
+		goto fail;
+
+	err = guc_ucode_xfer(dev_priv);
+	if (err)
+		goto fail;
+
+	guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_SUCCESS;
+
+	DRM_DEBUG_DRIVER("%s fw status: fetch %s, load %s\n",
+		guc_fw->uc_name,
+		intel_uc_fw_status_repr(guc_fw->uc_fw_fetch_status),
+		intel_uc_fw_status_repr(guc_fw->uc_fw_load_status));
+
+	return 0;
+
+fail:
+	if (guc_fw->uc_fw_load_status == INTEL_UC_FIRMWARE_PENDING)
+		guc_fw->uc_fw_load_status = INTEL_UC_FIRMWARE_FAIL;
+
+	DRM_ERROR("Failed to initialize GuC, error %d\n", err);
+
+	return err;
+}
+
+/**
+ * intel_guc_ucode_fini() - clean up all allocated resources
+ */
+void intel_guc_ucode_fini(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_uc_fw *guc_fw = &dev_priv->guc.guc_fw;
+
+	intel_uc_fw_fini(guc_fw);
+}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2015-07-06 18:09 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-15 18:36 [PATCH 00/15] Batch submission via GuC Dave Gordon
2015-06-15 18:36 ` [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c Dave Gordon
2015-06-15 20:09   ` Chris Wilson
2015-06-17  7:23     ` Dave Gordon
2015-06-17 12:02       ` Daniel Vetter
2015-06-18 11:49         ` Dave Gordon
2015-06-18 12:10           ` Chris Wilson
2015-06-18 18:07             ` Dave Gordon
2015-06-19  8:44               ` Chris Wilson
2015-06-22 11:59                 ` Dave Gordon
2015-06-22 12:37                   ` Chris Wilson
2015-06-23 16:54                     ` Dave Gordon
2015-06-18 14:31           ` Daniel Vetter
2015-06-18 18:28             ` Dave Gordon
2015-06-24  9:32               ` Daniel Vetter
2015-06-25 12:28                 ` Dave Gordon
2015-06-24  9:40               ` Chris Wilson
2015-06-15 18:36 ` [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support Dave Gordon
2015-06-17 12:05   ` Daniel Vetter
2015-06-18 12:11     ` Dave Gordon
2015-06-18 14:49       ` Daniel Vetter
2015-06-18 15:27         ` Chris Wilson
2015-06-18 15:35           ` Daniel Vetter
2015-06-18 15:49             ` Chris Wilson
2015-06-19  8:43         ` Dave Gordon
2015-06-24 10:29           ` Daniel Vetter
2015-07-06 12:44             ` Dave Gordon
2015-07-06 13:24               ` Daniel Vetter
2015-06-15 18:36 ` [PATCH 03/15] drm/i915: Add GuC-related module parameters Dave Gordon
2015-06-15 18:36 ` [PATCH 04/15] drm/i915: Add GuC-related header files Dave Gordon
2015-06-15 20:20   ` Chris Wilson
2015-06-17 15:01     ` Dave Gordon
2015-06-23 18:10       ` Dave Gordon
2015-06-24  7:41     ` Dave Gordon
2015-06-24  9:37       ` Daniel Vetter
2015-06-15 18:36 ` [PATCH 05/15] drm/i915: GuC-specific firmware loader Dave Gordon
2015-06-15 20:30   ` Chris Wilson
2015-06-18 17:53     ` Yu Dai
2015-06-18 20:12       ` Chris Wilson
2015-06-19 14:34         ` Dave Gordon
2015-06-18 18:54     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status Dave Gordon
2015-06-16  9:40   ` Chris Wilson
2015-06-19  7:49     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open Dave Gordon
2015-06-16  9:35   ` Chris Wilson
2015-06-19  9:42     ` Dave Gordon
2015-06-17 12:18   ` Daniel Vetter
2015-06-19  9:19     ` Dave Gordon
2015-06-24 10:15       ` Daniel Vetter
2015-06-15 18:36 ` [PATCH 08/15] drm/i915: Move execlists defines from .c to .h Dave Gordon
2015-06-16  9:37   ` Chris Wilson
2015-06-17  7:31     ` Dave Gordon
2015-06-17  7:54       ` Chris Wilson
2015-06-17  7:59       ` Chris Wilson
2015-06-22 13:05         ` Dave Gordon
2015-06-15 18:36 ` [PATCH 09/15] drm/i915: GuC submission setup, phase 1 Dave Gordon
2015-06-15 21:32   ` Chris Wilson
2015-06-19 17:02     ` Dave Gordon
2015-06-19 17:22       ` Dave Gordon
2015-06-16 11:44   ` Chris Wilson
2015-06-15 18:36 ` [PATCH 10/15] drm/i915: Enable GuC firmware log Dave Gordon
2015-06-15 21:40   ` Chris Wilson
2015-06-16  9:26   ` Tvrtko Ursulin
2015-06-16 11:40     ` Chris Wilson
2015-06-16 12:29       ` Tvrtko Ursulin
2015-06-15 18:36 ` [PATCH 11/15] drm/i915: Implementation of GuC client Dave Gordon
2015-06-15 21:55   ` Chris Wilson
2015-06-19 17:55     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 12/15] drm/i915: Interrupt routing for GuC submission Dave Gordon
2015-06-16  9:24   ` Chris Wilson
2015-06-17  8:20     ` Dave Gordon
2015-06-17 12:22       ` Daniel Vetter
2015-06-17 12:41         ` Daniel Vetter
2015-06-23 11:33           ` Dave Gordon
2015-06-23 23:48             ` Yu Dai
2015-06-24 10:02               ` Daniel Vetter
2015-06-15 18:36 ` [PATCH 13/15] drm/i915: Integrate GuC-based command submission Dave Gordon
2015-06-16  9:22   ` Chris Wilson
2015-06-19 18:18     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 14/15] drm/i915: Debugfs interface for GuC submission statistics Dave Gordon
2015-06-16  9:28   ` Chris Wilson
2015-06-24  8:27     ` Dave Gordon
2015-06-15 18:36 ` [PATCH 15/15] Documentation/drm: kerneldoc for GuC Dave Gordon
2015-06-15 18:36 ` [PATCH 16/15] drm/i915: Enable GuC submission, where supported Dave Gordon
2015-06-17 12:43 ` [PATCH 00/15] Batch submission via GuC Daniel Vetter
2015-06-25  7:23   ` Dave Gordon
2015-06-25  8:05     ` Chris Wilson
2015-06-24 12:16 ` Daniel Vetter
2015-06-24 12:57   ` Chris Wilson
2015-07-03 12:30 [PATCH 00/15 v3] " Dave Gordon
2015-07-03 12:30 ` [PATCH 05/15] drm/i915: GuC-specific firmware loader Dave Gordon
2015-07-06 14:28   ` Daniel Vetter
2015-07-06 16:37     ` Dave Gordon
2015-07-06 18:12       ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.