All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Cc: mika.kuoppala@intel.com
Subject: [PATCH 37/38] drm/i915: Enable userspace to opt-out of implicit fencing
Date: Tue, 20 Sep 2016 09:30:11 +0100	[thread overview]
Message-ID: <20160920083012.2754-38-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <20160920083012.2754-1-chris@chris-wilson.co.uk>

Userspace is faced with a dilemma. The kernel requires implicit fencing
to manage resource usage (we always must wait for the GPU to finish
before releasing its PTE) and for third parties. However, userspace may
wish to avoid this serialisation if it is either using explicit fencing
between parties and wants more fine-grained access to buffers (e.g. it
may partition the buffer between uses and track fences on ranges rather
than the implicit fences tracking the whole object). It follows that
userspace needs a mechanism to avoid the kernel's serialisation on its
implicit fences before execbuf execution.

The next question is whether this is an object, execbuf or context flag.
Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
shared winsys buffers, but implicit fencing on internal surfaces)
require a per-object level flag. Given that this flag need to be only
set once for the lifetime of the object, this reduces the convenience of
having an execbuf or context level flag (and avoids having multiple
pieces of uABI controlling the same feature).

Incorrect use of this flag will result in rendering corruption and GPU
hangs - but will not result in use-after-free or similar resource
tracking issues.

Serious caveat: write ordering is not strictly correct after setting
this flag on a render target on multiple engines. This affects all
subsequent GEM operations (execbuf, set-domain, pread) and shared
dma-buf operations. A fix is possible - but costly (both in terms of
further ABI changes and runtime overhead).

Testcase: igt/gem_exec_async
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 include/uapi/drm/i915_drm.h                | 27 ++++++++++++++++++++++++++-
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index b393347554bd..19ee76284371 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -332,6 +332,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
 	case I915_PARAM_HAS_COHERENT_PHYS_GTT:
 	case I915_PARAM_HAS_EXEC_SOFTPIN:
+	case I915_PARAM_HAS_EXEC_ASYNC:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e2d4f937d0b2..7038da9aa68f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1107,6 +1107,9 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
+		if (vma->exec_entry->flags & EXEC_OBJECT_ASYNC)
+			continue;
+
 		ret = i915_gem_request_await_object
 			(req, obj, obj->base.pending_write_domain);
 		if (ret)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 03725fe89859..a2fa511b46b3 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -388,6 +388,10 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_POOLED_EU	 38
 #define I915_PARAM_MIN_EU_IN_POOL	 39
 #define I915_PARAM_MMAP_GTT_VERSION	 40
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of
+ * synchronisation with implicit fencing on individual objects.
+ */
+#define I915_PARAM_HAS_EXEC_ASYNC	 41
 
 typedef struct drm_i915_getparam {
 	__s32 param;
@@ -729,8 +733,29 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
 #define EXEC_OBJECT_PINNED		 (1<<4)
 #define EXEC_OBJECT_PAD_TO_SIZE		 (1<<5)
+/* The kernel implicitly tracks GPU activity on all GEM objects, and
+ * synchronises operations with outstanding rendering. This includes
+ * rendering on other devices if exported via dma-buf. However, sometimes
+ * this tracking is too coarse and the user knows better. For example,
+ * if the object is split into non-overlapping ranges shared between different
+ * clients or engines (i.e. suballocating objects), the implicit tracking
+ * by kernel assumes that each operation affects the whole object rather
+ * than an individual range, causing needless synchronisation between clients.
+ * The kernel will also forgo any CPU cache flushes prior to rendering from
+ * the object as the client is expected to be also handling such domain
+ * tracking.
+ *
+ * The kernel maintains the implicit tracking in order to manage resources
+ * used by the GPU - this flag only disables the synchronisation prior to
+ * rendering with this object in this execbuf.
+ *
+ * Opting out of implicit synhronisation requires the user to do its own
+ * explicit tracking to avoid rendering corruption. See, for example,
+ * I915_PARAM_HAS_EXEC_FENCE to order execbufs and execute them asynchronously.
+ */
+#define EXEC_OBJECT_ASYNC		 (1<<6)
 /* All remaining bits are MBZ and RESERVED FOR FUTURE USE */
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PAD_TO_SIZE<<1)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_ASYNC<<1)
 	__u64 flags;
 
 	union {
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-09-20  8:31 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-20  8:29 Multiple timelines, take 2 Chris Wilson
2016-09-20  8:29 ` [PATCH 01/38] drm/i915: Allow disabling error capture Chris Wilson
2016-09-21  6:13   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 02/38] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-09-26  8:58   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 03/38] drm/i915: Always use the GTT for error capture Chris Wilson
2016-09-21  7:24   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 04/38] drm/i915: Consolidate error object printing Chris Wilson
2016-09-20  8:29 ` [PATCH 05/38] drm/i915: Compress GPU objects in error state Chris Wilson
2016-09-21  7:55   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 06/38] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
2016-09-21  8:05   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 07/38] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate Chris Wilson
2016-09-20  8:29 ` [PATCH 08/38] drm/i915: Rearrange i915_wait_request() accounting with callers Chris Wilson
2016-09-21  8:12   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 09/38] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Chris Wilson
2016-09-20  8:29 ` [PATCH 10/38] drm/i915: Defer active reference until required Chris Wilson
2016-09-21  8:44   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 11/38] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2016-09-21 11:50   ` Joonas Lahtinen
2016-09-27  9:10     ` Chris Wilson
2016-09-20  8:29 ` [PATCH 12/38] drm/i915: Reuse the active golden render state batch Chris Wilson
2016-09-26  7:24   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 13/38] drm/i915: Markup GEM API with lockdep asserts Chris Wilson
2016-09-21 11:56   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 14/38] drm/i915: Use a radixtree for random access to the object's backing storage Chris Wilson
2016-09-20  8:29 ` [PATCH 15/38] drm/i915: Refactor object page API Chris Wilson
2016-09-20  8:29 ` [PATCH 16/38] drm/i915: Pass around sg_table to get_pages/put_pages backend Chris Wilson
2016-09-20 11:24   ` kbuild test robot
2016-09-20  8:29 ` [PATCH 17/38] drm/i915: Move object backing storage manipulation to its own locking Chris Wilson
2016-09-20  8:29 ` [PATCH 18/38] drm/i915/dmabuf: Acquire the backing storage outside of struct_mutex Chris Wilson
2016-09-20  8:29 ` [PATCH 19/38] drm/i915: Implement pread without struct-mutex Chris Wilson
2016-09-20  8:29 ` [PATCH 20/38] drm/i915: Implement pwrite " Chris Wilson
2016-09-20 13:47   ` kbuild test robot
2016-09-20  8:29 ` [PATCH 21/38] drm/i915: Acquire the backing storage outside of struct_mutex in set-domain Chris Wilson
2016-09-20  8:29 ` [PATCH 22/38] drm/i915: Move object release to a freelist + worker Chris Wilson
2016-09-20  8:29 ` [PATCH 23/38] drm/i915: Use lockless object free Chris Wilson
2016-09-20  8:29 ` [PATCH 24/38] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
2016-09-26  7:53   ` Joonas Lahtinen
2016-09-20  8:29 ` [PATCH 25/38] drm: Add reference counting to drm_atomic_state Chris Wilson
2016-09-21  7:24   ` Sean Paul
2016-09-20  8:30 ` [PATCH 26/38] drm/i915: Restore nonblocking awaits for modesetting Chris Wilson
2016-09-26  8:11   ` Joonas Lahtinen
2016-09-20  8:30 ` [PATCH 27/38] drm/i915: Combine seqno + tracking into a global timeline struct Chris Wilson
2016-09-20  8:30 ` [PATCH 28/38] drm/i915: Queue the idling context switch after all other timelines Chris Wilson
2016-09-26  8:49   ` Joonas Lahtinen
2016-09-20  8:30 ` [PATCH 29/38] drm/i915: Wait first for submission, before waiting for request completion Chris Wilson
2016-09-20  8:30 ` [PATCH 30/38] drm/i915: Introduce a global_seqno for each request Chris Wilson
2016-09-20  8:30 ` [PATCH 31/38] drm/i915: Record space required for request emission Chris Wilson
2016-09-20  8:30 ` [PATCH 32/38] drm/i915: Defer " Chris Wilson
2016-09-26  8:53   ` Joonas Lahtinen
2016-09-26  9:04     ` Chris Wilson
2016-09-26  9:06       ` Joonas Lahtinen
2016-09-26  9:25         ` Chris Wilson
2016-09-20  8:30 ` [PATCH 33/38] drm/i915: Move the global sync optimisation to the timeline Chris Wilson
2016-09-20  8:30 ` [PATCH 34/38] drm/i915: Create a unique name for the context Chris Wilson
2016-09-20  8:30 ` [PATCH 35/38] drm/i915: Reserve space in the global seqno during request allocation Chris Wilson
2016-09-20 18:49   ` kbuild test robot
2016-09-20 18:49   ` [PATCH] drm/i915: fix semicolon.cocci warnings kbuild test robot
2016-09-20  8:30 ` [PATCH 36/38] drm/i915: Enable multiple timelines Chris Wilson
2016-09-26  8:55   ` Joonas Lahtinen
2016-09-20  8:30 ` Chris Wilson [this message]
2016-09-20  8:30 ` [PATCH 38/38] drm/i915: Support explicit fencing for execbuf Chris Wilson
2016-09-20  9:24 ` ✗ Fi.CI.BAT: failure for series starting with [01/38] drm/i915: Allow disabling error capture Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160920083012.2754-38-chris@chris-wilson.co.uk \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=mika.kuoppala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.