All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Convert requests to use struct fence
@ 2015-12-11 13:11 John.C.Harrison
  2015-12-11 13:11 ` [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
                   ` (13 more replies)
  0 siblings, 14 replies; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.

This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.

An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.

v2: Updated for review comments by various people and to add support
for Android style 'native sync'.

v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.

v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
comments.

[Patches against drm-intel-nightly tree fetched 17/11/2015]

John Harrison (10):
  staging/android/sync: Move sync framework out of staging
  android/sync: Improved debug dump to dmesg
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  drm/i915: Add per context timelines to fence object
  drm/i915: Delay the freeing of requests until retire time
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing
  drm/i915: Add sync framework support to execbuff IOCTL
  drm/i915: Cache last IRQ seqno to reduce IRQ overhead

Maarten Lankhorst (2):
  staging/android/sync: Support sync points created from dma-fences
  staging/android/sync: add sync_fence_create_dma

Peter Lawthers (1):
  android/sync: Fix reversed sense of signaled fence

 drivers/android/Kconfig                    |  28 ++
 drivers/android/Makefile                   |   2 +
 drivers/android/sw_sync.c                  | 260 ++++++++++
 drivers/android/sw_sync.h                  |  59 +++
 drivers/android/sync.c                     | 739 +++++++++++++++++++++++++++++
 drivers/android/sync.h                     | 388 +++++++++++++++
 drivers/android/sync_debug.c               | 280 +++++++++++
 drivers/android/trace/sync.h               |  82 ++++
 drivers/gpu/drm/i915/Kconfig               |   3 +
 drivers/gpu/drm/i915/i915_debugfs.c        |   7 +-
 drivers/gpu/drm/i915/i915_drv.h            |  75 +--
 drivers/gpu/drm/i915/i915_gem.c            | 438 ++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.c    |  15 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  95 +++-
 drivers/gpu/drm/i915/i915_irq.c            |   2 +-
 drivers/gpu/drm/i915/i915_trace.h          |  13 +-
 drivers/gpu/drm/i915/intel_display.c       |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  13 +
 drivers/gpu/drm/i915/intel_pm.c            |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   9 +
 drivers/staging/android/Kconfig            |  28 --
 drivers/staging/android/Makefile           |   2 -
 drivers/staging/android/sw_sync.c          | 260 ----------
 drivers/staging/android/sw_sync.h          |  59 ---
 drivers/staging/android/sync.c             | 729 ----------------------------
 drivers/staging/android/sync.h             | 356 --------------
 drivers/staging/android/sync_debug.c       | 254 ----------
 drivers/staging/android/trace/sync.h       |  82 ----
 drivers/staging/android/uapi/sw_sync.h     |  32 --
 drivers/staging/android/uapi/sync.h        |  97 ----
 include/uapi/Kbuild                        |   1 +
 include/uapi/drm/i915_drm.h                |  16 +-
 include/uapi/sync/Kbuild                   |   3 +
 include/uapi/sync/sw_sync.h                |  32 ++
 include/uapi/sync/sync.h                   |  97 ++++
 36 files changed, 2600 insertions(+), 1971 deletions(-)
 create mode 100644 drivers/android/sw_sync.c
 create mode 100644 drivers/android/sw_sync.h
 create mode 100644 drivers/android/sync.c
 create mode 100644 drivers/android/sync.h
 create mode 100644 drivers/android/sync_debug.c
 create mode 100644 drivers/android/trace/sync.h
 delete mode 100644 drivers/staging/android/sw_sync.c
 delete mode 100644 drivers/staging/android/sw_sync.h
 delete mode 100644 drivers/staging/android/sync.c
 delete mode 100644 drivers/staging/android/sync.h
 delete mode 100644 drivers/staging/android/sync_debug.c
 delete mode 100644 drivers/staging/android/trace/sync.h
 delete mode 100644 drivers/staging/android/uapi/sw_sync.h
 delete mode 100644 drivers/staging/android/uapi/sync.h
 create mode 100644 include/uapi/sync/Kbuild
 create mode 100644 include/uapi/sync/sw_sync.h
 create mode 100644 include/uapi/sync/sync.h

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-17 17:32   ` [Intel-gfx] " Jesse Barnes
  2015-12-11 13:11 ` [PATCH 02/13] staging/android/sync: add sync_fence_create_dma John.C.Harrison
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX
  Cc: devel, Greg Kroah-Hartman, Arve Hjønnevåg,
	Riley Andrews, Maarten Lankhorst

From: Maarten Lankhorst <maarten.lankhorst@canonical.com>

Debug output assumes all sync points are built on top of Android sync points
and when we start creating them from dma-fences will NULL ptr deref unless
taught about this.

v4: Corrected patch ownership.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: devel@driverdev.osuosl.org
Cc: Riley Andrews <riandrews@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
---
 drivers/staging/android/sync_debug.c | 42 +++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c
index 91ed2c4..f45d13c 100644
--- a/drivers/staging/android/sync_debug.c
+++ b/drivers/staging/android/sync_debug.c
@@ -82,36 +82,42 @@ static const char *sync_status_str(int status)
 	return "error";
 }
 
-static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence)
+static void sync_print_pt(struct seq_file *s, struct fence *pt, bool fence)
 {
 	int status = 1;
-	struct sync_timeline *parent = sync_pt_parent(pt);
 
-	if (fence_is_signaled_locked(&pt->base))
-		status = pt->base.status;
+	if (fence_is_signaled_locked(pt))
+		status = pt->status;
 
 	seq_printf(s, "  %s%spt %s",
-		   fence ? parent->name : "",
+		   fence && pt->ops->get_timeline_name ?
+		   pt->ops->get_timeline_name(pt) : "",
 		   fence ? "_" : "",
 		   sync_status_str(status));
 
 	if (status <= 0) {
 		struct timespec64 ts64 =
-			ktime_to_timespec64(pt->base.timestamp);
+			ktime_to_timespec64(pt->timestamp);
 
 		seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec);
 	}
 
-	if (parent->ops->timeline_value_str &&
-	    parent->ops->pt_value_str) {
+	if ((!fence || pt->ops->timeline_value_str) &&
+	    pt->ops->fence_value_str) {
 		char value[64];
+		bool success;
 
-		parent->ops->pt_value_str(pt, value, sizeof(value));
-		seq_printf(s, ": %s", value);
-		if (fence) {
-			parent->ops->timeline_value_str(parent, value,
-						    sizeof(value));
-			seq_printf(s, " / %s", value);
+		pt->ops->fence_value_str(pt, value, sizeof(value));
+		success = strlen(value);
+
+		if (success)
+			seq_printf(s, ": %s", value);
+
+		if (success && fence) {
+			pt->ops->timeline_value_str(pt, value, sizeof(value));
+
+			if (strlen(value))
+				seq_printf(s, " / %s", value);
 		}
 	}
 
@@ -138,7 +144,7 @@ static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj)
 	list_for_each(pos, &obj->child_list_head) {
 		struct sync_pt *pt =
 			container_of(pos, struct sync_pt, child_list);
-		sync_print_pt(s, pt, false);
+		sync_print_pt(s, &pt->base, false);
 	}
 	spin_unlock_irqrestore(&obj->child_list_lock, flags);
 }
@@ -153,11 +159,7 @@ static void sync_print_fence(struct seq_file *s, struct sync_fence *fence)
 		   sync_status_str(atomic_read(&fence->status)));
 
 	for (i = 0; i < fence->num_fences; ++i) {
-		struct sync_pt *pt =
-			container_of(fence->cbs[i].sync_pt,
-				     struct sync_pt, base);
-
-		sync_print_pt(s, pt, true);
+		sync_print_pt(s, fence->cbs[i].sync_pt, true);
 	}
 
 	spin_lock_irqsave(&fence->wq.lock, flags);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 02/13] staging/android/sync: add sync_fence_create_dma
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
  2015-12-11 13:11 ` [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-17 17:29   ` Jesse Barnes
  2015-12-11 13:11 ` [PATCH 03/13] staging/android/sync: Move sync framework out of staging John.C.Harrison
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX
  Cc: devel, Greg Kroah-Hartman, Arve Hjønnevåg,
	Riley Andrews, Maarten Lankhorst

From: Maarten Lankhorst <maarten.lankhorst@canonical.com>

This allows users of dma fences to create a android fence.

v2: Added kerneldoc. (Tvrtko Ursulin).

v4: Updated comments from review feedback my Maarten.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: devel@driverdev.osuosl.org
Cc: Riley Andrews <riandrews@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
---
 drivers/staging/android/sync.c | 13 +++++++++----
 drivers/staging/android/sync.h | 10 ++++++++++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index f83e00c..7f0e919 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -188,7 +188,7 @@ static void fence_check_cb_func(struct fence *f, struct fence_cb *cb)
 }
 
 /* TODO: implement a create which takes more that one sync_pt */
-struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt)
 {
 	struct sync_fence *fence;
 
@@ -199,16 +199,21 @@ struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
 	fence->num_fences = 1;
 	atomic_set(&fence->status, 1);
 
-	fence->cbs[0].sync_pt = &pt->base;
+	fence->cbs[0].sync_pt = pt;
 	fence->cbs[0].fence = fence;
-	if (fence_add_callback(&pt->base, &fence->cbs[0].cb,
-			       fence_check_cb_func))
+	if (fence_add_callback(pt, &fence->cbs[0].cb, fence_check_cb_func))
 		atomic_dec(&fence->status);
 
 	sync_fence_debug_add(fence);
 
 	return fence;
 }
+EXPORT_SYMBOL(sync_fence_create_dma);
+
+struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+{
+	return sync_fence_create_dma(name, &pt->base);
+}
 EXPORT_SYMBOL(sync_fence_create);
 
 struct sync_fence *sync_fence_fdget(int fd)
diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
index 61f8a3a..afa0752 100644
--- a/drivers/staging/android/sync.h
+++ b/drivers/staging/android/sync.h
@@ -254,6 +254,16 @@ void sync_pt_free(struct sync_pt *pt);
  */
 struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt);
 
+/**
+ * sync_fence_create_dma() - creates a sync fence from dma-fence
+ * @name:	name of fence to create
+ * @pt:	dma-fence to add to the fence
+ *
+ * Creates a fence containg @pt.  Once this is called, the fence takes
+ * ownership of @pt.
+ */
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt);
+
 /*
  * API for sync_fence consumers
  */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 03/13] staging/android/sync: Move sync framework out of staging
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
  2015-12-11 13:11 ` [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
  2015-12-11 13:11 ` [PATCH 02/13] staging/android/sync: add sync_fence_create_dma John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-17 17:35   ` Jesse Barnes
  2015-12-11 13:11 ` [PATCH 04/13] android/sync: Improved debug dump to dmesg John.C.Harrison
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The sync framework is now used by the i915 driver. Therefore it can be
moved out of staging and into the regular tree. Also, the public
interfaces can actually be made public and exported.

v3: New patch for series.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Geoff Miller <geoff.miller@intel.com>
---
 drivers/android/Kconfig                |  28 ++
 drivers/android/Makefile               |   2 +
 drivers/android/sw_sync.c              | 260 ++++++++++++
 drivers/android/sw_sync.h              |  59 +++
 drivers/android/sync.c                 | 734 +++++++++++++++++++++++++++++++++
 drivers/android/sync.h                 | 366 ++++++++++++++++
 drivers/android/sync_debug.c           | 256 ++++++++++++
 drivers/android/trace/sync.h           |  82 ++++
 drivers/staging/android/Kconfig        |  28 --
 drivers/staging/android/Makefile       |   2 -
 drivers/staging/android/sw_sync.c      | 260 ------------
 drivers/staging/android/sw_sync.h      |  59 ---
 drivers/staging/android/sync.c         | 734 ---------------------------------
 drivers/staging/android/sync.h         | 366 ----------------
 drivers/staging/android/sync_debug.c   | 256 ------------
 drivers/staging/android/trace/sync.h   |  82 ----
 drivers/staging/android/uapi/sw_sync.h |  32 --
 drivers/staging/android/uapi/sync.h    |  97 -----
 include/uapi/Kbuild                    |   1 +
 include/uapi/sync/Kbuild               |   3 +
 include/uapi/sync/sw_sync.h            |  32 ++
 include/uapi/sync/sync.h               |  97 +++++
 22 files changed, 1920 insertions(+), 1916 deletions(-)
 create mode 100644 drivers/android/sw_sync.c
 create mode 100644 drivers/android/sw_sync.h
 create mode 100644 drivers/android/sync.c
 create mode 100644 drivers/android/sync.h
 create mode 100644 drivers/android/sync_debug.c
 create mode 100644 drivers/android/trace/sync.h
 delete mode 100644 drivers/staging/android/sw_sync.c
 delete mode 100644 drivers/staging/android/sw_sync.h
 delete mode 100644 drivers/staging/android/sync.c
 delete mode 100644 drivers/staging/android/sync.h
 delete mode 100644 drivers/staging/android/sync_debug.c
 delete mode 100644 drivers/staging/android/trace/sync.h
 delete mode 100644 drivers/staging/android/uapi/sw_sync.h
 delete mode 100644 drivers/staging/android/uapi/sync.h
 create mode 100644 include/uapi/sync/Kbuild
 create mode 100644 include/uapi/sync/sw_sync.h
 create mode 100644 include/uapi/sync/sync.h

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index bdfc6c6..9edcd8f 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -32,6 +32,34 @@ config ANDROID_BINDER_IPC_32BIT
 
 	  Note that enabling this will break newer Android user-space.
 
+config SYNC
+	bool "Synchronization framework"
+	default n
+	select ANON_INODES
+	select DMA_SHARED_BUFFER
+	---help---
+	  This option enables the framework for synchronization between multiple
+	  drivers.  Sync implementations can take advantage of hardware
+	  synchronization built into devices like GPUs.
+
+config SW_SYNC
+	bool "Software synchronization objects"
+	default n
+	depends on SYNC
+	---help---
+	  A sync object driver that uses a 32bit counter to coordinate
+	  synchronization.  Useful when there is no hardware primitive backing
+	  the synchronization.
+
+config SW_SYNC_USER
+	bool "Userspace API for SW_SYNC"
+	default n
+	depends on SW_SYNC
+	---help---
+	  Provides a user space API to the sw sync object.
+	  *WARNING* improper use of this can result in deadlocking kernel
+	  drivers from userspace.
+
 endif # if ANDROID
 
 endmenu
diff --git a/drivers/android/Makefile b/drivers/android/Makefile
index 3b7e4b0..a1465dd 100644
--- a/drivers/android/Makefile
+++ b/drivers/android/Makefile
@@ -1,3 +1,5 @@
 ccflags-y += -I$(src)			# needed for trace events
 
 obj-$(CONFIG_ANDROID_BINDER_IPC)	+= binder.o
+obj-$(CONFIG_SYNC)			+= sync.o sync_debug.o
+obj-$(CONFIG_SW_SYNC)			+= sw_sync.o
diff --git a/drivers/android/sw_sync.c b/drivers/android/sw_sync.c
new file mode 100644
index 0000000..c4ff167
--- /dev/null
+++ b/drivers/android/sw_sync.c
@@ -0,0 +1,260 @@
+/*
+ * drivers/base/sw_sync.c
+ *
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/export.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+
+#include "sw_sync.h"
+
+static int sw_sync_cmp(u32 a, u32 b)
+{
+	if (a == b)
+		return 0;
+
+	return ((s32)a - (s32)b) < 0 ? -1 : 1;
+}
+
+struct sync_pt *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value)
+{
+	struct sw_sync_pt *pt;
+
+	pt = (struct sw_sync_pt *)
+		sync_pt_create(&obj->obj, sizeof(struct sw_sync_pt));
+
+	pt->value = value;
+
+	return (struct sync_pt *)pt;
+}
+EXPORT_SYMBOL(sw_sync_pt_create);
+
+static struct sync_pt *sw_sync_pt_dup(struct sync_pt *sync_pt)
+{
+	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
+	struct sw_sync_timeline *obj =
+		(struct sw_sync_timeline *)sync_pt_parent(sync_pt);
+
+	return (struct sync_pt *)sw_sync_pt_create(obj, pt->value);
+}
+
+static int sw_sync_pt_has_signaled(struct sync_pt *sync_pt)
+{
+	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
+	struct sw_sync_timeline *obj =
+		(struct sw_sync_timeline *)sync_pt_parent(sync_pt);
+
+	return sw_sync_cmp(obj->value, pt->value) >= 0;
+}
+
+static int sw_sync_pt_compare(struct sync_pt *a, struct sync_pt *b)
+{
+	struct sw_sync_pt *pt_a = (struct sw_sync_pt *)a;
+	struct sw_sync_pt *pt_b = (struct sw_sync_pt *)b;
+
+	return sw_sync_cmp(pt_a->value, pt_b->value);
+}
+
+static int sw_sync_fill_driver_data(struct sync_pt *sync_pt,
+				    void *data, int size)
+{
+	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
+
+	if (size < sizeof(pt->value))
+		return -ENOMEM;
+
+	memcpy(data, &pt->value, sizeof(pt->value));
+
+	return sizeof(pt->value);
+}
+
+static void sw_sync_timeline_value_str(struct sync_timeline *sync_timeline,
+				       char *str, int size)
+{
+	struct sw_sync_timeline *timeline =
+		(struct sw_sync_timeline *)sync_timeline;
+	snprintf(str, size, "%d", timeline->value);
+}
+
+static void sw_sync_pt_value_str(struct sync_pt *sync_pt,
+				 char *str, int size)
+{
+	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
+
+	snprintf(str, size, "%d", pt->value);
+}
+
+static struct sync_timeline_ops sw_sync_timeline_ops = {
+	.driver_name = "sw_sync",
+	.dup = sw_sync_pt_dup,
+	.has_signaled = sw_sync_pt_has_signaled,
+	.compare = sw_sync_pt_compare,
+	.fill_driver_data = sw_sync_fill_driver_data,
+	.timeline_value_str = sw_sync_timeline_value_str,
+	.pt_value_str = sw_sync_pt_value_str,
+};
+
+struct sw_sync_timeline *sw_sync_timeline_create(const char *name)
+{
+	struct sw_sync_timeline *obj = (struct sw_sync_timeline *)
+		sync_timeline_create(&sw_sync_timeline_ops,
+				     sizeof(struct sw_sync_timeline),
+				     name);
+
+	return obj;
+}
+EXPORT_SYMBOL(sw_sync_timeline_create);
+
+void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc)
+{
+	obj->value += inc;
+
+	sync_timeline_signal(&obj->obj);
+}
+EXPORT_SYMBOL(sw_sync_timeline_inc);
+
+#ifdef CONFIG_SW_SYNC_USER
+/* *WARNING*
+ *
+ * improper use of this can result in deadlocking kernel drivers from userspace.
+ */
+
+/* opening sw_sync create a new sync obj */
+static int sw_sync_open(struct inode *inode, struct file *file)
+{
+	struct sw_sync_timeline *obj;
+	char task_comm[TASK_COMM_LEN];
+
+	get_task_comm(task_comm, current);
+
+	obj = sw_sync_timeline_create(task_comm);
+	if (!obj)
+		return -ENOMEM;
+
+	file->private_data = obj;
+
+	return 0;
+}
+
+static int sw_sync_release(struct inode *inode, struct file *file)
+{
+	struct sw_sync_timeline *obj = file->private_data;
+
+	sync_timeline_destroy(&obj->obj);
+	return 0;
+}
+
+static long sw_sync_ioctl_create_fence(struct sw_sync_timeline *obj,
+				       unsigned long arg)
+{
+	int fd = get_unused_fd_flags(O_CLOEXEC);
+	int err;
+	struct sync_pt *pt;
+	struct sync_fence *fence;
+	struct sw_sync_create_fence_data data;
+
+	if (fd < 0)
+		return fd;
+
+	if (copy_from_user(&data, (void __user *)arg, sizeof(data))) {
+		err = -EFAULT;
+		goto err;
+	}
+
+	pt = sw_sync_pt_create(obj, data.value);
+	if (!pt) {
+		err = -ENOMEM;
+		goto err;
+	}
+
+	data.name[sizeof(data.name) - 1] = '\0';
+	fence = sync_fence_create(data.name, pt);
+	if (!fence) {
+		sync_pt_free(pt);
+		err = -ENOMEM;
+		goto err;
+	}
+
+	data.fence = fd;
+	if (copy_to_user((void __user *)arg, &data, sizeof(data))) {
+		sync_fence_put(fence);
+		err = -EFAULT;
+		goto err;
+	}
+
+	sync_fence_install(fence, fd);
+
+	return 0;
+
+err:
+	put_unused_fd(fd);
+	return err;
+}
+
+static long sw_sync_ioctl_inc(struct sw_sync_timeline *obj, unsigned long arg)
+{
+	u32 value;
+
+	if (copy_from_user(&value, (void __user *)arg, sizeof(value)))
+		return -EFAULT;
+
+	sw_sync_timeline_inc(obj, value);
+
+	return 0;
+}
+
+static long sw_sync_ioctl(struct file *file, unsigned int cmd,
+			  unsigned long arg)
+{
+	struct sw_sync_timeline *obj = file->private_data;
+
+	switch (cmd) {
+	case SW_SYNC_IOC_CREATE_FENCE:
+		return sw_sync_ioctl_create_fence(obj, arg);
+
+	case SW_SYNC_IOC_INC:
+		return sw_sync_ioctl_inc(obj, arg);
+
+	default:
+		return -ENOTTY;
+	}
+}
+
+static const struct file_operations sw_sync_fops = {
+	.owner = THIS_MODULE,
+	.open = sw_sync_open,
+	.release = sw_sync_release,
+	.unlocked_ioctl = sw_sync_ioctl,
+	.compat_ioctl = sw_sync_ioctl,
+};
+
+static struct miscdevice sw_sync_dev = {
+	.minor	= MISC_DYNAMIC_MINOR,
+	.name	= "sw_sync",
+	.fops	= &sw_sync_fops,
+};
+
+static int __init sw_sync_device_init(void)
+{
+	return misc_register(&sw_sync_dev);
+}
+device_initcall(sw_sync_device_init);
+
+#endif /* CONFIG_SW_SYNC_USER */
diff --git a/drivers/android/sw_sync.h b/drivers/android/sw_sync.h
new file mode 100644
index 0000000..4bf8b86
--- /dev/null
+++ b/drivers/android/sw_sync.h
@@ -0,0 +1,59 @@
+/*
+ * include/linux/sw_sync.h
+ *
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_SW_SYNC_H
+#define _LINUX_SW_SYNC_H
+
+#include <linux/types.h>
+#include <linux/kconfig.h>
+#include <uapi/sync/sw_sync.h>
+#include "sync.h"
+
+struct sw_sync_timeline {
+	struct	sync_timeline	obj;
+
+	u32			value;
+};
+
+struct sw_sync_pt {
+	struct sync_pt		pt;
+
+	u32			value;
+};
+
+#if IS_ENABLED(CONFIG_SW_SYNC)
+struct sw_sync_timeline *sw_sync_timeline_create(const char *name);
+void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc);
+
+struct sync_pt *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value);
+#else
+static inline struct sw_sync_timeline *sw_sync_timeline_create(const char *name)
+{
+	return NULL;
+}
+
+static inline void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc)
+{
+}
+
+static inline struct sync_pt *sw_sync_pt_create(struct sw_sync_timeline *obj,
+						u32 value)
+{
+	return NULL;
+}
+#endif /* IS_ENABLED(CONFIG_SW_SYNC) */
+
+#endif /* _LINUX_SW_SYNC_H */
diff --git a/drivers/android/sync.c b/drivers/android/sync.c
new file mode 100644
index 0000000..7f0e919
--- /dev/null
+++ b/drivers/android/sync.c
@@ -0,0 +1,734 @@
+/*
+ * drivers/base/sync.c
+ *
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/export.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/anon_inodes.h>
+
+#include "sync.h"
+
+#define CREATE_TRACE_POINTS
+#include "trace/sync.h"
+
+static const struct fence_ops android_fence_ops;
+static const struct file_operations sync_fence_fops;
+
+struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
+					   int size, const char *name)
+{
+	struct sync_timeline *obj;
+
+	if (size < sizeof(struct sync_timeline))
+		return NULL;
+
+	obj = kzalloc(size, GFP_KERNEL);
+	if (obj == NULL)
+		return NULL;
+
+	kref_init(&obj->kref);
+	obj->ops = ops;
+	obj->context = fence_context_alloc(1);
+	strlcpy(obj->name, name, sizeof(obj->name));
+
+	INIT_LIST_HEAD(&obj->child_list_head);
+	INIT_LIST_HEAD(&obj->active_list_head);
+	spin_lock_init(&obj->child_list_lock);
+
+	sync_timeline_debug_add(obj);
+
+	return obj;
+}
+EXPORT_SYMBOL(sync_timeline_create);
+
+static void sync_timeline_free(struct kref *kref)
+{
+	struct sync_timeline *obj =
+		container_of(kref, struct sync_timeline, kref);
+
+	sync_timeline_debug_remove(obj);
+
+	if (obj->ops->release_obj)
+		obj->ops->release_obj(obj);
+
+	kfree(obj);
+}
+
+static void sync_timeline_get(struct sync_timeline *obj)
+{
+	kref_get(&obj->kref);
+}
+
+static void sync_timeline_put(struct sync_timeline *obj)
+{
+	kref_put(&obj->kref, sync_timeline_free);
+}
+
+void sync_timeline_destroy(struct sync_timeline *obj)
+{
+	obj->destroyed = true;
+	/*
+	 * Ensure timeline is marked as destroyed before
+	 * changing timeline's fences status.
+	 */
+	smp_wmb();
+
+	/*
+	 * signal any children that their parent is going away.
+	 */
+	sync_timeline_signal(obj);
+	sync_timeline_put(obj);
+}
+EXPORT_SYMBOL(sync_timeline_destroy);
+
+void sync_timeline_signal(struct sync_timeline *obj)
+{
+	unsigned long flags;
+	LIST_HEAD(signaled_pts);
+	struct sync_pt *pt, *next;
+
+	trace_sync_timeline(obj);
+
+	spin_lock_irqsave(&obj->child_list_lock, flags);
+
+	list_for_each_entry_safe(pt, next, &obj->active_list_head,
+				 active_list) {
+		if (fence_is_signaled_locked(&pt->base))
+			list_del_init(&pt->active_list);
+	}
+
+	spin_unlock_irqrestore(&obj->child_list_lock, flags);
+}
+EXPORT_SYMBOL(sync_timeline_signal);
+
+struct sync_pt *sync_pt_create(struct sync_timeline *obj, int size)
+{
+	unsigned long flags;
+	struct sync_pt *pt;
+
+	if (size < sizeof(struct sync_pt))
+		return NULL;
+
+	pt = kzalloc(size, GFP_KERNEL);
+	if (pt == NULL)
+		return NULL;
+
+	spin_lock_irqsave(&obj->child_list_lock, flags);
+	sync_timeline_get(obj);
+	fence_init(&pt->base, &android_fence_ops, &obj->child_list_lock,
+		   obj->context, ++obj->value);
+	list_add_tail(&pt->child_list, &obj->child_list_head);
+	INIT_LIST_HEAD(&pt->active_list);
+	spin_unlock_irqrestore(&obj->child_list_lock, flags);
+	return pt;
+}
+EXPORT_SYMBOL(sync_pt_create);
+
+void sync_pt_free(struct sync_pt *pt)
+{
+	fence_put(&pt->base);
+}
+EXPORT_SYMBOL(sync_pt_free);
+
+static struct sync_fence *sync_fence_alloc(int size, const char *name)
+{
+	struct sync_fence *fence;
+
+	fence = kzalloc(size, GFP_KERNEL);
+	if (fence == NULL)
+		return NULL;
+
+	fence->file = anon_inode_getfile("sync_fence", &sync_fence_fops,
+					 fence, 0);
+	if (IS_ERR(fence->file))
+		goto err;
+
+	kref_init(&fence->kref);
+	strlcpy(fence->name, name, sizeof(fence->name));
+
+	init_waitqueue_head(&fence->wq);
+
+	return fence;
+
+err:
+	kfree(fence);
+	return NULL;
+}
+
+static void fence_check_cb_func(struct fence *f, struct fence_cb *cb)
+{
+	struct sync_fence_cb *check;
+	struct sync_fence *fence;
+
+	check = container_of(cb, struct sync_fence_cb, cb);
+	fence = check->fence;
+
+	if (atomic_dec_and_test(&fence->status))
+		wake_up_all(&fence->wq);
+}
+
+/* TODO: implement a create which takes more that one sync_pt */
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt)
+{
+	struct sync_fence *fence;
+
+	fence = sync_fence_alloc(offsetof(struct sync_fence, cbs[1]), name);
+	if (fence == NULL)
+		return NULL;
+
+	fence->num_fences = 1;
+	atomic_set(&fence->status, 1);
+
+	fence->cbs[0].sync_pt = pt;
+	fence->cbs[0].fence = fence;
+	if (fence_add_callback(pt, &fence->cbs[0].cb, fence_check_cb_func))
+		atomic_dec(&fence->status);
+
+	sync_fence_debug_add(fence);
+
+	return fence;
+}
+EXPORT_SYMBOL(sync_fence_create_dma);
+
+struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+{
+	return sync_fence_create_dma(name, &pt->base);
+}
+EXPORT_SYMBOL(sync_fence_create);
+
+struct sync_fence *sync_fence_fdget(int fd)
+{
+	struct file *file = fget(fd);
+
+	if (file == NULL)
+		return NULL;
+
+	if (file->f_op != &sync_fence_fops)
+		goto err;
+
+	return file->private_data;
+
+err:
+	fput(file);
+	return NULL;
+}
+EXPORT_SYMBOL(sync_fence_fdget);
+
+void sync_fence_put(struct sync_fence *fence)
+{
+	fput(fence->file);
+}
+EXPORT_SYMBOL(sync_fence_put);
+
+void sync_fence_install(struct sync_fence *fence, int fd)
+{
+	fd_install(fd, fence->file);
+}
+EXPORT_SYMBOL(sync_fence_install);
+
+static void sync_fence_add_pt(struct sync_fence *fence,
+			      int *i, struct fence *pt)
+{
+	fence->cbs[*i].sync_pt = pt;
+	fence->cbs[*i].fence = fence;
+
+	if (!fence_add_callback(pt, &fence->cbs[*i].cb, fence_check_cb_func)) {
+		fence_get(pt);
+		(*i)++;
+	}
+}
+
+struct sync_fence *sync_fence_merge(const char *name,
+				    struct sync_fence *a, struct sync_fence *b)
+{
+	int num_fences = a->num_fences + b->num_fences;
+	struct sync_fence *fence;
+	int i, i_a, i_b;
+	unsigned long size = offsetof(struct sync_fence, cbs[num_fences]);
+
+	fence = sync_fence_alloc(size, name);
+	if (fence == NULL)
+		return NULL;
+
+	atomic_set(&fence->status, num_fences);
+
+	/*
+	 * Assume sync_fence a and b are both ordered and have no
+	 * duplicates with the same context.
+	 *
+	 * If a sync_fence can only be created with sync_fence_merge
+	 * and sync_fence_create, this is a reasonable assumption.
+	 */
+	for (i = i_a = i_b = 0; i_a < a->num_fences && i_b < b->num_fences; ) {
+		struct fence *pt_a = a->cbs[i_a].sync_pt;
+		struct fence *pt_b = b->cbs[i_b].sync_pt;
+
+		if (pt_a->context < pt_b->context) {
+			sync_fence_add_pt(fence, &i, pt_a);
+
+			i_a++;
+		} else if (pt_a->context > pt_b->context) {
+			sync_fence_add_pt(fence, &i, pt_b);
+
+			i_b++;
+		} else {
+			if (pt_a->seqno - pt_b->seqno <= INT_MAX)
+				sync_fence_add_pt(fence, &i, pt_a);
+			else
+				sync_fence_add_pt(fence, &i, pt_b);
+
+			i_a++;
+			i_b++;
+		}
+	}
+
+	for (; i_a < a->num_fences; i_a++)
+		sync_fence_add_pt(fence, &i, a->cbs[i_a].sync_pt);
+
+	for (; i_b < b->num_fences; i_b++)
+		sync_fence_add_pt(fence, &i, b->cbs[i_b].sync_pt);
+
+	if (num_fences > i)
+		atomic_sub(num_fences - i, &fence->status);
+	fence->num_fences = i;
+
+	sync_fence_debug_add(fence);
+	return fence;
+}
+EXPORT_SYMBOL(sync_fence_merge);
+
+int sync_fence_wake_up_wq(wait_queue_t *curr, unsigned mode,
+				 int wake_flags, void *key)
+{
+	struct sync_fence_waiter *wait;
+
+	wait = container_of(curr, struct sync_fence_waiter, work);
+	list_del_init(&wait->work.task_list);
+
+	wait->callback(wait->work.private, wait);
+	return 1;
+}
+
+int sync_fence_wait_async(struct sync_fence *fence,
+			  struct sync_fence_waiter *waiter)
+{
+	int err = atomic_read(&fence->status);
+	unsigned long flags;
+
+	if (err < 0)
+		return err;
+
+	if (!err)
+		return 1;
+
+	init_waitqueue_func_entry(&waiter->work, sync_fence_wake_up_wq);
+	waiter->work.private = fence;
+
+	spin_lock_irqsave(&fence->wq.lock, flags);
+	err = atomic_read(&fence->status);
+	if (err > 0)
+		__add_wait_queue_tail(&fence->wq, &waiter->work);
+	spin_unlock_irqrestore(&fence->wq.lock, flags);
+
+	if (err < 0)
+		return err;
+
+	return !err;
+}
+EXPORT_SYMBOL(sync_fence_wait_async);
+
+int sync_fence_cancel_async(struct sync_fence *fence,
+			     struct sync_fence_waiter *waiter)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&fence->wq.lock, flags);
+	if (!list_empty(&waiter->work.task_list))
+		list_del_init(&waiter->work.task_list);
+	else
+		ret = -ENOENT;
+	spin_unlock_irqrestore(&fence->wq.lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL(sync_fence_cancel_async);
+
+int sync_fence_wait(struct sync_fence *fence, long timeout)
+{
+	long ret;
+	int i;
+
+	if (timeout < 0)
+		timeout = MAX_SCHEDULE_TIMEOUT;
+	else
+		timeout = msecs_to_jiffies(timeout);
+
+	trace_sync_wait(fence, 1);
+	for (i = 0; i < fence->num_fences; ++i)
+		trace_sync_pt(fence->cbs[i].sync_pt);
+	ret = wait_event_interruptible_timeout(fence->wq,
+					       atomic_read(&fence->status) <= 0,
+					       timeout);
+	trace_sync_wait(fence, 0);
+
+	if (ret < 0) {
+		return ret;
+	} else if (ret == 0) {
+		if (timeout) {
+			pr_info("fence timeout on [%p] after %dms\n", fence,
+				jiffies_to_msecs(timeout));
+			sync_dump();
+		}
+		return -ETIME;
+	}
+
+	ret = atomic_read(&fence->status);
+	if (ret) {
+		pr_info("fence error %ld on [%p]\n", ret, fence);
+		sync_dump();
+	}
+	return ret;
+}
+EXPORT_SYMBOL(sync_fence_wait);
+
+static const char *android_fence_get_driver_name(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	return parent->ops->driver_name;
+}
+
+static const char *android_fence_get_timeline_name(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	return parent->name;
+}
+
+static void android_fence_release(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+	unsigned long flags;
+
+	spin_lock_irqsave(fence->lock, flags);
+	list_del(&pt->child_list);
+	if (WARN_ON_ONCE(!list_empty(&pt->active_list)))
+		list_del(&pt->active_list);
+	spin_unlock_irqrestore(fence->lock, flags);
+
+	if (parent->ops->free_pt)
+		parent->ops->free_pt(pt);
+
+	sync_timeline_put(parent);
+	fence_free(&pt->base);
+}
+
+static bool android_fence_signaled(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+	int ret;
+
+	ret = parent->ops->has_signaled(pt);
+	if (ret < 0)
+		fence->status = ret;
+	return ret;
+}
+
+static bool android_fence_enable_signaling(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (android_fence_signaled(fence))
+		return false;
+
+	list_add_tail(&pt->active_list, &parent->active_list_head);
+	return true;
+}
+
+static int android_fence_fill_driver_data(struct fence *fence,
+					  void *data, int size)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (!parent->ops->fill_driver_data)
+		return 0;
+	return parent->ops->fill_driver_data(pt, data, size);
+}
+
+static void android_fence_value_str(struct fence *fence,
+				    char *str, int size)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (!parent->ops->pt_value_str) {
+		if (size)
+			*str = 0;
+		return;
+	}
+	parent->ops->pt_value_str(pt, str, size);
+}
+
+static void android_fence_timeline_value_str(struct fence *fence,
+					     char *str, int size)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (!parent->ops->timeline_value_str) {
+		if (size)
+			*str = 0;
+		return;
+	}
+	parent->ops->timeline_value_str(parent, str, size);
+}
+
+static const struct fence_ops android_fence_ops = {
+	.get_driver_name = android_fence_get_driver_name,
+	.get_timeline_name = android_fence_get_timeline_name,
+	.enable_signaling = android_fence_enable_signaling,
+	.signaled = android_fence_signaled,
+	.wait = fence_default_wait,
+	.release = android_fence_release,
+	.fill_driver_data = android_fence_fill_driver_data,
+	.fence_value_str = android_fence_value_str,
+	.timeline_value_str = android_fence_timeline_value_str,
+};
+
+static void sync_fence_free(struct kref *kref)
+{
+	struct sync_fence *fence = container_of(kref, struct sync_fence, kref);
+	int i, status = atomic_read(&fence->status);
+
+	for (i = 0; i < fence->num_fences; ++i) {
+		if (status)
+			fence_remove_callback(fence->cbs[i].sync_pt,
+					      &fence->cbs[i].cb);
+		fence_put(fence->cbs[i].sync_pt);
+	}
+
+	kfree(fence);
+}
+
+static int sync_fence_release(struct inode *inode, struct file *file)
+{
+	struct sync_fence *fence = file->private_data;
+
+	sync_fence_debug_remove(fence);
+
+	kref_put(&fence->kref, sync_fence_free);
+	return 0;
+}
+
+static unsigned int sync_fence_poll(struct file *file, poll_table *wait)
+{
+	struct sync_fence *fence = file->private_data;
+	int status;
+
+	poll_wait(file, &fence->wq, wait);
+
+	status = atomic_read(&fence->status);
+
+	if (!status)
+		return POLLIN;
+	else if (status < 0)
+		return POLLERR;
+	return 0;
+}
+
+static long sync_fence_ioctl_wait(struct sync_fence *fence, unsigned long arg)
+{
+	__s32 value;
+
+	if (copy_from_user(&value, (void __user *)arg, sizeof(value)))
+		return -EFAULT;
+
+	return sync_fence_wait(fence, value);
+}
+
+static long sync_fence_ioctl_merge(struct sync_fence *fence, unsigned long arg)
+{
+	int fd = get_unused_fd_flags(O_CLOEXEC);
+	int err;
+	struct sync_fence *fence2, *fence3;
+	struct sync_merge_data data;
+
+	if (fd < 0)
+		return fd;
+
+	if (copy_from_user(&data, (void __user *)arg, sizeof(data))) {
+		err = -EFAULT;
+		goto err_put_fd;
+	}
+
+	fence2 = sync_fence_fdget(data.fd2);
+	if (fence2 == NULL) {
+		err = -ENOENT;
+		goto err_put_fd;
+	}
+
+	data.name[sizeof(data.name) - 1] = '\0';
+	fence3 = sync_fence_merge(data.name, fence, fence2);
+	if (fence3 == NULL) {
+		err = -ENOMEM;
+		goto err_put_fence2;
+	}
+
+	data.fence = fd;
+	if (copy_to_user((void __user *)arg, &data, sizeof(data))) {
+		err = -EFAULT;
+		goto err_put_fence3;
+	}
+
+	sync_fence_install(fence3, fd);
+	sync_fence_put(fence2);
+	return 0;
+
+err_put_fence3:
+	sync_fence_put(fence3);
+
+err_put_fence2:
+	sync_fence_put(fence2);
+
+err_put_fd:
+	put_unused_fd(fd);
+	return err;
+}
+
+static int sync_fill_pt_info(struct fence *fence, void *data, int size)
+{
+	struct sync_pt_info *info = data;
+	int ret;
+
+	if (size < sizeof(struct sync_pt_info))
+		return -ENOMEM;
+
+	info->len = sizeof(struct sync_pt_info);
+
+	if (fence->ops->fill_driver_data) {
+		ret = fence->ops->fill_driver_data(fence, info->driver_data,
+						   size - sizeof(*info));
+		if (ret < 0)
+			return ret;
+
+		info->len += ret;
+	}
+
+	strlcpy(info->obj_name, fence->ops->get_timeline_name(fence),
+		sizeof(info->obj_name));
+	strlcpy(info->driver_name, fence->ops->get_driver_name(fence),
+		sizeof(info->driver_name));
+	if (fence_is_signaled(fence))
+		info->status = fence->status >= 0 ? 1 : fence->status;
+	else
+		info->status = 0;
+	info->timestamp_ns = ktime_to_ns(fence->timestamp);
+
+	return info->len;
+}
+
+static long sync_fence_ioctl_fence_info(struct sync_fence *fence,
+					unsigned long arg)
+{
+	struct sync_fence_info_data *data;
+	__u32 size;
+	__u32 len = 0;
+	int ret, i;
+
+	if (copy_from_user(&size, (void __user *)arg, sizeof(size)))
+		return -EFAULT;
+
+	if (size < sizeof(struct sync_fence_info_data))
+		return -EINVAL;
+
+	if (size > 4096)
+		size = 4096;
+
+	data = kzalloc(size, GFP_KERNEL);
+	if (data == NULL)
+		return -ENOMEM;
+
+	strlcpy(data->name, fence->name, sizeof(data->name));
+	data->status = atomic_read(&fence->status);
+	if (data->status >= 0)
+		data->status = !data->status;
+
+	len = sizeof(struct sync_fence_info_data);
+
+	for (i = 0; i < fence->num_fences; ++i) {
+		struct fence *pt = fence->cbs[i].sync_pt;
+
+		ret = sync_fill_pt_info(pt, (u8 *)data + len, size - len);
+
+		if (ret < 0)
+			goto out;
+
+		len += ret;
+	}
+
+	data->len = len;
+
+	if (copy_to_user((void __user *)arg, data, len))
+		ret = -EFAULT;
+	else
+		ret = 0;
+
+out:
+	kfree(data);
+
+	return ret;
+}
+
+static long sync_fence_ioctl(struct file *file, unsigned int cmd,
+			     unsigned long arg)
+{
+	struct sync_fence *fence = file->private_data;
+
+	switch (cmd) {
+	case SYNC_IOC_WAIT:
+		return sync_fence_ioctl_wait(fence, arg);
+
+	case SYNC_IOC_MERGE:
+		return sync_fence_ioctl_merge(fence, arg);
+
+	case SYNC_IOC_FENCE_INFO:
+		return sync_fence_ioctl_fence_info(fence, arg);
+
+	default:
+		return -ENOTTY;
+	}
+}
+
+static const struct file_operations sync_fence_fops = {
+	.release = sync_fence_release,
+	.poll = sync_fence_poll,
+	.unlocked_ioctl = sync_fence_ioctl,
+	.compat_ioctl = sync_fence_ioctl,
+};
+
diff --git a/drivers/android/sync.h b/drivers/android/sync.h
new file mode 100644
index 0000000..4ccff01
--- /dev/null
+++ b/drivers/android/sync.h
@@ -0,0 +1,366 @@
+/*
+ * include/linux/sync.h
+ *
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_SYNC_H
+#define _LINUX_SYNC_H
+
+#include <linux/types.h>
+#include <linux/kref.h>
+#include <linux/ktime.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+#include <linux/fence.h>
+
+#include <uapi/sync/sync.h>
+
+struct sync_timeline;
+struct sync_pt;
+struct sync_fence;
+
+/**
+ * struct sync_timeline_ops - sync object implementation ops
+ * @driver_name:	name of the implementation
+ * @dup:		duplicate a sync_pt
+ * @has_signaled:	returns:
+ *			  1 if pt has signaled
+ *			  0 if pt has not signaled
+ *			 <0 on error
+ * @compare:		returns:
+ *			  1 if b will signal before a
+ *			  0 if a and b will signal at the same time
+ *			 -1 if a will signal before b
+ * @free_pt:		called before sync_pt is freed
+ * @release_obj:	called before sync_timeline is freed
+ * @fill_driver_data:	write implementation specific driver data to data.
+ *			  should return an error if there is not enough room
+ *			  as specified by size.  This information is returned
+ *			  to userspace by SYNC_IOC_FENCE_INFO.
+ * @timeline_value_str: fill str with the value of the sync_timeline's counter
+ * @pt_value_str:	fill str with the value of the sync_pt
+ */
+struct sync_timeline_ops {
+	const char *driver_name;
+
+	/* required */
+	struct sync_pt * (*dup)(struct sync_pt *pt);
+
+	/* required */
+	int (*has_signaled)(struct sync_pt *pt);
+
+	/* required */
+	int (*compare)(struct sync_pt *a, struct sync_pt *b);
+
+	/* optional */
+	void (*free_pt)(struct sync_pt *sync_pt);
+
+	/* optional */
+	void (*release_obj)(struct sync_timeline *sync_timeline);
+
+	/* optional */
+	int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);
+
+	/* optional */
+	void (*timeline_value_str)(struct sync_timeline *timeline, char *str,
+				   int size);
+
+	/* optional */
+	void (*pt_value_str)(struct sync_pt *pt, char *str, int size);
+};
+
+/**
+ * struct sync_timeline - sync object
+ * @kref:		reference count on fence.
+ * @ops:		ops that define the implementation of the sync_timeline
+ * @name:		name of the sync_timeline. Useful for debugging
+ * @destroyed:		set when sync_timeline is destroyed
+ * @child_list_head:	list of children sync_pts for this sync_timeline
+ * @child_list_lock:	lock protecting @child_list_head, destroyed, and
+ *			  sync_pt.status
+ * @active_list_head:	list of active (unsignaled/errored) sync_pts
+ * @sync_timeline_list:	membership in global sync_timeline_list
+ */
+struct sync_timeline {
+	struct kref		kref;
+	const struct sync_timeline_ops	*ops;
+	char			name[32];
+
+	/* protected by child_list_lock */
+	bool			destroyed;
+	int			context, value;
+
+	struct list_head	child_list_head;
+	spinlock_t		child_list_lock;
+
+	struct list_head	active_list_head;
+
+#ifdef CONFIG_DEBUG_FS
+	struct list_head	sync_timeline_list;
+#endif
+};
+
+/**
+ * struct sync_pt - sync point
+ * @fence:		base fence class
+ * @child_list:		membership in sync_timeline.child_list_head
+ * @active_list:	membership in sync_timeline.active_list_head
+ * @signaled_list:	membership in temporary signaled_list on stack
+ * @fence:		sync_fence to which the sync_pt belongs
+ * @pt_list:		membership in sync_fence.pt_list_head
+ * @status:		1: signaled, 0:active, <0: error
+ * @timestamp:		time which sync_pt status transitioned from active to
+ *			  signaled or error.
+ */
+struct sync_pt {
+	struct fence base;
+
+	struct list_head	child_list;
+	struct list_head	active_list;
+};
+
+static inline struct sync_timeline *sync_pt_parent(struct sync_pt *pt)
+{
+	return container_of(pt->base.lock, struct sync_timeline,
+			    child_list_lock);
+}
+
+struct sync_fence_cb {
+	struct fence_cb cb;
+	struct fence *sync_pt;
+	struct sync_fence *fence;
+};
+
+/**
+ * struct sync_fence - sync fence
+ * @file:		file representing this fence
+ * @kref:		reference count on fence.
+ * @name:		name of sync_fence.  Useful for debugging
+ * @pt_list_head:	list of sync_pts in the fence.  immutable once fence
+ *			  is created
+ * @status:		0: signaled, >0:active, <0: error
+ *
+ * @wq:			wait queue for fence signaling
+ * @sync_fence_list:	membership in global fence list
+ */
+struct sync_fence {
+	struct file		*file;
+	struct kref		kref;
+	char			name[32];
+#ifdef CONFIG_DEBUG_FS
+	struct list_head	sync_fence_list;
+#endif
+	int num_fences;
+
+	wait_queue_head_t	wq;
+	atomic_t		status;
+
+	struct sync_fence_cb	cbs[];
+};
+
+struct sync_fence_waiter;
+typedef void (*sync_callback_t)(struct sync_fence *fence,
+				struct sync_fence_waiter *waiter);
+
+/**
+ * struct sync_fence_waiter - metadata for asynchronous waiter on a fence
+ * @waiter_list:	membership in sync_fence.waiter_list_head
+ * @callback:		function pointer to call when fence signals
+ * @callback_data:	pointer to pass to @callback
+ */
+struct sync_fence_waiter {
+	wait_queue_t work;
+	sync_callback_t callback;
+};
+
+static inline void sync_fence_waiter_init(struct sync_fence_waiter *waiter,
+					  sync_callback_t callback)
+{
+	INIT_LIST_HEAD(&waiter->work.task_list);
+	waiter->callback = callback;
+}
+
+/*
+ * API for sync_timeline implementers
+ */
+
+/**
+ * sync_timeline_create() - creates a sync object
+ * @ops:	specifies the implementation ops for the object
+ * @size:	size to allocate for this obj
+ * @name:	sync_timeline name
+ *
+ * Creates a new sync_timeline which will use the implementation specified by
+ * @ops.  @size bytes will be allocated allowing for implementation specific
+ * data to be kept after the generic sync_timeline struct.
+ */
+struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
+					   int size, const char *name);
+
+/**
+ * sync_timeline_destroy() - destroys a sync object
+ * @obj:	sync_timeline to destroy
+ *
+ * A sync implementation should call this when the @obj is going away
+ * (i.e. module unload.)  @obj won't actually be freed until all its children
+ * sync_pts are freed.
+ */
+void sync_timeline_destroy(struct sync_timeline *obj);
+
+/**
+ * sync_timeline_signal() - signal a status change on a sync_timeline
+ * @obj:	sync_timeline to signal
+ *
+ * A sync implementation should call this any time one of it's sync_pts
+ * has signaled or has an error condition.
+ */
+void sync_timeline_signal(struct sync_timeline *obj);
+
+/**
+ * sync_pt_create() - creates a sync pt
+ * @parent:	sync_pt's parent sync_timeline
+ * @size:	size to allocate for this pt
+ *
+ * Creates a new sync_pt as a child of @parent.  @size bytes will be
+ * allocated allowing for implementation specific data to be kept after
+ * the generic sync_timeline struct.
+ */
+struct sync_pt *sync_pt_create(struct sync_timeline *parent, int size);
+
+/**
+ * sync_pt_free() - frees a sync pt
+ * @pt:		sync_pt to free
+ *
+ * This should only be called on sync_pts which have been created but
+ * not added to a fence.
+ */
+void sync_pt_free(struct sync_pt *pt);
+
+/**
+ * sync_fence_create() - creates a sync fence
+ * @name:	name of fence to create
+ * @pt:		sync_pt to add to the fence
+ *
+ * Creates a fence containg @pt.  Once this is called, the fence takes
+ * ownership of @pt.
+ */
+struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt);
+
+/**
+ * sync_fence_create_dma() - creates a sync fence from dma-fence
+ * @name:	name of fence to create
+ * @pt:	dma-fence to add to the fence
+ *
+ * Creates a fence containg @pt.  Once this is called, the fence takes
+ * ownership of @pt.
+ */
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt);
+
+/*
+ * API for sync_fence consumers
+ */
+
+/**
+ * sync_fence_merge() - merge two fences
+ * @name:	name of new fence
+ * @a:		fence a
+ * @b:		fence b
+ *
+ * Creates a new fence which contains copies of all the sync_pts in both
+ * @a and @b.  @a and @b remain valid, independent fences.
+ */
+struct sync_fence *sync_fence_merge(const char *name,
+				    struct sync_fence *a, struct sync_fence *b);
+
+/**
+ * sync_fence_fdget() - get a fence from an fd
+ * @fd:		fd referencing a fence
+ *
+ * Ensures @fd references a valid fence, increments the refcount of the backing
+ * file, and returns the fence.
+ */
+struct sync_fence *sync_fence_fdget(int fd);
+
+/**
+ * sync_fence_put() - puts a reference of a sync fence
+ * @fence:	fence to put
+ *
+ * Puts a reference on @fence.  If this is the last reference, the fence and
+ * all it's sync_pts will be freed
+ */
+void sync_fence_put(struct sync_fence *fence);
+
+/**
+ * sync_fence_install() - installs a fence into a file descriptor
+ * @fence:	fence to install
+ * @fd:		file descriptor in which to install the fence
+ *
+ * Installs @fence into @fd.  @fd's should be acquired through
+ * get_unused_fd_flags(O_CLOEXEC).
+ */
+void sync_fence_install(struct sync_fence *fence, int fd);
+
+/**
+ * sync_fence_wait_async() - registers and async wait on the fence
+ * @fence:		fence to wait on
+ * @waiter:		waiter callback struck
+ *
+ * Returns 1 if @fence has already signaled.
+ *
+ * Registers a callback to be called when @fence signals or has an error.
+ * @waiter should be initialized with sync_fence_waiter_init().
+ */
+int sync_fence_wait_async(struct sync_fence *fence,
+			  struct sync_fence_waiter *waiter);
+
+/**
+ * sync_fence_cancel_async() - cancels an async wait
+ * @fence:		fence to wait on
+ * @waiter:		waiter callback struck
+ *
+ * returns 0 if waiter was removed from fence's async waiter list.
+ * returns -ENOENT if waiter was not found on fence's async waiter list.
+ *
+ * Cancels a previously registered async wait.  Will fail gracefully if
+ * @waiter was never registered or if @fence has already signaled @waiter.
+ */
+int sync_fence_cancel_async(struct sync_fence *fence,
+			    struct sync_fence_waiter *waiter);
+
+/**
+ * sync_fence_wait() - wait on fence
+ * @fence:	fence to wait on
+ * @tiemout:	timeout in ms
+ *
+ * Wait for @fence to be signaled or have an error.  Waits indefinitely
+ * if @timeout < 0
+ */
+int sync_fence_wait(struct sync_fence *fence, long timeout);
+
+#ifdef CONFIG_DEBUG_FS
+
+void sync_timeline_debug_add(struct sync_timeline *obj);
+void sync_timeline_debug_remove(struct sync_timeline *obj);
+void sync_fence_debug_add(struct sync_fence *fence);
+void sync_fence_debug_remove(struct sync_fence *fence);
+void sync_dump(void);
+
+#else
+# define sync_timeline_debug_add(obj)
+# define sync_timeline_debug_remove(obj)
+# define sync_fence_debug_add(fence)
+# define sync_fence_debug_remove(fence)
+# define sync_dump()
+#endif
+int sync_fence_wake_up_wq(wait_queue_t *curr, unsigned mode,
+				 int wake_flags, void *key);
+
+#endif /* _LINUX_SYNC_H */
diff --git a/drivers/android/sync_debug.c b/drivers/android/sync_debug.c
new file mode 100644
index 0000000..f45d13c
--- /dev/null
+++ b/drivers/android/sync_debug.c
@@ -0,0 +1,256 @@
+/*
+ * drivers/base/sync.c
+ *
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/export.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/anon_inodes.h>
+#include <linux/time64.h>
+#include "sync.h"
+
+#ifdef CONFIG_DEBUG_FS
+
+static LIST_HEAD(sync_timeline_list_head);
+static DEFINE_SPINLOCK(sync_timeline_list_lock);
+static LIST_HEAD(sync_fence_list_head);
+static DEFINE_SPINLOCK(sync_fence_list_lock);
+
+void sync_timeline_debug_add(struct sync_timeline *obj)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sync_timeline_list_lock, flags);
+	list_add_tail(&obj->sync_timeline_list, &sync_timeline_list_head);
+	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
+}
+
+void sync_timeline_debug_remove(struct sync_timeline *obj)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sync_timeline_list_lock, flags);
+	list_del(&obj->sync_timeline_list);
+	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
+}
+
+void sync_fence_debug_add(struct sync_fence *fence)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sync_fence_list_lock, flags);
+	list_add_tail(&fence->sync_fence_list, &sync_fence_list_head);
+	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
+}
+
+void sync_fence_debug_remove(struct sync_fence *fence)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sync_fence_list_lock, flags);
+	list_del(&fence->sync_fence_list);
+	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
+}
+
+static const char *sync_status_str(int status)
+{
+	if (status == 0)
+		return "signaled";
+
+	if (status > 0)
+		return "active";
+
+	return "error";
+}
+
+static void sync_print_pt(struct seq_file *s, struct fence *pt, bool fence)
+{
+	int status = 1;
+
+	if (fence_is_signaled_locked(pt))
+		status = pt->status;
+
+	seq_printf(s, "  %s%spt %s",
+		   fence && pt->ops->get_timeline_name ?
+		   pt->ops->get_timeline_name(pt) : "",
+		   fence ? "_" : "",
+		   sync_status_str(status));
+
+	if (status <= 0) {
+		struct timespec64 ts64 =
+			ktime_to_timespec64(pt->timestamp);
+
+		seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec);
+	}
+
+	if ((!fence || pt->ops->timeline_value_str) &&
+	    pt->ops->fence_value_str) {
+		char value[64];
+		bool success;
+
+		pt->ops->fence_value_str(pt, value, sizeof(value));
+		success = strlen(value);
+
+		if (success)
+			seq_printf(s, ": %s", value);
+
+		if (success && fence) {
+			pt->ops->timeline_value_str(pt, value, sizeof(value));
+
+			if (strlen(value))
+				seq_printf(s, " / %s", value);
+		}
+	}
+
+	seq_puts(s, "\n");
+}
+
+static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj)
+{
+	struct list_head *pos;
+	unsigned long flags;
+
+	seq_printf(s, "%s %s", obj->name, obj->ops->driver_name);
+
+	if (obj->ops->timeline_value_str) {
+		char value[64];
+
+		obj->ops->timeline_value_str(obj, value, sizeof(value));
+		seq_printf(s, ": %s", value);
+	}
+
+	seq_puts(s, "\n");
+
+	spin_lock_irqsave(&obj->child_list_lock, flags);
+	list_for_each(pos, &obj->child_list_head) {
+		struct sync_pt *pt =
+			container_of(pos, struct sync_pt, child_list);
+		sync_print_pt(s, &pt->base, false);
+	}
+	spin_unlock_irqrestore(&obj->child_list_lock, flags);
+}
+
+static void sync_print_fence(struct seq_file *s, struct sync_fence *fence)
+{
+	wait_queue_t *pos;
+	unsigned long flags;
+	int i;
+
+	seq_printf(s, "[%p] %s: %s\n", fence, fence->name,
+		   sync_status_str(atomic_read(&fence->status)));
+
+	for (i = 0; i < fence->num_fences; ++i) {
+		sync_print_pt(s, fence->cbs[i].sync_pt, true);
+	}
+
+	spin_lock_irqsave(&fence->wq.lock, flags);
+	list_for_each_entry(pos, &fence->wq.task_list, task_list) {
+		struct sync_fence_waiter *waiter;
+
+		if (pos->func != &sync_fence_wake_up_wq)
+			continue;
+
+		waiter = container_of(pos, struct sync_fence_waiter, work);
+
+		seq_printf(s, "waiter %pF\n", waiter->callback);
+	}
+	spin_unlock_irqrestore(&fence->wq.lock, flags);
+}
+
+static int sync_debugfs_show(struct seq_file *s, void *unused)
+{
+	unsigned long flags;
+	struct list_head *pos;
+
+	seq_puts(s, "objs:\n--------------\n");
+
+	spin_lock_irqsave(&sync_timeline_list_lock, flags);
+	list_for_each(pos, &sync_timeline_list_head) {
+		struct sync_timeline *obj =
+			container_of(pos, struct sync_timeline,
+				     sync_timeline_list);
+
+		sync_print_obj(s, obj);
+		seq_puts(s, "\n");
+	}
+	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
+
+	seq_puts(s, "fences:\n--------------\n");
+
+	spin_lock_irqsave(&sync_fence_list_lock, flags);
+	list_for_each(pos, &sync_fence_list_head) {
+		struct sync_fence *fence =
+			container_of(pos, struct sync_fence, sync_fence_list);
+
+		sync_print_fence(s, fence);
+		seq_puts(s, "\n");
+	}
+	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
+	return 0;
+}
+
+static int sync_debugfs_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, sync_debugfs_show, inode->i_private);
+}
+
+static const struct file_operations sync_debugfs_fops = {
+	.open           = sync_debugfs_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = single_release,
+};
+
+static __init int sync_debugfs_init(void)
+{
+	debugfs_create_file("sync", S_IRUGO, NULL, NULL, &sync_debugfs_fops);
+	return 0;
+}
+late_initcall(sync_debugfs_init);
+
+#define DUMP_CHUNK 256
+static char sync_dump_buf[64 * 1024];
+void sync_dump(void)
+{
+	struct seq_file s = {
+		.buf = sync_dump_buf,
+		.size = sizeof(sync_dump_buf) - 1,
+	};
+	int i;
+
+	sync_debugfs_show(&s, NULL);
+
+	for (i = 0; i < s.count; i += DUMP_CHUNK) {
+		if ((s.count - i) > DUMP_CHUNK) {
+			char c = s.buf[i + DUMP_CHUNK];
+
+			s.buf[i + DUMP_CHUNK] = 0;
+			pr_cont("%s", s.buf + i);
+			s.buf[i + DUMP_CHUNK] = c;
+		} else {
+			s.buf[s.count] = 0;
+			pr_cont("%s", s.buf + i);
+		}
+	}
+}
+
+#endif
diff --git a/drivers/android/trace/sync.h b/drivers/android/trace/sync.h
new file mode 100644
index 0000000..7dcf2fe
--- /dev/null
+++ b/drivers/android/trace/sync.h
@@ -0,0 +1,82 @@
+#undef TRACE_SYSTEM
+#define TRACE_INCLUDE_PATH ../../drivers/android/trace
+#define TRACE_SYSTEM sync
+
+#if !defined(_TRACE_SYNC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SYNC_H
+
+#include "../sync.h"
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(sync_timeline,
+	TP_PROTO(struct sync_timeline *timeline),
+
+	TP_ARGS(timeline),
+
+	TP_STRUCT__entry(
+			__string(name, timeline->name)
+			__array(char, value, 32)
+	),
+
+	TP_fast_assign(
+			__assign_str(name, timeline->name);
+			if (timeline->ops->timeline_value_str) {
+				timeline->ops->timeline_value_str(timeline,
+							__entry->value,
+							sizeof(__entry->value));
+			} else {
+				__entry->value[0] = '\0';
+			}
+	),
+
+	TP_printk("name=%s value=%s", __get_str(name), __entry->value)
+);
+
+TRACE_EVENT(sync_wait,
+	TP_PROTO(struct sync_fence *fence, int begin),
+
+	TP_ARGS(fence, begin),
+
+	TP_STRUCT__entry(
+			__string(name, fence->name)
+			__field(s32, status)
+			__field(u32, begin)
+	),
+
+	TP_fast_assign(
+			__assign_str(name, fence->name);
+			__entry->status = atomic_read(&fence->status);
+			__entry->begin = begin;
+	),
+
+	TP_printk("%s name=%s state=%d", __entry->begin ? "begin" : "end",
+			__get_str(name), __entry->status)
+);
+
+TRACE_EVENT(sync_pt,
+	TP_PROTO(struct fence *pt),
+
+	TP_ARGS(pt),
+
+	TP_STRUCT__entry(
+		__string(timeline, pt->ops->get_timeline_name(pt))
+		__array(char, value, 32)
+	),
+
+	TP_fast_assign(
+		__assign_str(timeline, pt->ops->get_timeline_name(pt));
+		if (pt->ops->fence_value_str) {
+			pt->ops->fence_value_str(pt, __entry->value,
+							sizeof(__entry->value));
+		} else {
+			__entry->value[0] = '\0';
+		}
+	),
+
+	TP_printk("name=%s value=%s", __get_str(timeline), __entry->value)
+);
+
+#endif /* if !defined(_TRACE_SYNC_H) || defined(TRACE_HEADER_MULTI_READ) */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 42b1512..4b18fee 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -38,34 +38,6 @@ config ANDROID_LOW_MEMORY_KILLER
 	  scripts (/init.rc), and it defines priority values with minimum free memory size
 	  for each priority.
 
-config SYNC
-	bool "Synchronization framework"
-	default n
-	select ANON_INODES
-	select DMA_SHARED_BUFFER
-	---help---
-	  This option enables the framework for synchronization between multiple
-	  drivers.  Sync implementations can take advantage of hardware
-	  synchronization built into devices like GPUs.
-
-config SW_SYNC
-	bool "Software synchronization objects"
-	default n
-	depends on SYNC
-	---help---
-	  A sync object driver that uses a 32bit counter to coordinate
-	  synchronization.  Useful when there is no hardware primitive backing
-	  the synchronization.
-
-config SW_SYNC_USER
-	bool "Userspace API for SW_SYNC"
-	default n
-	depends on SW_SYNC
-	---help---
-	  Provides a user space API to the sw sync object.
-	  *WARNING* improper use of this can result in deadlocking kernel
-	  drivers from userspace.
-
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index c7b6c99..355ad0e 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -6,5 +6,3 @@ obj-$(CONFIG_ASHMEM)			+= ashmem.o
 obj-$(CONFIG_ANDROID_TIMED_OUTPUT)	+= timed_output.o
 obj-$(CONFIG_ANDROID_TIMED_GPIO)	+= timed_gpio.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)	+= lowmemorykiller.o
-obj-$(CONFIG_SYNC)			+= sync.o sync_debug.o
-obj-$(CONFIG_SW_SYNC)			+= sw_sync.o
diff --git a/drivers/staging/android/sw_sync.c b/drivers/staging/android/sw_sync.c
deleted file mode 100644
index c4ff167..0000000
--- a/drivers/staging/android/sw_sync.c
+++ /dev/null
@@ -1,260 +0,0 @@
-/*
- * drivers/base/sw_sync.c
- *
- * Copyright (C) 2012 Google, Inc.
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/export.h>
-#include <linux/file.h>
-#include <linux/fs.h>
-#include <linux/miscdevice.h>
-#include <linux/syscalls.h>
-#include <linux/uaccess.h>
-
-#include "sw_sync.h"
-
-static int sw_sync_cmp(u32 a, u32 b)
-{
-	if (a == b)
-		return 0;
-
-	return ((s32)a - (s32)b) < 0 ? -1 : 1;
-}
-
-struct sync_pt *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value)
-{
-	struct sw_sync_pt *pt;
-
-	pt = (struct sw_sync_pt *)
-		sync_pt_create(&obj->obj, sizeof(struct sw_sync_pt));
-
-	pt->value = value;
-
-	return (struct sync_pt *)pt;
-}
-EXPORT_SYMBOL(sw_sync_pt_create);
-
-static struct sync_pt *sw_sync_pt_dup(struct sync_pt *sync_pt)
-{
-	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
-	struct sw_sync_timeline *obj =
-		(struct sw_sync_timeline *)sync_pt_parent(sync_pt);
-
-	return (struct sync_pt *)sw_sync_pt_create(obj, pt->value);
-}
-
-static int sw_sync_pt_has_signaled(struct sync_pt *sync_pt)
-{
-	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
-	struct sw_sync_timeline *obj =
-		(struct sw_sync_timeline *)sync_pt_parent(sync_pt);
-
-	return sw_sync_cmp(obj->value, pt->value) >= 0;
-}
-
-static int sw_sync_pt_compare(struct sync_pt *a, struct sync_pt *b)
-{
-	struct sw_sync_pt *pt_a = (struct sw_sync_pt *)a;
-	struct sw_sync_pt *pt_b = (struct sw_sync_pt *)b;
-
-	return sw_sync_cmp(pt_a->value, pt_b->value);
-}
-
-static int sw_sync_fill_driver_data(struct sync_pt *sync_pt,
-				    void *data, int size)
-{
-	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
-
-	if (size < sizeof(pt->value))
-		return -ENOMEM;
-
-	memcpy(data, &pt->value, sizeof(pt->value));
-
-	return sizeof(pt->value);
-}
-
-static void sw_sync_timeline_value_str(struct sync_timeline *sync_timeline,
-				       char *str, int size)
-{
-	struct sw_sync_timeline *timeline =
-		(struct sw_sync_timeline *)sync_timeline;
-	snprintf(str, size, "%d", timeline->value);
-}
-
-static void sw_sync_pt_value_str(struct sync_pt *sync_pt,
-				 char *str, int size)
-{
-	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
-
-	snprintf(str, size, "%d", pt->value);
-}
-
-static struct sync_timeline_ops sw_sync_timeline_ops = {
-	.driver_name = "sw_sync",
-	.dup = sw_sync_pt_dup,
-	.has_signaled = sw_sync_pt_has_signaled,
-	.compare = sw_sync_pt_compare,
-	.fill_driver_data = sw_sync_fill_driver_data,
-	.timeline_value_str = sw_sync_timeline_value_str,
-	.pt_value_str = sw_sync_pt_value_str,
-};
-
-struct sw_sync_timeline *sw_sync_timeline_create(const char *name)
-{
-	struct sw_sync_timeline *obj = (struct sw_sync_timeline *)
-		sync_timeline_create(&sw_sync_timeline_ops,
-				     sizeof(struct sw_sync_timeline),
-				     name);
-
-	return obj;
-}
-EXPORT_SYMBOL(sw_sync_timeline_create);
-
-void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc)
-{
-	obj->value += inc;
-
-	sync_timeline_signal(&obj->obj);
-}
-EXPORT_SYMBOL(sw_sync_timeline_inc);
-
-#ifdef CONFIG_SW_SYNC_USER
-/* *WARNING*
- *
- * improper use of this can result in deadlocking kernel drivers from userspace.
- */
-
-/* opening sw_sync create a new sync obj */
-static int sw_sync_open(struct inode *inode, struct file *file)
-{
-	struct sw_sync_timeline *obj;
-	char task_comm[TASK_COMM_LEN];
-
-	get_task_comm(task_comm, current);
-
-	obj = sw_sync_timeline_create(task_comm);
-	if (!obj)
-		return -ENOMEM;
-
-	file->private_data = obj;
-
-	return 0;
-}
-
-static int sw_sync_release(struct inode *inode, struct file *file)
-{
-	struct sw_sync_timeline *obj = file->private_data;
-
-	sync_timeline_destroy(&obj->obj);
-	return 0;
-}
-
-static long sw_sync_ioctl_create_fence(struct sw_sync_timeline *obj,
-				       unsigned long arg)
-{
-	int fd = get_unused_fd_flags(O_CLOEXEC);
-	int err;
-	struct sync_pt *pt;
-	struct sync_fence *fence;
-	struct sw_sync_create_fence_data data;
-
-	if (fd < 0)
-		return fd;
-
-	if (copy_from_user(&data, (void __user *)arg, sizeof(data))) {
-		err = -EFAULT;
-		goto err;
-	}
-
-	pt = sw_sync_pt_create(obj, data.value);
-	if (!pt) {
-		err = -ENOMEM;
-		goto err;
-	}
-
-	data.name[sizeof(data.name) - 1] = '\0';
-	fence = sync_fence_create(data.name, pt);
-	if (!fence) {
-		sync_pt_free(pt);
-		err = -ENOMEM;
-		goto err;
-	}
-
-	data.fence = fd;
-	if (copy_to_user((void __user *)arg, &data, sizeof(data))) {
-		sync_fence_put(fence);
-		err = -EFAULT;
-		goto err;
-	}
-
-	sync_fence_install(fence, fd);
-
-	return 0;
-
-err:
-	put_unused_fd(fd);
-	return err;
-}
-
-static long sw_sync_ioctl_inc(struct sw_sync_timeline *obj, unsigned long arg)
-{
-	u32 value;
-
-	if (copy_from_user(&value, (void __user *)arg, sizeof(value)))
-		return -EFAULT;
-
-	sw_sync_timeline_inc(obj, value);
-
-	return 0;
-}
-
-static long sw_sync_ioctl(struct file *file, unsigned int cmd,
-			  unsigned long arg)
-{
-	struct sw_sync_timeline *obj = file->private_data;
-
-	switch (cmd) {
-	case SW_SYNC_IOC_CREATE_FENCE:
-		return sw_sync_ioctl_create_fence(obj, arg);
-
-	case SW_SYNC_IOC_INC:
-		return sw_sync_ioctl_inc(obj, arg);
-
-	default:
-		return -ENOTTY;
-	}
-}
-
-static const struct file_operations sw_sync_fops = {
-	.owner = THIS_MODULE,
-	.open = sw_sync_open,
-	.release = sw_sync_release,
-	.unlocked_ioctl = sw_sync_ioctl,
-	.compat_ioctl = sw_sync_ioctl,
-};
-
-static struct miscdevice sw_sync_dev = {
-	.minor	= MISC_DYNAMIC_MINOR,
-	.name	= "sw_sync",
-	.fops	= &sw_sync_fops,
-};
-
-static int __init sw_sync_device_init(void)
-{
-	return misc_register(&sw_sync_dev);
-}
-device_initcall(sw_sync_device_init);
-
-#endif /* CONFIG_SW_SYNC_USER */
diff --git a/drivers/staging/android/sw_sync.h b/drivers/staging/android/sw_sync.h
deleted file mode 100644
index c87ae9e..0000000
--- a/drivers/staging/android/sw_sync.h
+++ /dev/null
@@ -1,59 +0,0 @@
-/*
- * include/linux/sw_sync.h
- *
- * Copyright (C) 2012 Google, Inc.
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#ifndef _LINUX_SW_SYNC_H
-#define _LINUX_SW_SYNC_H
-
-#include <linux/types.h>
-#include <linux/kconfig.h>
-#include "sync.h"
-#include "uapi/sw_sync.h"
-
-struct sw_sync_timeline {
-	struct	sync_timeline	obj;
-
-	u32			value;
-};
-
-struct sw_sync_pt {
-	struct sync_pt		pt;
-
-	u32			value;
-};
-
-#if IS_ENABLED(CONFIG_SW_SYNC)
-struct sw_sync_timeline *sw_sync_timeline_create(const char *name);
-void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc);
-
-struct sync_pt *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value);
-#else
-static inline struct sw_sync_timeline *sw_sync_timeline_create(const char *name)
-{
-	return NULL;
-}
-
-static inline void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc)
-{
-}
-
-static inline struct sync_pt *sw_sync_pt_create(struct sw_sync_timeline *obj,
-						u32 value)
-{
-	return NULL;
-}
-#endif /* IS_ENABLED(CONFIG_SW_SYNC) */
-
-#endif /* _LINUX_SW_SYNC_H */
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
deleted file mode 100644
index 7f0e919..0000000
--- a/drivers/staging/android/sync.c
+++ /dev/null
@@ -1,734 +0,0 @@
-/*
- * drivers/base/sync.c
- *
- * Copyright (C) 2012 Google, Inc.
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#include <linux/debugfs.h>
-#include <linux/export.h>
-#include <linux/file.h>
-#include <linux/fs.h>
-#include <linux/kernel.h>
-#include <linux/poll.h>
-#include <linux/sched.h>
-#include <linux/seq_file.h>
-#include <linux/slab.h>
-#include <linux/uaccess.h>
-#include <linux/anon_inodes.h>
-
-#include "sync.h"
-
-#define CREATE_TRACE_POINTS
-#include "trace/sync.h"
-
-static const struct fence_ops android_fence_ops;
-static const struct file_operations sync_fence_fops;
-
-struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
-					   int size, const char *name)
-{
-	struct sync_timeline *obj;
-
-	if (size < sizeof(struct sync_timeline))
-		return NULL;
-
-	obj = kzalloc(size, GFP_KERNEL);
-	if (obj == NULL)
-		return NULL;
-
-	kref_init(&obj->kref);
-	obj->ops = ops;
-	obj->context = fence_context_alloc(1);
-	strlcpy(obj->name, name, sizeof(obj->name));
-
-	INIT_LIST_HEAD(&obj->child_list_head);
-	INIT_LIST_HEAD(&obj->active_list_head);
-	spin_lock_init(&obj->child_list_lock);
-
-	sync_timeline_debug_add(obj);
-
-	return obj;
-}
-EXPORT_SYMBOL(sync_timeline_create);
-
-static void sync_timeline_free(struct kref *kref)
-{
-	struct sync_timeline *obj =
-		container_of(kref, struct sync_timeline, kref);
-
-	sync_timeline_debug_remove(obj);
-
-	if (obj->ops->release_obj)
-		obj->ops->release_obj(obj);
-
-	kfree(obj);
-}
-
-static void sync_timeline_get(struct sync_timeline *obj)
-{
-	kref_get(&obj->kref);
-}
-
-static void sync_timeline_put(struct sync_timeline *obj)
-{
-	kref_put(&obj->kref, sync_timeline_free);
-}
-
-void sync_timeline_destroy(struct sync_timeline *obj)
-{
-	obj->destroyed = true;
-	/*
-	 * Ensure timeline is marked as destroyed before
-	 * changing timeline's fences status.
-	 */
-	smp_wmb();
-
-	/*
-	 * signal any children that their parent is going away.
-	 */
-	sync_timeline_signal(obj);
-	sync_timeline_put(obj);
-}
-EXPORT_SYMBOL(sync_timeline_destroy);
-
-void sync_timeline_signal(struct sync_timeline *obj)
-{
-	unsigned long flags;
-	LIST_HEAD(signaled_pts);
-	struct sync_pt *pt, *next;
-
-	trace_sync_timeline(obj);
-
-	spin_lock_irqsave(&obj->child_list_lock, flags);
-
-	list_for_each_entry_safe(pt, next, &obj->active_list_head,
-				 active_list) {
-		if (fence_is_signaled_locked(&pt->base))
-			list_del_init(&pt->active_list);
-	}
-
-	spin_unlock_irqrestore(&obj->child_list_lock, flags);
-}
-EXPORT_SYMBOL(sync_timeline_signal);
-
-struct sync_pt *sync_pt_create(struct sync_timeline *obj, int size)
-{
-	unsigned long flags;
-	struct sync_pt *pt;
-
-	if (size < sizeof(struct sync_pt))
-		return NULL;
-
-	pt = kzalloc(size, GFP_KERNEL);
-	if (pt == NULL)
-		return NULL;
-
-	spin_lock_irqsave(&obj->child_list_lock, flags);
-	sync_timeline_get(obj);
-	fence_init(&pt->base, &android_fence_ops, &obj->child_list_lock,
-		   obj->context, ++obj->value);
-	list_add_tail(&pt->child_list, &obj->child_list_head);
-	INIT_LIST_HEAD(&pt->active_list);
-	spin_unlock_irqrestore(&obj->child_list_lock, flags);
-	return pt;
-}
-EXPORT_SYMBOL(sync_pt_create);
-
-void sync_pt_free(struct sync_pt *pt)
-{
-	fence_put(&pt->base);
-}
-EXPORT_SYMBOL(sync_pt_free);
-
-static struct sync_fence *sync_fence_alloc(int size, const char *name)
-{
-	struct sync_fence *fence;
-
-	fence = kzalloc(size, GFP_KERNEL);
-	if (fence == NULL)
-		return NULL;
-
-	fence->file = anon_inode_getfile("sync_fence", &sync_fence_fops,
-					 fence, 0);
-	if (IS_ERR(fence->file))
-		goto err;
-
-	kref_init(&fence->kref);
-	strlcpy(fence->name, name, sizeof(fence->name));
-
-	init_waitqueue_head(&fence->wq);
-
-	return fence;
-
-err:
-	kfree(fence);
-	return NULL;
-}
-
-static void fence_check_cb_func(struct fence *f, struct fence_cb *cb)
-{
-	struct sync_fence_cb *check;
-	struct sync_fence *fence;
-
-	check = container_of(cb, struct sync_fence_cb, cb);
-	fence = check->fence;
-
-	if (atomic_dec_and_test(&fence->status))
-		wake_up_all(&fence->wq);
-}
-
-/* TODO: implement a create which takes more that one sync_pt */
-struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt)
-{
-	struct sync_fence *fence;
-
-	fence = sync_fence_alloc(offsetof(struct sync_fence, cbs[1]), name);
-	if (fence == NULL)
-		return NULL;
-
-	fence->num_fences = 1;
-	atomic_set(&fence->status, 1);
-
-	fence->cbs[0].sync_pt = pt;
-	fence->cbs[0].fence = fence;
-	if (fence_add_callback(pt, &fence->cbs[0].cb, fence_check_cb_func))
-		atomic_dec(&fence->status);
-
-	sync_fence_debug_add(fence);
-
-	return fence;
-}
-EXPORT_SYMBOL(sync_fence_create_dma);
-
-struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
-{
-	return sync_fence_create_dma(name, &pt->base);
-}
-EXPORT_SYMBOL(sync_fence_create);
-
-struct sync_fence *sync_fence_fdget(int fd)
-{
-	struct file *file = fget(fd);
-
-	if (file == NULL)
-		return NULL;
-
-	if (file->f_op != &sync_fence_fops)
-		goto err;
-
-	return file->private_data;
-
-err:
-	fput(file);
-	return NULL;
-}
-EXPORT_SYMBOL(sync_fence_fdget);
-
-void sync_fence_put(struct sync_fence *fence)
-{
-	fput(fence->file);
-}
-EXPORT_SYMBOL(sync_fence_put);
-
-void sync_fence_install(struct sync_fence *fence, int fd)
-{
-	fd_install(fd, fence->file);
-}
-EXPORT_SYMBOL(sync_fence_install);
-
-static void sync_fence_add_pt(struct sync_fence *fence,
-			      int *i, struct fence *pt)
-{
-	fence->cbs[*i].sync_pt = pt;
-	fence->cbs[*i].fence = fence;
-
-	if (!fence_add_callback(pt, &fence->cbs[*i].cb, fence_check_cb_func)) {
-		fence_get(pt);
-		(*i)++;
-	}
-}
-
-struct sync_fence *sync_fence_merge(const char *name,
-				    struct sync_fence *a, struct sync_fence *b)
-{
-	int num_fences = a->num_fences + b->num_fences;
-	struct sync_fence *fence;
-	int i, i_a, i_b;
-	unsigned long size = offsetof(struct sync_fence, cbs[num_fences]);
-
-	fence = sync_fence_alloc(size, name);
-	if (fence == NULL)
-		return NULL;
-
-	atomic_set(&fence->status, num_fences);
-
-	/*
-	 * Assume sync_fence a and b are both ordered and have no
-	 * duplicates with the same context.
-	 *
-	 * If a sync_fence can only be created with sync_fence_merge
-	 * and sync_fence_create, this is a reasonable assumption.
-	 */
-	for (i = i_a = i_b = 0; i_a < a->num_fences && i_b < b->num_fences; ) {
-		struct fence *pt_a = a->cbs[i_a].sync_pt;
-		struct fence *pt_b = b->cbs[i_b].sync_pt;
-
-		if (pt_a->context < pt_b->context) {
-			sync_fence_add_pt(fence, &i, pt_a);
-
-			i_a++;
-		} else if (pt_a->context > pt_b->context) {
-			sync_fence_add_pt(fence, &i, pt_b);
-
-			i_b++;
-		} else {
-			if (pt_a->seqno - pt_b->seqno <= INT_MAX)
-				sync_fence_add_pt(fence, &i, pt_a);
-			else
-				sync_fence_add_pt(fence, &i, pt_b);
-
-			i_a++;
-			i_b++;
-		}
-	}
-
-	for (; i_a < a->num_fences; i_a++)
-		sync_fence_add_pt(fence, &i, a->cbs[i_a].sync_pt);
-
-	for (; i_b < b->num_fences; i_b++)
-		sync_fence_add_pt(fence, &i, b->cbs[i_b].sync_pt);
-
-	if (num_fences > i)
-		atomic_sub(num_fences - i, &fence->status);
-	fence->num_fences = i;
-
-	sync_fence_debug_add(fence);
-	return fence;
-}
-EXPORT_SYMBOL(sync_fence_merge);
-
-int sync_fence_wake_up_wq(wait_queue_t *curr, unsigned mode,
-				 int wake_flags, void *key)
-{
-	struct sync_fence_waiter *wait;
-
-	wait = container_of(curr, struct sync_fence_waiter, work);
-	list_del_init(&wait->work.task_list);
-
-	wait->callback(wait->work.private, wait);
-	return 1;
-}
-
-int sync_fence_wait_async(struct sync_fence *fence,
-			  struct sync_fence_waiter *waiter)
-{
-	int err = atomic_read(&fence->status);
-	unsigned long flags;
-
-	if (err < 0)
-		return err;
-
-	if (!err)
-		return 1;
-
-	init_waitqueue_func_entry(&waiter->work, sync_fence_wake_up_wq);
-	waiter->work.private = fence;
-
-	spin_lock_irqsave(&fence->wq.lock, flags);
-	err = atomic_read(&fence->status);
-	if (err > 0)
-		__add_wait_queue_tail(&fence->wq, &waiter->work);
-	spin_unlock_irqrestore(&fence->wq.lock, flags);
-
-	if (err < 0)
-		return err;
-
-	return !err;
-}
-EXPORT_SYMBOL(sync_fence_wait_async);
-
-int sync_fence_cancel_async(struct sync_fence *fence,
-			     struct sync_fence_waiter *waiter)
-{
-	unsigned long flags;
-	int ret = 0;
-
-	spin_lock_irqsave(&fence->wq.lock, flags);
-	if (!list_empty(&waiter->work.task_list))
-		list_del_init(&waiter->work.task_list);
-	else
-		ret = -ENOENT;
-	spin_unlock_irqrestore(&fence->wq.lock, flags);
-	return ret;
-}
-EXPORT_SYMBOL(sync_fence_cancel_async);
-
-int sync_fence_wait(struct sync_fence *fence, long timeout)
-{
-	long ret;
-	int i;
-
-	if (timeout < 0)
-		timeout = MAX_SCHEDULE_TIMEOUT;
-	else
-		timeout = msecs_to_jiffies(timeout);
-
-	trace_sync_wait(fence, 1);
-	for (i = 0; i < fence->num_fences; ++i)
-		trace_sync_pt(fence->cbs[i].sync_pt);
-	ret = wait_event_interruptible_timeout(fence->wq,
-					       atomic_read(&fence->status) <= 0,
-					       timeout);
-	trace_sync_wait(fence, 0);
-
-	if (ret < 0) {
-		return ret;
-	} else if (ret == 0) {
-		if (timeout) {
-			pr_info("fence timeout on [%p] after %dms\n", fence,
-				jiffies_to_msecs(timeout));
-			sync_dump();
-		}
-		return -ETIME;
-	}
-
-	ret = atomic_read(&fence->status);
-	if (ret) {
-		pr_info("fence error %ld on [%p]\n", ret, fence);
-		sync_dump();
-	}
-	return ret;
-}
-EXPORT_SYMBOL(sync_fence_wait);
-
-static const char *android_fence_get_driver_name(struct fence *fence)
-{
-	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
-	struct sync_timeline *parent = sync_pt_parent(pt);
-
-	return parent->ops->driver_name;
-}
-
-static const char *android_fence_get_timeline_name(struct fence *fence)
-{
-	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
-	struct sync_timeline *parent = sync_pt_parent(pt);
-
-	return parent->name;
-}
-
-static void android_fence_release(struct fence *fence)
-{
-	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
-	struct sync_timeline *parent = sync_pt_parent(pt);
-	unsigned long flags;
-
-	spin_lock_irqsave(fence->lock, flags);
-	list_del(&pt->child_list);
-	if (WARN_ON_ONCE(!list_empty(&pt->active_list)))
-		list_del(&pt->active_list);
-	spin_unlock_irqrestore(fence->lock, flags);
-
-	if (parent->ops->free_pt)
-		parent->ops->free_pt(pt);
-
-	sync_timeline_put(parent);
-	fence_free(&pt->base);
-}
-
-static bool android_fence_signaled(struct fence *fence)
-{
-	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
-	struct sync_timeline *parent = sync_pt_parent(pt);
-	int ret;
-
-	ret = parent->ops->has_signaled(pt);
-	if (ret < 0)
-		fence->status = ret;
-	return ret;
-}
-
-static bool android_fence_enable_signaling(struct fence *fence)
-{
-	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
-	struct sync_timeline *parent = sync_pt_parent(pt);
-
-	if (android_fence_signaled(fence))
-		return false;
-
-	list_add_tail(&pt->active_list, &parent->active_list_head);
-	return true;
-}
-
-static int android_fence_fill_driver_data(struct fence *fence,
-					  void *data, int size)
-{
-	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
-	struct sync_timeline *parent = sync_pt_parent(pt);
-
-	if (!parent->ops->fill_driver_data)
-		return 0;
-	return parent->ops->fill_driver_data(pt, data, size);
-}
-
-static void android_fence_value_str(struct fence *fence,
-				    char *str, int size)
-{
-	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
-	struct sync_timeline *parent = sync_pt_parent(pt);
-
-	if (!parent->ops->pt_value_str) {
-		if (size)
-			*str = 0;
-		return;
-	}
-	parent->ops->pt_value_str(pt, str, size);
-}
-
-static void android_fence_timeline_value_str(struct fence *fence,
-					     char *str, int size)
-{
-	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
-	struct sync_timeline *parent = sync_pt_parent(pt);
-
-	if (!parent->ops->timeline_value_str) {
-		if (size)
-			*str = 0;
-		return;
-	}
-	parent->ops->timeline_value_str(parent, str, size);
-}
-
-static const struct fence_ops android_fence_ops = {
-	.get_driver_name = android_fence_get_driver_name,
-	.get_timeline_name = android_fence_get_timeline_name,
-	.enable_signaling = android_fence_enable_signaling,
-	.signaled = android_fence_signaled,
-	.wait = fence_default_wait,
-	.release = android_fence_release,
-	.fill_driver_data = android_fence_fill_driver_data,
-	.fence_value_str = android_fence_value_str,
-	.timeline_value_str = android_fence_timeline_value_str,
-};
-
-static void sync_fence_free(struct kref *kref)
-{
-	struct sync_fence *fence = container_of(kref, struct sync_fence, kref);
-	int i, status = atomic_read(&fence->status);
-
-	for (i = 0; i < fence->num_fences; ++i) {
-		if (status)
-			fence_remove_callback(fence->cbs[i].sync_pt,
-					      &fence->cbs[i].cb);
-		fence_put(fence->cbs[i].sync_pt);
-	}
-
-	kfree(fence);
-}
-
-static int sync_fence_release(struct inode *inode, struct file *file)
-{
-	struct sync_fence *fence = file->private_data;
-
-	sync_fence_debug_remove(fence);
-
-	kref_put(&fence->kref, sync_fence_free);
-	return 0;
-}
-
-static unsigned int sync_fence_poll(struct file *file, poll_table *wait)
-{
-	struct sync_fence *fence = file->private_data;
-	int status;
-
-	poll_wait(file, &fence->wq, wait);
-
-	status = atomic_read(&fence->status);
-
-	if (!status)
-		return POLLIN;
-	else if (status < 0)
-		return POLLERR;
-	return 0;
-}
-
-static long sync_fence_ioctl_wait(struct sync_fence *fence, unsigned long arg)
-{
-	__s32 value;
-
-	if (copy_from_user(&value, (void __user *)arg, sizeof(value)))
-		return -EFAULT;
-
-	return sync_fence_wait(fence, value);
-}
-
-static long sync_fence_ioctl_merge(struct sync_fence *fence, unsigned long arg)
-{
-	int fd = get_unused_fd_flags(O_CLOEXEC);
-	int err;
-	struct sync_fence *fence2, *fence3;
-	struct sync_merge_data data;
-
-	if (fd < 0)
-		return fd;
-
-	if (copy_from_user(&data, (void __user *)arg, sizeof(data))) {
-		err = -EFAULT;
-		goto err_put_fd;
-	}
-
-	fence2 = sync_fence_fdget(data.fd2);
-	if (fence2 == NULL) {
-		err = -ENOENT;
-		goto err_put_fd;
-	}
-
-	data.name[sizeof(data.name) - 1] = '\0';
-	fence3 = sync_fence_merge(data.name, fence, fence2);
-	if (fence3 == NULL) {
-		err = -ENOMEM;
-		goto err_put_fence2;
-	}
-
-	data.fence = fd;
-	if (copy_to_user((void __user *)arg, &data, sizeof(data))) {
-		err = -EFAULT;
-		goto err_put_fence3;
-	}
-
-	sync_fence_install(fence3, fd);
-	sync_fence_put(fence2);
-	return 0;
-
-err_put_fence3:
-	sync_fence_put(fence3);
-
-err_put_fence2:
-	sync_fence_put(fence2);
-
-err_put_fd:
-	put_unused_fd(fd);
-	return err;
-}
-
-static int sync_fill_pt_info(struct fence *fence, void *data, int size)
-{
-	struct sync_pt_info *info = data;
-	int ret;
-
-	if (size < sizeof(struct sync_pt_info))
-		return -ENOMEM;
-
-	info->len = sizeof(struct sync_pt_info);
-
-	if (fence->ops->fill_driver_data) {
-		ret = fence->ops->fill_driver_data(fence, info->driver_data,
-						   size - sizeof(*info));
-		if (ret < 0)
-			return ret;
-
-		info->len += ret;
-	}
-
-	strlcpy(info->obj_name, fence->ops->get_timeline_name(fence),
-		sizeof(info->obj_name));
-	strlcpy(info->driver_name, fence->ops->get_driver_name(fence),
-		sizeof(info->driver_name));
-	if (fence_is_signaled(fence))
-		info->status = fence->status >= 0 ? 1 : fence->status;
-	else
-		info->status = 0;
-	info->timestamp_ns = ktime_to_ns(fence->timestamp);
-
-	return info->len;
-}
-
-static long sync_fence_ioctl_fence_info(struct sync_fence *fence,
-					unsigned long arg)
-{
-	struct sync_fence_info_data *data;
-	__u32 size;
-	__u32 len = 0;
-	int ret, i;
-
-	if (copy_from_user(&size, (void __user *)arg, sizeof(size)))
-		return -EFAULT;
-
-	if (size < sizeof(struct sync_fence_info_data))
-		return -EINVAL;
-
-	if (size > 4096)
-		size = 4096;
-
-	data = kzalloc(size, GFP_KERNEL);
-	if (data == NULL)
-		return -ENOMEM;
-
-	strlcpy(data->name, fence->name, sizeof(data->name));
-	data->status = atomic_read(&fence->status);
-	if (data->status >= 0)
-		data->status = !data->status;
-
-	len = sizeof(struct sync_fence_info_data);
-
-	for (i = 0; i < fence->num_fences; ++i) {
-		struct fence *pt = fence->cbs[i].sync_pt;
-
-		ret = sync_fill_pt_info(pt, (u8 *)data + len, size - len);
-
-		if (ret < 0)
-			goto out;
-
-		len += ret;
-	}
-
-	data->len = len;
-
-	if (copy_to_user((void __user *)arg, data, len))
-		ret = -EFAULT;
-	else
-		ret = 0;
-
-out:
-	kfree(data);
-
-	return ret;
-}
-
-static long sync_fence_ioctl(struct file *file, unsigned int cmd,
-			     unsigned long arg)
-{
-	struct sync_fence *fence = file->private_data;
-
-	switch (cmd) {
-	case SYNC_IOC_WAIT:
-		return sync_fence_ioctl_wait(fence, arg);
-
-	case SYNC_IOC_MERGE:
-		return sync_fence_ioctl_merge(fence, arg);
-
-	case SYNC_IOC_FENCE_INFO:
-		return sync_fence_ioctl_fence_info(fence, arg);
-
-	default:
-		return -ENOTTY;
-	}
-}
-
-static const struct file_operations sync_fence_fops = {
-	.release = sync_fence_release,
-	.poll = sync_fence_poll,
-	.unlocked_ioctl = sync_fence_ioctl,
-	.compat_ioctl = sync_fence_ioctl,
-};
-
diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
deleted file mode 100644
index afa0752..0000000
--- a/drivers/staging/android/sync.h
+++ /dev/null
@@ -1,366 +0,0 @@
-/*
- * include/linux/sync.h
- *
- * Copyright (C) 2012 Google, Inc.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#ifndef _LINUX_SYNC_H
-#define _LINUX_SYNC_H
-
-#include <linux/types.h>
-#include <linux/kref.h>
-#include <linux/ktime.h>
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <linux/wait.h>
-#include <linux/fence.h>
-
-#include "uapi/sync.h"
-
-struct sync_timeline;
-struct sync_pt;
-struct sync_fence;
-
-/**
- * struct sync_timeline_ops - sync object implementation ops
- * @driver_name:	name of the implementation
- * @dup:		duplicate a sync_pt
- * @has_signaled:	returns:
- *			  1 if pt has signaled
- *			  0 if pt has not signaled
- *			 <0 on error
- * @compare:		returns:
- *			  1 if b will signal before a
- *			  0 if a and b will signal at the same time
- *			 -1 if a will signal before b
- * @free_pt:		called before sync_pt is freed
- * @release_obj:	called before sync_timeline is freed
- * @fill_driver_data:	write implementation specific driver data to data.
- *			  should return an error if there is not enough room
- *			  as specified by size.  This information is returned
- *			  to userspace by SYNC_IOC_FENCE_INFO.
- * @timeline_value_str: fill str with the value of the sync_timeline's counter
- * @pt_value_str:	fill str with the value of the sync_pt
- */
-struct sync_timeline_ops {
-	const char *driver_name;
-
-	/* required */
-	struct sync_pt * (*dup)(struct sync_pt *pt);
-
-	/* required */
-	int (*has_signaled)(struct sync_pt *pt);
-
-	/* required */
-	int (*compare)(struct sync_pt *a, struct sync_pt *b);
-
-	/* optional */
-	void (*free_pt)(struct sync_pt *sync_pt);
-
-	/* optional */
-	void (*release_obj)(struct sync_timeline *sync_timeline);
-
-	/* optional */
-	int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);
-
-	/* optional */
-	void (*timeline_value_str)(struct sync_timeline *timeline, char *str,
-				   int size);
-
-	/* optional */
-	void (*pt_value_str)(struct sync_pt *pt, char *str, int size);
-};
-
-/**
- * struct sync_timeline - sync object
- * @kref:		reference count on fence.
- * @ops:		ops that define the implementation of the sync_timeline
- * @name:		name of the sync_timeline. Useful for debugging
- * @destroyed:		set when sync_timeline is destroyed
- * @child_list_head:	list of children sync_pts for this sync_timeline
- * @child_list_lock:	lock protecting @child_list_head, destroyed, and
- *			  sync_pt.status
- * @active_list_head:	list of active (unsignaled/errored) sync_pts
- * @sync_timeline_list:	membership in global sync_timeline_list
- */
-struct sync_timeline {
-	struct kref		kref;
-	const struct sync_timeline_ops	*ops;
-	char			name[32];
-
-	/* protected by child_list_lock */
-	bool			destroyed;
-	int			context, value;
-
-	struct list_head	child_list_head;
-	spinlock_t		child_list_lock;
-
-	struct list_head	active_list_head;
-
-#ifdef CONFIG_DEBUG_FS
-	struct list_head	sync_timeline_list;
-#endif
-};
-
-/**
- * struct sync_pt - sync point
- * @fence:		base fence class
- * @child_list:		membership in sync_timeline.child_list_head
- * @active_list:	membership in sync_timeline.active_list_head
- * @signaled_list:	membership in temporary signaled_list on stack
- * @fence:		sync_fence to which the sync_pt belongs
- * @pt_list:		membership in sync_fence.pt_list_head
- * @status:		1: signaled, 0:active, <0: error
- * @timestamp:		time which sync_pt status transitioned from active to
- *			  signaled or error.
- */
-struct sync_pt {
-	struct fence base;
-
-	struct list_head	child_list;
-	struct list_head	active_list;
-};
-
-static inline struct sync_timeline *sync_pt_parent(struct sync_pt *pt)
-{
-	return container_of(pt->base.lock, struct sync_timeline,
-			    child_list_lock);
-}
-
-struct sync_fence_cb {
-	struct fence_cb cb;
-	struct fence *sync_pt;
-	struct sync_fence *fence;
-};
-
-/**
- * struct sync_fence - sync fence
- * @file:		file representing this fence
- * @kref:		reference count on fence.
- * @name:		name of sync_fence.  Useful for debugging
- * @pt_list_head:	list of sync_pts in the fence.  immutable once fence
- *			  is created
- * @status:		0: signaled, >0:active, <0: error
- *
- * @wq:			wait queue for fence signaling
- * @sync_fence_list:	membership in global fence list
- */
-struct sync_fence {
-	struct file		*file;
-	struct kref		kref;
-	char			name[32];
-#ifdef CONFIG_DEBUG_FS
-	struct list_head	sync_fence_list;
-#endif
-	int num_fences;
-
-	wait_queue_head_t	wq;
-	atomic_t		status;
-
-	struct sync_fence_cb	cbs[];
-};
-
-struct sync_fence_waiter;
-typedef void (*sync_callback_t)(struct sync_fence *fence,
-				struct sync_fence_waiter *waiter);
-
-/**
- * struct sync_fence_waiter - metadata for asynchronous waiter on a fence
- * @waiter_list:	membership in sync_fence.waiter_list_head
- * @callback:		function pointer to call when fence signals
- * @callback_data:	pointer to pass to @callback
- */
-struct sync_fence_waiter {
-	wait_queue_t work;
-	sync_callback_t callback;
-};
-
-static inline void sync_fence_waiter_init(struct sync_fence_waiter *waiter,
-					  sync_callback_t callback)
-{
-	INIT_LIST_HEAD(&waiter->work.task_list);
-	waiter->callback = callback;
-}
-
-/*
- * API for sync_timeline implementers
- */
-
-/**
- * sync_timeline_create() - creates a sync object
- * @ops:	specifies the implementation ops for the object
- * @size:	size to allocate for this obj
- * @name:	sync_timeline name
- *
- * Creates a new sync_timeline which will use the implementation specified by
- * @ops.  @size bytes will be allocated allowing for implementation specific
- * data to be kept after the generic sync_timeline struct.
- */
-struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
-					   int size, const char *name);
-
-/**
- * sync_timeline_destroy() - destroys a sync object
- * @obj:	sync_timeline to destroy
- *
- * A sync implementation should call this when the @obj is going away
- * (i.e. module unload.)  @obj won't actually be freed until all its children
- * sync_pts are freed.
- */
-void sync_timeline_destroy(struct sync_timeline *obj);
-
-/**
- * sync_timeline_signal() - signal a status change on a sync_timeline
- * @obj:	sync_timeline to signal
- *
- * A sync implementation should call this any time one of it's sync_pts
- * has signaled or has an error condition.
- */
-void sync_timeline_signal(struct sync_timeline *obj);
-
-/**
- * sync_pt_create() - creates a sync pt
- * @parent:	sync_pt's parent sync_timeline
- * @size:	size to allocate for this pt
- *
- * Creates a new sync_pt as a child of @parent.  @size bytes will be
- * allocated allowing for implementation specific data to be kept after
- * the generic sync_timeline struct.
- */
-struct sync_pt *sync_pt_create(struct sync_timeline *parent, int size);
-
-/**
- * sync_pt_free() - frees a sync pt
- * @pt:		sync_pt to free
- *
- * This should only be called on sync_pts which have been created but
- * not added to a fence.
- */
-void sync_pt_free(struct sync_pt *pt);
-
-/**
- * sync_fence_create() - creates a sync fence
- * @name:	name of fence to create
- * @pt:		sync_pt to add to the fence
- *
- * Creates a fence containg @pt.  Once this is called, the fence takes
- * ownership of @pt.
- */
-struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt);
-
-/**
- * sync_fence_create_dma() - creates a sync fence from dma-fence
- * @name:	name of fence to create
- * @pt:	dma-fence to add to the fence
- *
- * Creates a fence containg @pt.  Once this is called, the fence takes
- * ownership of @pt.
- */
-struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt);
-
-/*
- * API for sync_fence consumers
- */
-
-/**
- * sync_fence_merge() - merge two fences
- * @name:	name of new fence
- * @a:		fence a
- * @b:		fence b
- *
- * Creates a new fence which contains copies of all the sync_pts in both
- * @a and @b.  @a and @b remain valid, independent fences.
- */
-struct sync_fence *sync_fence_merge(const char *name,
-				    struct sync_fence *a, struct sync_fence *b);
-
-/**
- * sync_fence_fdget() - get a fence from an fd
- * @fd:		fd referencing a fence
- *
- * Ensures @fd references a valid fence, increments the refcount of the backing
- * file, and returns the fence.
- */
-struct sync_fence *sync_fence_fdget(int fd);
-
-/**
- * sync_fence_put() - puts a reference of a sync fence
- * @fence:	fence to put
- *
- * Puts a reference on @fence.  If this is the last reference, the fence and
- * all it's sync_pts will be freed
- */
-void sync_fence_put(struct sync_fence *fence);
-
-/**
- * sync_fence_install() - installs a fence into a file descriptor
- * @fence:	fence to install
- * @fd:		file descriptor in which to install the fence
- *
- * Installs @fence into @fd.  @fd's should be acquired through
- * get_unused_fd_flags(O_CLOEXEC).
- */
-void sync_fence_install(struct sync_fence *fence, int fd);
-
-/**
- * sync_fence_wait_async() - registers and async wait on the fence
- * @fence:		fence to wait on
- * @waiter:		waiter callback struck
- *
- * Returns 1 if @fence has already signaled.
- *
- * Registers a callback to be called when @fence signals or has an error.
- * @waiter should be initialized with sync_fence_waiter_init().
- */
-int sync_fence_wait_async(struct sync_fence *fence,
-			  struct sync_fence_waiter *waiter);
-
-/**
- * sync_fence_cancel_async() - cancels an async wait
- * @fence:		fence to wait on
- * @waiter:		waiter callback struck
- *
- * returns 0 if waiter was removed from fence's async waiter list.
- * returns -ENOENT if waiter was not found on fence's async waiter list.
- *
- * Cancels a previously registered async wait.  Will fail gracefully if
- * @waiter was never registered or if @fence has already signaled @waiter.
- */
-int sync_fence_cancel_async(struct sync_fence *fence,
-			    struct sync_fence_waiter *waiter);
-
-/**
- * sync_fence_wait() - wait on fence
- * @fence:	fence to wait on
- * @tiemout:	timeout in ms
- *
- * Wait for @fence to be signaled or have an error.  Waits indefinitely
- * if @timeout < 0
- */
-int sync_fence_wait(struct sync_fence *fence, long timeout);
-
-#ifdef CONFIG_DEBUG_FS
-
-void sync_timeline_debug_add(struct sync_timeline *obj);
-void sync_timeline_debug_remove(struct sync_timeline *obj);
-void sync_fence_debug_add(struct sync_fence *fence);
-void sync_fence_debug_remove(struct sync_fence *fence);
-void sync_dump(void);
-
-#else
-# define sync_timeline_debug_add(obj)
-# define sync_timeline_debug_remove(obj)
-# define sync_fence_debug_add(fence)
-# define sync_fence_debug_remove(fence)
-# define sync_dump()
-#endif
-int sync_fence_wake_up_wq(wait_queue_t *curr, unsigned mode,
-				 int wake_flags, void *key);
-
-#endif /* _LINUX_SYNC_H */
diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c
deleted file mode 100644
index f45d13c..0000000
--- a/drivers/staging/android/sync_debug.c
+++ /dev/null
@@ -1,256 +0,0 @@
-/*
- * drivers/base/sync.c
- *
- * Copyright (C) 2012 Google, Inc.
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#include <linux/debugfs.h>
-#include <linux/export.h>
-#include <linux/file.h>
-#include <linux/fs.h>
-#include <linux/kernel.h>
-#include <linux/poll.h>
-#include <linux/sched.h>
-#include <linux/seq_file.h>
-#include <linux/slab.h>
-#include <linux/uaccess.h>
-#include <linux/anon_inodes.h>
-#include <linux/time64.h>
-#include "sync.h"
-
-#ifdef CONFIG_DEBUG_FS
-
-static LIST_HEAD(sync_timeline_list_head);
-static DEFINE_SPINLOCK(sync_timeline_list_lock);
-static LIST_HEAD(sync_fence_list_head);
-static DEFINE_SPINLOCK(sync_fence_list_lock);
-
-void sync_timeline_debug_add(struct sync_timeline *obj)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&sync_timeline_list_lock, flags);
-	list_add_tail(&obj->sync_timeline_list, &sync_timeline_list_head);
-	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
-}
-
-void sync_timeline_debug_remove(struct sync_timeline *obj)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&sync_timeline_list_lock, flags);
-	list_del(&obj->sync_timeline_list);
-	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
-}
-
-void sync_fence_debug_add(struct sync_fence *fence)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&sync_fence_list_lock, flags);
-	list_add_tail(&fence->sync_fence_list, &sync_fence_list_head);
-	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
-}
-
-void sync_fence_debug_remove(struct sync_fence *fence)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&sync_fence_list_lock, flags);
-	list_del(&fence->sync_fence_list);
-	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
-}
-
-static const char *sync_status_str(int status)
-{
-	if (status == 0)
-		return "signaled";
-
-	if (status > 0)
-		return "active";
-
-	return "error";
-}
-
-static void sync_print_pt(struct seq_file *s, struct fence *pt, bool fence)
-{
-	int status = 1;
-
-	if (fence_is_signaled_locked(pt))
-		status = pt->status;
-
-	seq_printf(s, "  %s%spt %s",
-		   fence && pt->ops->get_timeline_name ?
-		   pt->ops->get_timeline_name(pt) : "",
-		   fence ? "_" : "",
-		   sync_status_str(status));
-
-	if (status <= 0) {
-		struct timespec64 ts64 =
-			ktime_to_timespec64(pt->timestamp);
-
-		seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec);
-	}
-
-	if ((!fence || pt->ops->timeline_value_str) &&
-	    pt->ops->fence_value_str) {
-		char value[64];
-		bool success;
-
-		pt->ops->fence_value_str(pt, value, sizeof(value));
-		success = strlen(value);
-
-		if (success)
-			seq_printf(s, ": %s", value);
-
-		if (success && fence) {
-			pt->ops->timeline_value_str(pt, value, sizeof(value));
-
-			if (strlen(value))
-				seq_printf(s, " / %s", value);
-		}
-	}
-
-	seq_puts(s, "\n");
-}
-
-static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj)
-{
-	struct list_head *pos;
-	unsigned long flags;
-
-	seq_printf(s, "%s %s", obj->name, obj->ops->driver_name);
-
-	if (obj->ops->timeline_value_str) {
-		char value[64];
-
-		obj->ops->timeline_value_str(obj, value, sizeof(value));
-		seq_printf(s, ": %s", value);
-	}
-
-	seq_puts(s, "\n");
-
-	spin_lock_irqsave(&obj->child_list_lock, flags);
-	list_for_each(pos, &obj->child_list_head) {
-		struct sync_pt *pt =
-			container_of(pos, struct sync_pt, child_list);
-		sync_print_pt(s, &pt->base, false);
-	}
-	spin_unlock_irqrestore(&obj->child_list_lock, flags);
-}
-
-static void sync_print_fence(struct seq_file *s, struct sync_fence *fence)
-{
-	wait_queue_t *pos;
-	unsigned long flags;
-	int i;
-
-	seq_printf(s, "[%p] %s: %s\n", fence, fence->name,
-		   sync_status_str(atomic_read(&fence->status)));
-
-	for (i = 0; i < fence->num_fences; ++i) {
-		sync_print_pt(s, fence->cbs[i].sync_pt, true);
-	}
-
-	spin_lock_irqsave(&fence->wq.lock, flags);
-	list_for_each_entry(pos, &fence->wq.task_list, task_list) {
-		struct sync_fence_waiter *waiter;
-
-		if (pos->func != &sync_fence_wake_up_wq)
-			continue;
-
-		waiter = container_of(pos, struct sync_fence_waiter, work);
-
-		seq_printf(s, "waiter %pF\n", waiter->callback);
-	}
-	spin_unlock_irqrestore(&fence->wq.lock, flags);
-}
-
-static int sync_debugfs_show(struct seq_file *s, void *unused)
-{
-	unsigned long flags;
-	struct list_head *pos;
-
-	seq_puts(s, "objs:\n--------------\n");
-
-	spin_lock_irqsave(&sync_timeline_list_lock, flags);
-	list_for_each(pos, &sync_timeline_list_head) {
-		struct sync_timeline *obj =
-			container_of(pos, struct sync_timeline,
-				     sync_timeline_list);
-
-		sync_print_obj(s, obj);
-		seq_puts(s, "\n");
-	}
-	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
-
-	seq_puts(s, "fences:\n--------------\n");
-
-	spin_lock_irqsave(&sync_fence_list_lock, flags);
-	list_for_each(pos, &sync_fence_list_head) {
-		struct sync_fence *fence =
-			container_of(pos, struct sync_fence, sync_fence_list);
-
-		sync_print_fence(s, fence);
-		seq_puts(s, "\n");
-	}
-	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
-	return 0;
-}
-
-static int sync_debugfs_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, sync_debugfs_show, inode->i_private);
-}
-
-static const struct file_operations sync_debugfs_fops = {
-	.open           = sync_debugfs_open,
-	.read           = seq_read,
-	.llseek         = seq_lseek,
-	.release        = single_release,
-};
-
-static __init int sync_debugfs_init(void)
-{
-	debugfs_create_file("sync", S_IRUGO, NULL, NULL, &sync_debugfs_fops);
-	return 0;
-}
-late_initcall(sync_debugfs_init);
-
-#define DUMP_CHUNK 256
-static char sync_dump_buf[64 * 1024];
-void sync_dump(void)
-{
-	struct seq_file s = {
-		.buf = sync_dump_buf,
-		.size = sizeof(sync_dump_buf) - 1,
-	};
-	int i;
-
-	sync_debugfs_show(&s, NULL);
-
-	for (i = 0; i < s.count; i += DUMP_CHUNK) {
-		if ((s.count - i) > DUMP_CHUNK) {
-			char c = s.buf[i + DUMP_CHUNK];
-
-			s.buf[i + DUMP_CHUNK] = 0;
-			pr_cont("%s", s.buf + i);
-			s.buf[i + DUMP_CHUNK] = c;
-		} else {
-			s.buf[s.count] = 0;
-			pr_cont("%s", s.buf + i);
-		}
-	}
-}
-
-#endif
diff --git a/drivers/staging/android/trace/sync.h b/drivers/staging/android/trace/sync.h
deleted file mode 100644
index 77edb97..0000000
--- a/drivers/staging/android/trace/sync.h
+++ /dev/null
@@ -1,82 +0,0 @@
-#undef TRACE_SYSTEM
-#define TRACE_INCLUDE_PATH ../../drivers/staging/android/trace
-#define TRACE_SYSTEM sync
-
-#if !defined(_TRACE_SYNC_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_SYNC_H
-
-#include "../sync.h"
-#include <linux/tracepoint.h>
-
-TRACE_EVENT(sync_timeline,
-	TP_PROTO(struct sync_timeline *timeline),
-
-	TP_ARGS(timeline),
-
-	TP_STRUCT__entry(
-			__string(name, timeline->name)
-			__array(char, value, 32)
-	),
-
-	TP_fast_assign(
-			__assign_str(name, timeline->name);
-			if (timeline->ops->timeline_value_str) {
-				timeline->ops->timeline_value_str(timeline,
-							__entry->value,
-							sizeof(__entry->value));
-			} else {
-				__entry->value[0] = '\0';
-			}
-	),
-
-	TP_printk("name=%s value=%s", __get_str(name), __entry->value)
-);
-
-TRACE_EVENT(sync_wait,
-	TP_PROTO(struct sync_fence *fence, int begin),
-
-	TP_ARGS(fence, begin),
-
-	TP_STRUCT__entry(
-			__string(name, fence->name)
-			__field(s32, status)
-			__field(u32, begin)
-	),
-
-	TP_fast_assign(
-			__assign_str(name, fence->name);
-			__entry->status = atomic_read(&fence->status);
-			__entry->begin = begin;
-	),
-
-	TP_printk("%s name=%s state=%d", __entry->begin ? "begin" : "end",
-			__get_str(name), __entry->status)
-);
-
-TRACE_EVENT(sync_pt,
-	TP_PROTO(struct fence *pt),
-
-	TP_ARGS(pt),
-
-	TP_STRUCT__entry(
-		__string(timeline, pt->ops->get_timeline_name(pt))
-		__array(char, value, 32)
-	),
-
-	TP_fast_assign(
-		__assign_str(timeline, pt->ops->get_timeline_name(pt));
-		if (pt->ops->fence_value_str) {
-			pt->ops->fence_value_str(pt, __entry->value,
-							sizeof(__entry->value));
-		} else {
-			__entry->value[0] = '\0';
-		}
-	),
-
-	TP_printk("name=%s value=%s", __get_str(timeline), __entry->value)
-);
-
-#endif /* if !defined(_TRACE_SYNC_H) || defined(TRACE_HEADER_MULTI_READ) */
-
-/* This part must be outside protection */
-#include <trace/define_trace.h>
diff --git a/drivers/staging/android/uapi/sw_sync.h b/drivers/staging/android/uapi/sw_sync.h
deleted file mode 100644
index 9b5d486..0000000
--- a/drivers/staging/android/uapi/sw_sync.h
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Copyright (C) 2012 Google, Inc.
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#ifndef _UAPI_LINUX_SW_SYNC_H
-#define _UAPI_LINUX_SW_SYNC_H
-
-#include <linux/types.h>
-
-struct sw_sync_create_fence_data {
-	__u32	value;
-	char	name[32];
-	__s32	fence; /* fd of new fence */
-};
-
-#define SW_SYNC_IOC_MAGIC	'W'
-
-#define SW_SYNC_IOC_CREATE_FENCE	_IOWR(SW_SYNC_IOC_MAGIC, 0,\
-		struct sw_sync_create_fence_data)
-#define SW_SYNC_IOC_INC			_IOW(SW_SYNC_IOC_MAGIC, 1, __u32)
-
-#endif /* _UAPI_LINUX_SW_SYNC_H */
diff --git a/drivers/staging/android/uapi/sync.h b/drivers/staging/android/uapi/sync.h
deleted file mode 100644
index e964c75..0000000
--- a/drivers/staging/android/uapi/sync.h
+++ /dev/null
@@ -1,97 +0,0 @@
-/*
- * Copyright (C) 2012 Google, Inc.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#ifndef _UAPI_LINUX_SYNC_H
-#define _UAPI_LINUX_SYNC_H
-
-#include <linux/ioctl.h>
-#include <linux/types.h>
-
-/**
- * struct sync_merge_data - data passed to merge ioctl
- * @fd2:	file descriptor of second fence
- * @name:	name of new fence
- * @fence:	returns the fd of the new fence to userspace
- */
-struct sync_merge_data {
-	__s32	fd2; /* fd of second fence */
-	char	name[32]; /* name of new fence */
-	__s32	fence; /* fd on newly created fence */
-};
-
-/**
- * struct sync_pt_info - detailed sync_pt information
- * @len:		length of sync_pt_info including any driver_data
- * @obj_name:		name of parent sync_timeline
- * @driver_name:	name of driver implementing the parent
- * @status:		status of the sync_pt 0:active 1:signaled <0:error
- * @timestamp_ns:	timestamp of status change in nanoseconds
- * @driver_data:	any driver dependent data
- */
-struct sync_pt_info {
-	__u32	len;
-	char	obj_name[32];
-	char	driver_name[32];
-	__s32	status;
-	__u64	timestamp_ns;
-
-	__u8	driver_data[0];
-};
-
-/**
- * struct sync_fence_info_data - data returned from fence info ioctl
- * @len:	ioctl caller writes the size of the buffer its passing in.
- *		ioctl returns length of sync_fence_data returned to userspace
- *		including pt_info.
- * @name:	name of fence
- * @status:	status of fence. 1: signaled 0:active <0:error
- * @pt_info:	a sync_pt_info struct for every sync_pt in the fence
- */
-struct sync_fence_info_data {
-	__u32	len;
-	char	name[32];
-	__s32	status;
-
-	__u8	pt_info[0];
-};
-
-#define SYNC_IOC_MAGIC		'>'
-
-/**
- * DOC: SYNC_IOC_WAIT - wait for a fence to signal
- *
- * pass timeout in milliseconds.  Waits indefinitely timeout < 0.
- */
-#define SYNC_IOC_WAIT		_IOW(SYNC_IOC_MAGIC, 0, __s32)
-
-/**
- * DOC: SYNC_IOC_MERGE - merge two fences
- *
- * Takes a struct sync_merge_data.  Creates a new fence containing copies of
- * the sync_pts in both the calling fd and sync_merge_data.fd2.  Returns the
- * new fence's fd in sync_merge_data.fence
- */
-#define SYNC_IOC_MERGE		_IOWR(SYNC_IOC_MAGIC, 1, struct sync_merge_data)
-
-/**
- * DOC: SYNC_IOC_FENCE_INFO - get detailed information on a fence
- *
- * Takes a struct sync_fence_info_data with extra space allocated for pt_info.
- * Caller should write the size of the buffer into len.  On return, len is
- * updated to reflect the total size of the sync_fence_info_data including
- * pt_info.
- *
- * pt_info is a buffer containing sync_pt_infos for every sync_pt in the fence.
- * To iterate over the sync_pt_infos, use the sync_pt_info.len field.
- */
-#define SYNC_IOC_FENCE_INFO	_IOWR(SYNC_IOC_MAGIC, 2,\
-	struct sync_fence_info_data)
-
-#endif /* _UAPI_LINUX_SYNC_H */
diff --git a/include/uapi/Kbuild b/include/uapi/Kbuild
index 245aa6e..7c415d0 100644
--- a/include/uapi/Kbuild
+++ b/include/uapi/Kbuild
@@ -13,3 +13,4 @@ header-y += drm/
 header-y += xen/
 header-y += scsi/
 header-y += misc/
+header-y += sync/
diff --git a/include/uapi/sync/Kbuild b/include/uapi/sync/Kbuild
new file mode 100644
index 0000000..2716ffe
--- /dev/null
+++ b/include/uapi/sync/Kbuild
@@ -0,0 +1,3 @@
+# sync Header export list
+header-y += sw_sync.h
+header-y += sync.h
diff --git a/include/uapi/sync/sw_sync.h b/include/uapi/sync/sw_sync.h
new file mode 100644
index 0000000..9b5d486
--- /dev/null
+++ b/include/uapi/sync/sw_sync.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_SW_SYNC_H
+#define _UAPI_LINUX_SW_SYNC_H
+
+#include <linux/types.h>
+
+struct sw_sync_create_fence_data {
+	__u32	value;
+	char	name[32];
+	__s32	fence; /* fd of new fence */
+};
+
+#define SW_SYNC_IOC_MAGIC	'W'
+
+#define SW_SYNC_IOC_CREATE_FENCE	_IOWR(SW_SYNC_IOC_MAGIC, 0,\
+		struct sw_sync_create_fence_data)
+#define SW_SYNC_IOC_INC			_IOW(SW_SYNC_IOC_MAGIC, 1, __u32)
+
+#endif /* _UAPI_LINUX_SW_SYNC_H */
diff --git a/include/uapi/sync/sync.h b/include/uapi/sync/sync.h
new file mode 100644
index 0000000..e964c75
--- /dev/null
+++ b/include/uapi/sync/sync.h
@@ -0,0 +1,97 @@
+/*
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_SYNC_H
+#define _UAPI_LINUX_SYNC_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/**
+ * struct sync_merge_data - data passed to merge ioctl
+ * @fd2:	file descriptor of second fence
+ * @name:	name of new fence
+ * @fence:	returns the fd of the new fence to userspace
+ */
+struct sync_merge_data {
+	__s32	fd2; /* fd of second fence */
+	char	name[32]; /* name of new fence */
+	__s32	fence; /* fd on newly created fence */
+};
+
+/**
+ * struct sync_pt_info - detailed sync_pt information
+ * @len:		length of sync_pt_info including any driver_data
+ * @obj_name:		name of parent sync_timeline
+ * @driver_name:	name of driver implementing the parent
+ * @status:		status of the sync_pt 0:active 1:signaled <0:error
+ * @timestamp_ns:	timestamp of status change in nanoseconds
+ * @driver_data:	any driver dependent data
+ */
+struct sync_pt_info {
+	__u32	len;
+	char	obj_name[32];
+	char	driver_name[32];
+	__s32	status;
+	__u64	timestamp_ns;
+
+	__u8	driver_data[0];
+};
+
+/**
+ * struct sync_fence_info_data - data returned from fence info ioctl
+ * @len:	ioctl caller writes the size of the buffer its passing in.
+ *		ioctl returns length of sync_fence_data returned to userspace
+ *		including pt_info.
+ * @name:	name of fence
+ * @status:	status of fence. 1: signaled 0:active <0:error
+ * @pt_info:	a sync_pt_info struct for every sync_pt in the fence
+ */
+struct sync_fence_info_data {
+	__u32	len;
+	char	name[32];
+	__s32	status;
+
+	__u8	pt_info[0];
+};
+
+#define SYNC_IOC_MAGIC		'>'
+
+/**
+ * DOC: SYNC_IOC_WAIT - wait for a fence to signal
+ *
+ * pass timeout in milliseconds.  Waits indefinitely timeout < 0.
+ */
+#define SYNC_IOC_WAIT		_IOW(SYNC_IOC_MAGIC, 0, __s32)
+
+/**
+ * DOC: SYNC_IOC_MERGE - merge two fences
+ *
+ * Takes a struct sync_merge_data.  Creates a new fence containing copies of
+ * the sync_pts in both the calling fd and sync_merge_data.fd2.  Returns the
+ * new fence's fd in sync_merge_data.fence
+ */
+#define SYNC_IOC_MERGE		_IOWR(SYNC_IOC_MAGIC, 1, struct sync_merge_data)
+
+/**
+ * DOC: SYNC_IOC_FENCE_INFO - get detailed information on a fence
+ *
+ * Takes a struct sync_fence_info_data with extra space allocated for pt_info.
+ * Caller should write the size of the buffer into len.  On return, len is
+ * updated to reflect the total size of the sync_fence_info_data including
+ * pt_info.
+ *
+ * pt_info is a buffer containing sync_pt_infos for every sync_pt in the fence.
+ * To iterate over the sync_pt_infos, use the sync_pt_info.len field.
+ */
+#define SYNC_IOC_FENCE_INFO	_IOWR(SYNC_IOC_MAGIC, 2,\
+	struct sync_fence_info_data)
+
+#endif /* _UAPI_LINUX_SYNC_H */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 04/13] android/sync: Improved debug dump to dmesg
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (2 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 03/13] staging/android/sync: Move sync framework out of staging John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-17 17:36   ` Jesse Barnes
  2015-12-11 13:11 ` [PATCH 05/13] drm/i915: Convert requests to use struct fence John.C.Harrison
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The sync code has a facility for dumping current state information via
debugfs. It also has a way to re-use the same code for dumping to the
kernel log on an internal error. However, the redirection was rather
clunky and split the output across multiple prints at arbitrary
boundaries. This made it difficult to read and could result in output
from different sources being randomly interspersed.

This patch improves the redirection code to split the output on line
feed boundaries instead. It also adds support for highlighting the
offending fence object that caused the state dump in the first place.

v4: New patch in series.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/android/sync.c       |  9 ++++++--
 drivers/android/sync.h       |  5 +++--
 drivers/android/sync_debug.c | 50 ++++++++++++++++++++++++++++++++------------
 3 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/drivers/android/sync.c b/drivers/android/sync.c
index 7f0e919..db4a54b 100644
--- a/drivers/android/sync.c
+++ b/drivers/android/sync.c
@@ -86,6 +86,11 @@ static void sync_timeline_put(struct sync_timeline *obj)
 
 void sync_timeline_destroy(struct sync_timeline *obj)
 {
+	if (!list_empty(&obj->active_list_head)) {
+		pr_info("destroying timeline with outstanding fences!\n");
+		sync_dump_timeline(obj);
+	}
+
 	obj->destroyed = true;
 	/*
 	 * Ensure timeline is marked as destroyed before
@@ -397,7 +402,7 @@ int sync_fence_wait(struct sync_fence *fence, long timeout)
 		if (timeout) {
 			pr_info("fence timeout on [%p] after %dms\n", fence,
 				jiffies_to_msecs(timeout));
-			sync_dump();
+			sync_dump(fence);
 		}
 		return -ETIME;
 	}
@@ -405,7 +410,7 @@ int sync_fence_wait(struct sync_fence *fence, long timeout)
 	ret = atomic_read(&fence->status);
 	if (ret) {
 		pr_info("fence error %ld on [%p]\n", ret, fence);
-		sync_dump();
+		sync_dump(fence);
 	}
 	return ret;
 }
diff --git a/drivers/android/sync.h b/drivers/android/sync.h
index 4ccff01..d57fa0a 100644
--- a/drivers/android/sync.h
+++ b/drivers/android/sync.h
@@ -351,14 +351,15 @@ void sync_timeline_debug_add(struct sync_timeline *obj);
 void sync_timeline_debug_remove(struct sync_timeline *obj);
 void sync_fence_debug_add(struct sync_fence *fence);
 void sync_fence_debug_remove(struct sync_fence *fence);
-void sync_dump(void);
+void sync_dump(struct sync_fence *fence);
+void sync_dump_timeline(struct sync_timeline *timeline);
 
 #else
 # define sync_timeline_debug_add(obj)
 # define sync_timeline_debug_remove(obj)
 # define sync_fence_debug_add(fence)
 # define sync_fence_debug_remove(fence)
-# define sync_dump()
+# define sync_dump(fence)
 #endif
 int sync_fence_wake_up_wq(wait_queue_t *curr, unsigned mode,
 				 int wake_flags, void *key);
diff --git a/drivers/android/sync_debug.c b/drivers/android/sync_debug.c
index f45d13c..9b87e0a 100644
--- a/drivers/android/sync_debug.c
+++ b/drivers/android/sync_debug.c
@@ -229,28 +229,52 @@ late_initcall(sync_debugfs_init);
 
 #define DUMP_CHUNK 256
 static char sync_dump_buf[64 * 1024];
-void sync_dump(void)
+
+static void sync_dump_dfs(struct seq_file *s, void *targetPtr)
+{
+	char *start, *end;
+	char targetStr[100];
+
+	if (targetPtr)
+		snprintf(targetStr, sizeof(targetStr) - 1, "%p", targetPtr);
+
+	start = end = s->buf;
+	while( (end = strchr(end, '\n'))) {
+		*end = 0;
+		if (targetPtr && strstr(start, targetStr))
+			pr_info("*** %s ***\n", start);
+		else
+			pr_info("%s\n", start);
+		start = ++end;
+	}
+
+	if ((start - s->buf) < s->count)
+		pr_info("%d vs %d: >?>%s<?<\n", (uint32_t) (start - s->buf), (uint32_t) s->count, start);
+}
+
+void sync_dump(struct sync_fence *targetPtr)
 {
 	struct seq_file s = {
 		.buf = sync_dump_buf,
 		.size = sizeof(sync_dump_buf) - 1,
 	};
-	int i;
 
 	sync_debugfs_show(&s, NULL);
 
-	for (i = 0; i < s.count; i += DUMP_CHUNK) {
-		if ((s.count - i) > DUMP_CHUNK) {
-			char c = s.buf[i + DUMP_CHUNK];
+	sync_dump_dfs(&s, targetPtr);
+}
 
-			s.buf[i + DUMP_CHUNK] = 0;
-			pr_cont("%s", s.buf + i);
-			s.buf[i + DUMP_CHUNK] = c;
-		} else {
-			s.buf[s.count] = 0;
-			pr_cont("%s", s.buf + i);
-		}
-	}
+void sync_dump_timeline(struct sync_timeline *timeline)
+{
+	struct seq_file s = {
+		.buf = sync_dump_buf,
+		.size = sizeof(sync_dump_buf) - 1,
+	};
+
+	pr_info("timeline: %p\n", timeline);
+	sync_print_obj(&s, timeline);
+
+	sync_dump_dfs(&s, NULL);
 }
 
 #endif
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 05/13] drm/i915: Convert requests to use struct fence
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (3 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 04/13] android/sync: Improved debug dump to dmesg John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-17 17:43   ` Jesse Barnes
  2015-12-11 13:11 ` [PATCH 06/13] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

This patch makes the first step of integrating a struct fence into the
request. It replaces the explicit reference count with that of the
fence. It also replaces the 'is completed' test with the fence's
equivalent. Currently, that simply chains on to the original request
implementation. A future patch will improve this.

v3: Updated after review comments by Tvrtko Ursulin. Added fence
context/seqno pair to the debugfs request info. Renamed fence 'driver
name' to just 'i915'. Removed BUG_ONs.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  5 +--
 drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem.c         | 56 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c        |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 6 files changed, 81 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7415606..5b31186 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 			task = NULL;
 			if (req->pid)
 				task = pid_task(req->pid, PIDTYPE_PID);
-			seq_printf(m, "    %x @ %d: %s [%d]\n",
+			seq_printf(m, "    %x @ %d: %s [%d], fence = %u.%u\n",
 				   req->seqno,
 				   (int) (jiffies - req->emitted_jiffies),
 				   task ? task->comm : "<unknown>",
-				   task ? task->pid : -1);
+				   task ? task->pid : -1,
+				   req->fence.context, req->fence.seqno);
 			rcu_read_unlock();
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 436149e..aa5cba7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -51,6 +51,7 @@
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
 #include "intel_guc.h"
+#include <linux/fence.h>
 
 /* General customization:
  */
@@ -2174,7 +2175,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	/**
+	 * Underlying object for implementing the signal/wait stuff.
+	 * NB: Never call fence_later() or return this fence object to user
+	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
+	 * etc., there is no guarantee at all about the validity or
+	 * sequentiality of the fence's seqno! It is also unsafe to let
+	 * anything outside of the i915 driver get hold of the fence object
+	 * as the clean up when decrementing the reference count requires
+	 * holding the driver mutex lock.
+	 */
+	struct fence fence;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2251,7 +2262,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+					      bool lazy_coherency)
+{
+	return fence_is_signaled(&req->fence);
+}
+
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
 
@@ -2271,7 +2288,7 @@ static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
 	if (req)
-		kref_get(&req->ref);
+		fence_get(&req->fence);
 	return req;
 }
 
@@ -2279,7 +2296,7 @@ static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
 	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void
@@ -2291,7 +2308,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
 		return;
 
 	dev = req->ring->dev;
-	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
+	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
 		mutex_unlock(&dev->struct_mutex);
 }
 
@@ -2308,12 +2325,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 }
 
 /*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
-/*
  * A command that requires special handling by the command parser.
  */
 struct drm_i915_cmd_descriptor {
@@ -2916,18 +2927,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
-{
-	u32 seqno;
-
-	BUG_ON(req == NULL);
-
-	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-
-	return i915_seqno_passed(seqno, req->seqno);
-}
-
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e4056a3..a1b4dbd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2617,12 +2617,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
 	struct intel_context *ctx = req->ctx;
 
+	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
+
 	if (req->file_priv)
 		i915_gem_request_remove_from_client(req);
 
@@ -2638,6 +2640,45 @@ void i915_gem_request_free(struct kref *req_ref)
 	kmem_cache_free(req->i915->requests, req);
 }
 
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+	/* Interrupt driven fences are not implemented yet.*/
+	WARN(true, "This should not be called!");
+	return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	u32 seqno;
+
+	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+
+	return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+	return "i915";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	return req->ring->name;
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+	.enable_signaling	= i915_gem_request_enable_signaling,
+	.signaled		= i915_gem_request_is_completed,
+	.wait			= fence_default_wait,
+	.release		= i915_gem_request_free,
+	.get_driver_name	= i915_gem_request_get_driver_name,
+	.get_timeline_name	= i915_gem_request_get_timeline_name,
+};
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
@@ -2659,7 +2700,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	if (ret)
 		goto err;
 
-	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->ring = ring;
 	req->ctx  = ctx;
@@ -2674,6 +2714,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
+	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
@@ -4723,7 +4765,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i, j;
+	int ret, i, j, fence_base;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -4793,12 +4835,16 @@ i915_gem_init_hw(struct drm_device *dev)
 	if (ret)
 		goto out;
 
+	fence_base = fence_context_alloc(I915_NUM_RINGS);
+
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
 		WARN_ON(!ring->default_context);
 
+		ring->fence_context = fence_base + i;
+
 		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
 		if (ret) {
 			i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 06180dc..b8c8f9b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1920,6 +1920,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c9b081f..f4a6403 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2158,6 +2158,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	INIT_LIST_HEAD(&ring->buffers);
+	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 58b1976..4547645 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -348,6 +348,9 @@ struct  intel_engine_cs {
 	 * to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	unsigned fence_context;
+	spinlock_t fence_lock;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 06/13] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (4 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 05/13] drm/i915: Convert requests to use struct fence John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-11 13:11 ` [PATCH 07/13] drm/i915: Add per context timelines to fence object John.C.Harrison
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      |  3 +--
 drivers/gpu/drm/i915/i915_gem.c      | 18 +++++++++---------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5b31186..18dfb56 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   ring->get_seqno(ring, true),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index aa5cba7..caf7897 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2263,8 +2263,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a1b4dbd..0801738 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1165,7 +1165,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 	timeout = jiffies + 1;
 	while (!need_resched()) {
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			return 0;
 
 		if (time_after_eq(jiffies, timeout))
@@ -1173,7 +1173,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 		cpu_relax_lowlatency();
 	}
-	if (i915_gem_request_completed(req, false))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	return -EAGAIN;
@@ -1217,7 +1217,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_expire = timeout ?
@@ -1257,7 +1257,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		if (i915_gem_request_completed(req, false)) {
+		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
 		}
@@ -2758,7 +2758,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 	struct drm_i915_gem_request *request;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -2899,7 +2899,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -2923,7 +2923,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	}
 
 	if (unlikely(ring->trace_irq_req &&
-		     i915_gem_request_completed(ring->trace_irq_req, true))) {
+		     i915_gem_request_completed(ring->trace_irq_req))) {
 		ring->irq_put(ring);
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
@@ -3029,7 +3029,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (list_empty(&req->list))
 			goto retire;
 
-		if (i915_gem_request_completed(req, true)) {
+		if (i915_gem_request_completed(req)) {
 			__i915_gem_request_retire__upto(req);
 retire:
 			i915_gem_object_retire__read(obj, i);
@@ -3141,7 +3141,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index a5dd528..510365e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11313,7 +11313,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index ebd6735..c207a3a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7170,7 +7170,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
@@ -7186,7 +7186,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (req == NULL || INTEL_INFO(dev)->gen < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 07/13] drm/i915: Add per context timelines to fence object
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (5 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 06/13] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-17 17:49   ` Jesse Barnes
  2015-12-11 13:11 ` [PATCH 08/13] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel drivers
and user applications to wait on batch buffer completion
asynchronously via the dma-buff fence API.

To ensure that such external users are not confused by strange things
happening with the seqno, this patch adds in a per context timeline
that can provide a guaranteed in-order seqno value for the fence. This
is safe because the scheduler will not re-order batch buffers within a
context - they are considered to be mutually dependent.

v2: New patch in series.

v3: Renamed/retyped timeline structure fields after review comments by
Tvrtko Ursulin.

Added context information to the timeline's name string for better
identification in debugfs output.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 25 ++++++++---
 drivers/gpu/drm/i915/i915_gem.c         | 80 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_context.c | 15 ++++++-
 drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 5 files changed, 111 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index caf7897..7d6a7c0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -841,6 +841,15 @@ struct i915_ctx_hang_stats {
 	bool banned;
 };
 
+struct i915_fence_timeline {
+	char        name[32];
+	unsigned    fence_context;
+	unsigned    next;
+
+	struct intel_context *ctx;
+	struct intel_engine_cs *ring;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -885,6 +894,7 @@ struct intel_context {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
 		int pin_count;
+		struct i915_fence_timeline fence_timeline;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
@@ -2177,13 +2187,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
 	/**
 	 * Underlying object for implementing the signal/wait stuff.
-	 * NB: Never call fence_later() or return this fence object to user
-	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
-	 * etc., there is no guarantee at all about the validity or
-	 * sequentiality of the fence's seqno! It is also unsafe to let
-	 * anything outside of the i915 driver get hold of the fence object
-	 * as the clean up when decrementing the reference count requires
-	 * holding the driver mutex lock.
+	 * NB: Never return this fence object to user land! It is unsafe to
+	 * let anything outside of the i915 driver get hold of the fence
+	 * object as the clean up when decrementing the reference count
+	 * requires holding the driver mutex lock.
 	 */
 	struct fence fence;
 
@@ -2263,6 +2270,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *ring);
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0801738..7a37fb7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2665,9 +2665,32 @@ static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
 
 static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_fence,
-						 typeof(*req), fence);
-	return req->ring->name;
+	struct drm_i915_gem_request *req;
+	struct i915_fence_timeline *timeline;
+
+	req = container_of(req_fence, typeof(*req), fence);
+	timeline = &req->ctx->engine[req->ring->id].fence_timeline;
+
+	return timeline->name;
+}
+
+static void i915_gem_request_timeline_value_str(struct fence *req_fence, char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	/* Last signalled timeline value ??? */
+	snprintf(str, size, "? [%d]"/*, timeline->value*/, req->ring->get_seqno(req->ring, true));
+}
+
+static void i915_gem_request_fence_value_str(struct fence *req_fence, char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
 }
 
 static const struct fence_ops i915_gem_request_fops = {
@@ -2677,8 +2700,49 @@ static const struct fence_ops i915_gem_request_fops = {
 	.release		= i915_gem_request_free,
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
+	.fence_value_str	= i915_gem_request_fence_value_str,
+	.timeline_value_str	= i915_gem_request_timeline_value_str,
 };
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *ring)
+{
+	struct i915_fence_timeline *timeline;
+
+	timeline = &ctx->engine[ring->id].fence_timeline;
+
+	if (timeline->ring)
+		return 0;
+
+	timeline->fence_context = fence_context_alloc(1);
+
+	/*
+	 * Start the timeline from seqno 0 as this is a special value
+	 * that is reserved for invalid sync points.
+	 */
+	timeline->next       = 1;
+	timeline->ctx        = ctx;
+	timeline->ring       = ring;
+
+	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d", timeline->fence_context, ring->name, ctx->user_handle);
+
+	return 0;
+}
+
+static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+{
+	unsigned seqno;
+
+	seqno = timeline->next;
+
+	/* Reserve zero for invalid */
+	if (++timeline->next == 0 )
+		timeline->next = 1;
+
+	return seqno;
+}
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
@@ -2714,7 +2778,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
-	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
+	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
+		   ctx->engine[ring->id].fence_timeline.fence_context,
+		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
@@ -4765,7 +4831,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i, j, fence_base;
+	int ret, i, j;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -4835,16 +4901,12 @@ i915_gem_init_hw(struct drm_device *dev)
 	if (ret)
 		goto out;
 
-	fence_base = fence_context_alloc(I915_NUM_RINGS);
-
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
 		WARN_ON(!ring->default_context);
 
-		ring->fence_context = fence_base + i;
-
 		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
 		if (ret) {
 			i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 43b1c73..2798ddc 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -266,7 +266,7 @@ i915_gem_create_context(struct drm_device *dev,
 {
 	const bool is_global_default_ctx = file_priv == NULL;
 	struct intel_context *ctx;
-	int ret = 0;
+	int i, ret = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 
@@ -274,6 +274,19 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
+	if (!i915.enable_execlists) {
+		struct intel_engine_cs *ring;
+
+		/* Create a per context timeline for fences */
+		for_each_ring(ring, to_i915(dev), i) {
+			ret = i915_create_fence_timeline(dev, ctx, ring);
+			if (ret) {
+				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n", ring->name, ctx);
+				goto err_destroy;
+			}
+		}
+	}
+
 	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b8c8f9b..2b56651 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2489,6 +2489,14 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 		goto error_ringbuf;
 	}
 
+	/* Create a per context timeline for fences */
+	ret = i915_create_fence_timeline(dev, ctx, ring);
+	if (ret) {
+		DRM_ERROR("Fence timeline creation failed for ring %s, ctx %p\n",
+			  ring->name, ctx);
+		goto error_ringbuf;
+	}
+
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4547645..356b6a8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -349,7 +349,6 @@ struct  intel_engine_cs {
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
-	unsigned fence_context;
 	spinlock_t fence_lock;
 };
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 08/13] drm/i915: Delay the freeing of requests until retire time
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (6 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 07/13] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-11 13:11 ` [PATCH 09/13] drm/i915: Interrupt driven fences John.C.Harrison
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while all references were held internally to the driver. However, the
plan is to allow the underlying fence object (and hence the request
itself) to be returned to other drivers and to userland. External
users cannot be expected to acquire a driver private mutex lock.

Rather than attempt to disentangle the request structure from the
driver mutex lock, the decsion was to defer the free code until a
later (safer) point. Hence this patch changes the unreference callback
to merely move the request onto a delayed free list. The driver's
retire worker thread will then process the list and actually call the
free function on the requests.

v2: New patch in series.

v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
to 'link' rather than 'list'. Update list processing to be more
efficient/safer with respect to spinlocks.

v4: Changed to use basic spinlocks rather than IRQ ones - missed
update from earlier feedback by Tvrtko.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 22 +++-----------------
 drivers/gpu/drm/i915/i915_gem.c         | 37 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_display.c    |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  2 ++
 drivers/gpu/drm/i915/intel_pm.c         |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++++
 7 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7d6a7c0..fbf591f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2185,14 +2185,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	/**
-	 * Underlying object for implementing the signal/wait stuff.
-	 * NB: Never return this fence object to user land! It is unsafe to
-	 * let anything outside of the i915 driver get hold of the fence
-	 * object as the clean up when decrementing the reference count
-	 * requires holding the driver mutex lock.
-	 */
+	/** Underlying object for implementing the signal/wait stuff. */
 	struct fence fence;
+	struct list_head delayed_free_link;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2305,21 +2300,10 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	fence_put(&req->fence);
-}
-
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-	struct drm_device *dev;
-
 	if (!req)
 		return;
 
-	dev = req->ring->dev;
-	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
-		mutex_unlock(&dev->struct_mutex);
+	fence_put(&req->fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7a37fb7..f6c3e96 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2617,10 +2617,26 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-static void i915_gem_request_free(struct fence *req_fence)
+static void i915_gem_request_release(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+	struct intel_engine_cs *ring = req->ring;
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+
+	/*
+	 * Need to add the request to a deferred dereference list to be
+	 * processed at a mutex lock safe time.
+	 */
+	spin_lock(&ring->delayed_free_lock);
+	list_add_tail(&req->delayed_free_link, &ring->delayed_free_list);
+	spin_unlock(&ring->delayed_free_lock);
+
+	queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
+}
+
+static void i915_gem_request_free(struct drm_i915_gem_request *req)
+{
 	struct intel_context *ctx = req->ctx;
 
 	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
@@ -2697,7 +2713,7 @@ static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
 	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
-	.release		= i915_gem_request_free,
+	.release		= i915_gem_request_release,
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
 	.fence_value_str	= i915_gem_request_fence_value_str,
@@ -2951,6 +2967,9 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
+	struct drm_i915_gem_request *req, *req_next;
+	LIST_HEAD(list_head);
+
 	WARN_ON(i915_verify_lists(ring->dev));
 
 	/* Retire requests first as we use it above for the early return.
@@ -2994,6 +3013,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
 
+	/* Really free any requests that were recently unreferenced */
+	spin_lock(&ring->delayed_free_lock);
+	list_splice_init(&ring->delayed_free_list, &list_head);
+	spin_unlock(&ring->delayed_free_lock);
+	list_for_each_entry_safe(req, req_next, &list_head, delayed_free_link) {
+		list_del(&req->delayed_free_link);
+		i915_gem_request_free(req);
+	}
+
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
@@ -3184,7 +3212,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			ret = __i915_wait_request(req[i], reset_counter, true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  file->driver_priv);
-		i915_gem_request_unreference__unlocked(req[i]);
+		i915_gem_request_unreference(req[i]);
 	}
 	return ret;
 
@@ -4179,7 +4207,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
-	i915_gem_request_unreference__unlocked(target);
+	i915_gem_request_unreference(target);
 
 	return ret;
 }
@@ -5036,6 +5064,7 @@ init_ring_lists(struct intel_engine_cs *ring)
 {
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 }
 
 void
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 510365e..9291a1d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11256,7 +11256,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 					    mmio_flip->crtc->reset_counter,
 					    false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
-		i915_gem_request_unreference__unlocked(mmio_flip->req);
+		i915_gem_request_unreference(mmio_flip->req);
 	}
 
 	intel_do_mmio_flip(mmio_flip);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2b56651..06a398a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1920,7 +1920,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
+	spin_lock_init(&ring->delayed_free_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index c207a3a..e2d34a6 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7174,7 +7174,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
-	i915_gem_request_unreference__unlocked(req);
+	i915_gem_request_unreference(req);
 	kfree(boost);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index f4a6403..e5573e7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2158,7 +2158,9 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	INIT_LIST_HEAD(&ring->buffers);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
+	spin_lock_init(&ring->delayed_free_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 356b6a8..77384ed 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -301,6 +301,10 @@ struct  intel_engine_cs {
 	 */
 	u32 last_submitted_seqno;
 
+	/* deferred free list to allow unreferencing requests outside the driver */
+	struct list_head delayed_free_list;
+	spinlock_t delayed_free_lock;
+
 	bool gpu_caches_dirty;
 
 	wait_queue_head_t irq_queue;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 09/13] drm/i915: Interrupt driven fences
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (7 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 08/13] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-11 15:30   ` John Harrison
  2015-12-11 13:11 ` [PATCH 10/13] drm/i915: Updated request structure tracing John.C.Harrison
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
callback from the hardware which will update the state directly. In
the case of requests, this is the seqno update interrupt. The idea is
that this callback will only be enabled on demand when something
actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback
scheme. Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke
me' list when a new seqno pops out and signals any matching
fence/request. The fence is then removed from the list so the entire
request stack does not need to be scanned every time. Note that the
fence is added to the list before the commands to generate the seqno
interrupt are added to the ring. Thus the sequence is guaranteed to be
race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is called). Thus there is still a potential race when
enabling the interrupt as the request may already have completed.
However, this is simply solved by calling the interrupt processing
code immediately after enabling the interrupt and thereby checking for
already completed requests.

Lastly, the ring clean up code has the possibility to cancel
outstanding requests (e.g. because TDR has reset the ring). These
requests will never get signalled and so must be removed from the
signal list manually. This is done by setting a 'cancelled' flag and
then calling the regular notify/retire code path rather than
attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the
cancellation request might occur after/during the completion interrupt
actually arriving.

v2: Updated to take advantage of the request unreference no longer
requiring the mutex lock.

v3: Move the signal list processing around to prevent unsubmitted
requests being added to the list. This was occurring on Android
because the native sync implementation calls the
fence->enable_signalling API immediately on fence creation.

Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
'link' instead of 'list'. Added support for returning an error code on
a cancelled fence. Update list processing to be more efficient/safer
with respect to spinlocks.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  10 ++
 drivers/gpu/drm/i915/i915_gem.c         | 188 ++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_irq.c         |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 197 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fbf591f..d013c6d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2187,7 +2187,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
 	/** Underlying object for implementing the signal/wait stuff. */
 	struct fence fence;
+	struct list_head signal_link;
+	struct list_head unsignal_link;
 	struct list_head delayed_free_link;
+	bool cancelled;
+	bool irq_enabled;
+	bool signal_requested;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2265,6 +2270,11 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+void i915_gem_request_submit(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked);
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
+
 int i915_create_fence_timeline(struct drm_device *dev,
 			       struct intel_context *ctx,
 			       struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f6c3e96..f71215f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1165,6 +1165,8 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 	timeout = jiffies + 1;
 	while (!need_resched()) {
+		i915_gem_request_notify(req->ring, false);
+
 		if (i915_gem_request_completed(req))
 			return 0;
 
@@ -1173,6 +1175,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 		cpu_relax_lowlatency();
 	}
+
+	i915_gem_request_notify(req->ring, false);
+
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -1214,9 +1219,14 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
-	if (list_empty(&req->list))
+	if (i915_gem_request_completed(req))
 		return 0;
 
+	/*
+	 * Enable interrupt completion of the request.
+	 */
+	fence_enable_sw_signaling(&req->fence);
+
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -1377,6 +1387,19 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	list_del_init(&request->list);
 	i915_gem_request_remove_from_client(request);
 
+	/* In case the request is still in the signal pending list */
+	if (!list_empty(&request->signal_link)) {
+		/*
+		 * The request must be marked as cancelled and the underlying
+		 * fence as both failed. NB: There is no explicit fence fail
+		 * API, there is only a manual poke and signal.
+		 */
+		request->cancelled = true;
+		/* How to propagate to any associated sync_fence??? */
+		request->fence.status = -EIO;
+		fence_signal_locked(&request->fence);
+	}
+
 	i915_gem_request_unreference(request);
 }
 
@@ -2535,6 +2558,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ringbuf);
 
+	/*
+	 * Add the fence to the pending list before emitting the commands to
+	 * generate a seqno notification interrupt.
+	 */
+	i915_gem_request_submit(request);
+
 	if (i915.enable_execlists)
 		ret = ring->emit_request(request);
 	else {
@@ -2653,25 +2682,135 @@ static void i915_gem_request_free(struct drm_i915_gem_request *req)
 		i915_gem_context_unreference(ctx);
 	}
 
+	if (req->irq_enabled)
+		req->ring->irq_put(req->ring);
+
 	kmem_cache_free(req->i915->requests, req);
 }
 
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request is about to be submitted to the hardware so add the fence to
+ * the list of signalable fences.
+ *
+ * NB: This does not necessarily enable interrupts yet. That only occurs on
+ * demand when the request is actually waited on. However, adding it to the
+ * list early ensures that there is no race condition where the interrupt
+ * could pop out prematurely and thus be completely lost. The race is merely
+ * that the interrupt must be manually checked for after being enabled.
+ */
+void i915_gem_request_submit(struct drm_i915_gem_request *req)
 {
-	/* Interrupt driven fences are not implemented yet.*/
-	WARN(true, "This should not be called!");
-	return true;
+	unsigned long flags;
+
+	/*
+	 * Always enable signal processing for the request's fence object
+	 * before that request is submitted to the hardware. Thus there is no
+	 * race condition whereby the interrupt could pop out before the
+	 * request has been added to the signal list. Hence no need to check
+	 * for completion, undo the list add and return false.
+	 */
+	i915_gem_request_reference(req);
+	spin_lock_irqsave(&req->ring->fence_lock, flags);
+	WARN_ON(!list_empty(&req->signal_link));
+	list_add_tail(&req->signal_link, &req->ring->fence_signal_list);
+	spin_unlock_irqrestore(&req->ring->fence_lock, flags);
+
+	/*
+	 * NB: Interrupts are only enabled on demand. Thus there is still a
+	 * race where the request could complete before the interrupt has
+	 * been enabled. Thus care must be taken at that point.
+	 */
+
+	 /* Have interrupts already been requested? */
+	 if (req->signal_requested)
+		i915_gem_request_enable_interrupt(req, false);
+}
+
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked)
+{
+	if (req->irq_enabled)
+		return;
+
+	WARN_ON(!req->ring->irq_get(req->ring));
+	req->irq_enabled = true;
+
+	/*
+	 * Because the interrupt is only enabled on demand, there is a race
+	 * where the interrupt can fire before anyone is looking for it. So
+	 * do an explicit check for missed interrupts.
+	 */
+	i915_gem_request_notify(req->ring, fence_locked);
 }
 
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+
+	/*
+	 * No need to actually enable interrupt based processing until the
+	 * request has been submitted to the hardware. At which point
+	 * 'i915_gem_request_submit()' is called. So only really enable
+	 * signalling in there. Just set a flag to say that interrupts are
+	 * wanted when the request is eventually submitted. On the other hand
+	 * if the request has already been submitted then interrupts do need
+	 * to be enabled now.
+	 */
+
+	req->signal_requested = true;
+
+	if (!list_empty(&req->signal_link))
+		i915_gem_request_enable_interrupt(req, true);
+
+	return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
+{
+	struct drm_i915_gem_request *req, *req_next;
+	unsigned long flags;
 	u32 seqno;
 
-	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+	if (list_empty(&ring->fence_signal_list))
+		return;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&ring->fence_lock, flags);
+
+	seqno = ring->get_seqno(ring, false);
+
+	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
+		if (!req->cancelled) {
+			if (!i915_seqno_passed(seqno, req->seqno))
+				break;
+		}
 
-	return i915_seqno_passed(seqno, req->seqno);
+		/*
+		 * Start by removing the fence from the signal list otherwise
+		 * the retire code can run concurrently and get confused.
+		 */
+		list_del_init(&req->signal_link);
+
+		if (!req->cancelled) {
+			fence_signal_locked(&req->fence);
+		}
+
+		if (req->irq_enabled) {
+			req->ring->irq_put(req->ring);
+			req->irq_enabled = false;
+		}
+
+		/* Can't unreference here because that might grab fence_lock */
+		list_add_tail(&req->unsignal_link, &ring->fence_unsignal_list);
+	}
+
+	if (!fence_locked)
+		spin_unlock_irqrestore(&ring->fence_lock, flags);
 }
 
 static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
@@ -2711,7 +2850,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence, char *str,
 
 static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
-	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_release,
 	.get_driver_name	= i915_gem_request_get_driver_name,
@@ -2794,6 +2932,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
+	INIT_LIST_HEAD(&req->signal_link);
 	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
 		   ctx->engine[ring->id].fence_timeline.fence_context,
 		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
@@ -2831,6 +2970,11 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 {
 	intel_ring_reserved_space_cancel(req->ringbuf);
 
+	req->cancelled = true;
+	/* How to propagate to any associated sync_fence??? */
+	req->fence.status = -EINVAL;
+	fence_signal_locked(&req->fence);
+
 	i915_gem_request_unreference(req);
 }
 
@@ -2924,6 +3068,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_request_retire(request);
 	}
 
+	/*
+	 * Tidy up anything left over. This includes a call to
+	 * i915_gem_request_notify() which will make sure that any requests
+	 * that were on the signal pending list get also cleaned up.
+	 */
+	i915_gem_retire_requests_ring(ring);
+
 	/* Having flushed all requests from all queues, we know that all
 	 * ringbuffers must now be empty. However, since we do not reclaim
 	 * all space when retiring the request (to prevent HEADs colliding
@@ -2969,9 +3120,17 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req, *req_next;
 	LIST_HEAD(list_head);
+	unsigned long flags;
 
 	WARN_ON(i915_verify_lists(ring->dev));
 
+	/*
+	 * If no-one has waited on a request recently then interrupts will
+	 * not have been enabled and thus no requests will ever be marked as
+	 * completed. So do an interrupt check now.
+	 */
+	i915_gem_request_notify(ring, false);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -3013,6 +3172,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
 
+	/* Tidy up any requests that were recently signalled */
+	spin_lock_irqsave(&ring->fence_lock, flags);
+	list_splice_init(&ring->fence_unsignal_list, &list_head);
+	spin_unlock_irqrestore(&ring->fence_lock, flags);
+	list_for_each_entry_safe(req, req_next, &list_head, unsignal_link) {
+		list_del(&req->unsignal_link);
+		i915_gem_request_unreference(req);
+	}
+
 	/* Really free any requests that were recently unreferenced */
 	spin_lock(&ring->delayed_free_lock);
 	list_splice_init(&ring->delayed_free_list, &list_head);
@@ -5064,6 +5232,8 @@ init_ring_lists(struct intel_engine_cs *ring)
 {
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 68b094b..74f8552 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -981,6 +981,8 @@ static void notify_ring(struct intel_engine_cs *ring)
 
 	trace_i915_gem_request_notify(ring);
 
+	i915_gem_request_notify(ring, false);
+
 	wake_up_all(&ring->irq_queue);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 06a398a..76fc245 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1920,6 +1920,8 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
 	spin_lock_init(&ring->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e5573e7..1dec252 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2158,6 +2158,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	INIT_LIST_HEAD(&ring->buffers);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
 	spin_lock_init(&ring->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 77384ed..9d09edb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -354,6 +354,8 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
 	spinlock_t fence_lock;
+	struct list_head fence_signal_list;
+	struct list_head fence_unsignal_list;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 10/13] drm/i915: Updated request structure tracing
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (8 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 09/13] drm/i915: Interrupt driven fences John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-11 13:11 ` [PATCH 11/13] android/sync: Fix reversed sense of signaled fence John.C.Harrison
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.

v3: Added the current ring seqno to the notify trace point.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   |  6 +++++-
 drivers/gpu/drm/i915/i915_irq.c   |  2 --
 drivers/gpu/drm/i915/i915_trace.h | 13 ++++++++-----
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f71215f..4817015 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2776,13 +2776,16 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
 	unsigned long flags;
 	u32 seqno;
 
-	if (list_empty(&ring->fence_signal_list))
+	if (list_empty(&ring->fence_signal_list)) {
+		trace_i915_gem_request_notify(ring, 0);
 		return;
+	}
 
 	if (!fence_locked)
 		spin_lock_irqsave(&ring->fence_lock, flags);
 
 	seqno = ring->get_seqno(ring, false);
+	trace_i915_gem_request_notify(ring, seqno);
 
 	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -2798,6 +2801,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
 
 		if (!req->cancelled) {
 			fence_signal_locked(&req->fence);
+			trace_i915_gem_request_complete(req);
 		}
 
 		if (req->irq_enabled) {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 74f8552..d280e05 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -979,8 +979,6 @@ static void notify_ring(struct intel_engine_cs *ring)
 	if (!intel_ring_initialized(ring))
 		return;
 
-	trace_i915_gem_request_notify(ring);
-
 	i915_gem_request_notify(ring, false);
 
 	wake_up_all(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 04fe849..41a026d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -561,23 +561,26 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 );
 
 TRACE_EVENT(i915_gem_request_notify,
-	    TP_PROTO(struct intel_engine_cs *ring),
-	    TP_ARGS(ring),
+	    TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
+	    TP_ARGS(ring, seqno),
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(bool, is_empty)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->seqno = ring->get_seqno(ring, false);
+			   __entry->seqno = seqno;
+			   __entry->is_empty = list_empty(&ring->fence_signal_list);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+		      __entry->dev, __entry->ring, __entry->seqno,
+		      __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 11/13] android/sync: Fix reversed sense of signaled fence
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (9 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 10/13] drm/i915: Updated request structure tracing John.C.Harrison
@ 2015-12-11 13:11 ` John.C.Harrison
  2015-12-11 15:57   ` Tvrtko Ursulin
  2015-12-11 13:12 ` [PATCH 12/13] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:11 UTC (permalink / raw)
  To: Intel-GFX

From: Peter Lawthers <peter.lawthers@intel.com>

In the 3.14 kernel, a signaled fence was indicated by the status field
== 1. In 4.x, a status == 0 indicates signaled, status < 0 indicates error,
and status > 0 indicates active.

This patch wraps the check for a signaled fence in a function so that
callers no longer needs to know the underlying implementation.

v3: New patch for series.

Change-Id: I8e565e49683e3efeb9474656cd84cf4add6ad6a2
Tracked-On: https://jira01.devtools.intel.com/browse/ACD-308
Signed-off-by: Peter Lawthers <peter.lawthers@intel.com>
---
 drivers/android/sync.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/android/sync.h b/drivers/android/sync.h
index d57fa0a..75532d8 100644
--- a/drivers/android/sync.h
+++ b/drivers/android/sync.h
@@ -345,6 +345,27 @@ int sync_fence_cancel_async(struct sync_fence *fence,
  */
 int sync_fence_wait(struct sync_fence *fence, long timeout);
 
+/**
+ * sync_fence_is_signaled() - Return an indication if the fence is signaled
+ * @fence:	fence to check
+ *
+ * returns 1 if fence is signaled
+ * returns 0 if fence is not signaled
+ * returns < 0 if fence is in error state
+ */
+static inline int
+sync_fence_is_signaled(struct sync_fence *fence)
+{
+	int status;
+
+	status = atomic_read(&fence->status);
+	if (status == 0)
+		return 1;
+	if (status > 0)
+		return 0;
+	return status;
+}
+
 #ifdef CONFIG_DEBUG_FS
 
 void sync_timeline_debug_add(struct sync_timeline *obj);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 12/13] drm/i915: Add sync framework support to execbuff IOCTL
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (10 preceding siblings ...)
  2015-12-11 13:11 ` [PATCH 11/13] android/sync: Fix reversed sense of signaled fence John.C.Harrison
@ 2015-12-11 13:12 ` John.C.Harrison
  2015-12-11 15:29   ` Tvrtko Ursulin
  2015-12-11 13:12 ` [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
  13 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:12 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Various projects desire a mechanism for managing dependencies between
work items asynchronously. This can also include work items across
complete different and independent systems. For example, an
application wants to retreive a frame from a video in device,
using it for rendering on a GPU then send it to the video out device
for display all without having to stall waiting for completion along
the way. The sync framework allows this. It encapsulates
synchronisation events in file descriptors. The application can
request a sync point for the completion of each piece of work. Drivers
should also take sync points in with each new work request and not
schedule the work to start until the sync has been signalled.

This patch adds sync framework support to the exec buffer IOCTL. A
sync point can be passed in to stall execution of the batch buffer
until signalled. And a sync point can be returned after each batch
buffer submission which will be signalled upon that batch buffer's
completion.

At present, the input sync point is simply waited on synchronously
inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
this will be handled asynchronously inside the scheduler and the IOCTL
can return without having to wait.

Note also that the scheduler will re-order the execution of batch
buffers, e.g. because a batch buffer is stalled on a sync point and
cannot be submitted yet but other, independent, batch buffers are
being presented to the driver. This means that the timeline within the
sync points returned cannot be global to the engine. Instead they must
be kept per context per engine (the scheduler may not re-order batches
within a context). Hence the timeline cannot be based on the existing
seqno values but must be a new implementation.

This patch is a port of work by several people that has been pulled
across from Android. It has been updated several times across several
patches. Rather than attempt to port each individual patch, this
version is the finished product as a single patch. The various
contributors/authors along the way (in addition to myself) were:
  Satyanantha RamaGopal M <rama.gopal.m.satyanantha@intel.com>
  Tvrtko Ursulin <tvrtko.ursulin@intel.com>
  Michel Thierry <michel.thierry@intel.com>
  Arun Siluvery <arun.siluvery@linux.intel.com>

v2: New patch in series.

v3: Updated to use the new 'sync_fence_is_signaled' API rather than
having to know about the internal meaning of the 'fence::status' field
(which recently got inverted!) [work by Peter Lawthers].

Updated after review comments by Daniel Vetter. Removed '#ifdef
CONFIG_SYNC' and add 'select SYNC' to the Kconfig instead. Moved the
fd installation of fences to the end of the execbuff call to in order
to remove the need to use 'sys_close' to clean up on failure.

Updated after review comments by Tvrtko Ursulin. Remvoed the
'fence_external' flag as redundant. Covnerted DRM_ERRORs to
DRM_DEBUGs. Changed one second wait to a wait forever when waiting on
incoming fences.

v4: Re-instated missing return of fd to user land that somehow got
lost in the anti-sys_close() re-factor.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Peter Lawthers <peter.lawthers@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/i915/Kconfig               |  3 +
 drivers/gpu/drm/i915/i915_drv.h            |  6 ++
 drivers/gpu/drm/i915/i915_gem.c            | 89 +++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 95 ++++++++++++++++++++++++++++--
 include/uapi/drm/i915_drm.h                | 16 ++++-
 5 files changed, 200 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 1d96fe1..cb5d5b2 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -22,6 +22,9 @@ config DRM_I915
 	select ACPI_VIDEO if ACPI
 	select ACPI_BUTTON if ACPI
 	select MMU_NOTIFIER
+	# ANDROID is required for SYNC
+	select ANDROID
+	select SYNC
 	help
 	  Choose this option if you have a system that has "Intel Graphics
 	  Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d013c6d..194bca0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2278,6 +2278,12 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
 int i915_create_fence_timeline(struct drm_device *dev,
 			       struct intel_context *ctx,
 			       struct intel_engine_cs *ring);
+struct sync_fence;
+int i915_create_sync_fence(struct drm_i915_gem_request *req,
+			   struct sync_fence **sync_fence, int *fence_fd);
+void i915_install_sync_fence_fd(struct drm_i915_gem_request *req,
+				struct sync_fence *sync_fence, int fence_fd);
+bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct sync_fence *fence);
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4817015..279d79f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -37,6 +37,7 @@
 #include <linux/swap.h>
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
+#include <../drivers/android/sync.h>
 
 #define RQ_BUG_ON(expr)
 
@@ -2560,7 +2561,13 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 
 	/*
 	 * Add the fence to the pending list before emitting the commands to
-	 * generate a seqno notification interrupt.
+	 * generate a seqno notification interrupt. This will also enable
+	 * interrupts if 'signal_requested' has been set.
+	 *
+	 * For example, if an exported sync point has been requested for this
+	 * request then it can be waited on without the driver's knowledge,
+	 * i.e. without calling __i915_wait_request(). Thus interrupts must
+	 * be enabled from the start rather than only on demand.
 	 */
 	i915_gem_request_submit(request);
 
@@ -2901,6 +2908,86 @@ static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *t
 	return seqno;
 }
 
+int i915_create_sync_fence(struct drm_i915_gem_request *req,
+			   struct sync_fence **sync_fence, int *fence_fd)
+{
+	char ring_name[] = "i915_ring0";
+	int fd;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		DRM_DEBUG("No available file descriptors!\n");
+		*fence_fd = -1;
+		return fd;
+	}
+
+	ring_name[9] += req->ring->id;
+	*sync_fence = sync_fence_create_dma(ring_name, &req->fence);
+	if (!*sync_fence) {
+		put_unused_fd(fd);
+		*fence_fd = -1;
+		return -ENOMEM;
+	}
+
+	*fence_fd = fd;
+
+	return 0;
+}
+
+void i915_install_sync_fence_fd(struct drm_i915_gem_request *req,
+				struct sync_fence *sync_fence, int fence_fd)
+{
+	sync_fence_install(sync_fence, fence_fd);
+
+	/*
+	 * NB: The corresponding put happens automatically on file close
+	 * from sync_fence_release() via the fops callback.
+	 */
+	fence_get(&req->fence);
+
+	/*
+	 * The sync framework adds a callback to the fence. The fence
+	 * framework calls 'enable_signalling' when a callback is added.
+	 * Thus this flag should have been set by now. If not then
+	 * 'enable_signalling' must be called explicitly because exporting
+	 * a fence to user land means it can be waited on asynchronously and
+	 * thus must be signalled asynchronously.
+	 */
+	WARN_ON(!req->signal_requested);
+}
+
+bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct sync_fence *sync_fence)
+{
+	struct fence *dma_fence;
+	struct drm_i915_gem_request *req;
+	int i;
+
+	if (sync_fence_is_signaled(sync_fence))
+		return true;
+
+	for(i = 0; i < sync_fence->num_fences; i++) {
+		dma_fence = sync_fence->cbs[i].sync_pt;
+
+		/* No need to worry about dead points: */
+		if (fence_is_signaled(dma_fence))
+			continue;
+
+		/* Can't ignore other people's points: */
+		if(dma_fence->ops != &i915_gem_request_fops)
+			return false;
+
+		req = container_of(dma_fence, typeof(*req), fence);
+
+		/* Can't ignore points on other rings: */
+		if (req->ring != ring)
+			return false;
+
+		/* Same ring means guaranteed to be in order so ignore it. */
+	}
+
+	return true;
+}
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bfc4c17..5f629f8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -26,6 +26,7 @@
  *
  */
 
+#include <linux/syscalls.h>
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
@@ -33,6 +34,7 @@
 #include "intel_drv.h"
 #include <linux/dma_remapping.h>
 #include <linux/uaccess.h>
+#include <../drivers/android/sync.h>
 
 #define  __EXEC_OBJECT_HAS_PIN (1<<31)
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
@@ -1322,6 +1324,38 @@ eb_get_batch(struct eb_vmas *eb)
 	return vma->obj;
 }
 
+static int i915_early_fence_wait(struct intel_engine_cs *ring, int fence_fd)
+{
+	struct sync_fence *fence;
+	int ret = 0;
+
+	if (fence_fd < 0) {
+		DRM_DEBUG("Invalid wait fence fd %d on ring %d\n", fence_fd,
+			  (int) ring->id);
+		return 1;
+	}
+
+	fence = sync_fence_fdget(fence_fd);
+	if (fence == NULL) {
+		DRM_DEBUG("Invalid wait fence %d on ring %d\n", fence_fd,
+			  (int) ring->id);
+		return 1;
+	}
+
+	if (!sync_fence_is_signaled(fence)) {
+		/*
+		 * Wait forever for the fence to be signalled. This is safe
+		 * because the the mutex lock has not yet been acquired and
+		 * the wait is interruptible.
+		 */
+		if (!i915_safe_to_ignore_fence(ring, fence))
+			ret = sync_fence_wait(fence, -1);
+	}
+
+	sync_fence_put(fence);
+	return ret;
+}
+
 static int
 i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct drm_file *file,
@@ -1341,6 +1375,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	u32 dispatch_flags;
 	int ret;
 	bool need_relocs;
+	int fd_fence_complete = -1;
+	int fd_fence_wait = lower_32_bits(args->rsvd2);
+	struct sync_fence *sync_fence;
+
+	/*
+	 * Make sure an broken fence handle is not returned no matter
+	 * how early an error might be hit. Note that rsvd2 has to be
+	 * saved away first because it is also an input parameter!
+	 */
+	if (args->flags & I915_EXEC_CREATE_FENCE)
+		args->rsvd2 = (__u64) -1;
 
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
@@ -1424,6 +1469,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		dispatch_flags |= I915_DISPATCH_RS;
 	}
 
+	/*
+	 * Without a GPU scheduler, any fence waits must be done up front.
+	 */
+	if (args->flags & I915_EXEC_WAIT_FENCE) {
+		ret = i915_early_fence_wait(ring, fd_fence_wait);
+		if (ret < 0)
+			return ret;
+
+		args->flags &= ~I915_EXEC_WAIT_FENCE;
+	}
+
 	intel_runtime_pm_get(dev_priv);
 
 	ret = i915_mutex_lock_interruptible(dev);
@@ -1571,8 +1627,41 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->batch_obj               = batch_obj;
 	params->ctx                     = ctx;
 
+	if (args->flags & I915_EXEC_CREATE_FENCE) {
+		/*
+		 * Caller has requested a sync fence.
+		 * User interrupts will be enabled to make sure that
+		 * the timeline is signalled on completion.
+		 */
+		ret = i915_create_sync_fence(params->request, &sync_fence,
+					     &fd_fence_complete);
+		if (ret) {
+			DRM_ERROR("Fence creation failed for ring %d, ctx %p\n",
+				  ring->id, ctx);
+			goto err_batch_unpin;
+		}
+	}
+
 	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
 
+	if (fd_fence_complete != -1) {
+		if (ret) {
+			sync_fence_put(sync_fence);
+			put_unused_fd(fd_fence_complete);
+		} else {
+			/*
+			 * Install the fence into the pre-allocated file
+			 * descriptor to the fence object so that user land
+			 * can wait on it...
+			 */
+			i915_install_sync_fence_fd(params->request,
+						   sync_fence, fd_fence_complete);
+
+			/* Return the fence through the rsvd2 field */
+			args->rsvd2 = (__u64) fd_fence_complete;
+		}
+	}
+
 err_batch_unpin:
 	/*
 	 * FIXME: We crucially rely upon the active tracking for the (ppgtt)
@@ -1602,6 +1691,7 @@ pre_mutex_err:
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
 	intel_runtime_pm_put(dev_priv);
+
 	return ret;
 }
 
@@ -1707,11 +1797,6 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	if (args->rsvd2 != 0) {
-		DRM_DEBUG("dirty rvsd2 field\n");
-		return -EINVAL;
-	}
-
 	exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
 			     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	if (exec2_list == NULL)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 67cebe6..86f7921 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -250,7 +250,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_HWS_ADDR		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_HWS_ADDR, struct drm_i915_gem_init)
 #define DRM_IOCTL_I915_GEM_INIT		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_INIT, struct drm_i915_gem_init)
 #define DRM_IOCTL_I915_GEM_EXECBUFFER	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER, struct drm_i915_gem_execbuffer)
-#define DRM_IOCTL_I915_GEM_EXECBUFFER2	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER2	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
 #define DRM_IOCTL_I915_GEM_PIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
 #define DRM_IOCTL_I915_GEM_UNPIN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
 #define DRM_IOCTL_I915_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
@@ -695,7 +695,7 @@ struct drm_i915_gem_exec_object2 {
 	__u64 flags;
 
 	__u64 rsvd1;
-	__u64 rsvd2;
+	__u64 rsvd2;	/* Used for fence fd */
 };
 
 struct drm_i915_gem_execbuffer2 {
@@ -776,7 +776,17 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_RESOURCE_STREAMER     (1<<15)
 
-#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_RESOURCE_STREAMER<<1)
+/** Caller supplies a sync fence fd in the rsvd2 field.
+ * Wait for it to be signalled before starting the work
+ */
+#define I915_EXEC_WAIT_FENCE		(1<<16)
+
+/** Caller wants a sync fence fd for this execbuffer.
+ *  It will be returned in rsvd2
+ */
+#define I915_EXEC_CREATE_FENCE		(1<<17)
+
+#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_CREATE_FENCE<<1)
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (11 preceding siblings ...)
  2015-12-11 13:12 ` [PATCH 12/13] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
@ 2015-12-11 13:12 ` John.C.Harrison
  2015-12-11 14:28   ` Tvrtko Ursulin
  2015-12-11 14:55   ` Chris Wilson
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
  13 siblings, 2 replies; 74+ messages in thread
From: John.C.Harrison @ 2015-12-11 13:12 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
without the ring's seqno value changing. This patch reduces the
overhead of these extra calls by caching the last processed seqno
value and early exiting if it has not changed.

v3: New patch for series.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 14 +++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 279d79f..3c88678 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2457,6 +2457,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 
 		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
 			ring->semaphore.sync_seqno[j] = 0;
+
+		ring->last_irq_seqno = 0;
 	}
 
 	return 0;
@@ -2788,11 +2790,14 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
 		return;
 	}
 
-	if (!fence_locked)
-		spin_lock_irqsave(&ring->fence_lock, flags);
-
 	seqno = ring->get_seqno(ring, false);
 	trace_i915_gem_request_notify(ring, seqno);
+	if (seqno == ring->last_irq_seqno)
+		return;
+	ring->last_irq_seqno = seqno;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&ring->fence_lock, flags);
 
 	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -3163,7 +3168,10 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	 * Tidy up anything left over. This includes a call to
 	 * i915_gem_request_notify() which will make sure that any requests
 	 * that were on the signal pending list get also cleaned up.
+	 * NB: The seqno cache must be cleared otherwise the notify call will
+	 * simply return immediately.
 	 */
+	ring->last_irq_seqno = 0;
 	i915_gem_retire_requests_ring(ring);
 
 	/* Having flushed all requests from all queues, we know that all
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9d09edb..1987abd 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -356,6 +356,7 @@ struct  intel_engine_cs {
 	spinlock_t fence_lock;
 	struct list_head fence_signal_list;
 	struct list_head fence_unsignal_list;
+	uint32_t last_irq_seqno;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2015-12-11 13:12 ` [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2015-12-11 14:28   ` Tvrtko Ursulin
  2015-12-14 11:58     ` John Harrison
  2015-12-11 14:55   ` Chris Wilson
  1 sibling, 1 reply; 74+ messages in thread
From: Tvrtko Ursulin @ 2015-12-11 14:28 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX



On 11/12/15 13:12, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The notify function can be called many times without the seqno
> changing. A large number of duplicates are to prevent races due to the
> requirement of not enabling interrupts until requested. However, when
> interrupts are enabled the IRQ handle can be called multiple times
> without the ring's seqno value changing. This patch reduces the
> overhead of these extra calls by caching the last processed seqno
> value and early exiting if it has not changed.
>
> v3: New patch for series.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c         | 14 +++++++++++---
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>   2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 279d79f..3c88678 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2457,6 +2457,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
>
>   		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
>   			ring->semaphore.sync_seqno[j] = 0;
> +
> +		ring->last_irq_seqno = 0;
>   	}
>
>   	return 0;
> @@ -2788,11 +2790,14 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
>   		return;
>   	}
>
> -	if (!fence_locked)
> -		spin_lock_irqsave(&ring->fence_lock, flags);
> -
>   	seqno = ring->get_seqno(ring, false);
>   	trace_i915_gem_request_notify(ring, seqno);
> +	if (seqno == ring->last_irq_seqno)
> +		return;
> +	ring->last_irq_seqno = seqno;

Hmmm.. do you want to make the check "seqno <= ring->last_irq_seqno" ?

Is there a possibility for some weird timing or caching issue where two 
callers get in and last_irq_seqno goes backwards? Not sure that it would 
cause a problem, but pattern is unusual and hard to understand for me.

Also check and the assignment would need to be under the spinlock I think.

> +
> +	if (!fence_locked)
> +		spin_lock_irqsave(&ring->fence_lock, flags);
>
>   	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
>   		if (!req->cancelled) {
> @@ -3163,7 +3168,10 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>   	 * Tidy up anything left over. This includes a call to
>   	 * i915_gem_request_notify() which will make sure that any requests
>   	 * that were on the signal pending list get also cleaned up.
> +	 * NB: The seqno cache must be cleared otherwise the notify call will
> +	 * simply return immediately.
>   	 */
> +	ring->last_irq_seqno = 0;
>   	i915_gem_retire_requests_ring(ring);
>
>   	/* Having flushed all requests from all queues, we know that all
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 9d09edb..1987abd 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -356,6 +356,7 @@ struct  intel_engine_cs {
>   	spinlock_t fence_lock;
>   	struct list_head fence_signal_list;
>   	struct list_head fence_unsignal_list;
> +	uint32_t last_irq_seqno;
>   };
>
>   bool intel_ring_initialized(struct intel_engine_cs *ring);
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2015-12-11 13:12 ` [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
  2015-12-11 14:28   ` Tvrtko Ursulin
@ 2015-12-11 14:55   ` Chris Wilson
  2015-12-11 15:35     ` John Harrison
  1 sibling, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2015-12-11 14:55 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Dec 11, 2015 at 01:12:01PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The notify function can be called many times without the seqno
> changing. A large number of duplicates are to prevent races due to the
> requirement of not enabling interrupts until requested. However, when
> interrupts are enabled the IRQ handle can be called multiple times
> without the ring's seqno value changing. This patch reduces the
> overhead of these extra calls by caching the last processed seqno
> value and early exiting if it has not changed.

This is just plain wrong. Every user-interrupt is preceded by a seqno
update.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 12/13] drm/i915: Add sync framework support to execbuff IOCTL
  2015-12-11 13:12 ` [PATCH 12/13] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
@ 2015-12-11 15:29   ` Tvrtko Ursulin
  2015-12-14 11:46     ` John Harrison
  0 siblings, 1 reply; 74+ messages in thread
From: Tvrtko Ursulin @ 2015-12-11 15:29 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX



On 11/12/15 13:12, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> Various projects desire a mechanism for managing dependencies between
> work items asynchronously. This can also include work items across
> complete different and independent systems. For example, an
> application wants to retreive a frame from a video in device,
> using it for rendering on a GPU then send it to the video out device
> for display all without having to stall waiting for completion along
> the way. The sync framework allows this. It encapsulates
> synchronisation events in file descriptors. The application can
> request a sync point for the completion of each piece of work. Drivers
> should also take sync points in with each new work request and not
> schedule the work to start until the sync has been signalled.
>
> This patch adds sync framework support to the exec buffer IOCTL. A
> sync point can be passed in to stall execution of the batch buffer
> until signalled. And a sync point can be returned after each batch
> buffer submission which will be signalled upon that batch buffer's
> completion.
>
> At present, the input sync point is simply waited on synchronously
> inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
> this will be handled asynchronously inside the scheduler and the IOCTL
> can return without having to wait.
>
> Note also that the scheduler will re-order the execution of batch
> buffers, e.g. because a batch buffer is stalled on a sync point and
> cannot be submitted yet but other, independent, batch buffers are
> being presented to the driver. This means that the timeline within the
> sync points returned cannot be global to the engine. Instead they must
> be kept per context per engine (the scheduler may not re-order batches
> within a context). Hence the timeline cannot be based on the existing
> seqno values but must be a new implementation.
>
> This patch is a port of work by several people that has been pulled
> across from Android. It has been updated several times across several
> patches. Rather than attempt to port each individual patch, this
> version is the finished product as a single patch. The various
> contributors/authors along the way (in addition to myself) were:
>    Satyanantha RamaGopal M <rama.gopal.m.satyanantha@intel.com>
>    Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>    Michel Thierry <michel.thierry@intel.com>
>    Arun Siluvery <arun.siluvery@linux.intel.com>
>
> v2: New patch in series.
>
> v3: Updated to use the new 'sync_fence_is_signaled' API rather than
> having to know about the internal meaning of the 'fence::status' field
> (which recently got inverted!) [work by Peter Lawthers].
>
> Updated after review comments by Daniel Vetter. Removed '#ifdef
> CONFIG_SYNC' and add 'select SYNC' to the Kconfig instead. Moved the
> fd installation of fences to the end of the execbuff call to in order
> to remove the need to use 'sys_close' to clean up on failure.
>
> Updated after review comments by Tvrtko Ursulin. Remvoed the
> 'fence_external' flag as redundant. Covnerted DRM_ERRORs to
> DRM_DEBUGs. Changed one second wait to a wait forever when waiting on
> incoming fences.
>
> v4: Re-instated missing return of fd to user land that somehow got
> lost in the anti-sys_close() re-factor.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Peter Lawthers <peter.lawthers@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> ---
>   drivers/gpu/drm/i915/Kconfig               |  3 +
>   drivers/gpu/drm/i915/i915_drv.h            |  6 ++
>   drivers/gpu/drm/i915/i915_gem.c            | 89 +++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 95 ++++++++++++++++++++++++++++--
>   include/uapi/drm/i915_drm.h                | 16 ++++-
>   5 files changed, 200 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index 1d96fe1..cb5d5b2 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -22,6 +22,9 @@ config DRM_I915
>   	select ACPI_VIDEO if ACPI
>   	select ACPI_BUTTON if ACPI
>   	select MMU_NOTIFIER

select MMU_NOTIFIER is not upstream! :)

> +	# ANDROID is required for SYNC
> +	select ANDROID
> +	select SYNC
>   	help
>   	  Choose this option if you have a system that has "Intel Graphics
>   	  Media Accelerator" or "HD Graphics" integrated graphics,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d013c6d..194bca0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2278,6 +2278,12 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>   int i915_create_fence_timeline(struct drm_device *dev,
>   			       struct intel_context *ctx,
>   			       struct intel_engine_cs *ring);
> +struct sync_fence;
> +int i915_create_sync_fence(struct drm_i915_gem_request *req,
> +			   struct sync_fence **sync_fence, int *fence_fd);
> +void i915_install_sync_fence_fd(struct drm_i915_gem_request *req,
> +				struct sync_fence *sync_fence, int fence_fd);
> +bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct sync_fence *fence);
>
>   static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 4817015..279d79f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -37,6 +37,7 @@
>   #include <linux/swap.h>
>   #include <linux/pci.h>
>   #include <linux/dma-buf.h>
> +#include <../drivers/android/sync.h>
>
>   #define RQ_BUG_ON(expr)
>
> @@ -2560,7 +2561,13 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>
>   	/*
>   	 * Add the fence to the pending list before emitting the commands to
> -	 * generate a seqno notification interrupt.
> +	 * generate a seqno notification interrupt. This will also enable
> +	 * interrupts if 'signal_requested' has been set.
> +	 *
> +	 * For example, if an exported sync point has been requested for this
> +	 * request then it can be waited on without the driver's knowledge,
> +	 * i.e. without calling __i915_wait_request(). Thus interrupts must
> +	 * be enabled from the start rather than only on demand.
>   	 */
>   	i915_gem_request_submit(request);
>
> @@ -2901,6 +2908,86 @@ static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *t
>   	return seqno;
>   }
>
> +int i915_create_sync_fence(struct drm_i915_gem_request *req,
> +			   struct sync_fence **sync_fence, int *fence_fd)
> +{
> +	char ring_name[] = "i915_ring0";
> +	int fd;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0) {
> +		DRM_DEBUG("No available file descriptors!\n");
> +		*fence_fd = -1;
> +		return fd;
> +	}
> +
> +	ring_name[9] += req->ring->id;

I think this will possibly blow up if CONFIG_DEBUG_RODATA is set, which 
is the case on most kernels.

So I think you need to make a local copy with kstrdup and free it after 
calling sync_fence_create_dma.

> +	*sync_fence = sync_fence_create_dma(ring_name, &req->fence);
> +	if (!*sync_fence) {
> +		put_unused_fd(fd);
> +		*fence_fd = -1;
> +		return -ENOMEM;
> +	}
> +
> +	*fence_fd = fd;
> +
> +	return 0;
> +}
> +
> +void i915_install_sync_fence_fd(struct drm_i915_gem_request *req,
> +				struct sync_fence *sync_fence, int fence_fd)
> +{
> +	sync_fence_install(sync_fence, fence_fd);
> +
> +	/*
> +	 * NB: The corresponding put happens automatically on file close
> +	 * from sync_fence_release() via the fops callback.
> +	 */
> +	fence_get(&req->fence);
> +
> +	/*
> +	 * The sync framework adds a callback to the fence. The fence
> +	 * framework calls 'enable_signalling' when a callback is added.
> +	 * Thus this flag should have been set by now. If not then
> +	 * 'enable_signalling' must be called explicitly because exporting
> +	 * a fence to user land means it can be waited on asynchronously and
> +	 * thus must be signalled asynchronously.
> +	 */
> +	WARN_ON(!req->signal_requested);
> +}
> +
> +bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct sync_fence *sync_fence)
> +{
> +	struct fence *dma_fence;
> +	struct drm_i915_gem_request *req;
> +	int i;
> +
> +	if (sync_fence_is_signaled(sync_fence))
> +		return true;
> +
> +	for(i = 0; i < sync_fence->num_fences; i++) {
> +		dma_fence = sync_fence->cbs[i].sync_pt;
> +
> +		/* No need to worry about dead points: */
> +		if (fence_is_signaled(dma_fence))
> +			continue;
> +
> +		/* Can't ignore other people's points: */

Maybe add "unsignaled" to qualify.

> +		if(dma_fence->ops != &i915_gem_request_fops)
> +			return false;
> +
> +		req = container_of(dma_fence, typeof(*req), fence);
> +
> +		/* Can't ignore points on other rings: */
> +		if (req->ring != ring)
> +			return false;
> +
> +		/* Same ring means guaranteed to be in order so ignore it. */
> +	}
> +
> +	return true;
> +}
> +
>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct intel_context *ctx,
>   			   struct drm_i915_gem_request **req_out)
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index bfc4c17..5f629f8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -26,6 +26,7 @@
>    *
>    */
>
> +#include <linux/syscalls.h>
>   #include <drm/drmP.h>
>   #include <drm/i915_drm.h>
>   #include "i915_drv.h"
> @@ -33,6 +34,7 @@
>   #include "intel_drv.h"
>   #include <linux/dma_remapping.h>
>   #include <linux/uaccess.h>
> +#include <../drivers/android/sync.h>
>
>   #define  __EXEC_OBJECT_HAS_PIN (1<<31)
>   #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
> @@ -1322,6 +1324,38 @@ eb_get_batch(struct eb_vmas *eb)
>   	return vma->obj;
>   }
>
> +static int i915_early_fence_wait(struct intel_engine_cs *ring, int fence_fd)
> +{
> +	struct sync_fence *fence;
> +	int ret = 0;
> +
> +	if (fence_fd < 0) {
> +		DRM_DEBUG("Invalid wait fence fd %d on ring %d\n", fence_fd,
> +			  (int) ring->id);
> +		return 1;

Suggest adding kerneldoc describing return values from this function.

It wasn't immediately clear to me what one means.

But I am also not sure that invalid fd shouldn't be an outright error 
instead of allowing execbuf to contiue.

> +	}
> +
> +	fence = sync_fence_fdget(fence_fd);
> +	if (fence == NULL) {
> +		DRM_DEBUG("Invalid wait fence %d on ring %d\n", fence_fd,
> +			  (int) ring->id);
> +		return 1;
> +	}
> +
> +	if (!sync_fence_is_signaled(fence)) {

Minor comment, but i915_safe_to_ignore_fence checks this as well so you 
could remove it here.

> +		/*
> +		 * Wait forever for the fence to be signalled. This is safe
> +		 * because the the mutex lock has not yet been acquired and
> +		 * the wait is interruptible.
> +		 */
> +		if (!i915_safe_to_ignore_fence(ring, fence))
> +			ret = sync_fence_wait(fence, -1);
> +	}
> +
> +	sync_fence_put(fence);
> +	return ret;
> +}
> +
>   static int
>   i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   		       struct drm_file *file,
> @@ -1341,6 +1375,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	u32 dispatch_flags;
>   	int ret;
>   	bool need_relocs;
> +	int fd_fence_complete = -1;
> +	int fd_fence_wait = lower_32_bits(args->rsvd2);
> +	struct sync_fence *sync_fence;
> +
> +	/*
> +	 * Make sure an broken fence handle is not returned no matter
> +	 * how early an error might be hit. Note that rsvd2 has to be
> +	 * saved away first because it is also an input parameter!
> +	 */

Instead of the 2nd sentence maybe say something like "Note that we have 
saved rsvd2 already for later use since it is also in input parameter!". 
Like written I was expecting the code following the comment to do that, 
and then was confused when it didn't. Or maybe my attention span is too 
short.

> +	if (args->flags & I915_EXEC_CREATE_FENCE)
> +		args->rsvd2 = (__u64) -1;
>
>   	if (!i915_gem_check_execbuffer(args))
>   		return -EINVAL;
> @@ -1424,6 +1469,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   		dispatch_flags |= I915_DISPATCH_RS;
>   	}
>
> +	/*
> +	 * Without a GPU scheduler, any fence waits must be done up front.
> +	 */
> +	if (args->flags & I915_EXEC_WAIT_FENCE) {
> +		ret = i915_early_fence_wait(ring, fd_fence_wait);
> +		if (ret < 0)
> +			return ret;
> +
> +		args->flags &= ~I915_EXEC_WAIT_FENCE;
> +	}
> +
>   	intel_runtime_pm_get(dev_priv);
>
>   	ret = i915_mutex_lock_interruptible(dev);
> @@ -1571,8 +1627,41 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	params->batch_obj               = batch_obj;
>   	params->ctx                     = ctx;
>
> +	if (args->flags & I915_EXEC_CREATE_FENCE) {
> +		/*
> +		 * Caller has requested a sync fence.
> +		 * User interrupts will be enabled to make sure that
> +		 * the timeline is signalled on completion.
> +		 */

Is it signaled or signalled? There is a lot of usage of both throughout 
the patches and I as a non-native speaker am amu^H^H^Hconfused. ;)

> +		ret = i915_create_sync_fence(params->request, &sync_fence,
> +					     &fd_fence_complete);
> +		if (ret) {
> +			DRM_ERROR("Fence creation failed for ring %d, ctx %p\n",
> +				  ring->id, ctx);
> +			goto err_batch_unpin;
> +		}
> +	}
> +
>   	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
>
> +	if (fd_fence_complete != -1) {
> +		if (ret) {
> +			sync_fence_put(sync_fence);
> +			put_unused_fd(fd_fence_complete);
> +		} else {
> +			/*
> +			 * Install the fence into the pre-allocated file
> +			 * descriptor to the fence object so that user land
> +			 * can wait on it...
> +			 */
> +			i915_install_sync_fence_fd(params->request,
> +						   sync_fence, fd_fence_complete);
> +
> +			/* Return the fence through the rsvd2 field */
> +			args->rsvd2 = (__u64) fd_fence_complete;
> +		}
> +	}
> +
>   err_batch_unpin:
>   	/*
>   	 * FIXME: We crucially rely upon the active tracking for the (ppgtt)
> @@ -1602,6 +1691,7 @@ pre_mutex_err:
>   	/* intel_gpu_busy should also get a ref, so it will free when the device
>   	 * is really idle. */
>   	intel_runtime_pm_put(dev_priv);
> +
>   	return ret;
>   }
>
> @@ -1707,11 +1797,6 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
>   		return -EINVAL;
>   	}
>
> -	if (args->rsvd2 != 0) {
> -		DRM_DEBUG("dirty rvsd2 field\n");
> -		return -EINVAL;
> -	}
> -
>   	exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
>   			     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
>   	if (exec2_list == NULL)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 67cebe6..86f7921 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -250,7 +250,7 @@ typedef struct _drm_i915_sarea {
>   #define DRM_IOCTL_I915_HWS_ADDR		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_HWS_ADDR, struct drm_i915_gem_init)
>   #define DRM_IOCTL_I915_GEM_INIT		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_INIT, struct drm_i915_gem_init)
>   #define DRM_IOCTL_I915_GEM_EXECBUFFER	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER, struct drm_i915_gem_execbuffer)
> -#define DRM_IOCTL_I915_GEM_EXECBUFFER2	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER2	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
>   #define DRM_IOCTL_I915_GEM_PIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
>   #define DRM_IOCTL_I915_GEM_UNPIN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
>   #define DRM_IOCTL_I915_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
> @@ -695,7 +695,7 @@ struct drm_i915_gem_exec_object2 {
>   	__u64 flags;
>
>   	__u64 rsvd1;
> -	__u64 rsvd2;
> +	__u64 rsvd2;	/* Used for fence fd */
>   };
>
>   struct drm_i915_gem_execbuffer2 {
> @@ -776,7 +776,17 @@ struct drm_i915_gem_execbuffer2 {
>    */
>   #define I915_EXEC_RESOURCE_STREAMER     (1<<15)
>
> -#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_RESOURCE_STREAMER<<1)
> +/** Caller supplies a sync fence fd in the rsvd2 field.
> + * Wait for it to be signalled before starting the work
> + */
> +#define I915_EXEC_WAIT_FENCE		(1<<16)
> +
> +/** Caller wants a sync fence fd for this execbuffer.
> + *  It will be returned in rsvd2
> + */
> +#define I915_EXEC_CREATE_FENCE		(1<<17)
> +
> +#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_CREATE_FENCE<<1)
>
>   #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
>   #define i915_execbuffer2_set_context_id(eb2, context) \
>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 09/13] drm/i915: Interrupt driven fences
  2015-12-11 13:11 ` [PATCH 09/13] drm/i915: Interrupt driven fences John.C.Harrison
@ 2015-12-11 15:30   ` John Harrison
  2015-12-11 16:07     ` Tvrtko Ursulin
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2015-12-11 15:30 UTC (permalink / raw)
  To: Intel-GFX

Reply moved from earlier patch set which has now been superceeded by 
this set...

On 11/12/2015 12:17, Tvrtko Ursulin wrote:
>
> Hi,
>
> Some random comments, mostly from the point of view of solving the 
> thundering herd problem.
>
> On 23/11/15 11:34, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The intended usage model for struct fence is that the signalled status
>> should be set on demand rather than polled. That is, there should not
>> be a need for a 'signaled' function to be called everytime the status
>> is queried. Instead, 'something' should be done to enable a signal
>> callback from the hardware which will update the state directly. In
>> the case of requests, this is the seqno update interrupt. The idea is
>> that this callback will only be enabled on demand when something
>> actually tries to wait on the fence.
>>
>> This change removes the polling test and replaces it with the callback
>> scheme. Each fence is added to a 'please poke me' list at the start of
>> i915_add_request(). The interrupt handler then scans through the 'poke
>> me' list when a new seqno pops out and signals any matching
>> fence/request. The fence is then removed from the list so the entire
>> request stack does not need to be scanned every time. Note that the
>> fence is added to the list before the commands to generate the seqno
>> interrupt are added to the ring. Thus the sequence is guaranteed to be
>> race free if the interrupt is already enabled.
>>
>> Note that the interrupt is only enabled on demand (i.e. when
>> __wait_request() is called). Thus there is still a potential race when
>> enabling the interrupt as the request may already have completed.
>> However, this is simply solved by calling the interrupt processing
>> code immediately after enabling the interrupt and thereby checking for
>> already completed requests.
>>
>> Lastly, the ring clean up code has the possibility to cancel
>> outstanding requests (e.g. because TDR has reset the ring). These
>> requests will never get signalled and so must be removed from the
>> signal list manually. This is done by setting a 'cancelled' flag and
>> then calling the regular notify/retire code path rather than
>> attempting to duplicate the list manipulatation and clean up code in
>> multiple places. This also avoid any race condition where the
>> cancellation request might occur after/during the completion interrupt
>> actually arriving.
>>
>> v2: Updated to take advantage of the request unreference no longer
>> requiring the mutex lock.
>>
>> v3: Move the signal list processing around to prevent unsubmitted
>> requests being added to the list. This was occurring on Android
>> because the native sync implementation calls the
>> fence->enable_signalling API immediately on fence creation.
>>
>> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
>> 'link' instead of 'list'. Added support for returning an error code on
>> a cancelled fence. Update list processing to be more efficient/safer
>> with respect to spinlocks.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         |  10 ++
>>   drivers/gpu/drm/i915/i915_gem.c         | 187 
>> ++++++++++++++++++++++++++++++--
>>   drivers/gpu/drm/i915/i915_irq.c         |   2 +
>>   drivers/gpu/drm/i915/intel_lrc.c        |   2 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>>   6 files changed, 196 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index fbf591f..d013c6d 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2187,7 +2187,12 @@ void i915_gem_track_fb(struct 
>> drm_i915_gem_object *old,
>>   struct drm_i915_gem_request {
>>       /** Underlying object for implementing the signal/wait stuff. */
>>       struct fence fence;
>> +    struct list_head signal_link;
>> +    struct list_head unsignal_link;
>>       struct list_head delayed_free_link;
>> +    bool cancelled;
>> +    bool irq_enabled;
>> +    bool signal_requested;
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2265,6 +2270,11 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>                  struct drm_i915_gem_request **req_out);
>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>>
>> +void i915_gem_request_submit(struct drm_i915_gem_request *req);
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request 
>> *req,
>> +                       bool fence_locked);
>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool 
>> fence_locked);
>> +
>>   int i915_create_fence_timeline(struct drm_device *dev,
>>                      struct intel_context *ctx,
>>                      struct intel_engine_cs *ring);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 171ae5f..2a0b346 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1165,6 +1165,8 @@ static int __i915_spin_request(struct 
>> drm_i915_gem_request *req)
>>
>>       timeout = jiffies + 1;
>>       while (!need_resched()) {
>> +        i915_gem_request_notify(req->ring, false);
>> +
>
> This looks is a bit heavyweight, hammering on the spinlock and 
> interrupts on-off, why it is required to call this here instead of 
> relying on interrupts which have been enabled already?

It isn't required. It was is just an attempt to implement the super-fast 
spin wait rather than waiting for the full interrupt latency. Chris W 
seems to think polling for completion for a brief period before going to 
sleep is a significant performance improvement. With interrupt based 
completion then the spin will require running the interrupt processing 
code on each iteration otherwise you are still waiting for whatever 
latency is in the interrupt path itself.

>
>>           if (i915_gem_request_completed(req))
>>               return 0;
>>
>> @@ -1173,6 +1175,9 @@ static int __i915_spin_request(struct 
>> drm_i915_gem_request *req)
>>
>>           cpu_relax_lowlatency();
>>       }
>> +
>> +    i915_gem_request_notify(req->ring, false);
>> +
>>       if (i915_gem_request_completed(req))
>>           return 0;
>>
>> @@ -1214,9 +1219,14 @@ int __i915_wait_request(struct 
>> drm_i915_gem_request *req,
>>
>>       WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>>
>> -    if (list_empty(&req->list))
>> +    if (i915_gem_request_completed(req))
>>           return 0;
>>
>> +    /*
>> +     * Enable interrupt completion of the request.
>> +     */
>> +    fence_enable_sw_signaling(&req->fence);
>> +
>
> There is duplicated user interrupt handling in this function now. Code 
> which does irq get/put directly is still there, but I think it should 
> be removed.
Not sure exactly what you think is duplicated? The fence function just 
calls back to the i915 code which sets stuff up for interrupt signalling 
(if it hasn't already been done). If the fence is already set up then 
the call is a no-op.

>
>>       if (i915_gem_request_completed(req))
>>           return 0;
>>
>> @@ -1377,6 +1387,19 @@ static void i915_gem_request_retire(struct 
>> drm_i915_gem_request *request)
>>       list_del_init(&request->list);
>>       i915_gem_request_remove_from_client(request);
>>
>> +    /* In case the request is still in the signal pending list */
>
> When would this happen? (comment hint)

TDR, pre-emption, anything else which can abort a request/batch buffer 
before it has completed. Will update the comment.

>
>> +    if (!list_empty(&request->signal_link)) {
>> +        /*
>> +         * The request must be marked as cancelled and the underlying
>> +         * fence as both failed. NB: There is no explicit fence fail
>> +         * API, there is only a manual poke and signal.
>> +         */
>> +        request->cancelled = true;
>> +        /* How to propagate to any associated sync_fence??? */
>> +        request->fence.status = -EIO;
>> +        fence_signal_locked(&request->fence);
>
> Does the cancelled request has to stay on the signal list? If it could 
> be moved outside of the irq hot list it might be beneficial for list 
> length and simplifying the code in i915_gem_request_notify.
The point of leaving it on the list is to get all the other processing 
without duplicating any code. E.g. later on the scheduler hooks into the 
notify function as it needs to know when each request has completed.


>
>> +    }
>> +
>>       i915_gem_request_unreference(request);
>>   }
>>
>> @@ -2535,6 +2558,12 @@ void __i915_add_request(struct 
>> drm_i915_gem_request *request,
>>        */
>>       request->postfix = intel_ring_get_tail(ringbuf);
>>
>> +    /*
>> +     * Add the fence to the pending list before emitting the 
>> commands to
>> +     * generate a seqno notification interrupt.
>> +     */
>> +    i915_gem_request_submit(request);
>> +
>>       if (i915.enable_execlists)
>>           ret = ring->emit_request(request);
>>       else {
>> @@ -2654,25 +2683,135 @@ static void i915_gem_request_free(struct 
>> drm_i915_gem_request *req)
>>           i915_gem_context_unreference(ctx);
>>       }
>>
>> +    if (req->irq_enabled)
>> +        req->ring->irq_put(req->ring);
>> +
>>       kmem_cache_free(req->i915->requests, req);
>>   }
>>
>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +/*
>> + * The request is about to be submitted to the hardware so add the 
>> fence to
>> + * the list of signalable fences.
>> + *
>> + * NB: This does not necessarily enable interrupts yet. That only 
>> occurs on
>> + * demand when the request is actually waited on. However, adding it 
>> to the
>> + * list early ensures that there is no race condition where the 
>> interrupt
>> + * could pop out prematurely and thus be completely lost. The race 
>> is merely
>> + * that the interrupt must be manually checked for after being enabled.
>> + */
>> +void i915_gem_request_submit(struct drm_i915_gem_request *req)
>>   {
>> -    /* Interrupt driven fences are not implemented yet.*/
>> -    WARN(true, "This should not be called!");
>> -    return true;
>> +    unsigned long flags;
>> +
>> +    /*
>> +     * Always enable signal processing for the request's fence object
>> +     * before that request is submitted to the hardware. Thus there 
>> is no
>> +     * race condition whereby the interrupt could pop out before the
>> +     * request has been added to the signal list. Hence no need to 
>> check
>> +     * for completion, undo the list add and return false.
>> +     */
>> +    i915_gem_request_reference(req);
>> +    spin_lock_irqsave(&req->ring->fence_lock, flags);
>
> This only gets called from non-irq context so could use spin_lock_irq.
The fence_lock is acquired from within i915_gem_request_notify() which 
is called within the IRQ handler and therefore at IRQ context.

>
>> + WARN_ON(!list_empty(&req->signal_link));
>> +    list_add_tail(&req->signal_link, &req->ring->fence_signal_list);
>> +    spin_unlock_irqrestore(&req->ring->fence_lock, flags);
>> +
>> +    /*
>> +     * NB: Interrupts are only enabled on demand. Thus there is still a
>> +     * race where the request could complete before the interrupt has
>> +     * been enabled. Thus care must be taken at that point.
>> +     */
>> +
>> +     /* Have interrupts already been requested? */
>> +     if (req->signal_requested)
>
> Some code path can enable signaling before the execbuf completed?
Once the scheduler has arrived then a wait request call can be made long 
before the request has been submitted to the hardware. Likewise if a 
user land fence is requested for the batch buffer, that is allocated 
before execution begins and the internals of the fence code 
automatically enables signalling.

>> + i915_gem_request_enable_interrupt(req, false);
>> +}
>> +
>> +/*
>> + * The request is being actively waited on, so enable interrupt based
>> + * completion signalling.
>> + */
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request 
>> *req,
>> +                       bool fence_locked)
>> +{
>> +    if (req->irq_enabled)
>> +        return;
>> +
>> +    WARN_ON(!req->ring->irq_get(req->ring));
>> +    req->irq_enabled = true;
>
> Probably shouldn't set the flag if irq_get failed not to unbalance the 
> irq refcount.
Will add an if around the WARN.

>
>> +
>> +    /*
>> +     * Because the interrupt is only enabled on demand, there is a race
>> +     * where the interrupt can fire before anyone is looking for it. So
>> +     * do an explicit check for missed interrupts.
>> +     */
>> +    i915_gem_request_notify(req->ring, fence_locked);
>>   }
>>
>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>   {
>>       struct drm_i915_gem_request *req = container_of(req_fence,
>>                            typeof(*req), fence);
>> +
>> +    /*
>> +     * No need to actually enable interrupt based processing until the
>> +     * request has been submitted to the hardware. At which point
>> +     * 'i915_gem_request_submit()' is called. So only really enable
>> +     * signalling in there. Just set a flag to say that interrupts are
>> +     * wanted when the request is eventually submitted. On the other 
>> hand
>> +     * if the request has already been submitted then interrupts do 
>> need
>> +     * to be enabled now.
>> +     */
>> +
>> +    req->signal_requested = true;
>> +
>> +    if (!list_empty(&req->signal_link))
>> +        i915_gem_request_enable_interrupt(req, true);
>> +
>> +    return true;
>> +}
>> +
>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool 
>> fence_locked)
>
> I think name could be improve since this made me expect a request in 
> the parameter list.
>
> Maybe i915_gem_notify_requests ?
The idea behind the naming is that all the request code is prefixed with 
'i915_gem_request_'. This is the notify function of that set of interfaces.

>
>> +{
>> +    struct drm_i915_gem_request *req, *req_next;
>> +    unsigned long flags;
>>       u32 seqno;
>>
>> -    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>> +    if (list_empty(&ring->fence_signal_list))
>> +        return;
>> +
>> +    if (!fence_locked)
>> +        spin_lock_irqsave(&ring->fence_lock, flags);
>> +
>> +    seqno = ring->get_seqno(ring, false);
>
> Could read the seqno outside the spinlock, although not that important 
> since the primary caller of this is (or should be) the irq handler.
That fix is already done in patch #13 of the new series.

>
>> +
>> +    list_for_each_entry_safe(req, req_next, 
>> &ring->fence_signal_list, signal_link) {
>> +        if (!req->cancelled) {
>> +            if (!i915_seqno_passed(seqno, req->seqno))
>> +                break;
>> +        }
>> +
>> +        /*
>> +         * Start by removing the fence from the signal list otherwise
>> +         * the retire code can run concurrently and get confused.
>> +         */
>> +        list_del_init(&req->signal_link);
>> +
>> +        if (!req->cancelled) {
>> +            fence_signal_locked(&req->fence);
>> +        }
>> +
>> +        if (req->irq_enabled) {
>> +            req->ring->irq_put(req->ring);
>> +            req->irq_enabled = false;
>> +        }
>> +
>> +        /* Can't unreference here because that might grab fence_lock */
>> +        list_add_tail(&req->unsignal_link, &ring->fence_unsignal_list);
>> +    }
>>
>> -    return i915_seqno_passed(seqno, req->seqno);
>> +    if (!fence_locked)
>> +        spin_unlock_irqrestore(&ring->fence_lock, flags);
>>   }
>>
>>   static const char *i915_gem_request_get_driver_name(struct fence 
>> *req_fence)
>> @@ -2712,7 +2851,6 @@ static void 
>> i915_gem_request_fence_value_str(struct fence *req_fence, char *str,
>>
>>   static const struct fence_ops i915_gem_request_fops = {
>>       .enable_signaling    = i915_gem_request_enable_signaling,
>> -    .signaled        = i915_gem_request_is_completed,
>>       .wait            = fence_default_wait,
>>       .release        = i915_gem_request_release,
>>       .get_driver_name    = i915_gem_request_get_driver_name,
>> @@ -2795,6 +2933,7 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>           goto err;
>>       }
>>
>> +    INIT_LIST_HEAD(&req->signal_link);
>>       fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
>> ctx->engine[ring->id].fence_timeline.fence_context,
>> i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
>> @@ -2832,6 +2971,11 @@ void i915_gem_request_cancel(struct 
>> drm_i915_gem_request *req)
>>   {
>>       intel_ring_reserved_space_cancel(req->ringbuf);
>>
>> +    req->cancelled = true;
>> +    /* How to propagate to any associated sync_fence??? */
>> +    req->fence.status = -EINVAL;
>> +    fence_signal_locked(&req->fence);
>
> Same question from before, could you just move it to unsignaled list 
> straight away? (And drop interrupts?)
Again, it seemed a much simpler driver if the request completion code is 
only kept inside the notify function and not replicated to multiple places.

>
>> +
>>       i915_gem_request_unreference(req);
>>   }
>>
>> @@ -2925,6 +3069,13 @@ static void i915_gem_reset_ring_cleanup(struct 
>> drm_i915_private *dev_priv,
>>           i915_gem_request_retire(request);
>>       }
>>
>> +    /*
>> +     * Tidy up anything left over. This includes a call to
>> +     * i915_gem_request_notify() which will make sure that any requests
>> +     * that were on the signal pending list get also cleaned up.
>> +     */
>> +    i915_gem_retire_requests_ring(ring);
>> +
>>       /* Having flushed all requests from all queues, we know that all
>>        * ringbuffers must now be empty. However, since we do not reclaim
>>        * all space when retiring the request (to prevent HEADs colliding
>> @@ -2974,6 +3125,13 @@ i915_gem_retire_requests_ring(struct 
>> intel_engine_cs *ring)
>>
>>       WARN_ON(i915_verify_lists(ring->dev));
>>
>> +    /*
>> +     * If no-one has waited on a request recently then interrupts will
>> +     * not have been enabled and thus no requests will ever be 
>> marked as
>> +     * completed. So do an interrupt check now.
>> +     */
>> +    i915_gem_request_notify(ring, false);
>> +
>>       /* Retire requests first as we use it above for the early return.
>>        * If we retire requests last, we may use a later seqno and so 
>> clear
>>        * the requests lists without clearing the active list, leading to
>> @@ -3015,6 +3173,15 @@ i915_gem_retire_requests_ring(struct 
>> intel_engine_cs *ring)
>>           i915_gem_request_assign(&ring->trace_irq_req, NULL);
>>       }
>>
>> +    /* Tidy up any requests that were recently signalled */
>> +    spin_lock_irqsave(&ring->fence_lock, flags);
>
> Could also use spin_lock_irq here I think.
>
>> + list_splice_init(&ring->fence_unsignal_list, &list_head);
>> +    spin_unlock_irqrestore(&ring->fence_lock, flags);
>> +    list_for_each_entry_safe(req, req_next, &list_head, 
>> unsignal_link) {
>> +        list_del(&req->unsignal_link);
>> +        i915_gem_request_unreference(req);
>> +    }
>> +
>>       /* Really free any requests that were recently unreferenced */
>>       spin_lock_irqsave(&ring->delayed_free_lock, flags);
>>       list_splice_init(&ring->delayed_free_list, &list_head);
>> @@ -5066,6 +5233,8 @@ init_ring_lists(struct intel_engine_cs *ring)
>>   {
>>       INIT_LIST_HEAD(&ring->active_list);
>>       INIT_LIST_HEAD(&ring->request_list);
>> +    INIT_LIST_HEAD(&ring->fence_signal_list);
>> +    INIT_LIST_HEAD(&ring->fence_unsignal_list);
>>       INIT_LIST_HEAD(&ring->delayed_free_list);
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c 
>> b/drivers/gpu/drm/i915/i915_irq.c
>> index 68b094b..74f8552 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -981,6 +981,8 @@ static void notify_ring(struct intel_engine_cs 
>> *ring)
>>
>>       trace_i915_gem_request_notify(ring);
>>
>> +    i915_gem_request_notify(ring, false);
>> +
>>       wake_up_all(&ring->irq_queue);
>
> This is the big one - I think to solve the thundering herd problem 
> this patch should also include the removal of ring->irq_queue since I 
> don't think it needs to keep existing after it.
>
> For example adding a req->wait_queue on which __i915_wait_request 
> would wait instead and doing a wake_up_all on it from 
> i915_gem_request_notify?
>
> Because in this form, when I've ran this patch series and it caused 
> the number of context switches to go up more than six times on one 
> workload. Interrupts also went up but only by 50% in comparison and 
> time spent in irq handlers doubled.
>
> But context switches and 10% more cpu time look the most dramatic.
>
> So it would be interesting to see if simply moving the wait queue to 
> be per request could fix the majority of that or not.

This idea was posted one version of the thundering herd thread already. 
I haven't implemented it yet as I haven't had the time to look into 
exactly what is required. But yes, in theory the fence code should mean 
the irq_queue is obsolete. The waiters can simply register a callback on 
the request's underlying fence object. When the interrupt handler 
signals the fence, that fence will process it's callback list and wake 
the waiter. All nicely targeted and no-one gets woken unnecessarily.

>
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 06a398a..76fc245 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1920,6 +1920,8 @@ static int logical_ring_init(struct drm_device 
>> *dev, struct intel_engine_cs *rin
>>       ring->dev = dev;
>>       INIT_LIST_HEAD(&ring->active_list);
>>       INIT_LIST_HEAD(&ring->request_list);
>> +    INIT_LIST_HEAD(&ring->fence_signal_list);
>> +    INIT_LIST_HEAD(&ring->fence_unsignal_list);
>>       INIT_LIST_HEAD(&ring->delayed_free_list);
>>       spin_lock_init(&ring->fence_lock);
>>       spin_lock_init(&ring->delayed_free_lock);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index e5573e7..1dec252 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2158,6 +2158,8 @@ static int intel_init_ring_buffer(struct 
>> drm_device *dev,
>>       INIT_LIST_HEAD(&ring->request_list);
>>       INIT_LIST_HEAD(&ring->execlist_queue);
>>       INIT_LIST_HEAD(&ring->buffers);
>> +    INIT_LIST_HEAD(&ring->fence_signal_list);
>> +    INIT_LIST_HEAD(&ring->fence_unsignal_list);
>>       INIT_LIST_HEAD(&ring->delayed_free_list);
>>       spin_lock_init(&ring->fence_lock);
>>       spin_lock_init(&ring->delayed_free_lock);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 77384ed..9d09edb 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -354,6 +354,8 @@ struct  intel_engine_cs {
>>       u32 (*get_cmd_length_mask)(u32 cmd_header);
>>
>>       spinlock_t fence_lock;
>> +    struct list_head fence_signal_list;
>> +    struct list_head fence_unsignal_list;
>>   };
>>
>>   bool intel_ring_initialized(struct intel_engine_cs *ring);
>>
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2015-12-11 14:55   ` Chris Wilson
@ 2015-12-11 15:35     ` John Harrison
  2015-12-11 16:07       ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2015-12-11 15:35 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 11/12/2015 14:55, Chris Wilson wrote:
> On Fri, Dec 11, 2015 at 01:12:01PM +0000, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The notify function can be called many times without the seqno
>> changing. A large number of duplicates are to prevent races due to the
>> requirement of not enabling interrupts until requested. However, when
>> interrupts are enabled the IRQ handle can be called multiple times
>> without the ring's seqno value changing. This patch reduces the
>> overhead of these extra calls by caching the last processed seqno
>> value and early exiting if it has not changed.
> This is just plain wrong. Every user-interrupt is preceded by a seqno
> update.
Except that mutiple interrupts can be coalesced if they occur too close 
together. The driver's IRQ handler still gets called for each individual 
interrupt but the first time it is run it sees the seqno for the last. 
Thus all the processing gets done on the first invocation. The multiple 
subsequent invocations (I have seen up to four I believe) then have 
nothing to do.

> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/13] android/sync: Fix reversed sense of signaled fence
  2015-12-11 13:11 ` [PATCH 11/13] android/sync: Fix reversed sense of signaled fence John.C.Harrison
@ 2015-12-11 15:57   ` Tvrtko Ursulin
  2015-12-14 11:22     ` John Harrison
  0 siblings, 1 reply; 74+ messages in thread
From: Tvrtko Ursulin @ 2015-12-11 15:57 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


On 11/12/15 13:11, John.C.Harrison@Intel.com wrote:
> From: Peter Lawthers <peter.lawthers@intel.com>
>
> In the 3.14 kernel, a signaled fence was indicated by the status field
> == 1. In 4.x, a status == 0 indicates signaled, status < 0 indicates error,
> and status > 0 indicates active.
>
> This patch wraps the check for a signaled fence in a function so that
> callers no longer needs to know the underlying implementation.
>
> v3: New patch for series.
>
> Change-Id: I8e565e49683e3efeb9474656cd84cf4add6ad6a2
> Tracked-On: https://jira01.devtools.intel.com/browse/ACD-308
> Signed-off-by: Peter Lawthers <peter.lawthers@intel.com>
> ---
>   drivers/android/sync.h | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
>
> diff --git a/drivers/android/sync.h b/drivers/android/sync.h
> index d57fa0a..75532d8 100644
> --- a/drivers/android/sync.h
> +++ b/drivers/android/sync.h
> @@ -345,6 +345,27 @@ int sync_fence_cancel_async(struct sync_fence *fence,
>    */
>   int sync_fence_wait(struct sync_fence *fence, long timeout);
>
> +/**
> + * sync_fence_is_signaled() - Return an indication if the fence is signaled
> + * @fence:	fence to check
> + *
> + * returns 1 if fence is signaled
> + * returns 0 if fence is not signaled
> + * returns < 0 if fence is in error state
> + */
> +static inline int
> +sync_fence_is_signaled(struct sync_fence *fence)
> +{
> +	int status;
> +
> +	status = atomic_read(&fence->status);
> +	if (status == 0)
> +		return 1;
> +	if (status > 0)
> +		return 0;
> +	return status;
> +}

Not so important but could simply return bool, like "return status <= 
0"? Since it is called "is_signaled" and it is only used in boolean mode 
in future patches.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2015-12-11 15:35     ` John Harrison
@ 2015-12-11 16:07       ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2015-12-11 16:07 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Fri, Dec 11, 2015 at 03:35:54PM +0000, John Harrison wrote:
> On 11/12/2015 14:55, Chris Wilson wrote:
> >On Fri, Dec 11, 2015 at 01:12:01PM +0000, John.C.Harrison@Intel.com wrote:
> >>From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >>The notify function can be called many times without the seqno
> >>changing. A large number of duplicates are to prevent races due to the
> >>requirement of not enabling interrupts until requested. However, when
> >>interrupts are enabled the IRQ handle can be called multiple times
> >>without the ring's seqno value changing. This patch reduces the
> >>overhead of these extra calls by caching the last processed seqno
> >>value and early exiting if it has not changed.
> >This is just plain wrong. Every user-interrupt is preceded by a seqno
> >update.
> Except that mutiple interrupts can be coalesced if they occur too
> close together. The driver's IRQ handler still gets called for each
> individual interrupt but the first time it is run it sees the seqno
> for the last. Thus all the processing gets done on the first
> invocation. The multiple subsequent invocations (I have seen up to
> four I believe) then have nothing to do.

Yes. That is not what you implied above, or by talk about caching the
seqno -- which is already cached. There is a reason why we don't do this
in the interrupt handler and are not about to do so again.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 09/13] drm/i915: Interrupt driven fences
  2015-12-11 15:30   ` John Harrison
@ 2015-12-11 16:07     ` Tvrtko Ursulin
  0 siblings, 0 replies; 74+ messages in thread
From: Tvrtko Ursulin @ 2015-12-11 16:07 UTC (permalink / raw)
  To: John Harrison, Intel-GFX



On 11/12/15 15:30, John Harrison wrote:
> Reply moved from earlier patch set which has now been superceeded by
> this set...
>
> On 11/12/2015 12:17, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> Some random comments, mostly from the point of view of solving the
>> thundering herd problem.
>>
>> On 23/11/15 11:34, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The intended usage model for struct fence is that the signalled status
>>> should be set on demand rather than polled. That is, there should not
>>> be a need for a 'signaled' function to be called everytime the status
>>> is queried. Instead, 'something' should be done to enable a signal
>>> callback from the hardware which will update the state directly. In
>>> the case of requests, this is the seqno update interrupt. The idea is
>>> that this callback will only be enabled on demand when something
>>> actually tries to wait on the fence.
>>>
>>> This change removes the polling test and replaces it with the callback
>>> scheme. Each fence is added to a 'please poke me' list at the start of
>>> i915_add_request(). The interrupt handler then scans through the 'poke
>>> me' list when a new seqno pops out and signals any matching
>>> fence/request. The fence is then removed from the list so the entire
>>> request stack does not need to be scanned every time. Note that the
>>> fence is added to the list before the commands to generate the seqno
>>> interrupt are added to the ring. Thus the sequence is guaranteed to be
>>> race free if the interrupt is already enabled.
>>>
>>> Note that the interrupt is only enabled on demand (i.e. when
>>> __wait_request() is called). Thus there is still a potential race when
>>> enabling the interrupt as the request may already have completed.
>>> However, this is simply solved by calling the interrupt processing
>>> code immediately after enabling the interrupt and thereby checking for
>>> already completed requests.
>>>
>>> Lastly, the ring clean up code has the possibility to cancel
>>> outstanding requests (e.g. because TDR has reset the ring). These
>>> requests will never get signalled and so must be removed from the
>>> signal list manually. This is done by setting a 'cancelled' flag and
>>> then calling the regular notify/retire code path rather than
>>> attempting to duplicate the list manipulatation and clean up code in
>>> multiple places. This also avoid any race condition where the
>>> cancellation request might occur after/during the completion interrupt
>>> actually arriving.
>>>
>>> v2: Updated to take advantage of the request unreference no longer
>>> requiring the mutex lock.
>>>
>>> v3: Move the signal list processing around to prevent unsubmitted
>>> requests being added to the list. This was occurring on Android
>>> because the native sync implementation calls the
>>> fence->enable_signalling API immediately on fence creation.
>>>
>>> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
>>> 'link' instead of 'list'. Added support for returning an error code on
>>> a cancelled fence. Update list processing to be more efficient/safer
>>> with respect to spinlocks.
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         |  10 ++
>>>   drivers/gpu/drm/i915/i915_gem.c         | 187
>>> ++++++++++++++++++++++++++++++--
>>>   drivers/gpu/drm/i915/i915_irq.c         |   2 +
>>>   drivers/gpu/drm/i915/intel_lrc.c        |   2 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>>>   6 files changed, 196 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index fbf591f..d013c6d 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -2187,7 +2187,12 @@ void i915_gem_track_fb(struct
>>> drm_i915_gem_object *old,
>>>   struct drm_i915_gem_request {
>>>       /** Underlying object for implementing the signal/wait stuff. */
>>>       struct fence fence;
>>> +    struct list_head signal_link;
>>> +    struct list_head unsignal_link;
>>>       struct list_head delayed_free_link;
>>> +    bool cancelled;
>>> +    bool irq_enabled;
>>> +    bool signal_requested;
>>>
>>>       /** On Which ring this request was generated */
>>>       struct drm_i915_private *i915;
>>> @@ -2265,6 +2270,11 @@ int i915_gem_request_alloc(struct
>>> intel_engine_cs *ring,
>>>                  struct drm_i915_gem_request **req_out);
>>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>>>
>>> +void i915_gem_request_submit(struct drm_i915_gem_request *req);
>>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request
>>> *req,
>>> +                       bool fence_locked);
>>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool
>>> fence_locked);
>>> +
>>>   int i915_create_fence_timeline(struct drm_device *dev,
>>>                      struct intel_context *ctx,
>>>                      struct intel_engine_cs *ring);
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index 171ae5f..2a0b346 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1165,6 +1165,8 @@ static int __i915_spin_request(struct
>>> drm_i915_gem_request *req)
>>>
>>>       timeout = jiffies + 1;
>>>       while (!need_resched()) {
>>> +        i915_gem_request_notify(req->ring, false);
>>> +
>>
>> This looks is a bit heavyweight, hammering on the spinlock and
>> interrupts on-off, why it is required to call this here instead of
>> relying on interrupts which have been enabled already?
>
> It isn't required. It was is just an attempt to implement the super-fast
> spin wait rather than waiting for the full interrupt latency. Chris W
> seems to think polling for completion for a brief period before going to
> sleep is a significant performance improvement. With interrupt based
> completion then the spin will require running the interrupt processing
> code on each iteration otherwise you are still waiting for whatever
> latency is in the interrupt path itself.

It is fine to be removed then I think. Primary thing with busy waits is 
to avoid setup cost of turning on interrupts and scheduling latency of 
putting the waiter to sleep.

Your code already enabled user interrupts before the busy wait so you 
can spin on i915_gem_request_completed which will get signaled from the 
irq handler with little cost.

>>
>>>           if (i915_gem_request_completed(req))
>>>               return 0;
>>>
>>> @@ -1173,6 +1175,9 @@ static int __i915_spin_request(struct
>>> drm_i915_gem_request *req)
>>>
>>>           cpu_relax_lowlatency();
>>>       }
>>> +
>>> +    i915_gem_request_notify(req->ring, false);
>>> +
>>>       if (i915_gem_request_completed(req))
>>>           return 0;
>>>
>>> @@ -1214,9 +1219,14 @@ int __i915_wait_request(struct
>>> drm_i915_gem_request *req,
>>>
>>>       WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>>>
>>> -    if (list_empty(&req->list))
>>> +    if (i915_gem_request_completed(req))
>>>           return 0;
>>>
>>> +    /*
>>> +     * Enable interrupt completion of the request.
>>> +     */
>>> +    fence_enable_sw_signaling(&req->fence);
>>> +
>>
>> There is duplicated user interrupt handling in this function now. Code
>> which does irq get/put directly is still there, but I think it should
>> be removed.
> Not sure exactly what you think is duplicated? The fence function just
> calls back to the i915 code which sets stuff up for interrupt signalling
> (if it hasn't already been done). If the fence is already set up then
> the call is a no-op.

__i915_wait request does explicitly ring->irq_get and irq_put after you 
already enable them via fence_enable_sw_signaling. So it is redundant in 
that respect.

You also probably need to fix up the magical fault injection code which 
is dev_priv->gpu_error.test_irq_rings.

>
>>
>>>       if (i915_gem_request_completed(req))
>>>           return 0;
>>>
>>> @@ -1377,6 +1387,19 @@ static void i915_gem_request_retire(struct
>>> drm_i915_gem_request *request)
>>>       list_del_init(&request->list);
>>>       i915_gem_request_remove_from_client(request);
>>>
>>> +    /* In case the request is still in the signal pending list */
>>
>> When would this happen? (comment hint)
>
> TDR, pre-emption, anything else which can abort a request/batch buffer
> before it has completed. Will update the comment.
>
>>
>>> +    if (!list_empty(&request->signal_link)) {
>>> +        /*
>>> +         * The request must be marked as cancelled and the underlying
>>> +         * fence as both failed. NB: There is no explicit fence fail
>>> +         * API, there is only a manual poke and signal.
>>> +         */
>>> +        request->cancelled = true;
>>> +        /* How to propagate to any associated sync_fence??? */
>>> +        request->fence.status = -EIO;
>>> +        fence_signal_locked(&request->fence);
>>
>> Does the cancelled request has to stay on the signal list? If it could
>> be moved outside of the irq hot list it might be beneficial for list
>> length and simplifying the code in i915_gem_request_notify.
> The point of leaving it on the list is to get all the other processing
> without duplicating any code. E.g. later on the scheduler hooks into the
> notify function as it needs to know when each request has completed.

I was thinking along the lines to make the signal_list as short as 
possible so to do as little as possible in the expensive irq handler.

>
>>
>>> +    }
>>> +
>>>       i915_gem_request_unreference(request);
>>>   }
>>>
>>> @@ -2535,6 +2558,12 @@ void __i915_add_request(struct
>>> drm_i915_gem_request *request,
>>>        */
>>>       request->postfix = intel_ring_get_tail(ringbuf);
>>>
>>> +    /*
>>> +     * Add the fence to the pending list before emitting the
>>> commands to
>>> +     * generate a seqno notification interrupt.
>>> +     */
>>> +    i915_gem_request_submit(request);
>>> +
>>>       if (i915.enable_execlists)
>>>           ret = ring->emit_request(request);
>>>       else {
>>> @@ -2654,25 +2683,135 @@ static void i915_gem_request_free(struct
>>> drm_i915_gem_request *req)
>>>           i915_gem_context_unreference(ctx);
>>>       }
>>>
>>> +    if (req->irq_enabled)
>>> +        req->ring->irq_put(req->ring);
>>> +
>>>       kmem_cache_free(req->i915->requests, req);
>>>   }
>>>
>>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>> +/*
>>> + * The request is about to be submitted to the hardware so add the
>>> fence to
>>> + * the list of signalable fences.
>>> + *
>>> + * NB: This does not necessarily enable interrupts yet. That only
>>> occurs on
>>> + * demand when the request is actually waited on. However, adding it
>>> to the
>>> + * list early ensures that there is no race condition where the
>>> interrupt
>>> + * could pop out prematurely and thus be completely lost. The race
>>> is merely
>>> + * that the interrupt must be manually checked for after being enabled.
>>> + */
>>> +void i915_gem_request_submit(struct drm_i915_gem_request *req)
>>>   {
>>> -    /* Interrupt driven fences are not implemented yet.*/
>>> -    WARN(true, "This should not be called!");
>>> -    return true;
>>> +    unsigned long flags;
>>> +
>>> +    /*
>>> +     * Always enable signal processing for the request's fence object
>>> +     * before that request is submitted to the hardware. Thus there
>>> is no
>>> +     * race condition whereby the interrupt could pop out before the
>>> +     * request has been added to the signal list. Hence no need to
>>> check
>>> +     * for completion, undo the list add and return false.
>>> +     */
>>> +    i915_gem_request_reference(req);
>>> +    spin_lock_irqsave(&req->ring->fence_lock, flags);
>>
>> This only gets called from non-irq context so could use spin_lock_irq.
> The fence_lock is acquired from within i915_gem_request_notify() which
> is called within the IRQ handler and therefore at IRQ context.

Correct, fence_lock needs to be irq safe, but i915_gem_request_submit is 
only called from process context, no?

So you don't need to use spin_lock_irqsave but cheaper spin_lock_irq.

>>
>>> + WARN_ON(!list_empty(&req->signal_link));
>>> +    list_add_tail(&req->signal_link, &req->ring->fence_signal_list);
>>> +    spin_unlock_irqrestore(&req->ring->fence_lock, flags);
>>> +
>>> +    /*
>>> +     * NB: Interrupts are only enabled on demand. Thus there is still a
>>> +     * race where the request could complete before the interrupt has
>>> +     * been enabled. Thus care must be taken at that point.
>>> +     */
>>> +
>>> +     /* Have interrupts already been requested? */
>>> +     if (req->signal_requested)
>>
>> Some code path can enable signaling before the execbuf completed?
> Once the scheduler has arrived then a wait request call can be made long
> before the request has been submitted to the hardware. Likewise if a
> user land fence is requested for the batch buffer, that is allocated
> before execution begins and the internals of the fence code
> automatically enables signalling.

I forgot exactly what I meant there, but looked strange and still does. 
I suppose the question is why it is not sufficient to enable interrupts 
in enable_signaling?

>
>>> + i915_gem_request_enable_interrupt(req, false);
>>> +}
>>> +
>>> +/*
>>> + * The request is being actively waited on, so enable interrupt based
>>> + * completion signalling.
>>> + */
>>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request
>>> *req,
>>> +                       bool fence_locked)
>>> +{
>>> +    if (req->irq_enabled)
>>> +        return;
>>> +
>>> +    WARN_ON(!req->ring->irq_get(req->ring));
>>> +    req->irq_enabled = true;
>>
>> Probably shouldn't set the flag if irq_get failed not to unbalance the
>> irq refcount.
> Will add an if around the WARN.
>
>>
>>> +
>>> +    /*
>>> +     * Because the interrupt is only enabled on demand, there is a race
>>> +     * where the interrupt can fire before anyone is looking for it. So
>>> +     * do an explicit check for missed interrupts.
>>> +     */
>>> +    i915_gem_request_notify(req->ring, fence_locked);
>>>   }
>>>
>>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>>   {
>>>       struct drm_i915_gem_request *req = container_of(req_fence,
>>>                            typeof(*req), fence);
>>> +
>>> +    /*
>>> +     * No need to actually enable interrupt based processing until the
>>> +     * request has been submitted to the hardware. At which point
>>> +     * 'i915_gem_request_submit()' is called. So only really enable
>>> +     * signalling in there. Just set a flag to say that interrupts are
>>> +     * wanted when the request is eventually submitted. On the other
>>> hand
>>> +     * if the request has already been submitted then interrupts do
>>> need
>>> +     * to be enabled now.
>>> +     */
>>> +
>>> +    req->signal_requested = true;
>>> +
>>> +    if (!list_empty(&req->signal_link))
>>> +        i915_gem_request_enable_interrupt(req, true);
>>> +
>>> +    return true;
>>> +}
>>> +
>>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool
>>> fence_locked)
>>
>> I think name could be improve since this made me expect a request in
>> the parameter list.
>>
>> Maybe i915_gem_notify_requests ?
> The idea behind the naming is that all the request code is prefixed with
> 'i915_gem_request_'. This is the notify function of that set of interfaces.
>
>>
>>> +{
>>> +    struct drm_i915_gem_request *req, *req_next;
>>> +    unsigned long flags;
>>>       u32 seqno;
>>>
>>> -    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>>> +    if (list_empty(&ring->fence_signal_list))
>>> +        return;
>>> +
>>> +    if (!fence_locked)
>>> +        spin_lock_irqsave(&ring->fence_lock, flags);
>>> +
>>> +    seqno = ring->get_seqno(ring, false);
>>
>> Could read the seqno outside the spinlock, although not that important
>> since the primary caller of this is (or should be) the irq handler.
> That fix is already done in patch #13 of the new series.
>
>>
>>> +
>>> +    list_for_each_entry_safe(req, req_next,
>>> &ring->fence_signal_list, signal_link) {
>>> +        if (!req->cancelled) {
>>> +            if (!i915_seqno_passed(seqno, req->seqno))
>>> +                break;
>>> +        }
>>> +
>>> +        /*
>>> +         * Start by removing the fence from the signal list otherwise
>>> +         * the retire code can run concurrently and get confused.
>>> +         */
>>> +        list_del_init(&req->signal_link);
>>> +
>>> +        if (!req->cancelled) {
>>> +            fence_signal_locked(&req->fence);
>>> +        }
>>> +
>>> +        if (req->irq_enabled) {
>>> +            req->ring->irq_put(req->ring);
>>> +            req->irq_enabled = false;
>>> +        }
>>> +
>>> +        /* Can't unreference here because that might grab fence_lock */
>>> +        list_add_tail(&req->unsignal_link, &ring->fence_unsignal_list);
>>> +    }
>>>
>>> -    return i915_seqno_passed(seqno, req->seqno);
>>> +    if (!fence_locked)
>>> +        spin_unlock_irqrestore(&ring->fence_lock, flags);
>>>   }
>>>
>>>   static const char *i915_gem_request_get_driver_name(struct fence
>>> *req_fence)
>>> @@ -2712,7 +2851,6 @@ static void
>>> i915_gem_request_fence_value_str(struct fence *req_fence, char *str,
>>>
>>>   static const struct fence_ops i915_gem_request_fops = {
>>>       .enable_signaling    = i915_gem_request_enable_signaling,
>>> -    .signaled        = i915_gem_request_is_completed,
>>>       .wait            = fence_default_wait,
>>>       .release        = i915_gem_request_release,
>>>       .get_driver_name    = i915_gem_request_get_driver_name,
>>> @@ -2795,6 +2933,7 @@ int i915_gem_request_alloc(struct
>>> intel_engine_cs *ring,
>>>           goto err;
>>>       }
>>>
>>> +    INIT_LIST_HEAD(&req->signal_link);
>>>       fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
>>> ctx->engine[ring->id].fence_timeline.fence_context,
>>> i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
>>>
>>> @@ -2832,6 +2971,11 @@ void i915_gem_request_cancel(struct
>>> drm_i915_gem_request *req)
>>>   {
>>>       intel_ring_reserved_space_cancel(req->ringbuf);
>>>
>>> +    req->cancelled = true;
>>> +    /* How to propagate to any associated sync_fence??? */
>>> +    req->fence.status = -EINVAL;
>>> +    fence_signal_locked(&req->fence);
>>
>> Same question from before, could you just move it to unsignaled list
>> straight away? (And drop interrupts?)
> Again, it seemed a much simpler driver if the request completion code is
> only kept inside the notify function and not replicated to multiple places.
>
>>
>>> +
>>>       i915_gem_request_unreference(req);
>>>   }
>>>
>>> @@ -2925,6 +3069,13 @@ static void i915_gem_reset_ring_cleanup(struct
>>> drm_i915_private *dev_priv,
>>>           i915_gem_request_retire(request);
>>>       }
>>>
>>> +    /*
>>> +     * Tidy up anything left over. This includes a call to
>>> +     * i915_gem_request_notify() which will make sure that any requests
>>> +     * that were on the signal pending list get also cleaned up.
>>> +     */
>>> +    i915_gem_retire_requests_ring(ring);
>>> +
>>>       /* Having flushed all requests from all queues, we know that all
>>>        * ringbuffers must now be empty. However, since we do not reclaim
>>>        * all space when retiring the request (to prevent HEADs colliding
>>> @@ -2974,6 +3125,13 @@ i915_gem_retire_requests_ring(struct
>>> intel_engine_cs *ring)
>>>
>>>       WARN_ON(i915_verify_lists(ring->dev));
>>>
>>> +    /*
>>> +     * If no-one has waited on a request recently then interrupts will
>>> +     * not have been enabled and thus no requests will ever be
>>> marked as
>>> +     * completed. So do an interrupt check now.
>>> +     */
>>> +    i915_gem_request_notify(ring, false);
>>> +
>>>       /* Retire requests first as we use it above for the early return.
>>>        * If we retire requests last, we may use a later seqno and so
>>> clear
>>>        * the requests lists without clearing the active list, leading to
>>> @@ -3015,6 +3173,15 @@ i915_gem_retire_requests_ring(struct
>>> intel_engine_cs *ring)
>>>           i915_gem_request_assign(&ring->trace_irq_req, NULL);
>>>       }
>>>
>>> +    /* Tidy up any requests that were recently signalled */
>>> +    spin_lock_irqsave(&ring->fence_lock, flags);
>>
>> Could also use spin_lock_irq here I think.
>>
>>> + list_splice_init(&ring->fence_unsignal_list, &list_head);
>>> +    spin_unlock_irqrestore(&ring->fence_lock, flags);
>>> +    list_for_each_entry_safe(req, req_next, &list_head,
>>> unsignal_link) {
>>> +        list_del(&req->unsignal_link);
>>> +        i915_gem_request_unreference(req);
>>> +    }
>>> +
>>>       /* Really free any requests that were recently unreferenced */
>>>       spin_lock_irqsave(&ring->delayed_free_lock, flags);
>>>       list_splice_init(&ring->delayed_free_list, &list_head);
>>> @@ -5066,6 +5233,8 @@ init_ring_lists(struct intel_engine_cs *ring)
>>>   {
>>>       INIT_LIST_HEAD(&ring->active_list);
>>>       INIT_LIST_HEAD(&ring->request_list);
>>> +    INIT_LIST_HEAD(&ring->fence_signal_list);
>>> +    INIT_LIST_HEAD(&ring->fence_unsignal_list);
>>>       INIT_LIST_HEAD(&ring->delayed_free_list);
>>>   }
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c
>>> b/drivers/gpu/drm/i915/i915_irq.c
>>> index 68b094b..74f8552 100644
>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>> @@ -981,6 +981,8 @@ static void notify_ring(struct intel_engine_cs
>>> *ring)
>>>
>>>       trace_i915_gem_request_notify(ring);
>>>
>>> +    i915_gem_request_notify(ring, false);
>>> +
>>>       wake_up_all(&ring->irq_queue);
>>
>> This is the big one - I think to solve the thundering herd problem
>> this patch should also include the removal of ring->irq_queue since I
>> don't think it needs to keep existing after it.
>>
>> For example adding a req->wait_queue on which __i915_wait_request
>> would wait instead and doing a wake_up_all on it from
>> i915_gem_request_notify?
>>
>> Because in this form, when I've ran this patch series and it caused
>> the number of context switches to go up more than six times on one
>> workload. Interrupts also went up but only by 50% in comparison and
>> time spent in irq handlers doubled.
>>
>> But context switches and 10% more cpu time look the most dramatic.
>>
>> So it would be interesting to see if simply moving the wait queue to
>> be per request could fix the majority of that or not.
>
> This idea was posted one version of the thundering herd thread already.
> I haven't implemented it yet as I haven't had the time to look into
> exactly what is required. But yes, in theory the fence code should mean
> the irq_queue is obsolete. The waiters can simply register a callback on
> the request's underlying fence object. When the interrupt handler
> signals the fence, that fence will process it's callback list and wake
> the waiter. All nicely targeted and no-one gets woken unnecessarily.

Cool, I think we need that if nothing for the 6x more context switches 
per second this patch causes (13k vs 2k for pure nightly with the NBody 
test - which good for stressing wait request, interrupts and busy 
spinning.).

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/13] android/sync: Fix reversed sense of signaled fence
  2015-12-11 15:57   ` Tvrtko Ursulin
@ 2015-12-14 11:22     ` John Harrison
  2015-12-14 12:37       ` Tvrtko Ursulin
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2015-12-14 11:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 11/12/2015 15:57, Tvrtko Ursulin wrote:
>
> On 11/12/15 13:11, John.C.Harrison@Intel.com wrote:
>> From: Peter Lawthers <peter.lawthers@intel.com>
>>
>> In the 3.14 kernel, a signaled fence was indicated by the status field
>> == 1. In 4.x, a status == 0 indicates signaled, status < 0 indicates 
>> error,
>> and status > 0 indicates active.
>>
>> This patch wraps the check for a signaled fence in a function so that
>> callers no longer needs to know the underlying implementation.
>>
>> v3: New patch for series.
>>
>> Change-Id: I8e565e49683e3efeb9474656cd84cf4add6ad6a2
>> Tracked-On: https://jira01.devtools.intel.com/browse/ACD-308
>> Signed-off-by: Peter Lawthers <peter.lawthers@intel.com>
>> ---
>>   drivers/android/sync.h | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/drivers/android/sync.h b/drivers/android/sync.h
>> index d57fa0a..75532d8 100644
>> --- a/drivers/android/sync.h
>> +++ b/drivers/android/sync.h
>> @@ -345,6 +345,27 @@ int sync_fence_cancel_async(struct sync_fence 
>> *fence,
>>    */
>>   int sync_fence_wait(struct sync_fence *fence, long timeout);
>>
>> +/**
>> + * sync_fence_is_signaled() - Return an indication if the fence is 
>> signaled
>> + * @fence:    fence to check
>> + *
>> + * returns 1 if fence is signaled
>> + * returns 0 if fence is not signaled
>> + * returns < 0 if fence is in error state
>> + */
>> +static inline int
>> +sync_fence_is_signaled(struct sync_fence *fence)
>> +{
>> +    int status;
>> +
>> +    status = atomic_read(&fence->status);
>> +    if (status == 0)
>> +        return 1;
>> +    if (status > 0)
>> +        return 0;
>> +    return status;
>> +}
>
> Not so important but could simply return bool, like "return status <= 
> 0"? Since it is called "is_signaled" and it is only used in boolean 
> mode in future patches.

There is no point in throwing away the error code unnecessarily. It can 
be useful in debug output and indeed will show up the scheduler status 
dump via debugfs.

>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 12/13] drm/i915: Add sync framework support to execbuff IOCTL
  2015-12-11 15:29   ` Tvrtko Ursulin
@ 2015-12-14 11:46     ` John Harrison
  2015-12-14 12:23       ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2015-12-14 11:46 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 11/12/2015 15:29, Tvrtko Ursulin wrote:
>
>
> On 11/12/15 13:12, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> Various projects desire a mechanism for managing dependencies between
>> work items asynchronously. This can also include work items across
>> complete different and independent systems. For example, an
>> application wants to retreive a frame from a video in device,
>> using it for rendering on a GPU then send it to the video out device
>> for display all without having to stall waiting for completion along
>> the way. The sync framework allows this. It encapsulates
>> synchronisation events in file descriptors. The application can
>> request a sync point for the completion of each piece of work. Drivers
>> should also take sync points in with each new work request and not
>> schedule the work to start until the sync has been signalled.
>>
>> This patch adds sync framework support to the exec buffer IOCTL. A
>> sync point can be passed in to stall execution of the batch buffer
>> until signalled. And a sync point can be returned after each batch
>> buffer submission which will be signalled upon that batch buffer's
>> completion.
>>
>> At present, the input sync point is simply waited on synchronously
>> inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
>> this will be handled asynchronously inside the scheduler and the IOCTL
>> can return without having to wait.
>>
>> Note also that the scheduler will re-order the execution of batch
>> buffers, e.g. because a batch buffer is stalled on a sync point and
>> cannot be submitted yet but other, independent, batch buffers are
>> being presented to the driver. This means that the timeline within the
>> sync points returned cannot be global to the engine. Instead they must
>> be kept per context per engine (the scheduler may not re-order batches
>> within a context). Hence the timeline cannot be based on the existing
>> seqno values but must be a new implementation.
>>
>> This patch is a port of work by several people that has been pulled
>> across from Android. It has been updated several times across several
>> patches. Rather than attempt to port each individual patch, this
>> version is the finished product as a single patch. The various
>> contributors/authors along the way (in addition to myself) were:
>>    Satyanantha RamaGopal M <rama.gopal.m.satyanantha@intel.com>
>>    Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>    Michel Thierry <michel.thierry@intel.com>
>>    Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> v2: New patch in series.
>>
>> v3: Updated to use the new 'sync_fence_is_signaled' API rather than
>> having to know about the internal meaning of the 'fence::status' field
>> (which recently got inverted!) [work by Peter Lawthers].
>>
>> Updated after review comments by Daniel Vetter. Removed '#ifdef
>> CONFIG_SYNC' and add 'select SYNC' to the Kconfig instead. Moved the
>> fd installation of fences to the end of the execbuff call to in order
>> to remove the need to use 'sys_close' to clean up on failure.
>>
>> Updated after review comments by Tvrtko Ursulin. Remvoed the
>> 'fence_external' flag as redundant. Covnerted DRM_ERRORs to
>> DRM_DEBUGs. Changed one second wait to a wait forever when waiting on
>> incoming fences.
>>
>> v4: Re-instated missing return of fd to user land that somehow got
>> lost in the anti-sys_close() re-factor.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Signed-off-by: Peter Lawthers <peter.lawthers@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> ---
>>   drivers/gpu/drm/i915/Kconfig               |  3 +
>>   drivers/gpu/drm/i915/i915_drv.h            |  6 ++
>>   drivers/gpu/drm/i915/i915_gem.c            | 89 
>> +++++++++++++++++++++++++++-
>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 95 
>> ++++++++++++++++++++++++++++--
>>   include/uapi/drm/i915_drm.h                | 16 ++++-
>>   5 files changed, 200 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
>> index 1d96fe1..cb5d5b2 100644
>> --- a/drivers/gpu/drm/i915/Kconfig
>> +++ b/drivers/gpu/drm/i915/Kconfig
>> @@ -22,6 +22,9 @@ config DRM_I915
>>       select ACPI_VIDEO if ACPI
>>       select ACPI_BUTTON if ACPI
>>       select MMU_NOTIFIER
>
> select MMU_NOTIFIER is not upstream! :)
Oops. That's from a patch not being upstreamed. It is required for 
OCL2.0 which is something we are using to test the scheduler.

>
>> +    # ANDROID is required for SYNC
>> +    select ANDROID
>> +    select SYNC
>>       help
>>         Choose this option if you have a system that has "Intel Graphics
>>         Media Accelerator" or "HD Graphics" integrated graphics,
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index d013c6d..194bca0 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2278,6 +2278,12 @@ void i915_gem_request_notify(struct 
>> intel_engine_cs *ring, bool fence_locked);
>>   int i915_create_fence_timeline(struct drm_device *dev,
>>                      struct intel_context *ctx,
>>                      struct intel_engine_cs *ring);
>> +struct sync_fence;
>> +int i915_create_sync_fence(struct drm_i915_gem_request *req,
>> +               struct sync_fence **sync_fence, int *fence_fd);
>> +void i915_install_sync_fence_fd(struct drm_i915_gem_request *req,
>> +                struct sync_fence *sync_fence, int fence_fd);
>> +bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct 
>> sync_fence *fence);
>>
>>   static inline bool i915_gem_request_completed(struct 
>> drm_i915_gem_request *req)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 4817015..279d79f 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -37,6 +37,7 @@
>>   #include <linux/swap.h>
>>   #include <linux/pci.h>
>>   #include <linux/dma-buf.h>
>> +#include <../drivers/android/sync.h>
>>
>>   #define RQ_BUG_ON(expr)
>>
>> @@ -2560,7 +2561,13 @@ void __i915_add_request(struct 
>> drm_i915_gem_request *request,
>>
>>       /*
>>        * Add the fence to the pending list before emitting the 
>> commands to
>> -     * generate a seqno notification interrupt.
>> +     * generate a seqno notification interrupt. This will also enable
>> +     * interrupts if 'signal_requested' has been set.
>> +     *
>> +     * For example, if an exported sync point has been requested for 
>> this
>> +     * request then it can be waited on without the driver's knowledge,
>> +     * i.e. without calling __i915_wait_request(). Thus interrupts must
>> +     * be enabled from the start rather than only on demand.
>>        */
>>       i915_gem_request_submit(request);
>>
>> @@ -2901,6 +2908,86 @@ static unsigned 
>> i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *t
>>       return seqno;
>>   }
>>
>> +int i915_create_sync_fence(struct drm_i915_gem_request *req,
>> +               struct sync_fence **sync_fence, int *fence_fd)
>> +{
>> +    char ring_name[] = "i915_ring0";
>> +    int fd;
>> +
>> +    fd = get_unused_fd_flags(O_CLOEXEC);
>> +    if (fd < 0) {
>> +        DRM_DEBUG("No available file descriptors!\n");
>> +        *fence_fd = -1;
>> +        return fd;
>> +    }
>> +
>> +    ring_name[9] += req->ring->id;
>
> I think this will possibly blow up if CONFIG_DEBUG_RODATA is set, 
> which is the case on most kernels.
>
> So I think you need to make a local copy with kstrdup and free it 
> after calling sync_fence_create_dma.
What will blow up? The ring_name local is a stack array not a pointer to 
the data segment. Did you miss the '[]'?

>
>> +    *sync_fence = sync_fence_create_dma(ring_name, &req->fence);
>> +    if (!*sync_fence) {
>> +        put_unused_fd(fd);
>> +        *fence_fd = -1;
>> +        return -ENOMEM;
>> +    }
>> +
>> +    *fence_fd = fd;
>> +
>> +    return 0;
>> +}
>> +
>> +void i915_install_sync_fence_fd(struct drm_i915_gem_request *req,
>> +                struct sync_fence *sync_fence, int fence_fd)
>> +{
>> +    sync_fence_install(sync_fence, fence_fd);
>> +
>> +    /*
>> +     * NB: The corresponding put happens automatically on file close
>> +     * from sync_fence_release() via the fops callback.
>> +     */
>> +    fence_get(&req->fence);
>> +
>> +    /*
>> +     * The sync framework adds a callback to the fence. The fence
>> +     * framework calls 'enable_signalling' when a callback is added.
>> +     * Thus this flag should have been set by now. If not then
>> +     * 'enable_signalling' must be called explicitly because exporting
>> +     * a fence to user land means it can be waited on asynchronously 
>> and
>> +     * thus must be signalled asynchronously.
>> +     */
>> +    WARN_ON(!req->signal_requested);
>> +}
>> +
>> +bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct 
>> sync_fence *sync_fence)
>> +{
>> +    struct fence *dma_fence;
>> +    struct drm_i915_gem_request *req;
>> +    int i;
>> +
>> +    if (sync_fence_is_signaled(sync_fence))
>> +        return true;
>> +
>> +    for(i = 0; i < sync_fence->num_fences; i++) {
>> +        dma_fence = sync_fence->cbs[i].sync_pt;
>> +
>> +        /* No need to worry about dead points: */
>> +        if (fence_is_signaled(dma_fence))
>> +            continue;
>> +
>> +        /* Can't ignore other people's points: */
>
> Maybe add "unsignaled" to qualify.
The test above filters out anything that is signalled (or errored). 
Stating that again on each subsequent test seems unnecessarily verbose.

>
>> +        if(dma_fence->ops != &i915_gem_request_fops)
>> +            return false;
>> +
>> +        req = container_of(dma_fence, typeof(*req), fence);
>> +
>> +        /* Can't ignore points on other rings: */
>> +        if (req->ring != ring)
>> +            return false;
>> +
>> +        /* Same ring means guaranteed to be in order so ignore it. */
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>                  struct intel_context *ctx,
>>                  struct drm_i915_gem_request **req_out)
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
>> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index bfc4c17..5f629f8 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -26,6 +26,7 @@
>>    *
>>    */
>>
>> +#include <linux/syscalls.h>
>>   #include <drm/drmP.h>
>>   #include <drm/i915_drm.h>
>>   #include "i915_drv.h"
>> @@ -33,6 +34,7 @@
>>   #include "intel_drv.h"
>>   #include <linux/dma_remapping.h>
>>   #include <linux/uaccess.h>
>> +#include <../drivers/android/sync.h>
>>
>>   #define  __EXEC_OBJECT_HAS_PIN (1<<31)
>>   #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
>> @@ -1322,6 +1324,38 @@ eb_get_batch(struct eb_vmas *eb)
>>       return vma->obj;
>>   }
>>
>> +static int i915_early_fence_wait(struct intel_engine_cs *ring, int 
>> fence_fd)
>> +{
>> +    struct sync_fence *fence;
>> +    int ret = 0;
>> +
>> +    if (fence_fd < 0) {
>> +        DRM_DEBUG("Invalid wait fence fd %d on ring %d\n", fence_fd,
>> +              (int) ring->id);
>> +        return 1;
>
> Suggest adding kerneldoc describing return values from this function.
>
> It wasn't immediately clear to me what one means.
To be honest, I think the one is left over from an earlier iteration of 
the code which did things slightly differently. It probably could just 
return zero or -error.

> But I am also not sure that invalid fd shouldn't be an outright error 
> instead of allowing execbuf to contiue.
Wasn't sure if it was possible for a fence to be invalidated behind the 
back of an application. And you don't want to reject rendering from one 
app just because another app went splat. I guess even if the underlying 
fence has been destroyed, the fd will still be private to the current 
app and hence valid. So maybe it should just return -EINVAL or some such 
of the fd itself is toast.


>
>> +    }
>> +
>> +    fence = sync_fence_fdget(fence_fd);
>> +    if (fence == NULL) {
>> +        DRM_DEBUG("Invalid wait fence %d on ring %d\n", fence_fd,
>> +              (int) ring->id);
>> +        return 1;
>> +    }
>> +
>> +    if (!sync_fence_is_signaled(fence)) {
>
> Minor comment, but i915_safe_to_ignore_fence checks this as well so 
> you could remove it here.
>
>> +        /*
>> +         * Wait forever for the fence to be signalled. This is safe
>> +         * because the the mutex lock has not yet been acquired and
>> +         * the wait is interruptible.
>> +         */
>> +        if (!i915_safe_to_ignore_fence(ring, fence))
>> +            ret = sync_fence_wait(fence, -1);
>> +    }
>> +
>> +    sync_fence_put(fence);
>> +    return ret;
>> +}
>> +
>>   static int
>>   i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>>                  struct drm_file *file,
>> @@ -1341,6 +1375,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, 
>> void *data,
>>       u32 dispatch_flags;
>>       int ret;
>>       bool need_relocs;
>> +    int fd_fence_complete = -1;
>> +    int fd_fence_wait = lower_32_bits(args->rsvd2);
>> +    struct sync_fence *sync_fence;
>> +
>> +    /*
>> +     * Make sure an broken fence handle is not returned no matter
>> +     * how early an error might be hit. Note that rsvd2 has to be
>> +     * saved away first because it is also an input parameter!
>> +     */
>
> Instead of the 2nd sentence maybe say something like "Note that we 
> have saved rsvd2 already for later use since it is also in input 
> parameter!". Like written I was expecting the code following the 
> comment to do that, and then was confused when it didn't. Or maybe my 
> attention span is too short.
Will update the comment for those who can't remember what they read two 
lines earlier...

>
>> +    if (args->flags & I915_EXEC_CREATE_FENCE)
>> +        args->rsvd2 = (__u64) -1;
>>
>>       if (!i915_gem_check_execbuffer(args))
>>           return -EINVAL;
>> @@ -1424,6 +1469,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, 
>> void *data,
>>           dispatch_flags |= I915_DISPATCH_RS;
>>       }
>>
>> +    /*
>> +     * Without a GPU scheduler, any fence waits must be done up front.
>> +     */
>> +    if (args->flags & I915_EXEC_WAIT_FENCE) {
>> +        ret = i915_early_fence_wait(ring, fd_fence_wait);
>> +        if (ret < 0)
>> +            return ret;
>> +
>> +        args->flags &= ~I915_EXEC_WAIT_FENCE;
>> +    }
>> +
>>       intel_runtime_pm_get(dev_priv);
>>
>>       ret = i915_mutex_lock_interruptible(dev);
>> @@ -1571,8 +1627,41 @@ i915_gem_do_execbuffer(struct drm_device *dev, 
>> void *data,
>>       params->batch_obj               = batch_obj;
>>       params->ctx                     = ctx;
>>
>> +    if (args->flags & I915_EXEC_CREATE_FENCE) {
>> +        /*
>> +         * Caller has requested a sync fence.
>> +         * User interrupts will be enabled to make sure that
>> +         * the timeline is signalled on completion.
>> +         */
>
> Is it signaled or signalled? There is a lot of usage of both 
> throughout the patches and I as a non-native speaker am 
> amu^H^H^Hconfused. ;)
It depends which side of the Atlantic you are on. English has a double 
l, American just a single. So largely it depends who wrote the 
code/comment and who (if anyone!) has reviewed it.

>
>> +        ret = i915_create_sync_fence(params->request, &sync_fence,
>> +                         &fd_fence_complete);
>> +        if (ret) {
>> +            DRM_ERROR("Fence creation failed for ring %d, ctx %p\n",
>> +                  ring->id, ctx);
>> +            goto err_batch_unpin;
>> +        }
>> +    }
>> +
>>       ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
>>
>> +    if (fd_fence_complete != -1) {
>> +        if (ret) {
>> +            sync_fence_put(sync_fence);
>> +            put_unused_fd(fd_fence_complete);
>> +        } else {
>> +            /*
>> +             * Install the fence into the pre-allocated file
>> +             * descriptor to the fence object so that user land
>> +             * can wait on it...
>> +             */
>> +            i915_install_sync_fence_fd(params->request,
>> +                           sync_fence, fd_fence_complete);
>> +
>> +            /* Return the fence through the rsvd2 field */
>> +            args->rsvd2 = (__u64) fd_fence_complete;
>> +        }
>> +    }
>> +
>>   err_batch_unpin:
>>       /*
>>        * FIXME: We crucially rely upon the active tracking for the 
>> (ppgtt)
>> @@ -1602,6 +1691,7 @@ pre_mutex_err:
>>       /* intel_gpu_busy should also get a ref, so it will free when 
>> the device
>>        * is really idle. */
>>       intel_runtime_pm_put(dev_priv);
>> +
>>       return ret;
>>   }
>>
>> @@ -1707,11 +1797,6 @@ i915_gem_execbuffer2(struct drm_device *dev, 
>> void *data,
>>           return -EINVAL;
>>       }
>>
>> -    if (args->rsvd2 != 0) {
>> -        DRM_DEBUG("dirty rvsd2 field\n");
>> -        return -EINVAL;
>> -    }
>> -
>>       exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
>>                    GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
>>       if (exec2_list == NULL)
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 67cebe6..86f7921 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -250,7 +250,7 @@ typedef struct _drm_i915_sarea {
>>   #define DRM_IOCTL_I915_HWS_ADDR DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_I915_HWS_ADDR, struct drm_i915_gem_init)
>>   #define DRM_IOCTL_I915_GEM_INIT DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_I915_GEM_INIT, struct drm_i915_gem_init)
>>   #define DRM_IOCTL_I915_GEM_EXECBUFFER DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_I915_GEM_EXECBUFFER, struct drm_i915_gem_execbuffer)
>> -#define DRM_IOCTL_I915_GEM_EXECBUFFER2 DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
>> +#define DRM_IOCTL_I915_GEM_EXECBUFFER2 DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
>>   #define DRM_IOCTL_I915_GEM_PIN DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
>>   #define DRM_IOCTL_I915_GEM_UNPIN    DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
>>   #define DRM_IOCTL_I915_GEM_BUSY DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
>> @@ -695,7 +695,7 @@ struct drm_i915_gem_exec_object2 {
>>       __u64 flags;
>>
>>       __u64 rsvd1;
>> -    __u64 rsvd2;
>> +    __u64 rsvd2;    /* Used for fence fd */
>>   };
>>
>>   struct drm_i915_gem_execbuffer2 {
>> @@ -776,7 +776,17 @@ struct drm_i915_gem_execbuffer2 {
>>    */
>>   #define I915_EXEC_RESOURCE_STREAMER     (1<<15)
>>
>> -#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_RESOURCE_STREAMER<<1)
>> +/** Caller supplies a sync fence fd in the rsvd2 field.
>> + * Wait for it to be signalled before starting the work
>> + */
>> +#define I915_EXEC_WAIT_FENCE        (1<<16)
>> +
>> +/** Caller wants a sync fence fd for this execbuffer.
>> + *  It will be returned in rsvd2
>> + */
>> +#define I915_EXEC_CREATE_FENCE        (1<<17)
>> +
>> +#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_CREATE_FENCE<<1)
>>
>>   #define I915_EXEC_CONTEXT_ID_MASK    (0xffffffff)
>>   #define i915_execbuffer2_set_context_id(eb2, context) \
>>
>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2015-12-11 14:28   ` Tvrtko Ursulin
@ 2015-12-14 11:58     ` John Harrison
  2015-12-14 12:52       ` Tvrtko Ursulin
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2015-12-14 11:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 11/12/2015 14:28, Tvrtko Ursulin wrote:
> On 11/12/15 13:12, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The notify function can be called many times without the seqno
>> changing. A large number of duplicates are to prevent races due to the
>> requirement of not enabling interrupts until requested. However, when
>> interrupts are enabled the IRQ handle can be called multiple times
>> without the ring's seqno value changing. This patch reduces the
>> overhead of these extra calls by caching the last processed seqno
>> value and early exiting if it has not changed.
>>
>> v3: New patch for series.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem.c         | 14 +++++++++++---
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>>   2 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 279d79f..3c88678 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2457,6 +2457,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 
>> seqno)
>>
>>           for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
>>               ring->semaphore.sync_seqno[j] = 0;
>> +
>> +        ring->last_irq_seqno = 0;
>>       }
>>
>>       return 0;
>> @@ -2788,11 +2790,14 @@ void i915_gem_request_notify(struct 
>> intel_engine_cs *ring, bool fence_locked)
>>           return;
>>       }
>>
>> -    if (!fence_locked)
>> -        spin_lock_irqsave(&ring->fence_lock, flags);
>> -
>>       seqno = ring->get_seqno(ring, false);
>>       trace_i915_gem_request_notify(ring, seqno);
>> +    if (seqno == ring->last_irq_seqno)
>> +        return;
>> +    ring->last_irq_seqno = seqno;
>
> Hmmm.. do you want to make the check "seqno <= ring->last_irq_seqno" ?
>
> Is there a possibility for some weird timing or caching issue where 
> two callers get in and last_irq_seqno goes backwards? Not sure that it 
> would cause a problem, but pattern is unusual and hard to understand 
> for me.
The check is simply to prevent repeat processing of identical seqno 
values. The 'last_' value is never used for anything more complicated. 
If there is a very rare race condition where the repeat processing can 
still happen, it doesn't really matter too much.

> Also check and the assignment would need to be under the spinlock I 
> think.

The whole point is to not grab the spinlock if there is no work to do. 
Hence the seqno read and test must be done first. The assignment could 
potentially be done after the lock but if two different threads have 
made it that far concurrently then it doesn't really matter who does the 
write first. Most likely they are both processing the same seqno and in 
the really rare case of two concurrent threads actually reading two 
different (and both new) seqno values then there is no guarantee about 
which will take the lock first. So you are into the above situation of 
it doesn't really matter if there is then a third time around later that 
finds an 'incorrect' last value and goes through the processing sequence 
but with no work to do.


>> +
>> +    if (!fence_locked)
>> +        spin_lock_irqsave(&ring->fence_lock, flags);
>>
>>       list_for_each_entry_safe(req, req_next, 
>> &ring->fence_signal_list, signal_link) {
>>           if (!req->cancelled) {
>> @@ -3163,7 +3168,10 @@ static void i915_gem_reset_ring_cleanup(struct 
>> drm_i915_private *dev_priv,
>>        * Tidy up anything left over. This includes a call to
>>        * i915_gem_request_notify() which will make sure that any 
>> requests
>>        * that were on the signal pending list get also cleaned up.
>> +     * NB: The seqno cache must be cleared otherwise the notify call 
>> will
>> +     * simply return immediately.
>>        */
>> +    ring->last_irq_seqno = 0;
>>       i915_gem_retire_requests_ring(ring);
>>
>>       /* Having flushed all requests from all queues, we know that all
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 9d09edb..1987abd 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -356,6 +356,7 @@ struct  intel_engine_cs {
>>       spinlock_t fence_lock;
>>       struct list_head fence_signal_list;
>>       struct list_head fence_unsignal_list;
>> +    uint32_t last_irq_seqno;
>>   };
>>
>>   bool intel_ring_initialized(struct intel_engine_cs *ring);
>>
>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 12/13] drm/i915: Add sync framework support to execbuff IOCTL
  2015-12-14 11:46     ` John Harrison
@ 2015-12-14 12:23       ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2015-12-14 12:23 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Mon, Dec 14, 2015 at 11:46:22AM +0000, John Harrison wrote:
> >>@@ -1341,6 +1375,17 @@ i915_gem_do_execbuffer(struct drm_device
> >>*dev, void *data,
> >>      u32 dispatch_flags;
> >>      int ret;
> >>      bool need_relocs;
> >>+    int fd_fence_complete = -1;
> >>+    int fd_fence_wait = lower_32_bits(args->rsvd2);
> >>+    struct sync_fence *sync_fence;
> >>+
> >>+    /*
> >>+     * Make sure an broken fence handle is not returned no matter
> >>+     * how early an error might be hit. Note that rsvd2 has to be
> >>+     * saved away first because it is also an input parameter!
> >>+     */
> >
> >Instead of the 2nd sentence maybe say something like "Note that we
> >have saved rsvd2 already for later use since it is also in input
> >parameter!". Like written I was expecting the code following the
> >comment to do that, and then was confused when it didn't. Or maybe
> >my attention span is too short.
> Will update the comment for those who can't remember what they read
> two lines earlier...

Honestly, I thought the complaint here would be that the user's input
parameter is being modified on the error path breaking the ABI
i.e. drmIoctl() will not work.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/13] android/sync: Fix reversed sense of signaled fence
  2015-12-14 11:22     ` John Harrison
@ 2015-12-14 12:37       ` Tvrtko Ursulin
  0 siblings, 0 replies; 74+ messages in thread
From: Tvrtko Ursulin @ 2015-12-14 12:37 UTC (permalink / raw)
  To: John Harrison, Intel-GFX


On 14/12/15 11:22, John Harrison wrote:
> On 11/12/2015 15:57, Tvrtko Ursulin wrote:
>>
>> On 11/12/15 13:11, John.C.Harrison@Intel.com wrote:
>>> From: Peter Lawthers <peter.lawthers@intel.com>
>>>
>>> In the 3.14 kernel, a signaled fence was indicated by the status field
>>> == 1. In 4.x, a status == 0 indicates signaled, status < 0 indicates
>>> error,
>>> and status > 0 indicates active.
>>>
>>> This patch wraps the check for a signaled fence in a function so that
>>> callers no longer needs to know the underlying implementation.
>>>
>>> v3: New patch for series.
>>>
>>> Change-Id: I8e565e49683e3efeb9474656cd84cf4add6ad6a2
>>> Tracked-On: https://jira01.devtools.intel.com/browse/ACD-308
>>> Signed-off-by: Peter Lawthers <peter.lawthers@intel.com>
>>> ---
>>>   drivers/android/sync.h | 21 +++++++++++++++++++++
>>>   1 file changed, 21 insertions(+)
>>>
>>> diff --git a/drivers/android/sync.h b/drivers/android/sync.h
>>> index d57fa0a..75532d8 100644
>>> --- a/drivers/android/sync.h
>>> +++ b/drivers/android/sync.h
>>> @@ -345,6 +345,27 @@ int sync_fence_cancel_async(struct sync_fence
>>> *fence,
>>>    */
>>>   int sync_fence_wait(struct sync_fence *fence, long timeout);
>>>
>>> +/**
>>> + * sync_fence_is_signaled() - Return an indication if the fence is
>>> signaled
>>> + * @fence:    fence to check
>>> + *
>>> + * returns 1 if fence is signaled
>>> + * returns 0 if fence is not signaled
>>> + * returns < 0 if fence is in error state
>>> + */
>>> +static inline int
>>> +sync_fence_is_signaled(struct sync_fence *fence)
>>> +{
>>> +    int status;
>>> +
>>> +    status = atomic_read(&fence->status);
>>> +    if (status == 0)
>>> +        return 1;
>>> +    if (status > 0)
>>> +        return 0;
>>> +    return status;
>>> +}
>>
>> Not so important but could simply return bool, like "return status <=
>> 0"? Since it is called "is_signaled" and it is only used in boolean
>> mode in future patches.
>
> There is no point in throwing away the error code unnecessarily. It can
> be useful in debug output and indeed will show up the scheduler status
> dump via debugfs.

You could still grab it directly from that call site. Or by adding 
another accessor like sync_fence_get_error(). Just saying that it may be 
good to decouple more from sync_fence implementation, since the 
internals have changed once already.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2015-12-14 11:58     ` John Harrison
@ 2015-12-14 12:52       ` Tvrtko Ursulin
  0 siblings, 0 replies; 74+ messages in thread
From: Tvrtko Ursulin @ 2015-12-14 12:52 UTC (permalink / raw)
  To: John Harrison, Intel-GFX


On 14/12/15 11:58, John Harrison wrote:
> On 11/12/2015 14:28, Tvrtko Ursulin wrote:
>> On 11/12/15 13:12, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The notify function can be called many times without the seqno
>>> changing. A large number of duplicates are to prevent races due to the
>>> requirement of not enabling interrupts until requested. However, when
>>> interrupts are enabled the IRQ handle can be called multiple times
>>> without the ring's seqno value changing. This patch reduces the
>>> overhead of these extra calls by caching the last processed seqno
>>> value and early exiting if it has not changed.
>>>
>>> v3: New patch for series.
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem.c         | 14 +++++++++++---
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>>>   2 files changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index 279d79f..3c88678 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -2457,6 +2457,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32
>>> seqno)
>>>
>>>           for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
>>>               ring->semaphore.sync_seqno[j] = 0;
>>> +
>>> +        ring->last_irq_seqno = 0;
>>>       }
>>>
>>>       return 0;
>>> @@ -2788,11 +2790,14 @@ void i915_gem_request_notify(struct
>>> intel_engine_cs *ring, bool fence_locked)
>>>           return;
>>>       }
>>>
>>> -    if (!fence_locked)
>>> -        spin_lock_irqsave(&ring->fence_lock, flags);
>>> -
>>>       seqno = ring->get_seqno(ring, false);
>>>       trace_i915_gem_request_notify(ring, seqno);
>>> +    if (seqno == ring->last_irq_seqno)
>>> +        return;
>>> +    ring->last_irq_seqno = seqno;
>>
>> Hmmm.. do you want to make the check "seqno <= ring->last_irq_seqno" ?
>>
>> Is there a possibility for some weird timing or caching issue where
>> two callers get in and last_irq_seqno goes backwards? Not sure that it
>> would cause a problem, but pattern is unusual and hard to understand
>> for me.
> The check is simply to prevent repeat processing of identical seqno
> values. The 'last_' value is never used for anything more complicated.
> If there is a very rare race condition where the repeat processing can
> still happen, it doesn't really matter too much.
>
>> Also check and the assignment would need to be under the spinlock I
>> think.
>
> The whole point is to not grab the spinlock if there is no work to do.
> Hence the seqno read and test must be done first. The assignment could
> potentially be done after the lock but if two different threads have
> made it that far concurrently then it doesn't really matter who does the
> write first. Most likely they are both processing the same seqno and in
> the really rare case of two concurrent threads actually reading two
> different (and both new) seqno values then there is no guarantee about
> which will take the lock first. So you are into the above situation of
> it doesn't really matter if there is then a third time around later that
> finds an 'incorrect' last value and goes through the processing sequence
> but with no work to do.

I think it would be good to put that in the comment then. :)

That you don't care about multiple notify processing running if the 
timing is right, or that you don't care if ring->last_irq_seqno does not 
reflect the last processed seqno. Etc.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 02/13] staging/android/sync: add sync_fence_create_dma
  2015-12-11 13:11 ` [PATCH 02/13] staging/android/sync: add sync_fence_create_dma John.C.Harrison
@ 2015-12-17 17:29   ` Jesse Barnes
  0 siblings, 0 replies; 74+ messages in thread
From: Jesse Barnes @ 2015-12-17 17:29 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX
  Cc: devel, Greg Kroah-Hartman, Arve Hjønnevåg,
	Riley Andrews, Maarten Lankhorst

On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
> From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> 
> This allows users of dma fences to create a android fence.
> 
> v2: Added kerneldoc. (Tvrtko Ursulin).
> 
> v4: Updated comments from review feedback my Maarten.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> Cc: devel@driverdev.osuosl.org
> Cc: Riley Andrews <riandrews@android.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> ---
>  drivers/staging/android/sync.c | 13 +++++++++----
>  drivers/staging/android/sync.h | 10 ++++++++++
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
> index f83e00c..7f0e919 100644
> --- a/drivers/staging/android/sync.c
> +++ b/drivers/staging/android/sync.c
> @@ -188,7 +188,7 @@ static void fence_check_cb_func(struct fence *f, struct fence_cb *cb)
>  }
>  
>  /* TODO: implement a create which takes more that one sync_pt */
> -struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
> +struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt)
>  {
>  	struct sync_fence *fence;
>  
> @@ -199,16 +199,21 @@ struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
>  	fence->num_fences = 1;
>  	atomic_set(&fence->status, 1);
>  
> -	fence->cbs[0].sync_pt = &pt->base;
> +	fence->cbs[0].sync_pt = pt;
>  	fence->cbs[0].fence = fence;
> -	if (fence_add_callback(&pt->base, &fence->cbs[0].cb,
> -			       fence_check_cb_func))
> +	if (fence_add_callback(pt, &fence->cbs[0].cb, fence_check_cb_func))
>  		atomic_dec(&fence->status);
>  
>  	sync_fence_debug_add(fence);
>  
>  	return fence;
>  }
> +EXPORT_SYMBOL(sync_fence_create_dma);
> +
> +struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
> +{
> +	return sync_fence_create_dma(name, &pt->base);
> +}
>  EXPORT_SYMBOL(sync_fence_create);
>  
>  struct sync_fence *sync_fence_fdget(int fd)
> diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
> index 61f8a3a..afa0752 100644
> --- a/drivers/staging/android/sync.h
> +++ b/drivers/staging/android/sync.h
> @@ -254,6 +254,16 @@ void sync_pt_free(struct sync_pt *pt);
>   */
>  struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt);
>  
> +/**
> + * sync_fence_create_dma() - creates a sync fence from dma-fence
> + * @name:	name of fence to create
> + * @pt:	dma-fence to add to the fence
> + *
> + * Creates a fence containg @pt.  Once this is called, the fence takes
> + * ownership of @pt.
> + */
> +struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt);
> +
>  /*
>   * API for sync_fence consumers
>   */
> 

I've been using this one for awhile, so:
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Intel-gfx] [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences
  2015-12-11 13:11 ` [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
@ 2015-12-17 17:32   ` Jesse Barnes
  0 siblings, 0 replies; 74+ messages in thread
From: Jesse Barnes @ 2015-12-17 17:32 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX
  Cc: devel, Greg Kroah-Hartman, Arve Hjønnevåg,
	Maarten Lankhorst, Riley Andrews

On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
> From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> 
> Debug output assumes all sync points are built on top of Android sync points
> and when we start creating them from dma-fences will NULL ptr deref unless
> taught about this.
> 
> v4: Corrected patch ownership.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: devel@driverdev.osuosl.org
> Cc: Riley Andrews <riandrews@android.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Arve Hjønnevåg <arve@android.com>
> ---
>  drivers/staging/android/sync_debug.c | 42 +++++++++++++++++++-----------------
>  1 file changed, 22 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c
> index 91ed2c4..f45d13c 100644
> --- a/drivers/staging/android/sync_debug.c
> +++ b/drivers/staging/android/sync_debug.c
> @@ -82,36 +82,42 @@ static const char *sync_status_str(int status)
>  	return "error";
>  }
>  
> -static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence)
> +static void sync_print_pt(struct seq_file *s, struct fence *pt, bool fence)
>  {
>  	int status = 1;
> -	struct sync_timeline *parent = sync_pt_parent(pt);
>  
> -	if (fence_is_signaled_locked(&pt->base))
> -		status = pt->base.status;
> +	if (fence_is_signaled_locked(pt))
> +		status = pt->status;
>  
>  	seq_printf(s, "  %s%spt %s",
> -		   fence ? parent->name : "",
> +		   fence && pt->ops->get_timeline_name ?
> +		   pt->ops->get_timeline_name(pt) : "",
>  		   fence ? "_" : "",
>  		   sync_status_str(status));
>  
>  	if (status <= 0) {
>  		struct timespec64 ts64 =
> -			ktime_to_timespec64(pt->base.timestamp);
> +			ktime_to_timespec64(pt->timestamp);
>  
>  		seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec);
>  	}
>  
> -	if (parent->ops->timeline_value_str &&
> -	    parent->ops->pt_value_str) {
> +	if ((!fence || pt->ops->timeline_value_str) &&
> +	    pt->ops->fence_value_str) {
>  		char value[64];
> +		bool success;
>  
> -		parent->ops->pt_value_str(pt, value, sizeof(value));
> -		seq_printf(s, ": %s", value);
> -		if (fence) {
> -			parent->ops->timeline_value_str(parent, value,
> -						    sizeof(value));
> -			seq_printf(s, " / %s", value);
> +		pt->ops->fence_value_str(pt, value, sizeof(value));
> +		success = strlen(value);
> +
> +		if (success)
> +			seq_printf(s, ": %s", value);
> +
> +		if (success && fence) {
> +			pt->ops->timeline_value_str(pt, value, sizeof(value));
> +
> +			if (strlen(value))
> +				seq_printf(s, " / %s", value);
>  		}
>  	}
>  
> @@ -138,7 +144,7 @@ static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj)
>  	list_for_each(pos, &obj->child_list_head) {
>  		struct sync_pt *pt =
>  			container_of(pos, struct sync_pt, child_list);
> -		sync_print_pt(s, pt, false);
> +		sync_print_pt(s, &pt->base, false);
>  	}
>  	spin_unlock_irqrestore(&obj->child_list_lock, flags);
>  }
> @@ -153,11 +159,7 @@ static void sync_print_fence(struct seq_file *s, struct sync_fence *fence)
>  		   sync_status_str(atomic_read(&fence->status)));
>  
>  	for (i = 0; i < fence->num_fences; ++i) {
> -		struct sync_pt *pt =
> -			container_of(fence->cbs[i].sync_pt,
> -				     struct sync_pt, base);
> -
> -		sync_print_pt(s, pt, true);
> +		sync_print_pt(s, fence->cbs[i].sync_pt, true);
>  	}
>  
>  	spin_lock_irqsave(&fence->wq.lock, flags);
> 

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 03/13] staging/android/sync: Move sync framework out of staging
  2015-12-11 13:11 ` [PATCH 03/13] staging/android/sync: Move sync framework out of staging John.C.Harrison
@ 2015-12-17 17:35   ` Jesse Barnes
  2015-12-21 10:03     ` Daniel Vetter
  0 siblings, 1 reply; 74+ messages in thread
From: Jesse Barnes @ 2015-12-17 17:35 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The sync framework is now used by the i915 driver. Therefore it can be
> moved out of staging and into the regular tree. Also, the public
> interfaces can actually be made public and exported.
> 
> v3: New patch for series.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Geoff Miller <geoff.miller@intel.com>
> ---
>  drivers/android/Kconfig                |  28 ++
>  drivers/android/Makefile               |   2 +
>  drivers/android/sw_sync.c              | 260 ++++++++++++
>  drivers/android/sw_sync.h              |  59 +++
>  drivers/android/sync.c                 | 734 +++++++++++++++++++++++++++++++++
>  drivers/android/sync.h                 | 366 ++++++++++++++++
>  drivers/android/sync_debug.c           | 256 ++++++++++++
>  drivers/android/trace/sync.h           |  82 ++++
>  drivers/staging/android/Kconfig        |  28 --
>  drivers/staging/android/Makefile       |   2 -
>  drivers/staging/android/sw_sync.c      | 260 ------------
>  drivers/staging/android/sw_sync.h      |  59 ---
>  drivers/staging/android/sync.c         | 734 ---------------------------------
>  drivers/staging/android/sync.h         | 366 ----------------
>  drivers/staging/android/sync_debug.c   | 256 ------------
>  drivers/staging/android/trace/sync.h   |  82 ----
>  drivers/staging/android/uapi/sw_sync.h |  32 --
>  drivers/staging/android/uapi/sync.h    |  97 -----
>  include/uapi/Kbuild                    |   1 +
>  include/uapi/sync/Kbuild               |   3 +
>  include/uapi/sync/sw_sync.h            |  32 ++
>  include/uapi/sync/sync.h               |  97 +++++
>  22 files changed, 1920 insertions(+), 1916 deletions(-)
>  create mode 100644 drivers/android/sw_sync.c
>  create mode 100644 drivers/android/sw_sync.h
>  create mode 100644 drivers/android/sync.c
>  create mode 100644 drivers/android/sync.h
>  create mode 100644 drivers/android/sync_debug.c
>  create mode 100644 drivers/android/trace/sync.h
>  delete mode 100644 drivers/staging/android/sw_sync.c
>  delete mode 100644 drivers/staging/android/sw_sync.h
>  delete mode 100644 drivers/staging/android/sync.c
>  delete mode 100644 drivers/staging/android/sync.h
>  delete mode 100644 drivers/staging/android/sync_debug.c
>  delete mode 100644 drivers/staging/android/trace/sync.h
>  delete mode 100644 drivers/staging/android/uapi/sw_sync.h
>  delete mode 100644 drivers/staging/android/uapi/sync.h
>  create mode 100644 include/uapi/sync/Kbuild
>  create mode 100644 include/uapi/sync/sw_sync.h
>  create mode 100644 include/uapi/sync/sync.h
> 
> diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
> index bdfc6c6..9edcd8f 100644
> --- a/drivers/android/Kconfig
> +++ b/drivers/android/Kconfig
> @@ -32,6 +32,34 @@ config ANDROID_BINDER_IPC_32BIT
>  
>  	  Note that enabling this will break newer Android user-space.
>  
> +config SYNC
> +	bool "Synchronization framework"
> +	default n
> +	select ANON_INODES
> +	select DMA_SHARED_BUFFER
> +	---help---
> +	  This option enables the framework for synchronization between multiple
> +	  drivers.  Sync implementations can take advantage of hardware
> +	  synchronization built into devices like GPUs.
> +
> +config SW_SYNC
> +	bool "Software synchronization objects"
> +	default n
> +	depends on SYNC
> +	---help---
> +	  A sync object driver that uses a 32bit counter to coordinate
> +	  synchronization.  Useful when there is no hardware primitive backing
> +	  the synchronization.
> +
> +config SW_SYNC_USER
> +	bool "Userspace API for SW_SYNC"
> +	default n
> +	depends on SW_SYNC
> +	---help---
> +	  Provides a user space API to the sw sync object.
> +	  *WARNING* improper use of this can result in deadlocking kernel
> +	  drivers from userspace.
> +
>  endif # if ANDROID

IIRC we wanted to drop the user ABI altogether?  I think we can de-stage this even before we push the new ABI on the i915 side to expose the sync points (since we'll need an open source userspace for that), and any changes/cleanups can happen outside of staging.

Thanks,
Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 04/13] android/sync: Improved debug dump to dmesg
  2015-12-11 13:11 ` [PATCH 04/13] android/sync: Improved debug dump to dmesg John.C.Harrison
@ 2015-12-17 17:36   ` Jesse Barnes
  0 siblings, 0 replies; 74+ messages in thread
From: Jesse Barnes @ 2015-12-17 17:36 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The sync code has a facility for dumping current state information via
> debugfs. It also has a way to re-use the same code for dumping to the
> kernel log on an internal error. However, the redirection was rather
> clunky and split the output across multiple prints at arbitrary
> boundaries. This made it difficult to read and could result in output
> from different sources being randomly interspersed.
> 
> This patch improves the redirection code to split the output on line
> feed boundaries instead. It also adds support for highlighting the
> offending fence object that caused the state dump in the first place.
> 
> v4: New patch in series.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/android/sync.c       |  9 ++++++--
>  drivers/android/sync.h       |  5 +++--
>  drivers/android/sync_debug.c | 50 ++++++++++++++++++++++++++++++++------------
>  3 files changed, 47 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/android/sync.c b/drivers/android/sync.c
> index 7f0e919..db4a54b 100644
> --- a/drivers/android/sync.c
> +++ b/drivers/android/sync.c
> @@ -86,6 +86,11 @@ static void sync_timeline_put(struct sync_timeline *obj)
>  
>  void sync_timeline_destroy(struct sync_timeline *obj)
>  {
> +	if (!list_empty(&obj->active_list_head)) {
> +		pr_info("destroying timeline with outstanding fences!\n");
> +		sync_dump_timeline(obj);
> +	}
> +
>  	obj->destroyed = true;
>  	/*
>  	 * Ensure timeline is marked as destroyed before
> @@ -397,7 +402,7 @@ int sync_fence_wait(struct sync_fence *fence, long timeout)
>  		if (timeout) {
>  			pr_info("fence timeout on [%p] after %dms\n", fence,
>  				jiffies_to_msecs(timeout));
> -			sync_dump();
> +			sync_dump(fence);
>  		}
>  		return -ETIME;
>  	}
> @@ -405,7 +410,7 @@ int sync_fence_wait(struct sync_fence *fence, long timeout)
>  	ret = atomic_read(&fence->status);
>  	if (ret) {
>  		pr_info("fence error %ld on [%p]\n", ret, fence);
> -		sync_dump();
> +		sync_dump(fence);
>  	}
>  	return ret;
>  }
> diff --git a/drivers/android/sync.h b/drivers/android/sync.h
> index 4ccff01..d57fa0a 100644
> --- a/drivers/android/sync.h
> +++ b/drivers/android/sync.h
> @@ -351,14 +351,15 @@ void sync_timeline_debug_add(struct sync_timeline *obj);
>  void sync_timeline_debug_remove(struct sync_timeline *obj);
>  void sync_fence_debug_add(struct sync_fence *fence);
>  void sync_fence_debug_remove(struct sync_fence *fence);
> -void sync_dump(void);
> +void sync_dump(struct sync_fence *fence);
> +void sync_dump_timeline(struct sync_timeline *timeline);
>  
>  #else
>  # define sync_timeline_debug_add(obj)
>  # define sync_timeline_debug_remove(obj)
>  # define sync_fence_debug_add(fence)
>  # define sync_fence_debug_remove(fence)
> -# define sync_dump()
> +# define sync_dump(fence)
>  #endif
>  int sync_fence_wake_up_wq(wait_queue_t *curr, unsigned mode,
>  				 int wake_flags, void *key);
> diff --git a/drivers/android/sync_debug.c b/drivers/android/sync_debug.c
> index f45d13c..9b87e0a 100644
> --- a/drivers/android/sync_debug.c
> +++ b/drivers/android/sync_debug.c
> @@ -229,28 +229,52 @@ late_initcall(sync_debugfs_init);
>  
>  #define DUMP_CHUNK 256
>  static char sync_dump_buf[64 * 1024];
> -void sync_dump(void)
> +
> +static void sync_dump_dfs(struct seq_file *s, void *targetPtr)
> +{
> +	char *start, *end;
> +	char targetStr[100];
> +
> +	if (targetPtr)
> +		snprintf(targetStr, sizeof(targetStr) - 1, "%p", targetPtr);
> +
> +	start = end = s->buf;
> +	while( (end = strchr(end, '\n'))) {
> +		*end = 0;
> +		if (targetPtr && strstr(start, targetStr))
> +			pr_info("*** %s ***\n", start);
> +		else
> +			pr_info("%s\n", start);
> +		start = ++end;
> +	}
> +
> +	if ((start - s->buf) < s->count)
> +		pr_info("%d vs %d: >?>%s<?<\n", (uint32_t) (start - s->buf), (uint32_t) s->count, start);
> +}
> +
> +void sync_dump(struct sync_fence *targetPtr)
>  {
>  	struct seq_file s = {
>  		.buf = sync_dump_buf,
>  		.size = sizeof(sync_dump_buf) - 1,
>  	};
> -	int i;
>  
>  	sync_debugfs_show(&s, NULL);
>  
> -	for (i = 0; i < s.count; i += DUMP_CHUNK) {
> -		if ((s.count - i) > DUMP_CHUNK) {
> -			char c = s.buf[i + DUMP_CHUNK];
> +	sync_dump_dfs(&s, targetPtr);
> +}
>  
> -			s.buf[i + DUMP_CHUNK] = 0;
> -			pr_cont("%s", s.buf + i);
> -			s.buf[i + DUMP_CHUNK] = c;
> -		} else {
> -			s.buf[s.count] = 0;
> -			pr_cont("%s", s.buf + i);
> -		}
> -	}
> +void sync_dump_timeline(struct sync_timeline *timeline)
> +{
> +	struct seq_file s = {
> +		.buf = sync_dump_buf,
> +		.size = sizeof(sync_dump_buf) - 1,
> +	};
> +
> +	pr_info("timeline: %p\n", timeline);
> +	sync_print_obj(&s, timeline);
> +
> +	sync_dump_dfs(&s, NULL);
>  }
>  
>  #endif
> 

I guess the Android guys might have feedback here, but it seems fine to me.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/13] drm/i915: Convert requests to use struct fence
  2015-12-11 13:11 ` [PATCH 05/13] drm/i915: Convert requests to use struct fence John.C.Harrison
@ 2015-12-17 17:43   ` Jesse Barnes
  2016-01-04 17:20     ` Jesse Barnes
  0 siblings, 1 reply; 74+ messages in thread
From: Jesse Barnes @ 2015-12-17 17:43 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> There is a construct in the linux kernel called 'struct fence' that is
> intended to keep track of work that is executed on hardware. I.e. it
> solves the basic problem that the drivers 'struct
> drm_i915_gem_request' is trying to address. The request structure does
> quite a lot more than simply track the execution progress so is very
> definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain
> all the advantages that provides.
> 
> This patch makes the first step of integrating a struct fence into the
> request. It replaces the explicit reference count with that of the
> fence. It also replaces the 'is completed' test with the fence's
> equivalent. Currently, that simply chains on to the original request
> implementation. A future patch will improve this.
> 
> v3: Updated after review comments by Tvrtko Ursulin. Added fence
> context/seqno pair to the debugfs request info. Renamed fence 'driver
> name' to just 'i915'. Removed BUG_ONs.
> 
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  5 +--
>  drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++-------------
>  drivers/gpu/drm/i915/i915_gem.c         | 56 ++++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>  6 files changed, 81 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 7415606..5b31186 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>  			task = NULL;
>  			if (req->pid)
>  				task = pid_task(req->pid, PIDTYPE_PID);
> -			seq_printf(m, "    %x @ %d: %s [%d]\n",
> +			seq_printf(m, "    %x @ %d: %s [%d], fence = %u.%u\n",
>  				   req->seqno,
>  				   (int) (jiffies - req->emitted_jiffies),
>  				   task ? task->comm : "<unknown>",
> -				   task ? task->pid : -1);
> +				   task ? task->pid : -1,
> +				   req->fence.context, req->fence.seqno);
>  			rcu_read_unlock();
>  		}
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 436149e..aa5cba7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -51,6 +51,7 @@
>  #include <linux/kref.h>
>  #include <linux/pm_qos.h>
>  #include "intel_guc.h"
> +#include <linux/fence.h>
>  
>  /* General customization:
>   */
> @@ -2174,7 +2175,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>   * initial reference taken using kref_init
>   */
>  struct drm_i915_gem_request {
> -	struct kref ref;
> +	/**
> +	 * Underlying object for implementing the signal/wait stuff.
> +	 * NB: Never call fence_later() or return this fence object to user
> +	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
> +	 * etc., there is no guarantee at all about the validity or
> +	 * sequentiality of the fence's seqno! It is also unsafe to let
> +	 * anything outside of the i915 driver get hold of the fence object
> +	 * as the clean up when decrementing the reference count requires
> +	 * holding the driver mutex lock.
> +	 */
> +	struct fence fence;
>  
>  	/** On Which ring this request was generated */
>  	struct drm_i915_private *i915;
> @@ -2251,7 +2262,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct intel_context *ctx,
>  			   struct drm_i915_gem_request **req_out);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> -void i915_gem_request_free(struct kref *req_ref);
> +
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> +					      bool lazy_coherency)
> +{
> +	return fence_is_signaled(&req->fence);
> +}
> +
>  int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>  				   struct drm_file *file);
>  
> @@ -2271,7 +2288,7 @@ static inline struct drm_i915_gem_request *
>  i915_gem_request_reference(struct drm_i915_gem_request *req)
>  {
>  	if (req)
> -		kref_get(&req->ref);
> +		fence_get(&req->fence);
>  	return req;
>  }
>  
> @@ -2279,7 +2296,7 @@ static inline void
>  i915_gem_request_unreference(struct drm_i915_gem_request *req)
>  {
>  	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> -	kref_put(&req->ref, i915_gem_request_free);
> +	fence_put(&req->fence);
>  }
>  
>  static inline void
> @@ -2291,7 +2308,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>  		return;
>  
>  	dev = req->ring->dev;
> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> +	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>  		mutex_unlock(&dev->struct_mutex);
>  }
>  
> @@ -2308,12 +2325,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>  }
>  
>  /*
> - * XXX: i915_gem_request_completed should be here but currently needs the
> - * definition of i915_seqno_passed() which is below. It will be moved in
> - * a later patch when the call to i915_seqno_passed() is obsoleted...
> - */
> -
> -/*
>   * A command that requires special handling by the command parser.
>   */
>  struct drm_i915_cmd_descriptor {
> @@ -2916,18 +2927,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>  	return (int32_t)(seq1 - seq2) >= 0;
>  }
>  
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> -{
> -	u32 seqno;
> -
> -	BUG_ON(req == NULL);
> -
> -	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
> -
> -	return i915_seqno_passed(seqno, req->seqno);
> -}
> -
>  int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>  int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e4056a3..a1b4dbd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2617,12 +2617,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>  	}
>  }
>  
> -void i915_gem_request_free(struct kref *req_ref)
> +static void i915_gem_request_free(struct fence *req_fence)
>  {
> -	struct drm_i915_gem_request *req = container_of(req_ref,
> -						 typeof(*req), ref);
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
>  	struct intel_context *ctx = req->ctx;
>  
> +	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> +
>  	if (req->file_priv)
>  		i915_gem_request_remove_from_client(req);
>  
> @@ -2638,6 +2640,45 @@ void i915_gem_request_free(struct kref *req_ref)
>  	kmem_cache_free(req->i915->requests, req);
>  }
>  
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +{
> +	/* Interrupt driven fences are not implemented yet.*/
> +	WARN(true, "This should not be called!");
> +	return true;
> +}
> +
> +static bool i915_gem_request_is_completed(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	u32 seqno;
> +
> +	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
> +
> +	return i915_seqno_passed(seqno, req->seqno);
> +}
> +
> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
> +{
> +	return "i915";
> +}
> +
> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	return req->ring->name;
> +}
> +
> +static const struct fence_ops i915_gem_request_fops = {
> +	.enable_signaling	= i915_gem_request_enable_signaling,
> +	.signaled		= i915_gem_request_is_completed,
> +	.wait			= fence_default_wait,
> +	.release		= i915_gem_request_free,
> +	.get_driver_name	= i915_gem_request_get_driver_name,
> +	.get_timeline_name	= i915_gem_request_get_timeline_name,
> +};
> +
>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct intel_context *ctx,
>  			   struct drm_i915_gem_request **req_out)
> @@ -2659,7 +2700,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  	if (ret)
>  		goto err;
>  
> -	kref_init(&req->ref);
>  	req->i915 = dev_priv;
>  	req->ring = ring;
>  	req->ctx  = ctx;
> @@ -2674,6 +2714,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  		goto err;
>  	}
>  
> +	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
> +
>  	/*
>  	 * Reserve space in the ring buffer for all the commands required to
>  	 * eventually emit this request. This is to guarantee that the
> @@ -4723,7 +4765,7 @@ i915_gem_init_hw(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_engine_cs *ring;
> -	int ret, i, j;
> +	int ret, i, j, fence_base;
>  
>  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>  		return -EIO;
> @@ -4793,12 +4835,16 @@ i915_gem_init_hw(struct drm_device *dev)
>  	if (ret)
>  		goto out;
>  
> +	fence_base = fence_context_alloc(I915_NUM_RINGS);
> +
>  	/* Now it is safe to go back round and do everything else: */
>  	for_each_ring(ring, dev_priv, i) {
>  		struct drm_i915_gem_request *req;
>  
>  		WARN_ON(!ring->default_context);
>  
> +		ring->fence_context = fence_base + i;
> +
>  		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
>  		if (ret) {
>  			i915_gem_cleanup_ringbuffer(dev);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 06180dc..b8c8f9b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1920,6 +1920,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  	ring->dev = dev;
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
> +	spin_lock_init(&ring->fence_lock);
>  	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>  	init_waitqueue_head(&ring->irq_queue);
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index c9b081f..f4a6403 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2158,6 +2158,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	INIT_LIST_HEAD(&ring->request_list);
>  	INIT_LIST_HEAD(&ring->execlist_queue);
>  	INIT_LIST_HEAD(&ring->buffers);
> +	spin_lock_init(&ring->fence_lock);
>  	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>  	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 58b1976..4547645 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -348,6 +348,9 @@ struct  intel_engine_cs {
>  	 * to encode the command length in the header).
>  	 */
>  	u32 (*get_cmd_length_mask)(u32 cmd_header);
> +
> +	unsigned fence_context;
> +	spinlock_t fence_lock;
>  };
>  
>  bool intel_ring_initialized(struct intel_engine_cs *ring);
> 

Chris has an equivalent patch that does a little more (interrupt driven waits, custom i915 wait function, etc).  Can you review that instead assuming it's sufficient?

http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=breadcrumbs&id=f062e706740d87befb8e7cd7ea337f98f0b24f52

Thanks,
Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 07/13] drm/i915: Add per context timelines to fence object
  2015-12-11 13:11 ` [PATCH 07/13] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2015-12-17 17:49   ` Jesse Barnes
  2015-12-21 10:16     ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Jesse Barnes @ 2015-12-17 17:49 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The fence object used inside the request structure requires a sequence
> number. Although this is not used by the i915 driver itself, it could
> potentially be used by non-i915 code if the fence is passed outside of
> the driver. This is the intention as it allows external kernel drivers
> and user applications to wait on batch buffer completion
> asynchronously via the dma-buff fence API.
> 
> To ensure that such external users are not confused by strange things
> happening with the seqno, this patch adds in a per context timeline
> that can provide a guaranteed in-order seqno value for the fence. This
> is safe because the scheduler will not re-order batch buffers within a
> context - they are considered to be mutually dependent.
> 
> v2: New patch in series.
> 
> v3: Renamed/retyped timeline structure fields after review comments by
> Tvrtko Ursulin.
> 
> Added context information to the timeline's name string for better
> identification in debugfs output.
> 
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 25 ++++++++---
>  drivers/gpu/drm/i915/i915_gem.c         | 80 +++++++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/i915_gem_context.c | 15 ++++++-
>  drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>  5 files changed, 111 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index caf7897..7d6a7c0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -841,6 +841,15 @@ struct i915_ctx_hang_stats {
>  	bool banned;
>  };
>  
> +struct i915_fence_timeline {
> +	char        name[32];
> +	unsigned    fence_context;
> +	unsigned    next;
> +
> +	struct intel_context *ctx;
> +	struct intel_engine_cs *ring;
> +};
> +
>  /* This must match up with the value previously used for execbuf2.rsvd1. */
>  #define DEFAULT_CONTEXT_HANDLE 0
>  
> @@ -885,6 +894,7 @@ struct intel_context {
>  		struct drm_i915_gem_object *state;
>  		struct intel_ringbuffer *ringbuf;
>  		int pin_count;
> +		struct i915_fence_timeline fence_timeline;
>  	} engine[I915_NUM_RINGS];
>  
>  	struct list_head link;
> @@ -2177,13 +2187,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>  struct drm_i915_gem_request {
>  	/**
>  	 * Underlying object for implementing the signal/wait stuff.
> -	 * NB: Never call fence_later() or return this fence object to user
> -	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
> -	 * etc., there is no guarantee at all about the validity or
> -	 * sequentiality of the fence's seqno! It is also unsafe to let
> -	 * anything outside of the i915 driver get hold of the fence object
> -	 * as the clean up when decrementing the reference count requires
> -	 * holding the driver mutex lock.
> +	 * NB: Never return this fence object to user land! It is unsafe to
> +	 * let anything outside of the i915 driver get hold of the fence
> +	 * object as the clean up when decrementing the reference count
> +	 * requires holding the driver mutex lock.
>  	 */
>  	struct fence fence;
>  
> @@ -2263,6 +2270,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct drm_i915_gem_request **req_out);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>  
> +int i915_create_fence_timeline(struct drm_device *dev,
> +			       struct intel_context *ctx,
> +			       struct intel_engine_cs *ring);
> +
>  static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>  {
>  	return fence_is_signaled(&req->fence);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0801738..7a37fb7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2665,9 +2665,32 @@ static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>  
>  static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
>  {
> -	struct drm_i915_gem_request *req = container_of(req_fence,
> -						 typeof(*req), fence);
> -	return req->ring->name;
> +	struct drm_i915_gem_request *req;
> +	struct i915_fence_timeline *timeline;
> +
> +	req = container_of(req_fence, typeof(*req), fence);
> +	timeline = &req->ctx->engine[req->ring->id].fence_timeline;
> +
> +	return timeline->name;
> +}
> +
> +static void i915_gem_request_timeline_value_str(struct fence *req_fence, char *str, int size)
> +{
> +	struct drm_i915_gem_request *req;
> +
> +	req = container_of(req_fence, typeof(*req), fence);
> +
> +	/* Last signalled timeline value ??? */
> +	snprintf(str, size, "? [%d]"/*, timeline->value*/, req->ring->get_seqno(req->ring, true));
> +}
> +
> +static void i915_gem_request_fence_value_str(struct fence *req_fence, char *str, int size)
> +{
> +	struct drm_i915_gem_request *req;
> +
> +	req = container_of(req_fence, typeof(*req), fence);
> +
> +	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
>  }
>  
>  static const struct fence_ops i915_gem_request_fops = {
> @@ -2677,8 +2700,49 @@ static const struct fence_ops i915_gem_request_fops = {
>  	.release		= i915_gem_request_free,
>  	.get_driver_name	= i915_gem_request_get_driver_name,
>  	.get_timeline_name	= i915_gem_request_get_timeline_name,
> +	.fence_value_str	= i915_gem_request_fence_value_str,
> +	.timeline_value_str	= i915_gem_request_timeline_value_str,
>  };
>  
> +int i915_create_fence_timeline(struct drm_device *dev,
> +			       struct intel_context *ctx,
> +			       struct intel_engine_cs *ring)
> +{
> +	struct i915_fence_timeline *timeline;
> +
> +	timeline = &ctx->engine[ring->id].fence_timeline;
> +
> +	if (timeline->ring)
> +		return 0;
> +
> +	timeline->fence_context = fence_context_alloc(1);
> +
> +	/*
> +	 * Start the timeline from seqno 0 as this is a special value
> +	 * that is reserved for invalid sync points.
> +	 */
> +	timeline->next       = 1;
> +	timeline->ctx        = ctx;
> +	timeline->ring       = ring;
> +
> +	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d", timeline->fence_context, ring->name, ctx->user_handle);
> +
> +	return 0;
> +}
> +
> +static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
> +{
> +	unsigned seqno;
> +
> +	seqno = timeline->next;
> +
> +	/* Reserve zero for invalid */
> +	if (++timeline->next == 0 )
> +		timeline->next = 1;
> +
> +	return seqno;
> +}
> +
>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct intel_context *ctx,
>  			   struct drm_i915_gem_request **req_out)
> @@ -2714,7 +2778,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  		goto err;
>  	}
>  
> -	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
> +	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
> +		   ctx->engine[ring->id].fence_timeline.fence_context,
> +		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
>  
>  	/*
>  	 * Reserve space in the ring buffer for all the commands required to
> @@ -4765,7 +4831,7 @@ i915_gem_init_hw(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_engine_cs *ring;
> -	int ret, i, j, fence_base;
> +	int ret, i, j;
>  
>  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>  		return -EIO;
> @@ -4835,16 +4901,12 @@ i915_gem_init_hw(struct drm_device *dev)
>  	if (ret)
>  		goto out;
>  
> -	fence_base = fence_context_alloc(I915_NUM_RINGS);
> -
>  	/* Now it is safe to go back round and do everything else: */
>  	for_each_ring(ring, dev_priv, i) {
>  		struct drm_i915_gem_request *req;
>  
>  		WARN_ON(!ring->default_context);
>  
> -		ring->fence_context = fence_base + i;
> -
>  		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
>  		if (ret) {
>  			i915_gem_cleanup_ringbuffer(dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 43b1c73..2798ddc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -266,7 +266,7 @@ i915_gem_create_context(struct drm_device *dev,
>  {
>  	const bool is_global_default_ctx = file_priv == NULL;
>  	struct intel_context *ctx;
> -	int ret = 0;
> +	int i, ret = 0;
>  
>  	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>  
> @@ -274,6 +274,19 @@ i915_gem_create_context(struct drm_device *dev,
>  	if (IS_ERR(ctx))
>  		return ctx;
>  
> +	if (!i915.enable_execlists) {
> +		struct intel_engine_cs *ring;
> +
> +		/* Create a per context timeline for fences */
> +		for_each_ring(ring, to_i915(dev), i) {
> +			ret = i915_create_fence_timeline(dev, ctx, ring);
> +			if (ret) {
> +				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n", ring->name, ctx);
> +				goto err_destroy;
> +			}
> +		}
> +	}
> +
>  	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
>  		/* We may need to do things with the shrinker which
>  		 * require us to immediately switch back to the default
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b8c8f9b..2b56651 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2489,6 +2489,14 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
>  		goto error_ringbuf;
>  	}
>  
> +	/* Create a per context timeline for fences */
> +	ret = i915_create_fence_timeline(dev, ctx, ring);
> +	if (ret) {
> +		DRM_ERROR("Fence timeline creation failed for ring %s, ctx %p\n",
> +			  ring->name, ctx);
> +		goto error_ringbuf;
> +	}
> +
>  	ctx->engine[ring->id].ringbuf = ringbuf;
>  	ctx->engine[ring->id].state = ctx_obj;
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 4547645..356b6a8 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -349,7 +349,6 @@ struct  intel_engine_cs {
>  	 */
>  	u32 (*get_cmd_length_mask)(u32 cmd_header);
>  
> -	unsigned fence_context;
>  	spinlock_t fence_lock;
>  };
>  
> 

Yeah we definitely want this, but it'll have to be reconciled with the different request->fence patches.  I'm not sure if it would be easier to move to per-context seqnos first or go this route and deal with the mismatch between global and per-ctx.

Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 03/13] staging/android/sync: Move sync framework out of staging
  2015-12-17 17:35   ` Jesse Barnes
@ 2015-12-21 10:03     ` Daniel Vetter
  2015-12-21 14:20       ` John Harrison
  0 siblings, 1 reply; 74+ messages in thread
From: Daniel Vetter @ 2015-12-21 10:03 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Thu, Dec 17, 2015 at 09:35:03AM -0800, Jesse Barnes wrote:
> On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The sync framework is now used by the i915 driver. Therefore it can be
> > moved out of staging and into the regular tree. Also, the public
> > interfaces can actually be made public and exported.
> > 
> > v3: New patch for series.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Geoff Miller <geoff.miller@intel.com>
> > ---
> >  drivers/android/Kconfig                |  28 ++
> >  drivers/android/Makefile               |   2 +
> >  drivers/android/sw_sync.c              | 260 ++++++++++++
> >  drivers/android/sw_sync.h              |  59 +++
> >  drivers/android/sync.c                 | 734 +++++++++++++++++++++++++++++++++
> >  drivers/android/sync.h                 | 366 ++++++++++++++++
> >  drivers/android/sync_debug.c           | 256 ++++++++++++
> >  drivers/android/trace/sync.h           |  82 ++++
> >  drivers/staging/android/Kconfig        |  28 --
> >  drivers/staging/android/Makefile       |   2 -
> >  drivers/staging/android/sw_sync.c      | 260 ------------
> >  drivers/staging/android/sw_sync.h      |  59 ---
> >  drivers/staging/android/sync.c         | 734 ---------------------------------
> >  drivers/staging/android/sync.h         | 366 ----------------
> >  drivers/staging/android/sync_debug.c   | 256 ------------
> >  drivers/staging/android/trace/sync.h   |  82 ----
> >  drivers/staging/android/uapi/sw_sync.h |  32 --
> >  drivers/staging/android/uapi/sync.h    |  97 -----
> >  include/uapi/Kbuild                    |   1 +
> >  include/uapi/sync/Kbuild               |   3 +
> >  include/uapi/sync/sw_sync.h            |  32 ++
> >  include/uapi/sync/sync.h               |  97 +++++
> >  22 files changed, 1920 insertions(+), 1916 deletions(-)
> >  create mode 100644 drivers/android/sw_sync.c
> >  create mode 100644 drivers/android/sw_sync.h
> >  create mode 100644 drivers/android/sync.c
> >  create mode 100644 drivers/android/sync.h
> >  create mode 100644 drivers/android/sync_debug.c
> >  create mode 100644 drivers/android/trace/sync.h
> >  delete mode 100644 drivers/staging/android/sw_sync.c
> >  delete mode 100644 drivers/staging/android/sw_sync.h
> >  delete mode 100644 drivers/staging/android/sync.c
> >  delete mode 100644 drivers/staging/android/sync.h
> >  delete mode 100644 drivers/staging/android/sync_debug.c
> >  delete mode 100644 drivers/staging/android/trace/sync.h
> >  delete mode 100644 drivers/staging/android/uapi/sw_sync.h
> >  delete mode 100644 drivers/staging/android/uapi/sync.h
> >  create mode 100644 include/uapi/sync/Kbuild
> >  create mode 100644 include/uapi/sync/sw_sync.h
> >  create mode 100644 include/uapi/sync/sync.h
> > 
> > diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
> > index bdfc6c6..9edcd8f 100644
> > --- a/drivers/android/Kconfig
> > +++ b/drivers/android/Kconfig
> > @@ -32,6 +32,34 @@ config ANDROID_BINDER_IPC_32BIT
> >  
> >  	  Note that enabling this will break newer Android user-space.
> >  
> > +config SYNC
> > +	bool "Synchronization framework"
> > +	default n
> > +	select ANON_INODES
> > +	select DMA_SHARED_BUFFER
> > +	---help---
> > +	  This option enables the framework for synchronization between multiple
> > +	  drivers.  Sync implementations can take advantage of hardware
> > +	  synchronization built into devices like GPUs.
> > +
> > +config SW_SYNC
> > +	bool "Software synchronization objects"
> > +	default n
> > +	depends on SYNC
> > +	---help---
> > +	  A sync object driver that uses a 32bit counter to coordinate
> > +	  synchronization.  Useful when there is no hardware primitive backing
> > +	  the synchronization.
> > +
> > +config SW_SYNC_USER
> > +	bool "Userspace API for SW_SYNC"
> > +	default n
> > +	depends on SW_SYNC
> > +	---help---
> > +	  Provides a user space API to the sw sync object.
> > +	  *WARNING* improper use of this can result in deadlocking kernel
> > +	  drivers from userspace.
> > +
> >  endif # if ANDROID
> 
> IIRC we wanted to drop the user ABI altogether?  I think we can de-stage
> this even before we push the new ABI on the i915 side to expose the sync
> points (since we'll need an open source userspace for that), and any
> changes/cleanups can happen outside of staging.

Just a head-up: Gustavo Padovan from Collabora is working to de-stage all
the syncpt stuff. Greg KH merged the TODO update for that work for 4.5,
which covers consensus (including ack from Google's Greg Hackmann on the
plan). Given that I think it'd be best to freeload on that effort. But
that means we need to push all the android/syncpt stuff down in the
series. I hope that works, and there's no functional depencies in the
fence conversion/scheduler core?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 07/13] drm/i915: Add per context timelines to fence object
  2015-12-17 17:49   ` Jesse Barnes
@ 2015-12-21 10:16     ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2015-12-21 10:16 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Thu, Dec 17, 2015 at 09:49:27AM -0800, Jesse Barnes wrote:
> Yeah we definitely want this, but it'll have to be reconciled with the different request->fence patches.  I'm not sure if it would be easier to move to per-context seqnos first or go this route and deal with the mismatch between global and per-ctx.

This patch doesn't do independent per-context seqno, so the point is moot.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 03/13] staging/android/sync: Move sync framework out of staging
  2015-12-21 10:03     ` Daniel Vetter
@ 2015-12-21 14:20       ` John Harrison
  2015-12-21 15:46         ` Daniel Vetter
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2015-12-21 14:20 UTC (permalink / raw)
  To: Daniel Vetter, Jesse Barnes; +Cc: Intel-GFX

On 21/12/2015 10:03, Daniel Vetter wrote:
> On Thu, Dec 17, 2015 at 09:35:03AM -0800, Jesse Barnes wrote:
>> On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The sync framework is now used by the i915 driver. Therefore it can be
>>> moved out of staging and into the regular tree. Also, the public
>>> interfaces can actually be made public and exported.
>>>
>>> v3: New patch for series.
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Geoff Miller <geoff.miller@intel.com>
>>> ---
>>>   drivers/android/Kconfig                |  28 ++
>>>   drivers/android/Makefile               |   2 +
>>>   drivers/android/sw_sync.c              | 260 ++++++++++++
>>>   drivers/android/sw_sync.h              |  59 +++
>>>   drivers/android/sync.c                 | 734 +++++++++++++++++++++++++++++++++
>>>   drivers/android/sync.h                 | 366 ++++++++++++++++
>>>   drivers/android/sync_debug.c           | 256 ++++++++++++
>>>   drivers/android/trace/sync.h           |  82 ++++
>>>   drivers/staging/android/Kconfig        |  28 --
>>>   drivers/staging/android/Makefile       |   2 -
>>>   drivers/staging/android/sw_sync.c      | 260 ------------
>>>   drivers/staging/android/sw_sync.h      |  59 ---
>>>   drivers/staging/android/sync.c         | 734 ---------------------------------
>>>   drivers/staging/android/sync.h         | 366 ----------------
>>>   drivers/staging/android/sync_debug.c   | 256 ------------
>>>   drivers/staging/android/trace/sync.h   |  82 ----
>>>   drivers/staging/android/uapi/sw_sync.h |  32 --
>>>   drivers/staging/android/uapi/sync.h    |  97 -----
>>>   include/uapi/Kbuild                    |   1 +
>>>   include/uapi/sync/Kbuild               |   3 +
>>>   include/uapi/sync/sw_sync.h            |  32 ++
>>>   include/uapi/sync/sync.h               |  97 +++++
>>>   22 files changed, 1920 insertions(+), 1916 deletions(-)
>>>   create mode 100644 drivers/android/sw_sync.c
>>>   create mode 100644 drivers/android/sw_sync.h
>>>   create mode 100644 drivers/android/sync.c
>>>   create mode 100644 drivers/android/sync.h
>>>   create mode 100644 drivers/android/sync_debug.c
>>>   create mode 100644 drivers/android/trace/sync.h
>>>   delete mode 100644 drivers/staging/android/sw_sync.c
>>>   delete mode 100644 drivers/staging/android/sw_sync.h
>>>   delete mode 100644 drivers/staging/android/sync.c
>>>   delete mode 100644 drivers/staging/android/sync.h
>>>   delete mode 100644 drivers/staging/android/sync_debug.c
>>>   delete mode 100644 drivers/staging/android/trace/sync.h
>>>   delete mode 100644 drivers/staging/android/uapi/sw_sync.h
>>>   delete mode 100644 drivers/staging/android/uapi/sync.h
>>>   create mode 100644 include/uapi/sync/Kbuild
>>>   create mode 100644 include/uapi/sync/sw_sync.h
>>>   create mode 100644 include/uapi/sync/sync.h
>>>
>>> diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
>>> index bdfc6c6..9edcd8f 100644
>>> --- a/drivers/android/Kconfig
>>> +++ b/drivers/android/Kconfig
>>> @@ -32,6 +32,34 @@ config ANDROID_BINDER_IPC_32BIT
>>>   
>>>   	  Note that enabling this will break newer Android user-space.
>>>   
>>> +config SYNC
>>> +	bool "Synchronization framework"
>>> +	default n
>>> +	select ANON_INODES
>>> +	select DMA_SHARED_BUFFER
>>> +	---help---
>>> +	  This option enables the framework for synchronization between multiple
>>> +	  drivers.  Sync implementations can take advantage of hardware
>>> +	  synchronization built into devices like GPUs.
>>> +
>>> +config SW_SYNC
>>> +	bool "Software synchronization objects"
>>> +	default n
>>> +	depends on SYNC
>>> +	---help---
>>> +	  A sync object driver that uses a 32bit counter to coordinate
>>> +	  synchronization.  Useful when there is no hardware primitive backing
>>> +	  the synchronization.
>>> +
>>> +config SW_SYNC_USER
>>> +	bool "Userspace API for SW_SYNC"
>>> +	default n
>>> +	depends on SW_SYNC
>>> +	---help---
>>> +	  Provides a user space API to the sw sync object.
>>> +	  *WARNING* improper use of this can result in deadlocking kernel
>>> +	  drivers from userspace.
>>> +
>>>   endif # if ANDROID
>> IIRC we wanted to drop the user ABI altogether?  I think we can de-stage
>> this even before we push the new ABI on the i915 side to expose the sync
>> points (since we'll need an open source userspace for that), and any
>> changes/cleanups can happen outside of staging.
> Just a head-up: Gustavo Padovan from Collabora is working to de-stage all
> the syncpt stuff. Greg KH merged the TODO update for that work for 4.5,
> which covers consensus (including ack from Google's Greg Hackmann on the
> plan). Given that I think it'd be best to freeload on that effort. But
> that means we need to push all the android/syncpt stuff down in the
> series. I hope that works, and there's no functional depencies in the
> fence conversion/scheduler core?
>
> Thanks, Daniel

Do you have any idea on the timescale for their destaging? Is it 
definitely, definitely happening or still just a 'some point in the 
future it would be nice to...'? The Android driver certainly needs it 
and I believe Jesse's bufferless work will require it too. So sooner is 
greatly preferable to later.

As long as the reset of the fence conversion still goes in then it's not 
all that much effort to extract the sync code. It was all previously 
'#if CONFIG_SYNC' anyway so the system can definitely be made to work 
without it. There are two main patches that add it in, one in the fence 
series and another in the scheduler series. However, if you just drop 
those there might well be a big bunch of merge conflicts to resolve in 
the subsequent patches.

John.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 03/13] staging/android/sync: Move sync framework out of staging
  2015-12-21 14:20       ` John Harrison
@ 2015-12-21 15:46         ` Daniel Vetter
  2015-12-22 12:14           ` John Harrison
  0 siblings, 1 reply; 74+ messages in thread
From: Daniel Vetter @ 2015-12-21 15:46 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Mon, Dec 21, 2015 at 02:20:59PM +0000, John Harrison wrote:
> On 21/12/2015 10:03, Daniel Vetter wrote:
> >On Thu, Dec 17, 2015 at 09:35:03AM -0800, Jesse Barnes wrote:
> >>On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
> >>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>
> >>>The sync framework is now used by the i915 driver. Therefore it can be
> >>>moved out of staging and into the regular tree. Also, the public
> >>>interfaces can actually be made public and exported.
> >>>
> >>>v3: New patch for series.
> >>>
> >>>Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>Signed-off-by: Geoff Miller <geoff.miller@intel.com>
> >>>---
> >>>  drivers/android/Kconfig                |  28 ++
> >>>  drivers/android/Makefile               |   2 +
> >>>  drivers/android/sw_sync.c              | 260 ++++++++++++
> >>>  drivers/android/sw_sync.h              |  59 +++
> >>>  drivers/android/sync.c                 | 734 +++++++++++++++++++++++++++++++++
> >>>  drivers/android/sync.h                 | 366 ++++++++++++++++
> >>>  drivers/android/sync_debug.c           | 256 ++++++++++++
> >>>  drivers/android/trace/sync.h           |  82 ++++
> >>>  drivers/staging/android/Kconfig        |  28 --
> >>>  drivers/staging/android/Makefile       |   2 -
> >>>  drivers/staging/android/sw_sync.c      | 260 ------------
> >>>  drivers/staging/android/sw_sync.h      |  59 ---
> >>>  drivers/staging/android/sync.c         | 734 ---------------------------------
> >>>  drivers/staging/android/sync.h         | 366 ----------------
> >>>  drivers/staging/android/sync_debug.c   | 256 ------------
> >>>  drivers/staging/android/trace/sync.h   |  82 ----
> >>>  drivers/staging/android/uapi/sw_sync.h |  32 --
> >>>  drivers/staging/android/uapi/sync.h    |  97 -----
> >>>  include/uapi/Kbuild                    |   1 +
> >>>  include/uapi/sync/Kbuild               |   3 +
> >>>  include/uapi/sync/sw_sync.h            |  32 ++
> >>>  include/uapi/sync/sync.h               |  97 +++++
> >>>  22 files changed, 1920 insertions(+), 1916 deletions(-)
> >>>  create mode 100644 drivers/android/sw_sync.c
> >>>  create mode 100644 drivers/android/sw_sync.h
> >>>  create mode 100644 drivers/android/sync.c
> >>>  create mode 100644 drivers/android/sync.h
> >>>  create mode 100644 drivers/android/sync_debug.c
> >>>  create mode 100644 drivers/android/trace/sync.h
> >>>  delete mode 100644 drivers/staging/android/sw_sync.c
> >>>  delete mode 100644 drivers/staging/android/sw_sync.h
> >>>  delete mode 100644 drivers/staging/android/sync.c
> >>>  delete mode 100644 drivers/staging/android/sync.h
> >>>  delete mode 100644 drivers/staging/android/sync_debug.c
> >>>  delete mode 100644 drivers/staging/android/trace/sync.h
> >>>  delete mode 100644 drivers/staging/android/uapi/sw_sync.h
> >>>  delete mode 100644 drivers/staging/android/uapi/sync.h
> >>>  create mode 100644 include/uapi/sync/Kbuild
> >>>  create mode 100644 include/uapi/sync/sw_sync.h
> >>>  create mode 100644 include/uapi/sync/sync.h
> >>>
> >>>diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
> >>>index bdfc6c6..9edcd8f 100644
> >>>--- a/drivers/android/Kconfig
> >>>+++ b/drivers/android/Kconfig
> >>>@@ -32,6 +32,34 @@ config ANDROID_BINDER_IPC_32BIT
> >>>  	  Note that enabling this will break newer Android user-space.
> >>>+config SYNC
> >>>+	bool "Synchronization framework"
> >>>+	default n
> >>>+	select ANON_INODES
> >>>+	select DMA_SHARED_BUFFER
> >>>+	---help---
> >>>+	  This option enables the framework for synchronization between multiple
> >>>+	  drivers.  Sync implementations can take advantage of hardware
> >>>+	  synchronization built into devices like GPUs.
> >>>+
> >>>+config SW_SYNC
> >>>+	bool "Software synchronization objects"
> >>>+	default n
> >>>+	depends on SYNC
> >>>+	---help---
> >>>+	  A sync object driver that uses a 32bit counter to coordinate
> >>>+	  synchronization.  Useful when there is no hardware primitive backing
> >>>+	  the synchronization.
> >>>+
> >>>+config SW_SYNC_USER
> >>>+	bool "Userspace API for SW_SYNC"
> >>>+	default n
> >>>+	depends on SW_SYNC
> >>>+	---help---
> >>>+	  Provides a user space API to the sw sync object.
> >>>+	  *WARNING* improper use of this can result in deadlocking kernel
> >>>+	  drivers from userspace.
> >>>+
> >>>  endif # if ANDROID
> >>IIRC we wanted to drop the user ABI altogether?  I think we can de-stage
> >>this even before we push the new ABI on the i915 side to expose the sync
> >>points (since we'll need an open source userspace for that), and any
> >>changes/cleanups can happen outside of staging.
> >Just a head-up: Gustavo Padovan from Collabora is working to de-stage all
> >the syncpt stuff. Greg KH merged the TODO update for that work for 4.5,
> >which covers consensus (including ack from Google's Greg Hackmann on the
> >plan). Given that I think it'd be best to freeload on that effort. But
> >that means we need to push all the android/syncpt stuff down in the
> >series. I hope that works, and there's no functional depencies in the
> >fence conversion/scheduler core?
> >
> >Thanks, Daniel
> 
> Do you have any idea on the timescale for their destaging? Is it definitely,
> definitely happening or still just a 'some point in the future it would be
> nice to...'? The Android driver certainly needs it and I believe Jesse's
> bufferless work will require it too. So sooner is greatly preferable to
> later.

He's working on it, under a contract to make it happen. Afaik a bunch of
the work is done already, big bits left are cleaning up some of the
internals and then also doing some review on the ABI & test coverage.

> As long as the reset of the fence conversion still goes in then it's not all
> that much effort to extract the sync code. It was all previously '#if
> CONFIG_SYNC' anyway so the system can definitely be made to work without it.
> There are two main patches that add it in, one in the fence series and
> another in the scheduler series. However, if you just drop those there might
> well be a big bunch of merge conflicts to resolve in the subsequent patches.

Rebasing to avoid the depency would be good I think.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 03/13] staging/android/sync: Move sync framework out of staging
  2015-12-21 15:46         ` Daniel Vetter
@ 2015-12-22 12:14           ` John Harrison
  0 siblings, 0 replies; 74+ messages in thread
From: John Harrison @ 2015-12-22 12:14 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel-GFX

On 21/12/2015 15:46, Daniel Vetter wrote:
> On Mon, Dec 21, 2015 at 02:20:59PM +0000, John Harrison wrote:
>> On 21/12/2015 10:03, Daniel Vetter wrote:
>>> On Thu, Dec 17, 2015 at 09:35:03AM -0800, Jesse Barnes wrote:
>>>> On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> The sync framework is now used by the i915 driver. Therefore it can be
>>>>> moved out of staging and into the regular tree. Also, the public
>>>>> interfaces can actually be made public and exported.
>>>>>
>>>>> v3: New patch for series.
>>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> Signed-off-by: Geoff Miller <geoff.miller@intel.com>
>>>>> ---
>>>>>   drivers/android/Kconfig                |  28 ++
>>>>>   drivers/android/Makefile               |   2 +
>>>>>   drivers/android/sw_sync.c              | 260 ++++++++++++
>>>>>   drivers/android/sw_sync.h              |  59 +++
>>>>>   drivers/android/sync.c                 | 734 +++++++++++++++++++++++++++++++++
>>>>>   drivers/android/sync.h                 | 366 ++++++++++++++++
>>>>>   drivers/android/sync_debug.c           | 256 ++++++++++++
>>>>>   drivers/android/trace/sync.h           |  82 ++++
>>>>>   drivers/staging/android/Kconfig        |  28 --
>>>>>   drivers/staging/android/Makefile       |   2 -
>>>>>   drivers/staging/android/sw_sync.c      | 260 ------------
>>>>>   drivers/staging/android/sw_sync.h      |  59 ---
>>>>>   drivers/staging/android/sync.c         | 734 ---------------------------------
>>>>>   drivers/staging/android/sync.h         | 366 ----------------
>>>>>   drivers/staging/android/sync_debug.c   | 256 ------------
>>>>>   drivers/staging/android/trace/sync.h   |  82 ----
>>>>>   drivers/staging/android/uapi/sw_sync.h |  32 --
>>>>>   drivers/staging/android/uapi/sync.h    |  97 -----
>>>>>   include/uapi/Kbuild                    |   1 +
>>>>>   include/uapi/sync/Kbuild               |   3 +
>>>>>   include/uapi/sync/sw_sync.h            |  32 ++
>>>>>   include/uapi/sync/sync.h               |  97 +++++
>>>>>   22 files changed, 1920 insertions(+), 1916 deletions(-)
>>>>>   create mode 100644 drivers/android/sw_sync.c
>>>>>   create mode 100644 drivers/android/sw_sync.h
>>>>>   create mode 100644 drivers/android/sync.c
>>>>>   create mode 100644 drivers/android/sync.h
>>>>>   create mode 100644 drivers/android/sync_debug.c
>>>>>   create mode 100644 drivers/android/trace/sync.h
>>>>>   delete mode 100644 drivers/staging/android/sw_sync.c
>>>>>   delete mode 100644 drivers/staging/android/sw_sync.h
>>>>>   delete mode 100644 drivers/staging/android/sync.c
>>>>>   delete mode 100644 drivers/staging/android/sync.h
>>>>>   delete mode 100644 drivers/staging/android/sync_debug.c
>>>>>   delete mode 100644 drivers/staging/android/trace/sync.h
>>>>>   delete mode 100644 drivers/staging/android/uapi/sw_sync.h
>>>>>   delete mode 100644 drivers/staging/android/uapi/sync.h
>>>>>   create mode 100644 include/uapi/sync/Kbuild
>>>>>   create mode 100644 include/uapi/sync/sw_sync.h
>>>>>   create mode 100644 include/uapi/sync/sync.h
>>>>>
>>>>> diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
>>>>> index bdfc6c6..9edcd8f 100644
>>>>> --- a/drivers/android/Kconfig
>>>>> +++ b/drivers/android/Kconfig
>>>>> @@ -32,6 +32,34 @@ config ANDROID_BINDER_IPC_32BIT
>>>>>   	  Note that enabling this will break newer Android user-space.
>>>>> +config SYNC
>>>>> +	bool "Synchronization framework"
>>>>> +	default n
>>>>> +	select ANON_INODES
>>>>> +	select DMA_SHARED_BUFFER
>>>>> +	---help---
>>>>> +	  This option enables the framework for synchronization between multiple
>>>>> +	  drivers.  Sync implementations can take advantage of hardware
>>>>> +	  synchronization built into devices like GPUs.
>>>>> +
>>>>> +config SW_SYNC
>>>>> +	bool "Software synchronization objects"
>>>>> +	default n
>>>>> +	depends on SYNC
>>>>> +	---help---
>>>>> +	  A sync object driver that uses a 32bit counter to coordinate
>>>>> +	  synchronization.  Useful when there is no hardware primitive backing
>>>>> +	  the synchronization.
>>>>> +
>>>>> +config SW_SYNC_USER
>>>>> +	bool "Userspace API for SW_SYNC"
>>>>> +	default n
>>>>> +	depends on SW_SYNC
>>>>> +	---help---
>>>>> +	  Provides a user space API to the sw sync object.
>>>>> +	  *WARNING* improper use of this can result in deadlocking kernel
>>>>> +	  drivers from userspace.
>>>>> +
>>>>>   endif # if ANDROID
>>>> IIRC we wanted to drop the user ABI altogether?  I think we can de-stage
>>>> this even before we push the new ABI on the i915 side to expose the sync
>>>> points (since we'll need an open source userspace for that), and any
>>>> changes/cleanups can happen outside of staging.
>>> Just a head-up: Gustavo Padovan from Collabora is working to de-stage all
>>> the syncpt stuff. Greg KH merged the TODO update for that work for 4.5,
>>> which covers consensus (including ack from Google's Greg Hackmann on the
>>> plan). Given that I think it'd be best to freeload on that effort. But
>>> that means we need to push all the android/syncpt stuff down in the
>>> series. I hope that works, and there's no functional depencies in the
>>> fence conversion/scheduler core?
>>>
>>> Thanks, Daniel
>> Do you have any idea on the timescale for their destaging? Is it definitely,
>> definitely happening or still just a 'some point in the future it would be
>> nice to...'? The Android driver certainly needs it and I believe Jesse's
>> bufferless work will require it too. So sooner is greatly preferable to
>> later.
> He's working on it, under a contract to make it happen. Afaik a bunch of
> the work is done already, big bits left are cleaning up some of the
> internals and then also doing some review on the ABI & test coverage.
>
>> As long as the reset of the fence conversion still goes in then it's not all
>> that much effort to extract the sync code. It was all previously '#if
>> CONFIG_SYNC' anyway so the system can definitely be made to work without it.
>> There are two main patches that add it in, one in the fence series and
>> another in the scheduler series. However, if you just drop those there might
>> well be a big bunch of merge conflicts to resolve in the subsequent patches.
> Rebasing to avoid the depency would be good I think.
> -Daniel

Okay, I've got a bunch of changes to make right the way through anyway 
as I forgot to run the style checker. Too used to our Android process 
which does it automatically.

Will repost both the struct fence and the scheduler patches as new 
series with all the staging stuff left until the very end as an 
independent set.

Thanks,
John.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/13] drm/i915: Convert requests to use struct fence
  2015-12-17 17:43   ` Jesse Barnes
@ 2016-01-04 17:20     ` Jesse Barnes
  2016-01-04 20:57       ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Jesse Barnes @ 2016-01-04 17:20 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

On 12/17/2015 09:43 AM, Jesse Barnes wrote:
> On 12/11/2015 05:11 AM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> There is a construct in the linux kernel called 'struct fence' that is
>> intended to keep track of work that is executed on hardware. I.e. it
>> solves the basic problem that the drivers 'struct
>> drm_i915_gem_request' is trying to address. The request structure does
>> quite a lot more than simply track the execution progress so is very
>> definitely still required. However, the basic completion status side
>> could be updated to use the ready made fence implementation and gain
>> all the advantages that provides.
>>
>> This patch makes the first step of integrating a struct fence into the
>> request. It replaces the explicit reference count with that of the
>> fence. It also replaces the 'is completed' test with the fence's
>> equivalent. Currently, that simply chains on to the original request
>> implementation. A future patch will improve this.
>>
>> v3: Updated after review comments by Tvrtko Ursulin. Added fence
>> context/seqno pair to the debugfs request info. Renamed fence 'driver
>> name' to just 'i915'. Removed BUG_ONs.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_debugfs.c     |  5 +--
>>  drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++-------------
>>  drivers/gpu/drm/i915/i915_gem.c         | 56 ++++++++++++++++++++++++++++++---
>>  drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>>  drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>>  6 files changed, 81 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 7415606..5b31186 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>>  			task = NULL;
>>  			if (req->pid)
>>  				task = pid_task(req->pid, PIDTYPE_PID);
>> -			seq_printf(m, "    %x @ %d: %s [%d]\n",
>> +			seq_printf(m, "    %x @ %d: %s [%d], fence = %u.%u\n",
>>  				   req->seqno,
>>  				   (int) (jiffies - req->emitted_jiffies),
>>  				   task ? task->comm : "<unknown>",
>> -				   task ? task->pid : -1);
>> +				   task ? task->pid : -1,
>> +				   req->fence.context, req->fence.seqno);
>>  			rcu_read_unlock();
>>  		}
>>  
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 436149e..aa5cba7 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -51,6 +51,7 @@
>>  #include <linux/kref.h>
>>  #include <linux/pm_qos.h>
>>  #include "intel_guc.h"
>> +#include <linux/fence.h>
>>  
>>  /* General customization:
>>   */
>> @@ -2174,7 +2175,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>>   * initial reference taken using kref_init
>>   */
>>  struct drm_i915_gem_request {
>> -	struct kref ref;
>> +	/**
>> +	 * Underlying object for implementing the signal/wait stuff.
>> +	 * NB: Never call fence_later() or return this fence object to user
>> +	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
>> +	 * etc., there is no guarantee at all about the validity or
>> +	 * sequentiality of the fence's seqno! It is also unsafe to let
>> +	 * anything outside of the i915 driver get hold of the fence object
>> +	 * as the clean up when decrementing the reference count requires
>> +	 * holding the driver mutex lock.
>> +	 */
>> +	struct fence fence;
>>  
>>  	/** On Which ring this request was generated */
>>  	struct drm_i915_private *i915;
>> @@ -2251,7 +2262,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>  			   struct intel_context *ctx,
>>  			   struct drm_i915_gem_request **req_out);
>>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>> -void i915_gem_request_free(struct kref *req_ref);
>> +
>> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>> +					      bool lazy_coherency)
>> +{
>> +	return fence_is_signaled(&req->fence);
>> +}
>> +
>>  int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>  				   struct drm_file *file);
>>  
>> @@ -2271,7 +2288,7 @@ static inline struct drm_i915_gem_request *
>>  i915_gem_request_reference(struct drm_i915_gem_request *req)
>>  {
>>  	if (req)
>> -		kref_get(&req->ref);
>> +		fence_get(&req->fence);
>>  	return req;
>>  }
>>  
>> @@ -2279,7 +2296,7 @@ static inline void
>>  i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>  {
>>  	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> -	kref_put(&req->ref, i915_gem_request_free);
>> +	fence_put(&req->fence);
>>  }
>>  
>>  static inline void
>> @@ -2291,7 +2308,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>>  		return;
>>  
>>  	dev = req->ring->dev;
>> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
>> +	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>>  		mutex_unlock(&dev->struct_mutex);
>>  }
>>  
>> @@ -2308,12 +2325,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>  }
>>  
>>  /*
>> - * XXX: i915_gem_request_completed should be here but currently needs the
>> - * definition of i915_seqno_passed() which is below. It will be moved in
>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>> - */
>> -
>> -/*
>>   * A command that requires special handling by the command parser.
>>   */
>>  struct drm_i915_cmd_descriptor {
>> @@ -2916,18 +2927,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>  	return (int32_t)(seq1 - seq2) >= 0;
>>  }
>>  
>> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>> -					      bool lazy_coherency)
>> -{
>> -	u32 seqno;
>> -
>> -	BUG_ON(req == NULL);
>> -
>> -	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
>> -
>> -	return i915_seqno_passed(seqno, req->seqno);
>> -}
>> -
>>  int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>>  int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>>  
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index e4056a3..a1b4dbd 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2617,12 +2617,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>>  	}
>>  }
>>  
>> -void i915_gem_request_free(struct kref *req_ref)
>> +static void i915_gem_request_free(struct fence *req_fence)
>>  {
>> -	struct drm_i915_gem_request *req = container_of(req_ref,
>> -						 typeof(*req), ref);
>> +	struct drm_i915_gem_request *req = container_of(req_fence,
>> +						 typeof(*req), fence);
>>  	struct intel_context *ctx = req->ctx;
>>  
>> +	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> +
>>  	if (req->file_priv)
>>  		i915_gem_request_remove_from_client(req);
>>  
>> @@ -2638,6 +2640,45 @@ void i915_gem_request_free(struct kref *req_ref)
>>  	kmem_cache_free(req->i915->requests, req);
>>  }
>>  
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +{
>> +	/* Interrupt driven fences are not implemented yet.*/
>> +	WARN(true, "This should not be called!");
>> +	return true;
>> +}
>> +
>> +static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +{
>> +	struct drm_i915_gem_request *req = container_of(req_fence,
>> +						 typeof(*req), fence);
>> +	u32 seqno;
>> +
>> +	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>> +
>> +	return i915_seqno_passed(seqno, req->seqno);
>> +}
>> +
>> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>> +{
>> +	return "i915";
>> +}
>> +
>> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
>> +{
>> +	struct drm_i915_gem_request *req = container_of(req_fence,
>> +						 typeof(*req), fence);
>> +	return req->ring->name;
>> +}
>> +
>> +static const struct fence_ops i915_gem_request_fops = {
>> +	.enable_signaling	= i915_gem_request_enable_signaling,
>> +	.signaled		= i915_gem_request_is_completed,
>> +	.wait			= fence_default_wait,
>> +	.release		= i915_gem_request_free,
>> +	.get_driver_name	= i915_gem_request_get_driver_name,
>> +	.get_timeline_name	= i915_gem_request_get_timeline_name,
>> +};
>> +
>>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>  			   struct intel_context *ctx,
>>  			   struct drm_i915_gem_request **req_out)
>> @@ -2659,7 +2700,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>  	if (ret)
>>  		goto err;
>>  
>> -	kref_init(&req->ref);
>>  	req->i915 = dev_priv;
>>  	req->ring = ring;
>>  	req->ctx  = ctx;
>> @@ -2674,6 +2714,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>  		goto err;
>>  	}
>>  
>> +	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
>> +
>>  	/*
>>  	 * Reserve space in the ring buffer for all the commands required to
>>  	 * eventually emit this request. This is to guarantee that the
>> @@ -4723,7 +4765,7 @@ i915_gem_init_hw(struct drm_device *dev)
>>  {
>>  	struct drm_i915_private *dev_priv = dev->dev_private;
>>  	struct intel_engine_cs *ring;
>> -	int ret, i, j;
>> +	int ret, i, j, fence_base;
>>  
>>  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>>  		return -EIO;
>> @@ -4793,12 +4835,16 @@ i915_gem_init_hw(struct drm_device *dev)
>>  	if (ret)
>>  		goto out;
>>  
>> +	fence_base = fence_context_alloc(I915_NUM_RINGS);
>> +
>>  	/* Now it is safe to go back round and do everything else: */
>>  	for_each_ring(ring, dev_priv, i) {
>>  		struct drm_i915_gem_request *req;
>>  
>>  		WARN_ON(!ring->default_context);
>>  
>> +		ring->fence_context = fence_base + i;
>> +
>>  		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
>>  		if (ret) {
>>  			i915_gem_cleanup_ringbuffer(dev);
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 06180dc..b8c8f9b 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1920,6 +1920,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>>  	ring->dev = dev;
>>  	INIT_LIST_HEAD(&ring->active_list);
>>  	INIT_LIST_HEAD(&ring->request_list);
>> +	spin_lock_init(&ring->fence_lock);
>>  	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>>  	init_waitqueue_head(&ring->irq_queue);
>>  
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index c9b081f..f4a6403 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2158,6 +2158,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>  	INIT_LIST_HEAD(&ring->request_list);
>>  	INIT_LIST_HEAD(&ring->execlist_queue);
>>  	INIT_LIST_HEAD(&ring->buffers);
>> +	spin_lock_init(&ring->fence_lock);
>>  	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>>  	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
>>  
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 58b1976..4547645 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -348,6 +348,9 @@ struct  intel_engine_cs {
>>  	 * to encode the command length in the header).
>>  	 */
>>  	u32 (*get_cmd_length_mask)(u32 cmd_header);
>> +
>> +	unsigned fence_context;
>> +	spinlock_t fence_lock;
>>  };
>>  
>>  bool intel_ring_initialized(struct intel_engine_cs *ring);
>>
> 
> Chris has an equivalent patch that does a little more (interrupt driven waits, custom i915 wait function, etc).  Can you review that instead assuming it's sufficient?
> 
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=breadcrumbs&id=f062e706740d87befb8e7cd7ea337f98f0b24f52

Ok given that Chris's stuff is more ambitious, and this has already been
out there a long time and seen a lot of review, I think we should go
ahead with this version first, and then rebase Chris's stuff on top.

So this one has my ack.

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/13] drm/i915: Convert requests to use struct fence
  2016-01-04 17:20     ` Jesse Barnes
@ 2016-01-04 20:57       ` Chris Wilson
  2016-01-04 21:16         ` Jesse Barnes
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-04 20:57 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Mon, Jan 04, 2016 at 09:20:44AM -0800, Jesse Barnes wrote:
> So this one has my ack.

This series makes a number of fundamental mistakes in seqno-interrupt
handling, so no.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/13] drm/i915: Convert requests to use struct fence
  2016-01-04 20:57       ` Chris Wilson
@ 2016-01-04 21:16         ` Jesse Barnes
  2016-01-08 21:47           ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Jesse Barnes @ 2016-01-04 21:16 UTC (permalink / raw)
  To: Chris Wilson, John.C.Harrison, Intel-GFX

On 01/04/2016 12:57 PM, Chris Wilson wrote:
> On Mon, Jan 04, 2016 at 09:20:44AM -0800, Jesse Barnes wrote:
>> So this one has my ack.
> 
> This series makes a number of fundamental mistakes in seqno-interrupt
> handling, so no.

Well unless you can enumerate the issues in enough detail for us to address them, we don't have much choice but to go ahead.  I know you've replied to a few of these threads in the past, but I don't see a current list of outstanding bugs aside from the one about modifying input params on the execbuf error path (though the code comment seems to indicate some care is being taken there at least, so should be a small fix).

Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 0/7] Convert requests to use struct fence
  2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
                   ` (12 preceding siblings ...)
  2015-12-11 13:12 ` [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2016-01-08 18:47 ` John.C.Harrison
  2016-01-08 18:47   ` [PATCH 1/7] drm/i915: " John.C.Harrison
                     ` (7 more replies)
  13 siblings, 8 replies; 74+ messages in thread
From: John.C.Harrison @ 2016-01-08 18:47 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.

This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.

An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.

v2: Updated for review comments by various people and to add support
for Android style 'native sync'.

v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.

v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
comments.

v5: Removed de-staging and further updates to Android sync code. The
de-stage is now being handled by someone else. The sync integration to
the i915 driver will be a separate patch set that can only land after
the external de-stage has been completed.

Assorted changes based on review comments and style checker fixes.
Most significant change is fixing up the fake lost interrupt support
for the 'drv_missed_irq_hang' IGT test and improving the wait request
latency.

[Patches against drm-intel-nightly tree fetched 17/11/2015]

John Harrison (7):
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  drm/i915: Add per context timelines to fence object
  drm/i915: Delay the freeing of requests until retire time
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing
  drm/i915: Cache last IRQ seqno to reduce IRQ overhead

 drivers/gpu/drm/i915/i915_debugfs.c     |   7 +-
 drivers/gpu/drm/i915/i915_drv.h         |  67 ++---
 drivers/gpu/drm/i915/i915_gem.c         | 427 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_context.c |  16 +-
 drivers/gpu/drm/i915/i915_irq.c         |   2 +-
 drivers/gpu/drm/i915/i915_trace.h       |  14 +-
 drivers/gpu/drm/i915/intel_display.c    |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  13 +
 drivers/gpu/drm/i915/intel_pm.c         |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  12 +
 11 files changed, 496 insertions(+), 77 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/7] drm/i915: Convert requests to use struct fence
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
@ 2016-01-08 18:47   ` John.C.Harrison
  2016-01-08 21:59     ` Chris Wilson
  2016-01-08 18:47   ` [PATCH 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2016-01-08 18:47 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

This patch makes the first step of integrating a struct fence into the
request. It replaces the explicit reference count with that of the
fence. It also replaces the 'is completed' test with the fence's
equivalent. Currently, that simply chains on to the original request
implementation. A future patch will improve this.

v3: Updated after review comments by Tvrtko Ursulin. Added fence
context/seqno pair to the debugfs request info. Renamed fence 'driver
name' to just 'i915'. Removed BUG_ONs.

v5: Changed seqno format in debugfs to %x rather than %u as that is
apparently the preferred appearance. Line wrapped some long lines to
keep the style checker happy.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  5 +--
 drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem.c         | 57 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c        |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 6 files changed, 82 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7415606..af41e5c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 			task = NULL;
 			if (req->pid)
 				task = pid_task(req->pid, PIDTYPE_PID);
-			seq_printf(m, "    %x @ %d: %s [%d]\n",
+			seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
 				   req->seqno,
 				   (int) (jiffies - req->emitted_jiffies),
 				   task ? task->comm : "<unknown>",
-				   task ? task->pid : -1);
+				   task ? task->pid : -1,
+				   req->fence.context, req->fence.seqno);
 			rcu_read_unlock();
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 436149e..aa5cba7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -51,6 +51,7 @@
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
 #include "intel_guc.h"
+#include <linux/fence.h>
 
 /* General customization:
  */
@@ -2174,7 +2175,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	/**
+	 * Underlying object for implementing the signal/wait stuff.
+	 * NB: Never call fence_later() or return this fence object to user
+	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
+	 * etc., there is no guarantee at all about the validity or
+	 * sequentiality of the fence's seqno! It is also unsafe to let
+	 * anything outside of the i915 driver get hold of the fence object
+	 * as the clean up when decrementing the reference count requires
+	 * holding the driver mutex lock.
+	 */
+	struct fence fence;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2251,7 +2262,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+					      bool lazy_coherency)
+{
+	return fence_is_signaled(&req->fence);
+}
+
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
 
@@ -2271,7 +2288,7 @@ static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
 	if (req)
-		kref_get(&req->ref);
+		fence_get(&req->fence);
 	return req;
 }
 
@@ -2279,7 +2296,7 @@ static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
 	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void
@@ -2291,7 +2308,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
 		return;
 
 	dev = req->ring->dev;
-	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
+	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
 		mutex_unlock(&dev->struct_mutex);
 }
 
@@ -2308,12 +2325,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 }
 
 /*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
-/*
  * A command that requires special handling by the command parser.
  */
 struct drm_i915_cmd_descriptor {
@@ -2916,18 +2927,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
-{
-	u32 seqno;
-
-	BUG_ON(req == NULL);
-
-	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-
-	return i915_seqno_passed(seqno, req->seqno);
-}
-
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e4056a3..1138990 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2617,12 +2617,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
 	struct intel_context *ctx = req->ctx;
 
+	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
+
 	if (req->file_priv)
 		i915_gem_request_remove_from_client(req);
 
@@ -2638,6 +2640,45 @@ void i915_gem_request_free(struct kref *req_ref)
 	kmem_cache_free(req->i915->requests, req);
 }
 
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+	/* Interrupt driven fences are not implemented yet.*/
+	WARN(true, "This should not be called!");
+	return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	u32 seqno;
+
+	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+
+	return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+	return "i915";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	return req->ring->name;
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+	.enable_signaling	= i915_gem_request_enable_signaling,
+	.signaled		= i915_gem_request_is_completed,
+	.wait			= fence_default_wait,
+	.release		= i915_gem_request_free,
+	.get_driver_name	= i915_gem_request_get_driver_name,
+	.get_timeline_name	= i915_gem_request_get_timeline_name,
+};
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
@@ -2659,7 +2700,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	if (ret)
 		goto err;
 
-	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->ring = ring;
 	req->ctx  = ctx;
@@ -2674,6 +2714,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
+	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
+		   ring->fence_context, req->seqno);
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
@@ -4723,7 +4766,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i, j;
+	int ret, i, j, fence_base;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -4793,12 +4836,16 @@ i915_gem_init_hw(struct drm_device *dev)
 	if (ret)
 		goto out;
 
+	fence_base = fence_context_alloc(I915_NUM_RINGS);
+
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
 		WARN_ON(!ring->default_context);
 
+		ring->fence_context = fence_base + i;
+
 		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
 		if (ret) {
 			i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 06180dc..b8c8f9b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1920,6 +1920,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c9b081f..f4a6403 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2158,6 +2158,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	INIT_LIST_HEAD(&ring->buffers);
+	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 58b1976..4547645 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -348,6 +348,9 @@ struct  intel_engine_cs {
 	 * to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	unsigned fence_context;
+	spinlock_t fence_lock;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
  2016-01-08 18:47   ` [PATCH 1/7] drm/i915: " John.C.Harrison
@ 2016-01-08 18:47   ` John.C.Harrison
  2016-01-11 22:43     ` Jesse Barnes
  2016-01-08 18:47   ` [PATCH 3/7] drm/i915: Add per context timelines to fence object John.C.Harrison
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2016-01-08 18:47 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      |  3 +--
 drivers/gpu/drm/i915/i915_gem.c      | 18 +++++++++---------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index af41e5c..b54d99e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   ring->get_seqno(ring, true),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index aa5cba7..caf7897 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2263,8 +2263,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1138990..93d2f32 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1165,7 +1165,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 	timeout = jiffies + 1;
 	while (!need_resched()) {
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			return 0;
 
 		if (time_after_eq(jiffies, timeout))
@@ -1173,7 +1173,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 		cpu_relax_lowlatency();
 	}
-	if (i915_gem_request_completed(req, false))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	return -EAGAIN;
@@ -1217,7 +1217,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_expire = timeout ?
@@ -1257,7 +1257,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		if (i915_gem_request_completed(req, false)) {
+		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
 		}
@@ -2759,7 +2759,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 	struct drm_i915_gem_request *request;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -2900,7 +2900,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -2924,7 +2924,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	}
 
 	if (unlikely(ring->trace_irq_req &&
-		     i915_gem_request_completed(ring->trace_irq_req, true))) {
+		     i915_gem_request_completed(ring->trace_irq_req))) {
 		ring->irq_put(ring);
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
@@ -3030,7 +3030,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (list_empty(&req->list))
 			goto retire;
 
-		if (i915_gem_request_completed(req, true)) {
+		if (i915_gem_request_completed(req)) {
 			__i915_gem_request_retire__upto(req);
 retire:
 			i915_gem_object_retire__read(obj, i);
@@ -3142,7 +3142,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index a5dd528..510365e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11313,7 +11313,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index ebd6735..c207a3a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7170,7 +7170,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
@@ -7186,7 +7186,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (req == NULL || INTEL_INFO(dev)->gen < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 3/7] drm/i915: Add per context timelines to fence object
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
  2016-01-08 18:47   ` [PATCH 1/7] drm/i915: " John.C.Harrison
  2016-01-08 18:47   ` [PATCH 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2016-01-08 18:47   ` John.C.Harrison
  2016-01-08 22:05     ` Chris Wilson
  2016-01-08 18:47   ` [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2016-01-08 18:47 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel drivers
and user applications to wait on batch buffer completion
asynchronously via the dma-buff fence API.

To ensure that such external users are not confused by strange things
happening with the seqno, this patch adds in a per context timeline
that can provide a guaranteed in-order seqno value for the fence. This
is safe because the scheduler will not re-order batch buffers within a
context - they are considered to be mutually dependent.

v2: New patch in series.

v3: Renamed/retyped timeline structure fields after review comments by
Tvrtko Ursulin.

Added context information to the timeline's name string for better
identification in debugfs output.

v5: Line wrapping and other white space fixes to keep style checker
happy.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 25 +++++++---
 drivers/gpu/drm/i915/i915_gem.c         | 83 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_context.c | 16 ++++++-
 drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 5 files changed, 115 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index caf7897..7d6a7c0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -841,6 +841,15 @@ struct i915_ctx_hang_stats {
 	bool banned;
 };
 
+struct i915_fence_timeline {
+	char        name[32];
+	unsigned    fence_context;
+	unsigned    next;
+
+	struct intel_context *ctx;
+	struct intel_engine_cs *ring;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -885,6 +894,7 @@ struct intel_context {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
 		int pin_count;
+		struct i915_fence_timeline fence_timeline;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
@@ -2177,13 +2187,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
 	/**
 	 * Underlying object for implementing the signal/wait stuff.
-	 * NB: Never call fence_later() or return this fence object to user
-	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
-	 * etc., there is no guarantee at all about the validity or
-	 * sequentiality of the fence's seqno! It is also unsafe to let
-	 * anything outside of the i915 driver get hold of the fence object
-	 * as the clean up when decrementing the reference count requires
-	 * holding the driver mutex lock.
+	 * NB: Never return this fence object to user land! It is unsafe to
+	 * let anything outside of the i915 driver get hold of the fence
+	 * object as the clean up when decrementing the reference count
+	 * requires holding the driver mutex lock.
 	 */
 	struct fence fence;
 
@@ -2263,6 +2270,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *ring);
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 93d2f32..9ce17a3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2665,9 +2665,35 @@ static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
 
 static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_fence,
-						 typeof(*req), fence);
-	return req->ring->name;
+	struct drm_i915_gem_request *req;
+	struct i915_fence_timeline *timeline;
+
+	req = container_of(req_fence, typeof(*req), fence);
+	timeline = &req->ctx->engine[req->ring->id].fence_timeline;
+
+	return timeline->name;
+}
+
+static void i915_gem_request_timeline_value_str(struct fence *req_fence,
+						char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	/* Last signalled timeline value ??? */
+	snprintf(str, size, "? [%d]"/*, timeline->value*/,
+		 req->ring->get_seqno(req->ring, true));
+}
+
+static void i915_gem_request_fence_value_str(struct fence *req_fence,
+					     char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
 }
 
 static const struct fence_ops i915_gem_request_fops = {
@@ -2677,8 +2703,50 @@ static const struct fence_ops i915_gem_request_fops = {
 	.release		= i915_gem_request_free,
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
+	.fence_value_str	= i915_gem_request_fence_value_str,
+	.timeline_value_str	= i915_gem_request_timeline_value_str,
 };
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *ring)
+{
+	struct i915_fence_timeline *timeline;
+
+	timeline = &ctx->engine[ring->id].fence_timeline;
+
+	if (timeline->ring)
+		return 0;
+
+	timeline->fence_context = fence_context_alloc(1);
+
+	/*
+	 * Start the timeline from seqno 0 as this is a special value
+	 * that is reserved for invalid sync points.
+	 */
+	timeline->next       = 1;
+	timeline->ctx        = ctx;
+	timeline->ring       = ring;
+
+	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
+		 timeline->fence_context, ring->name, ctx->user_handle);
+
+	return 0;
+}
+
+static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+{
+	unsigned seqno;
+
+	seqno = timeline->next;
+
+	/* Reserve zero for invalid */
+	if (++timeline->next == 0)
+		timeline->next = 1;
+
+	return seqno;
+}
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
@@ -2715,7 +2783,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	}
 
 	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
-		   ring->fence_context, req->seqno);
+		   ctx->engine[ring->id].fence_timeline.fence_context,
+		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
@@ -4766,7 +4835,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i, j, fence_base;
+	int ret, i, j;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -4836,16 +4905,12 @@ i915_gem_init_hw(struct drm_device *dev)
 	if (ret)
 		goto out;
 
-	fence_base = fence_context_alloc(I915_NUM_RINGS);
-
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
 		WARN_ON(!ring->default_context);
 
-		ring->fence_context = fence_base + i;
-
 		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
 		if (ret) {
 			i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 43b1c73..4570edd 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -266,7 +266,7 @@ i915_gem_create_context(struct drm_device *dev,
 {
 	const bool is_global_default_ctx = file_priv == NULL;
 	struct intel_context *ctx;
-	int ret = 0;
+	int i, ret = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 
@@ -274,6 +274,20 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
+	if (!i915.enable_execlists) {
+		struct intel_engine_cs *ring;
+
+		/* Create a per context timeline for fences */
+		for_each_ring(ring, to_i915(dev), i) {
+			ret = i915_create_fence_timeline(dev, ctx, ring);
+			if (ret) {
+				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n",
+					  ring->name, ctx);
+				goto err_destroy;
+			}
+		}
+	}
+
 	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b8c8f9b..2b56651 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2489,6 +2489,14 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 		goto error_ringbuf;
 	}
 
+	/* Create a per context timeline for fences */
+	ret = i915_create_fence_timeline(dev, ctx, ring);
+	if (ret) {
+		DRM_ERROR("Fence timeline creation failed for ring %s, ctx %p\n",
+			  ring->name, ctx);
+		goto error_ringbuf;
+	}
+
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4547645..356b6a8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -349,7 +349,6 @@ struct  intel_engine_cs {
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
-	unsigned fence_context;
 	spinlock_t fence_lock;
 };
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
                     ` (2 preceding siblings ...)
  2016-01-08 18:47   ` [PATCH 3/7] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2016-01-08 18:47   ` John.C.Harrison
  2016-01-08 22:08     ` Chris Wilson
  2016-01-08 18:47   ` [PATCH 5/7] drm/i915: Interrupt driven fences John.C.Harrison
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2016-01-08 18:47 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while all references were held internally to the driver. However, the
plan is to allow the underlying fence object (and hence the request
itself) to be returned to other drivers and to userland. External
users cannot be expected to acquire a driver private mutex lock.

Rather than attempt to disentangle the request structure from the
driver mutex lock, the decsion was to defer the free code until a
later (safer) point. Hence this patch changes the unreference callback
to merely move the request onto a delayed free list. The driver's
retire worker thread will then process the list and actually call the
free function on the requests.

v2: New patch in series.

v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
to 'link' rather than 'list'. Update list processing to be more
efficient/safer with respect to spinlocks.

v4: Changed to use basic spinlocks rather than IRQ ones - missed
update from earlier feedback by Tvrtko.

v5: Improved a comment to keep the style checker happy.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 22 +++-----------------
 drivers/gpu/drm/i915/i915_gem.c         | 37 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_display.c    |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  2 ++
 drivers/gpu/drm/i915/intel_pm.c         |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  7 +++++++
 7 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7d6a7c0..fbf591f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2185,14 +2185,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	/**
-	 * Underlying object for implementing the signal/wait stuff.
-	 * NB: Never return this fence object to user land! It is unsafe to
-	 * let anything outside of the i915 driver get hold of the fence
-	 * object as the clean up when decrementing the reference count
-	 * requires holding the driver mutex lock.
-	 */
+	/** Underlying object for implementing the signal/wait stuff. */
 	struct fence fence;
+	struct list_head delayed_free_link;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2305,21 +2300,10 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	fence_put(&req->fence);
-}
-
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-	struct drm_device *dev;
-
 	if (!req)
 		return;
 
-	dev = req->ring->dev;
-	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
-		mutex_unlock(&dev->struct_mutex);
+	fence_put(&req->fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9ce17a3..f42296e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2617,10 +2617,26 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-static void i915_gem_request_free(struct fence *req_fence)
+static void i915_gem_request_release(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+	struct intel_engine_cs *ring = req->ring;
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+
+	/*
+	 * Need to add the request to a deferred dereference list to be
+	 * processed at a mutex lock safe time.
+	 */
+	spin_lock(&ring->delayed_free_lock);
+	list_add_tail(&req->delayed_free_link, &ring->delayed_free_list);
+	spin_unlock(&ring->delayed_free_lock);
+
+	queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
+}
+
+static void i915_gem_request_free(struct drm_i915_gem_request *req)
+{
 	struct intel_context *ctx = req->ctx;
 
 	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
@@ -2700,7 +2716,7 @@ static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
 	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
-	.release		= i915_gem_request_free,
+	.release		= i915_gem_request_release,
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
 	.fence_value_str	= i915_gem_request_fence_value_str,
@@ -2955,6 +2971,9 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
+	struct drm_i915_gem_request *req, *req_next;
+	LIST_HEAD(list_head);
+
 	WARN_ON(i915_verify_lists(ring->dev));
 
 	/* Retire requests first as we use it above for the early return.
@@ -2998,6 +3017,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
 
+	/* Really free any requests that were recently unreferenced */
+	spin_lock(&ring->delayed_free_lock);
+	list_splice_init(&ring->delayed_free_list, &list_head);
+	spin_unlock(&ring->delayed_free_lock);
+	list_for_each_entry_safe(req, req_next, &list_head, delayed_free_link) {
+		list_del(&req->delayed_free_link);
+		i915_gem_request_free(req);
+	}
+
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
@@ -3188,7 +3216,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			ret = __i915_wait_request(req[i], reset_counter, true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  file->driver_priv);
-		i915_gem_request_unreference__unlocked(req[i]);
+		i915_gem_request_unreference(req[i]);
 	}
 	return ret;
 
@@ -4183,7 +4211,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
-	i915_gem_request_unreference__unlocked(target);
+	i915_gem_request_unreference(target);
 
 	return ret;
 }
@@ -5040,6 +5068,7 @@ init_ring_lists(struct intel_engine_cs *ring)
 {
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 }
 
 void
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 510365e..9291a1d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11256,7 +11256,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 					    mmio_flip->crtc->reset_counter,
 					    false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
-		i915_gem_request_unreference__unlocked(mmio_flip->req);
+		i915_gem_request_unreference(mmio_flip->req);
 	}
 
 	intel_do_mmio_flip(mmio_flip);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2b56651..06a398a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1920,7 +1920,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
+	spin_lock_init(&ring->delayed_free_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index c207a3a..e2d34a6 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7174,7 +7174,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
-	i915_gem_request_unreference__unlocked(req);
+	i915_gem_request_unreference(req);
 	kfree(boost);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index f4a6403..e5573e7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2158,7 +2158,9 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	INIT_LIST_HEAD(&ring->buffers);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
+	spin_lock_init(&ring->delayed_free_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 356b6a8..6c7a90a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -301,6 +301,13 @@ struct  intel_engine_cs {
 	 */
 	u32 last_submitted_seqno;
 
+	/*
+	 * Deferred free list to allow unreferencing requests from interrupt
+	 * contexts and from outside of the i915 driver.
+	 */
+	struct list_head delayed_free_list;
+	spinlock_t delayed_free_lock;
+
 	bool gpu_caches_dirty;
 
 	wait_queue_head_t irq_queue;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 5/7] drm/i915: Interrupt driven fences
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
                     ` (3 preceding siblings ...)
  2016-01-08 18:47   ` [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2016-01-08 18:47   ` John.C.Harrison
  2016-01-08 22:14     ` Chris Wilson
  2016-01-08 22:46     ` Chris Wilson
  2016-01-08 18:47   ` [PATCH 6/7] drm/i915: Updated request structure tracing John.C.Harrison
                     ` (2 subsequent siblings)
  7 siblings, 2 replies; 74+ messages in thread
From: John.C.Harrison @ 2016-01-08 18:47 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
callback from the hardware which will update the state directly. In
the case of requests, this is the seqno update interrupt. The idea is
that this callback will only be enabled on demand when something
actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback
scheme. Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke
me' list when a new seqno pops out and signals any matching
fence/request. The fence is then removed from the list so the entire
request stack does not need to be scanned every time. Note that the
fence is added to the list before the commands to generate the seqno
interrupt are added to the ring. Thus the sequence is guaranteed to be
race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is called). Thus there is still a potential race when
enabling the interrupt as the request may already have completed.
However, this is simply solved by calling the interrupt processing
code immediately after enabling the interrupt and thereby checking for
already completed requests.

Lastly, the ring clean up code has the possibility to cancel
outstanding requests (e.g. because TDR has reset the ring). These
requests will never get signalled and so must be removed from the
signal list manually. This is done by setting a 'cancelled' flag and
then calling the regular notify/retire code path rather than
attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the
cancellation request might occur after/during the completion interrupt
actually arriving.

v2: Updated to take advantage of the request unreference no longer
requiring the mutex lock.

v3: Move the signal list processing around to prevent unsubmitted
requests being added to the list. This was occurring on Android
because the native sync implementation calls the
fence->enable_signalling API immediately on fence creation.

Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
'link' instead of 'list'. Added support for returning an error code on
a cancelled fence. Update list processing to be more efficient/safer
with respect to spinlocks.

v5: Made i915_gem_request_submit a static as it is only ever called
from one place.

Fixed up the low latency wait optimisation. The time delay between the
seqno value being to memory and the drive's ISR running can be
significant, at least for the wait request micro-benchmark. This can
be greatly improved by explicitly checking for seqno updates in the
pre-wait busy poll loop. Also added some documentation comments to the
busy poll code.

Fixed up support for the faking of lost interrupts
(test_irq_rings/missed_irq_rings). That is, there is an IGT test that
tells the driver to loose interrupts deliberately and then check that
everything still works as expected (albeit much slower).

Updates from review comments: use non IRQ-save spinlocking, early exit
on WARN and improved comments (Tvrtko Ursulin).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |   8 +
 drivers/gpu/drm/i915/i915_gem.c         | 256 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 248 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fbf591f..acfe25f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2187,7 +2187,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
 	/** Underlying object for implementing the signal/wait stuff. */
 	struct fence fence;
+	struct list_head signal_link;
+	struct list_head unsignal_link;
 	struct list_head delayed_free_link;
+	bool cancelled;
+	bool irq_enabled;
+	bool signal_requested;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2264,6 +2269,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked);
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
 
 int i915_create_fence_timeline(struct drm_device *dev,
 			       struct intel_context *ctx,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f42296e..96cafab 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -40,6 +40,8 @@
 
 #define RQ_BUG_ON(expr)
 
+static void i915_gem_request_submit(struct drm_i915_gem_request *req);
+
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
 static void
@@ -1156,16 +1158,32 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
 	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
 }
 
+/*
+ * Super low latency implementation of request wait.
+ *
+ * This is used as a precursor to doing anything slow like waiting to be
+ * woken up by a signal, interrupt handler, etc. in the main wait request
+ * code. The idea is that most requests complete pretty quickly, so burning
+ * the CPU for a jiffy is actually more efficient than sleeping as that
+ * introduces significant latency.
+ */
 static int __i915_spin_request(struct drm_i915_gem_request *req)
 {
 	unsigned long timeout;
+	uint32_t seqno;
 
 	if (i915_gem_request_get_ring(req)->irq_refcount)
 		return -EBUSY;
 
 	timeout = jiffies + 1;
 	while (!need_resched()) {
-		if (i915_gem_request_completed(req))
+		/*
+		 * Explicitly check the seqno rather than waiting for the
+		 * user interrupt to work its way through the hardware and
+		 * software layers.
+		 */
+		seqno = req->ring->get_seqno(req->ring, false);
+		if (i915_seqno_passed(seqno, req->seqno))
 			return 0;
 
 		if (time_after_eq(jiffies, timeout))
@@ -1173,7 +1191,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 		cpu_relax_lowlatency();
 	}
-	if (i915_gem_request_completed(req))
+
+	seqno = req->ring->get_seqno(req->ring, false);
+	if (i915_seqno_passed(seqno, req->seqno))
 		return 0;
 
 	return -EAGAIN;
@@ -1205,8 +1225,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	const bool irq_test_in_progress =
-		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring);
+	uint32_t seqno;
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
 	s64 before, now;
@@ -1214,9 +1233,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
-	if (list_empty(&req->list))
-		return 0;
-
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -1231,15 +1247,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	before = ktime_get_raw_ns();
 
 	/* Optimistic spin for the next jiffie before touching IRQs */
-	ret = __i915_spin_request(req);
-	if (ret == 0)
-		goto out;
-
-	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring))) {
-		ret = -ENODEV;
-		goto out;
+	if (req->seqno) {
+		ret = __i915_spin_request(req);
+		if (ret == 0)
+			goto out;
 	}
 
+	/*
+	 * Enable interrupt completion of the request.
+	 */
+	fence_enable_sw_signaling(&req->fence);
+
 	for (;;) {
 		struct timer_list timer;
 
@@ -1262,6 +1280,19 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		if (req->seqno) {
+			/*
+			 * There is quite a lot of latency in the user interrupt
+			 * path. So do an explicit seqno check and potentially
+			 * remove all that delay.
+			 */
+			seqno = ring->get_seqno(ring, false);
+			if (i915_seqno_passed(seqno, req->seqno)) {
+				ret = 0;
+				break;
+			}
+		}
+
 		if (interruptible && signal_pending(current)) {
 			ret = -ERESTARTSYS;
 			break;
@@ -1288,8 +1319,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			destroy_timer_on_stack(&timer);
 		}
 	}
-	if (!irq_test_in_progress)
-		ring->irq_put(ring);
 
 	finish_wait(&ring->irq_queue, &wait);
 
@@ -1297,6 +1326,18 @@ out:
 	now = ktime_get_raw_ns();
 	trace_i915_gem_request_wait_end(req);
 
+	if ((ret == 0) && (req->seqno)) {
+		seqno = ring->get_seqno(ring, false);
+		if (i915_seqno_passed(seqno, req->seqno) &&
+		    !i915_gem_request_completed(req)) {
+			/*
+			 * Make sure the request is marked as completed before
+			 * returning:
+			 */
+			i915_gem_request_notify(req->ring, false);
+		}
+	}
+
 	if (timeout) {
 		s64 tres = *timeout - (now - before);
 
@@ -1377,6 +1418,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	list_del_init(&request->list);
 	i915_gem_request_remove_from_client(request);
 
+	/*
+	 * In case the request is still in the signal pending list,
+	 * e.g. due to being cancelled by TDR, preemption, etc.
+	 */
+	if (!list_empty(&request->signal_link)) {
+		/*
+		 * The request must be marked as cancelled and the underlying
+		 * fence as both failed. NB: There is no explicit fence fail
+		 * API, there is only a manual poke and signal.
+		 */
+		request->cancelled = true;
+		/* How to propagate to any associated sync_fence??? */
+		request->fence.status = -EIO;
+		fence_signal_locked(&request->fence);
+	}
+
 	i915_gem_request_unreference(request);
 }
 
@@ -2535,6 +2592,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ringbuf);
 
+	/*
+	 * Add the fence to the pending list before emitting the commands to
+	 * generate a seqno notification interrupt.
+	 */
+	i915_gem_request_submit(request);
+
 	if (i915.enable_execlists)
 		ret = ring->emit_request(request);
 	else {
@@ -2653,25 +2716,140 @@ static void i915_gem_request_free(struct drm_i915_gem_request *req)
 		i915_gem_context_unreference(ctx);
 	}
 
+	if (req->irq_enabled)
+		req->ring->irq_put(req->ring);
+
 	kmem_cache_free(req->i915->requests, req);
 }
 
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request is about to be submitted to the hardware so add the fence to
+ * the list of signalable fences.
+ *
+ * NB: This does not necessarily enable interrupts yet. That only occurs on
+ * demand when the request is actually waited on. However, adding it to the
+ * list early ensures that there is no race condition where the interrupt
+ * could pop out prematurely and thus be completely lost. The race is merely
+ * that the interrupt must be manually checked for after being enabled.
+ */
+static void i915_gem_request_submit(struct drm_i915_gem_request *req)
 {
-	/* Interrupt driven fences are not implemented yet.*/
-	WARN(true, "This should not be called!");
-	return true;
+	/*
+	 * Always enable signal processing for the request's fence object
+	 * before that request is submitted to the hardware. Thus there is no
+	 * race condition whereby the interrupt could pop out before the
+	 * request has been added to the signal list. Hence no need to check
+	 * for completion, undo the list add and return false.
+	 */
+	i915_gem_request_reference(req);
+	spin_lock_irq(&req->ring->fence_lock);
+	WARN_ON(!list_empty(&req->signal_link));
+	list_add_tail(&req->signal_link, &req->ring->fence_signal_list);
+	spin_unlock_irq(&req->ring->fence_lock);
+
+	/*
+	 * NB: Interrupts are only enabled on demand. Thus there is still a
+	 * race where the request could complete before the interrupt has
+	 * been enabled. Thus care must be taken at that point.
+	 */
+
+	 /* Have interrupts already been requested? */
+	 if (req->signal_requested)
+		i915_gem_request_enable_interrupt(req, false);
 }
 
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked)
+{
+	struct drm_i915_private *dev_priv = to_i915(req->ring->dev);
+	const bool irq_test_in_progress =
+		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
+						intel_ring_flag(req->ring);
+
+	if (req->irq_enabled)
+		return;
+
+	if (irq_test_in_progress)
+		return;
+
+	if (!WARN_ON(!req->ring->irq_get(req->ring)))
+		req->irq_enabled = true;
+
+	/*
+	 * Because the interrupt is only enabled on demand, there is a race
+	 * where the interrupt can fire before anyone is looking for it. So
+	 * do an explicit check for missed interrupts.
+	 */
+	i915_gem_request_notify(req->ring, fence_locked);
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+
+	/*
+	 * No need to actually enable interrupt based processing until the
+	 * request has been submitted to the hardware. At which point
+	 * 'i915_gem_request_submit()' is called. So only really enable
+	 * signalling in there. Just set a flag to say that interrupts are
+	 * wanted when the request is eventually submitted. On the other hand
+	 * if the request has already been submitted then interrupts do need
+	 * to be enabled now.
+	 */
+
+	req->signal_requested = true;
+
+	if (!list_empty(&req->signal_link))
+		i915_gem_request_enable_interrupt(req, true);
+
+	return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
+{
+	struct drm_i915_gem_request *req, *req_next;
+	unsigned long flags;
 	u32 seqno;
 
-	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+	if (list_empty(&ring->fence_signal_list))
+		return;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&ring->fence_lock, flags);
 
-	return i915_seqno_passed(seqno, req->seqno);
+	seqno = ring->get_seqno(ring, false);
+
+	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
+		if (!req->cancelled) {
+			if (!i915_seqno_passed(seqno, req->seqno))
+				break;
+		}
+
+		/*
+		 * Start by removing the fence from the signal list otherwise
+		 * the retire code can run concurrently and get confused.
+		 */
+		list_del_init(&req->signal_link);
+
+		if (!req->cancelled)
+			fence_signal_locked(&req->fence);
+
+		if (req->irq_enabled) {
+			req->ring->irq_put(req->ring);
+			req->irq_enabled = false;
+		}
+
+		/* Can't unreference here because that might grab fence_lock */
+		list_add_tail(&req->unsignal_link, &ring->fence_unsignal_list);
+	}
+
+	if (!fence_locked)
+		spin_unlock_irqrestore(&ring->fence_lock, flags);
 }
 
 static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
@@ -2714,7 +2892,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
 
 static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
-	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_release,
 	.get_driver_name	= i915_gem_request_get_driver_name,
@@ -2798,6 +2975,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
+	INIT_LIST_HEAD(&req->signal_link);
 	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
 		   ctx->engine[ring->id].fence_timeline.fence_context,
 		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
@@ -2835,6 +3013,11 @@ void i915_gem_request_cancel(struct drm_i915_gem_request *req)
 {
 	intel_ring_reserved_space_cancel(req->ringbuf);
 
+	req->cancelled = true;
+	/* How to propagate to any associated sync_fence??? */
+	req->fence.status = -EINVAL;
+	fence_signal_locked(&req->fence);
+
 	i915_gem_request_unreference(req);
 }
 
@@ -2928,6 +3111,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_request_retire(request);
 	}
 
+	/*
+	 * Tidy up anything left over. This includes a call to
+	 * i915_gem_request_notify() which will make sure that any requests
+	 * that were on the signal pending list get also cleaned up.
+	 */
+	i915_gem_retire_requests_ring(ring);
+
 	/* Having flushed all requests from all queues, we know that all
 	 * ringbuffers must now be empty. However, since we do not reclaim
 	 * all space when retiring the request (to prevent HEADs colliding
@@ -2976,6 +3166,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 	WARN_ON(i915_verify_lists(ring->dev));
 
+	/*
+	 * If no-one has waited on a request recently then interrupts will
+	 * not have been enabled and thus no requests will ever be marked as
+	 * completed. So do an interrupt check now.
+	 */
+	i915_gem_request_notify(ring, false);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -3017,6 +3214,15 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
 
+	/* Tidy up any requests that were recently signalled */
+	spin_lock_irq(&ring->fence_lock);
+	list_splice_init(&ring->fence_unsignal_list, &list_head);
+	spin_unlock_irq(&ring->fence_lock);
+	list_for_each_entry_safe(req, req_next, &list_head, unsignal_link) {
+		list_del(&req->unsignal_link);
+		i915_gem_request_unreference(req);
+	}
+
 	/* Really free any requests that were recently unreferenced */
 	spin_lock(&ring->delayed_free_lock);
 	list_splice_init(&ring->delayed_free_list, &list_head);
@@ -5068,6 +5274,8 @@ init_ring_lists(struct intel_engine_cs *ring)
 {
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 68b094b..74f8552 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -981,6 +981,8 @@ static void notify_ring(struct intel_engine_cs *ring)
 
 	trace_i915_gem_request_notify(ring);
 
+	i915_gem_request_notify(ring, false);
+
 	wake_up_all(&ring->irq_queue);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 06a398a..76fc245 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1920,6 +1920,8 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
 	spin_lock_init(&ring->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e5573e7..1dec252 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2158,6 +2158,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
 	INIT_LIST_HEAD(&ring->buffers);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
 	spin_lock_init(&ring->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6c7a90a..72f811e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -357,6 +357,8 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
 	spinlock_t fence_lock;
+	struct list_head fence_signal_list;
+	struct list_head fence_unsignal_list;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 6/7] drm/i915: Updated request structure tracing
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
                     ` (4 preceding siblings ...)
  2016-01-08 18:47   ` [PATCH 5/7] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-01-08 18:47   ` John.C.Harrison
  2016-01-08 22:16     ` Chris Wilson
  2016-01-08 18:47   ` [PATCH 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
  2016-01-08 22:47   ` [PATCH 0/7] Convert requests to use struct fence Chris Wilson
  7 siblings, 1 reply; 74+ messages in thread
From: John.C.Harrison @ 2016-01-08 18:47 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.

v3: Added the current ring seqno to the notify trace point.

v5: Line wrapping to keep the style checker happy.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   |  9 +++++++--
 drivers/gpu/drm/i915/i915_irq.c   |  2 --
 drivers/gpu/drm/i915/i915_trace.h | 14 +++++++++-----
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 96cafab..ef03e4e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2816,13 +2816,16 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
 	unsigned long flags;
 	u32 seqno;
 
-	if (list_empty(&ring->fence_signal_list))
+	if (list_empty(&ring->fence_signal_list)) {
+		trace_i915_gem_request_notify(ring, 0);
 		return;
+	}
 
 	if (!fence_locked)
 		spin_lock_irqsave(&ring->fence_lock, flags);
 
 	seqno = ring->get_seqno(ring, false);
+	trace_i915_gem_request_notify(ring, seqno);
 
 	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -2836,8 +2839,10 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
 		 */
 		list_del_init(&req->signal_link);
 
-		if (!req->cancelled)
+		if (!req->cancelled) {
 			fence_signal_locked(&req->fence);
+			trace_i915_gem_request_complete(req);
+		}
 
 		if (req->irq_enabled) {
 			req->ring->irq_put(req->ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 74f8552..d280e05 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -979,8 +979,6 @@ static void notify_ring(struct intel_engine_cs *ring)
 	if (!intel_ring_initialized(ring))
 		return;
 
-	trace_i915_gem_request_notify(ring);
-
 	i915_gem_request_notify(ring, false);
 
 	wake_up_all(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 04fe849..b3ae894 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -561,23 +561,27 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 );
 
 TRACE_EVENT(i915_gem_request_notify,
-	    TP_PROTO(struct intel_engine_cs *ring),
-	    TP_ARGS(ring),
+	    TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
+	    TP_ARGS(ring, seqno),
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(bool, is_empty)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->seqno = ring->get_seqno(ring, false);
+			   __entry->seqno = seqno;
+			   __entry->is_empty =
+					list_empty(&ring->fence_signal_list);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+		      __entry->dev, __entry->ring, __entry->seqno,
+		      __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
                     ` (5 preceding siblings ...)
  2016-01-08 18:47   ` [PATCH 6/7] drm/i915: Updated request structure tracing John.C.Harrison
@ 2016-01-08 18:47   ` John.C.Harrison
  2016-01-08 22:47   ` [PATCH 0/7] Convert requests to use struct fence Chris Wilson
  7 siblings, 0 replies; 74+ messages in thread
From: John.C.Harrison @ 2016-01-08 18:47 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
without the ring's seqno value changing. This patch reduces the
overhead of these extra calls by caching the last processed seqno
value and early exiting if it has not changed.

v3: New patch for series.

v5: Added comment about last_irq_seqno usage due to code review
feedback (Tvrtko Ursulin).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 21 ++++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ef03e4e..e8ec49e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2490,6 +2490,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 
 		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
 			ring->semaphore.sync_seqno[j] = 0;
+
+		ring->last_irq_seqno = 0;
 	}
 
 	return 0;
@@ -2821,11 +2823,21 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
 		return;
 	}
 
-	if (!fence_locked)
-		spin_lock_irqsave(&ring->fence_lock, flags);
-
+	/*
+	 * Check for a new seqno. If it hasn't actually changed then early
+	 * exit without even grabbing the spinlock. Note that this is safe
+	 * because any corruption of last_irq_seqno merely results in doing
+	 * the full processing when there is potentially no work to be done.
+	 * It can never lead to not processing work that does need to happen.
+	 */
 	seqno = ring->get_seqno(ring, false);
 	trace_i915_gem_request_notify(ring, seqno);
+	if (seqno == ring->last_irq_seqno)
+		return;
+	ring->last_irq_seqno = seqno;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&ring->fence_lock, flags);
 
 	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -3120,7 +3132,10 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	 * Tidy up anything left over. This includes a call to
 	 * i915_gem_request_notify() which will make sure that any requests
 	 * that were on the signal pending list get also cleaned up.
+	 * NB: The seqno cache must be cleared otherwise the notify call will
+	 * simply return immediately.
 	 */
+	ring->last_irq_seqno = 0;
 	i915_gem_retire_requests_ring(ring);
 
 	/* Having flushed all requests from all queues, we know that all
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 72f811e..a103019 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -359,6 +359,7 @@ struct  intel_engine_cs {
 	spinlock_t fence_lock;
 	struct list_head fence_signal_list;
 	struct list_head fence_unsignal_list;
+	uint32_t last_irq_seqno;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/13] drm/i915: Convert requests to use struct fence
  2016-01-04 21:16         ` Jesse Barnes
@ 2016-01-08 21:47           ` Chris Wilson
  2016-01-08 21:55             ` Jesse Barnes
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-08 21:47 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Mon, Jan 04, 2016 at 01:16:54PM -0800, Jesse Barnes wrote:
> On 01/04/2016 12:57 PM, Chris Wilson wrote:
> > On Mon, Jan 04, 2016 at 09:20:44AM -0800, Jesse Barnes wrote:
> >> So this one has my ack.
> > 
> > This series makes a number of fundamental mistakes in seqno-interrupt
> > handling, so no.
> 
> Well unless you can enumerate the issues in enough detail for us to address them, we don't have much choice but to go ahead.  I know you've replied to a few of these threads in the past, but I don't see a current list of outstanding bugs aside from the one about modifying input params on the execbuf error path (though the code comment seems to indicate some care is being taken there at least, so should be a small fix).

Other than the series addressing the reported bugs which this is direct
conflict with?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/13] drm/i915: Convert requests to use struct fence
  2016-01-08 21:47           ` Chris Wilson
@ 2016-01-08 21:55             ` Jesse Barnes
  0 siblings, 0 replies; 74+ messages in thread
From: Jesse Barnes @ 2016-01-08 21:55 UTC (permalink / raw)
  To: Chris Wilson, John.C.Harrison, Intel-GFX

On 01/08/2016 01:47 PM, Chris Wilson wrote:
> On Mon, Jan 04, 2016 at 01:16:54PM -0800, Jesse Barnes wrote:
>> On 01/04/2016 12:57 PM, Chris Wilson wrote:
>>> On Mon, Jan 04, 2016 at 09:20:44AM -0800, Jesse Barnes wrote:
>>>> So this one has my ack.
>>>
>>> This series makes a number of fundamental mistakes in seqno-interrupt
>>> handling, so no.
>>
>> Well unless you can enumerate the issues in enough detail for us to address them, we don't have much choice but to go ahead.  I know you've replied to a few of these threads in the past, but I don't see a current list of outstanding bugs aside from the one about modifying input params on the execbuf error path (though the code comment seems to indicate some care is being taken there at least, so should be a small fix).
> 
> Other than the series addressing the reported bugs which this is direct
> conflict with?

Which patchset came first?  And yes, clearly enumerating the issues is
helpful regardless.  It doesn't really matter which came first though,
we've agreed to move forward with John's version since the scheduler has
been outstanding for so long, so your bug fixes will have to be rebased
on top of this work.  I hope that's acceptable, since I think we all
have the same ultimate goal here...

Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/7] drm/i915: Convert requests to use struct fence
  2016-01-08 18:47   ` [PATCH 1/7] drm/i915: " John.C.Harrison
@ 2016-01-08 21:59     ` Chris Wilson
  2016-01-11 19:03       ` John Harrison
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-08 21:59 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jan 08, 2016 at 06:47:22PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> There is a construct in the linux kernel called 'struct fence' that is
> intended to keep track of work that is executed on hardware. I.e. it
> solves the basic problem that the drivers 'struct
> drm_i915_gem_request' is trying to address. The request structure does
> quite a lot more than simply track the execution progress so is very
> definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain
> all the advantages that provides.
> 
> This patch makes the first step of integrating a struct fence into the
> request. It replaces the explicit reference count with that of the
> fence. It also replaces the 'is completed' test with the fence's
> equivalent. Currently, that simply chains on to the original request
> implementation. A future patch will improve this.

But this forces everyone to do the heavyweight polling until the request
is completed? The seqno is already CPU cacheable and with the exception
of interrupt polling, the question of whether a fence is complete can be
determined by just inspecting that value. Only one place (the
interrupt/signalling path) should ever be concerned about the
complication of how we emit the breadcrumb and interrupt from the ring.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/7] drm/i915: Add per context timelines to fence object
  2016-01-08 18:47   ` [PATCH 3/7] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2016-01-08 22:05     ` Chris Wilson
  2016-01-11 19:03       ` John Harrison
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-08 22:05 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jan 08, 2016 at 06:47:24PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The fence object used inside the request structure requires a sequence
> number. Although this is not used by the i915 driver itself, it could
> potentially be used by non-i915 code if the fence is passed outside of
> the driver. This is the intention as it allows external kernel drivers
> and user applications to wait on batch buffer completion
> asynchronously via the dma-buff fence API.

That doesn't make any sense as they are not limited by a single
timeline.

> To ensure that such external users are not confused by strange things
> happening with the seqno, this patch adds in a per context timeline
> that can provide a guaranteed in-order seqno value for the fence. This
> is safe because the scheduler will not re-order batch buffers within a
> context - they are considered to be mutually dependent.

You haven't added per-context breadcrumbs. What we need for being able
to execute requests from parallel timelines, but with requests within a
timeline being ordered, is a per-context page where we can emit the
per-context issued breadcrumb. Then instead of looking up the current
HW seqno in a global page, the request just looks at the current context
HW seqno in the context seq, just
i915_seqno_passed(*req->p_context_seqno, req->seqno).

The retirment ordered requests lists are moved from the engine to the
context and retirement cleanup are restricted to the context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time
  2016-01-08 18:47   ` [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2016-01-08 22:08     ` Chris Wilson
  2016-01-11 19:06       ` John Harrison
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-08 22:08 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jan 08, 2016 at 06:47:25PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The request structure is reference counted. When the count reached
> zero, the request was immediately freed and all associated objects
> were unrefereced/unallocated. This meant that the driver mutex lock
> must be held at the point where the count reaches zero. This was fine
> while all references were held internally to the driver. However, the
> plan is to allow the underlying fence object (and hence the request
> itself) to be returned to other drivers and to userland. External
> users cannot be expected to acquire a driver private mutex lock.

It's a trivial issue to fix to enable freeing requests without holding the
struct_mutex. You don't need to even add any new lists, delayed freeing
mechanisms and whotnot.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/7] drm/i915: Interrupt driven fences
  2016-01-08 18:47   ` [PATCH 5/7] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-01-08 22:14     ` Chris Wilson
  2016-01-09  0:30       ` Chris Wilson
  2016-01-08 22:46     ` Chris Wilson
  1 sibling, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-08 22:14 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jan 08, 2016 at 06:47:26PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.

But struct fence already has support for that model, i.e.
fence_add_callback(). This looks to duplicate that code.

What exactly are you trying to improve?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 6/7] drm/i915: Updated request structure tracing
  2016-01-08 18:47   ` [PATCH 6/7] drm/i915: Updated request structure tracing John.C.Harrison
@ 2016-01-08 22:16     ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2016-01-08 22:16 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jan 08, 2016 at 06:47:27PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Added the '_complete' trace event which occurs when a fence/request is
> signaled as complete. Also moved the notify event from the IRQ handler
> code to inside the notify function itself.

No. It was actually from the previous patch, but we do not query the current
sequence invoking a forcewake dance from inside the interrupt handler.
We did that years ago and realised our mistake very quickly.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/7] drm/i915: Interrupt driven fences
  2016-01-08 18:47   ` [PATCH 5/7] drm/i915: Interrupt driven fences John.C.Harrison
  2016-01-08 22:14     ` Chris Wilson
@ 2016-01-08 22:46     ` Chris Wilson
  2016-01-11 19:10       ` John Harrison
  1 sibling, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-08 22:46 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jan 08, 2016 at 06:47:26PM +0000, John.C.Harrison@Intel.com wrote:
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
> +{
> +	struct drm_i915_gem_request *req, *req_next;
> +	unsigned long flags;
>  	u32 seqno;
>  
> -	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
> +	if (list_empty(&ring->fence_signal_list))
> +		return;
> +
> +	if (!fence_locked)
> +		spin_lock_irqsave(&ring->fence_lock, flags);
>  
> -	return i915_seqno_passed(seqno, req->seqno);
> +	seqno = ring->get_seqno(ring, false);

We really don't want to do be doing the forcewake dance from inside the
interrupt handler. We made that mistake years ago.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] Convert requests to use struct fence
  2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
                     ` (6 preceding siblings ...)
  2016-01-08 18:47   ` [PATCH 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2016-01-08 22:47   ` Chris Wilson
  2016-01-11 19:15     ` John Harrison
  7 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-08 22:47 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jan 08, 2016 at 06:47:21PM +0000, John.C.Harrison@Intel.com wrote:
> [Patches against drm-intel-nightly tree fetched 17/11/2015]

Branch url?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/7] drm/i915: Interrupt driven fences
  2016-01-08 22:14     ` Chris Wilson
@ 2016-01-09  0:30       ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2016-01-09  0:30 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

On Fri, Jan 08, 2016 at 10:14:16PM +0000, Chris Wilson wrote:
> On Fri, Jan 08, 2016 at 06:47:26PM +0000, John.C.Harrison@Intel.com wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The intended usage model for struct fence is that the signalled status
> > should be set on demand rather than polled. That is, there should not
> > be a need for a 'signaled' function to be called everytime the status
> > is queried. Instead, 'something' should be done to enable a signal
> > callback from the hardware which will update the state directly. In
> > the case of requests, this is the seqno update interrupt. The idea is
> > that this callback will only be enabled on demand when something
> > actually tries to wait on the fence.
> 
> But struct fence already has support for that model, i.e.
> fence_add_callback(). This looks to duplicate that code.
> 
> What exactly are you trying to improve?

I was being dense and thought you were describing how you intended our
driver to function. (In the driver, the preference is definitely lazily
batched polling.) You could make it clearer that you are describing
how the external interface to struct fence operates.

"When signaling is enabled on a struct fence, the driver is expected to
call fence_signal() as soon as the fence is complete. To do this, we
unmask the user interrupt and then in the interrupt handler we check the
seqno and call fence_signal() on completed requests."

Then you can explain how you replaced the waitqueue_t with a list of all
requests, be they waited upon or not. And how you moved the heavyweight
seqno processing from out of process context into interrupt context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/7] drm/i915: Add per context timelines to fence object
  2016-01-08 22:05     ` Chris Wilson
@ 2016-01-11 19:03       ` John Harrison
  2016-01-11 22:47         ` Jesse Barnes
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2016-01-11 19:03 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 08/01/2016 22:05, Chris Wilson wrote:
> On Fri, Jan 08, 2016 at 06:47:24PM +0000, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The fence object used inside the request structure requires a sequence
>> number. Although this is not used by the i915 driver itself, it could
>> potentially be used by non-i915 code if the fence is passed outside of
>> the driver. This is the intention as it allows external kernel drivers
>> and user applications to wait on batch buffer completion
>> asynchronously via the dma-buff fence API.
> That doesn't make any sense as they are not limited by a single
> timeline.
I don't understand what you mean. Who is not limited by a single 
timeline?  The point is that the current seqno values cannot be used as 
there is no guarantee that they will increment globally once things like 
a scheduler and pre-emption arrive. Whereas, the fence internal 
implementation makes various assumptions about the linearity of the 
timeline. External users do not want to care about timelines or seqnos 
at all, they just want the fence API to work as documented.

>
>> To ensure that such external users are not confused by strange things
>> happening with the seqno, this patch adds in a per context timeline
>> that can provide a guaranteed in-order seqno value for the fence. This
>> is safe because the scheduler will not re-order batch buffers within a
>> context - they are considered to be mutually dependent.
> You haven't added per-context breadcrumbs. What we need for being able
> to execute requests from parallel timelines, but with requests within a
> timeline being ordered, is a per-context page where we can emit the
> per-context issued breadcrumb. Then instead of looking up the current
> HW seqno in a global page, the request just looks at the current context
> HW seqno in the context seq, just
> i915_seqno_passed(*req->p_context_seqno, req->seqno).
This patch is not attempting to implement per context seqno values. That 
can be done as future work. This patch is doing the simplest, least 
invasive implementation in order to make external fences work.

> The retirment ordered requests lists are moved from the engine to the
> context and retirement cleanup are restricted to the context.
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/7] drm/i915: Convert requests to use struct fence
  2016-01-08 21:59     ` Chris Wilson
@ 2016-01-11 19:03       ` John Harrison
  2016-01-11 22:41         ` Jesse Barnes
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2016-01-11 19:03 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 08/01/2016 21:59, Chris Wilson wrote:
> On Fri, Jan 08, 2016 at 06:47:22PM +0000, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> There is a construct in the linux kernel called 'struct fence' that is
>> intended to keep track of work that is executed on hardware. I.e. it
>> solves the basic problem that the drivers 'struct
>> drm_i915_gem_request' is trying to address. The request structure does
>> quite a lot more than simply track the execution progress so is very
>> definitely still required. However, the basic completion status side
>> could be updated to use the ready made fence implementation and gain
>> all the advantages that provides.
>>
>> This patch makes the first step of integrating a struct fence into the
>> request. It replaces the explicit reference count with that of the
>> fence. It also replaces the 'is completed' test with the fence's
>> equivalent. Currently, that simply chains on to the original request
>> implementation. A future patch will improve this.
> But this forces everyone to do the heavyweight polling until the request
> is completed?
Not sure what you mean by heavy weight polling. And as described, this 
is only an intermediate step.

> The seqno is already CPU cacheable and with the exception
> of interrupt polling, the question of whether a fence is complete can be
> determined by just inspecting that value. Only one place (the
> interrupt/signalling path) should ever be concerned about the
> complication of how we emit the breadcrumb and interrupt from the ring.
There is still only one piece of code that needs to worry about the 
internal details. This is change is simply moving that around to make it 
easier to be interrupt driven later on (which is required for the 
scheduler).


> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time
  2016-01-08 22:08     ` Chris Wilson
@ 2016-01-11 19:06       ` John Harrison
  2016-01-25 11:52         ` Maarten Lankhorst
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2016-01-11 19:06 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 08/01/2016 22:08, Chris Wilson wrote:
> On Fri, Jan 08, 2016 at 06:47:25PM +0000, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The request structure is reference counted. When the count reached
>> zero, the request was immediately freed and all associated objects
>> were unrefereced/unallocated. This meant that the driver mutex lock
>> must be held at the point where the count reaches zero. This was fine
>> while all references were held internally to the driver. However, the
>> plan is to allow the underlying fence object (and hence the request
>> itself) to be returned to other drivers and to userland. External
>> users cannot be expected to acquire a driver private mutex lock.
> It's a trivial issue to fix to enable freeing requests without holding the
> struct_mutex. You don't need to even add any new lists, delayed freeing
> mechanisms and whotnot.
> -Chris
>

As the driver stands, it is not trivial to free a request without 
holding the mutex. It does things like unpinning buffers, freeing up 
contexts (which is a whole other bundle of complication), releasing 
IRQs. It may be possible to re-organise things to make those operations 
safe to do without the mutex but it certainly does not look trivial!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/7] drm/i915: Interrupt driven fences
  2016-01-08 22:46     ` Chris Wilson
@ 2016-01-11 19:10       ` John Harrison
  2016-01-11 23:01         ` Jesse Barnes
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2016-01-11 19:10 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 08/01/2016 22:46, Chris Wilson wrote:
> On Fri, Jan 08, 2016 at 06:47:26PM +0000, John.C.Harrison@Intel.com wrote:
>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
>> +{
>> +	struct drm_i915_gem_request *req, *req_next;
>> +	unsigned long flags;
>>   	u32 seqno;
>>   
>> -	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>> +	if (list_empty(&ring->fence_signal_list))
>> +		return;
>> +
>> +	if (!fence_locked)
>> +		spin_lock_irqsave(&ring->fence_lock, flags);
>>   
>> -	return i915_seqno_passed(seqno, req->seqno);
>> +	seqno = ring->get_seqno(ring, false);
> We really don't want to do be doing the forcewake dance from inside the
> interrupt handler. We made that mistake years ago.
> -Chris
>
What forcewake dance? Nothing in the above code mentions force wake.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/7] Convert requests to use struct fence
  2016-01-08 22:47   ` [PATCH 0/7] Convert requests to use struct fence Chris Wilson
@ 2016-01-11 19:15     ` John Harrison
  0 siblings, 0 replies; 74+ messages in thread
From: John Harrison @ 2016-01-11 19:15 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 08/01/2016 22:47, Chris Wilson wrote:
> On Fri, Jan 08, 2016 at 06:47:21PM +0000, John.C.Harrison@Intel.com wrote:
>> [Patches against drm-intel-nightly tree fetched 17/11/2015]
> Branch url?

Not sure what you mean. The branch is 'drm-intel-nightly' from the DRM 
intel git repository (git://anongit.freedesktop.org/drm-intel/). How 
many versions of 'drm-intel-nightly' are there?


> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/7] drm/i915: Convert requests to use struct fence
  2016-01-11 19:03       ` John Harrison
@ 2016-01-11 22:41         ` Jesse Barnes
  0 siblings, 0 replies; 74+ messages in thread
From: Jesse Barnes @ 2016-01-11 22:41 UTC (permalink / raw)
  To: John Harrison, Chris Wilson, Intel-GFX

On 01/11/2016 11:03 AM, John Harrison wrote:
> On 08/01/2016 21:59, Chris Wilson wrote:
>> On Fri, Jan 08, 2016 at 06:47:22PM +0000, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> There is a construct in the linux kernel called 'struct fence' that is
>>> intended to keep track of work that is executed on hardware. I.e. it
>>> solves the basic problem that the drivers 'struct
>>> drm_i915_gem_request' is trying to address. The request structure does
>>> quite a lot more than simply track the execution progress so is very
>>> definitely still required. However, the basic completion status side
>>> could be updated to use the ready made fence implementation and gain
>>> all the advantages that provides.
>>>
>>> This patch makes the first step of integrating a struct fence into the
>>> request. It replaces the explicit reference count with that of the
>>> fence. It also replaces the 'is completed' test with the fence's
>>> equivalent. Currently, that simply chains on to the original request
>>> implementation. A future patch will improve this.
>> But this forces everyone to do the heavyweight polling until the request
>> is completed?
> Not sure what you mean by heavy weight polling. And as described, this is only an intermediate step.

Just the lazy_coherency removal maybe?  Chris?

Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2016-01-08 18:47   ` [PATCH 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2016-01-11 22:43     ` Jesse Barnes
  0 siblings, 0 replies; 74+ messages in thread
From: Jesse Barnes @ 2016-01-11 22:43 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

On 01/08/2016 10:47 AM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The change to the implementation of i915_gem_request_completed() means
> that the lazy coherency flag is no longer used. This can now be
> removed to simplify the interface.
> 
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
>  drivers/gpu/drm/i915/i915_drv.h      |  3 +--
>  drivers/gpu/drm/i915/i915_gem.c      | 18 +++++++++---------
>  drivers/gpu/drm/i915/intel_display.c |  2 +-
>  drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
>  5 files changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index af41e5c..b54d99e 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
>  					   i915_gem_request_get_seqno(work->flip_queued_req),
>  					   dev_priv->next_seqno,
>  					   ring->get_seqno(ring, true),
> -					   i915_gem_request_completed(work->flip_queued_req, true));
> +					   i915_gem_request_completed(work->flip_queued_req));
>  			} else
>  				seq_printf(m, "Flip not associated with any ring\n");
>  			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index aa5cba7..caf7897 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2263,8 +2263,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct drm_i915_gem_request **req_out);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>  
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>  {
>  	return fence_is_signaled(&req->fence);
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1138990..93d2f32 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1165,7 +1165,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
>  
>  	timeout = jiffies + 1;
>  	while (!need_resched()) {
> -		if (i915_gem_request_completed(req, true))
> +		if (i915_gem_request_completed(req))
>  			return 0;
>  
>  		if (time_after_eq(jiffies, timeout))
> @@ -1173,7 +1173,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
>  
>  		cpu_relax_lowlatency();
>  	}
> -	if (i915_gem_request_completed(req, false))
> +	if (i915_gem_request_completed(req))
>  		return 0;
>  
>  	return -EAGAIN;
> @@ -1217,7 +1217,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	if (list_empty(&req->list))
>  		return 0;
>  
> -	if (i915_gem_request_completed(req, true))
> +	if (i915_gem_request_completed(req))
>  		return 0;
>  
>  	timeout_expire = timeout ?
> @@ -1257,7 +1257,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> -		if (i915_gem_request_completed(req, false)) {
> +		if (i915_gem_request_completed(req)) {
>  			ret = 0;
>  			break;
>  		}
> @@ -2759,7 +2759,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
>  	struct drm_i915_gem_request *request;
>  
>  	list_for_each_entry(request, &ring->request_list, list) {
> -		if (i915_gem_request_completed(request, false))
> +		if (i915_gem_request_completed(request))
>  			continue;
>  
>  		return request;
> @@ -2900,7 +2900,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  					   struct drm_i915_gem_request,
>  					   list);
>  
> -		if (!i915_gem_request_completed(request, true))
> +		if (!i915_gem_request_completed(request))
>  			break;
>  
>  		i915_gem_request_retire(request);
> @@ -2924,7 +2924,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  	}
>  
>  	if (unlikely(ring->trace_irq_req &&
> -		     i915_gem_request_completed(ring->trace_irq_req, true))) {
> +		     i915_gem_request_completed(ring->trace_irq_req))) {
>  		ring->irq_put(ring);
>  		i915_gem_request_assign(&ring->trace_irq_req, NULL);
>  	}
> @@ -3030,7 +3030,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>  		if (list_empty(&req->list))
>  			goto retire;
>  
> -		if (i915_gem_request_completed(req, true)) {
> +		if (i915_gem_request_completed(req)) {
>  			__i915_gem_request_retire__upto(req);
>  retire:
>  			i915_gem_object_retire__read(obj, i);
> @@ -3142,7 +3142,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  	if (to == from)
>  		return 0;
>  
> -	if (i915_gem_request_completed(from_req, true))
> +	if (i915_gem_request_completed(from_req))
>  		return 0;
>  
>  	if (!i915_semaphore_is_enabled(obj->base.dev)) {
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index a5dd528..510365e 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11313,7 +11313,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
>  
>  	if (work->flip_ready_vblank == 0) {
>  		if (work->flip_queued_req &&
> -		    !i915_gem_request_completed(work->flip_queued_req, true))
> +		    !i915_gem_request_completed(work->flip_queued_req))
>  			return false;
>  
>  		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index ebd6735..c207a3a 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -7170,7 +7170,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
>  	struct request_boost *boost = container_of(work, struct request_boost, work);
>  	struct drm_i915_gem_request *req = boost->req;
>  
> -	if (!i915_gem_request_completed(req, true))
> +	if (!i915_gem_request_completed(req))
>  		gen6_rps_boost(to_i915(req->ring->dev), NULL,
>  			       req->emitted_jiffies);
>  
> @@ -7186,7 +7186,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
>  	if (req == NULL || INTEL_INFO(dev)->gen < 6)
>  		return;
>  
> -	if (i915_gem_request_completed(req, true))
> +	if (i915_gem_request_completed(req))
>  		return;
>  
>  	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
> 

I'm sure we'll have optimizations on top once this whole thing lands, so this seems fine as an intermediate step (we'll want to do lots of benchmarking and analysis after the interrupt driven stuff lands anyway).

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/7] drm/i915: Add per context timelines to fence object
  2016-01-11 19:03       ` John Harrison
@ 2016-01-11 22:47         ` Jesse Barnes
  2016-01-11 22:58           ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Jesse Barnes @ 2016-01-11 22:47 UTC (permalink / raw)
  To: John Harrison, Chris Wilson, Intel-GFX

On 01/11/2016 11:03 AM, John Harrison wrote:
> On 08/01/2016 22:05, Chris Wilson wrote:
>> On Fri, Jan 08, 2016 at 06:47:24PM +0000, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The fence object used inside the request structure requires a sequence
>>> number. Although this is not used by the i915 driver itself, it could
>>> potentially be used by non-i915 code if the fence is passed outside of
>>> the driver. This is the intention as it allows external kernel drivers
>>> and user applications to wait on batch buffer completion
>>> asynchronously via the dma-buff fence API.
>> That doesn't make any sense as they are not limited by a single
>> timeline.
> I don't understand what you mean. Who is not limited by a single timeline?  The point is that the current seqno values cannot be used as there is no guarantee that they will increment globally once things like a scheduler and pre-emption arrive. Whereas, the fence internal implementation makes various assumptions about the linearity of the timeline. External users do not want to care about timelines or seqnos at all, they just want the fence API to work as documented.
> 
>>
>>> To ensure that such external users are not confused by strange things
>>> happening with the seqno, this patch adds in a per context timeline
>>> that can provide a guaranteed in-order seqno value for the fence. This
>>> is safe because the scheduler will not re-order batch buffers within a
>>> context - they are considered to be mutually dependent.
>> You haven't added per-context breadcrumbs. What we need for being able
>> to execute requests from parallel timelines, but with requests within a
>> timeline being ordered, is a per-context page where we can emit the
>> per-context issued breadcrumb. Then instead of looking up the current
>> HW seqno in a global page, the request just looks at the current context
>> HW seqno in the context seq, just
>> i915_seqno_passed(*req->p_context_seqno, req->seqno).
> This patch is not attempting to implement per context seqno values. That can be done as future work. This patch is doing the simplest, least invasive implementation in order to make external fences work.

Right.  I think we want to move to per-context seqnos, but we don't have to do it before this work lands.  It should be easier to do it after the rest of these bits land in fact, since seqno handling will be well encapsulated aiui.

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/7] drm/i915: Add per context timelines to fence object
  2016-01-11 22:47         ` Jesse Barnes
@ 2016-01-11 22:58           ` Chris Wilson
  2016-01-12 11:03             ` John Harrison
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2016-01-11 22:58 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Mon, Jan 11, 2016 at 02:47:33PM -0800, Jesse Barnes wrote:
> On 01/11/2016 11:03 AM, John Harrison wrote:
> > On 08/01/2016 22:05, Chris Wilson wrote:
> >> On Fri, Jan 08, 2016 at 06:47:24PM +0000, John.C.Harrison@Intel.com wrote:
> >>> From: John Harrison <John.C.Harrison@Intel.com>
> >>>
> >>> The fence object used inside the request structure requires a sequence
> >>> number. Although this is not used by the i915 driver itself, it could
> >>> potentially be used by non-i915 code if the fence is passed outside of
> >>> the driver. This is the intention as it allows external kernel drivers
> >>> and user applications to wait on batch buffer completion
> >>> asynchronously via the dma-buff fence API.
> >> That doesn't make any sense as they are not limited by a single
> >> timeline.
> > I don't understand what you mean. Who is not limited by a single timeline?  The point is that the current seqno values cannot be used as there is no guarantee that they will increment globally once things like a scheduler and pre-emption arrive. Whereas, the fence internal implementation makes various assumptions about the linearity of the timeline. External users do not want to care about timelines or seqnos at all, they just want the fence API to work as documented.
> > 
> >>
> >>> To ensure that such external users are not confused by strange things
> >>> happening with the seqno, this patch adds in a per context timeline
> >>> that can provide a guaranteed in-order seqno value for the fence. This
> >>> is safe because the scheduler will not re-order batch buffers within a
> >>> context - they are considered to be mutually dependent.
> >> You haven't added per-context breadcrumbs. What we need for being able
> >> to execute requests from parallel timelines, but with requests within a
> >> timeline being ordered, is a per-context page where we can emit the
> >> per-context issued breadcrumb. Then instead of looking up the current
> >> HW seqno in a global page, the request just looks at the current context
> >> HW seqno in the context seq, just
> >> i915_seqno_passed(*req->p_context_seqno, req->seqno).
> > This patch is not attempting to implement per context seqno values. That can be done as future work. This patch is doing the simplest, least invasive implementation in order to make external fences work.
> 
> Right.  I think we want to move to per-context seqnos, but we don't have to do it before this work lands.  It should be easier to do it after the rest of these bits land in fact, since seqno handling will be well encapsulated aiui.

This patch is irrelevent then. I think it is actually worse because it
is encapsulating a design detail that is fundamentally wrong.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/7] drm/i915: Interrupt driven fences
  2016-01-11 19:10       ` John Harrison
@ 2016-01-11 23:01         ` Jesse Barnes
  0 siblings, 0 replies; 74+ messages in thread
From: Jesse Barnes @ 2016-01-11 23:01 UTC (permalink / raw)
  To: John Harrison, Chris Wilson, Intel-GFX

On 01/11/2016 11:10 AM, John Harrison wrote:
> On 08/01/2016 22:46, Chris Wilson wrote:
>> On Fri, Jan 08, 2016 at 06:47:26PM +0000, John.C.Harrison@Intel.com wrote:
>>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked)
>>> +{
>>> +    struct drm_i915_gem_request *req, *req_next;
>>> +    unsigned long flags;
>>>       u32 seqno;
>>>   -    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>>> +    if (list_empty(&ring->fence_signal_list))
>>> +        return;
>>> +
>>> +    if (!fence_locked)
>>> +        spin_lock_irqsave(&ring->fence_lock, flags);
>>>   -    return i915_seqno_passed(seqno, req->seqno);
>>> +    seqno = ring->get_seqno(ring, false);
>> We really don't want to do be doing the forcewake dance from inside the
>> interrupt handler. We made that mistake years ago.
>> -Chris
>>
> What forcewake dance? Nothing in the above code mentions force wake.

get_seqno() w/o lazy_coherency set will do a POSTING_READ of the ring active head, which goes through our crazy read function and does forcewake.  So we may need something smarter here.

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/7] drm/i915: Add per context timelines to fence object
  2016-01-11 22:58           ` Chris Wilson
@ 2016-01-12 11:03             ` John Harrison
  2016-01-12 11:26               ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: John Harrison @ 2016-01-12 11:03 UTC (permalink / raw)
  To: Chris Wilson, Jesse Barnes, Intel-GFX

On 11/01/2016 22:58, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 02:47:33PM -0800, Jesse Barnes wrote:
>> On 01/11/2016 11:03 AM, John Harrison wrote:
>>> On 08/01/2016 22:05, Chris Wilson wrote:
>>>> On Fri, Jan 08, 2016 at 06:47:24PM +0000, John.C.Harrison@Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> The fence object used inside the request structure requires a sequence
>>>>> number. Although this is not used by the i915 driver itself, it could
>>>>> potentially be used by non-i915 code if the fence is passed outside of
>>>>> the driver. This is the intention as it allows external kernel drivers
>>>>> and user applications to wait on batch buffer completion
>>>>> asynchronously via the dma-buff fence API.
>>>> That doesn't make any sense as they are not limited by a single
>>>> timeline.
>>> I don't understand what you mean. Who is not limited by a single timeline?  The point is that the current seqno values cannot be used as there is no guarantee that they will increment globally once things like a scheduler and pre-emption arrive. Whereas, the fence internal implementation makes various assumptions about the linearity of the timeline. External users do not want to care about timelines or seqnos at all, they just want the fence API to work as documented.
>>>
>>>>> To ensure that such external users are not confused by strange things
>>>>> happening with the seqno, this patch adds in a per context timeline
>>>>> that can provide a guaranteed in-order seqno value for the fence. This
>>>>> is safe because the scheduler will not re-order batch buffers within a
>>>>> context - they are considered to be mutually dependent.
>>>> You haven't added per-context breadcrumbs. What we need for being able
>>>> to execute requests from parallel timelines, but with requests within a
>>>> timeline being ordered, is a per-context page where we can emit the
>>>> per-context issued breadcrumb. Then instead of looking up the current
>>>> HW seqno in a global page, the request just looks at the current context
>>>> HW seqno in the context seq, just
>>>> i915_seqno_passed(*req->p_context_seqno, req->seqno).
>>> This patch is not attempting to implement per context seqno values. That can be done as future work. This patch is doing the simplest, least invasive implementation in order to make external fences work.
>> Right.  I think we want to move to per-context seqnos, but we don't have to do it before this work lands.  It should be easier to do it after the rest of these bits land in fact, since seqno handling will be well encapsulated aiui.
> This patch is irrelevent then. I think it is actually worse because it
> is encapsulating a design detail that is fundamentally wrong.
> -Chris
>

Some kind of per-context timeline is required for the external use of 
i915 fences. Seqnos cannot be used without a lot of rework because they 
dance around with scheduler re-ordering and pre-emption - a low priority 
request could go through many different seqnos if it keeps getting 
pre-empted. We need to be able to use fences externally on Android at 
least, and with SVM it becomes vital for linux too. Therefore we need 
some solution. And this is much simpler and than re-writing the whole of 
the driver's seqno management.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/7] drm/i915: Add per context timelines to fence object
  2016-01-12 11:03             ` John Harrison
@ 2016-01-12 11:26               ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2016-01-12 11:26 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Tue, Jan 12, 2016 at 11:03:08AM +0000, John Harrison wrote:
> On 11/01/2016 22:58, Chris Wilson wrote:
> >On Mon, Jan 11, 2016 at 02:47:33PM -0800, Jesse Barnes wrote:
> >>On 01/11/2016 11:03 AM, John Harrison wrote:
> >>>On 08/01/2016 22:05, Chris Wilson wrote:
> >>>>On Fri, Jan 08, 2016 at 06:47:24PM +0000, John.C.Harrison@Intel.com wrote:
> >>>>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>>>
> >>>>>The fence object used inside the request structure requires a sequence
> >>>>>number. Although this is not used by the i915 driver itself, it could
> >>>>>potentially be used by non-i915 code if the fence is passed outside of
> >>>>>the driver. This is the intention as it allows external kernel drivers
> >>>>>and user applications to wait on batch buffer completion
> >>>>>asynchronously via the dma-buff fence API.
> >>>>That doesn't make any sense as they are not limited by a single
> >>>>timeline.
> >>>I don't understand what you mean. Who is not limited by a single timeline?  The point is that the current seqno values cannot be used as there is no guarantee that they will increment globally once things like a scheduler and pre-emption arrive. Whereas, the fence internal implementation makes various assumptions about the linearity of the timeline. External users do not want to care about timelines or seqnos at all, they just want the fence API to work as documented.
> >>>
> >>>>>To ensure that such external users are not confused by strange things
> >>>>>happening with the seqno, this patch adds in a per context timeline
> >>>>>that can provide a guaranteed in-order seqno value for the fence. This
> >>>>>is safe because the scheduler will not re-order batch buffers within a
> >>>>>context - they are considered to be mutually dependent.
> >>>>You haven't added per-context breadcrumbs. What we need for being able
> >>>>to execute requests from parallel timelines, but with requests within a
> >>>>timeline being ordered, is a per-context page where we can emit the
> >>>>per-context issued breadcrumb. Then instead of looking up the current
> >>>>HW seqno in a global page, the request just looks at the current context
> >>>>HW seqno in the context seq, just
> >>>>i915_seqno_passed(*req->p_context_seqno, req->seqno).
> >>>This patch is not attempting to implement per context seqno values. That can be done as future work. This patch is doing the simplest, least invasive implementation in order to make external fences work.
> >>Right.  I think we want to move to per-context seqnos, but we don't have to do it before this work lands.  It should be easier to do it after the rest of these bits land in fact, since seqno handling will be well encapsulated aiui.
> >This patch is irrelevent then. I think it is actually worse because it
> >is encapsulating a design detail that is fundamentally wrong.
> >-Chris
> >
> 
> Some kind of per-context timeline is required for the external use
> of i915 fences. Seqnos cannot be used without a lot of rework
> because they dance around with scheduler re-ordering and pre-emption
> - a low priority request could go through many different seqnos if
> it keeps getting pre-empted. We need to be able to use fences
> externally on Android at least, and with SVM it becomes vital for
> linux too. Therefore we need some solution. And this is much simpler
> and than re-writing the whole of the driver's seqno management.

Actually no. per-context seqno are trivial to implement, and allow for
request reordering between timelines with the seqno known a priori, that
includes priority handling and pre-emption, and struct fence of course
(since each context is a separate timeline).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time
  2016-01-11 19:06       ` John Harrison
@ 2016-01-25 11:52         ` Maarten Lankhorst
  2016-01-25 12:11           ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Maarten Lankhorst @ 2016-01-25 11:52 UTC (permalink / raw)
  To: John Harrison, Chris Wilson, Intel-GFX

Op 11-01-16 om 20:06 schreef John Harrison:
> On 08/01/2016 22:08, Chris Wilson wrote:
>> On Fri, Jan 08, 2016 at 06:47:25PM +0000, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The request structure is reference counted. When the count reached
>>> zero, the request was immediately freed and all associated objects
>>> were unrefereced/unallocated. This meant that the driver mutex lock
>>> must be held at the point where the count reaches zero. This was fine
>>> while all references were held internally to the driver. However, the
>>> plan is to allow the underlying fence object (and hence the request
>>> itself) to be returned to other drivers and to userland. External
>>> users cannot be expected to acquire a driver private mutex lock.
>> It's a trivial issue to fix to enable freeing requests without holding the
>> struct_mutex. You don't need to even add any new lists, delayed freeing
>> mechanisms and whotnot.
>> -Chris
>>
>
> As the driver stands, it is not trivial to free a request without holding the mutex. It does things like unpinning buffers, freeing up contexts (which is a whole other bundle of complication), releasing IRQs. It may be possible to re-organise things to make those operations safe to do without the mutex but it certainly does not look trivial!
Those things could be done as soon as the fence is signaled, doing it on free() is slightly too late..

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time
  2016-01-25 11:52         ` Maarten Lankhorst
@ 2016-01-25 12:11           ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2016-01-25 12:11 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: Intel-GFX

On Mon, Jan 25, 2016 at 12:52:46PM +0100, Maarten Lankhorst wrote:
> Op 11-01-16 om 20:06 schreef John Harrison:
> > On 08/01/2016 22:08, Chris Wilson wrote:
> >> On Fri, Jan 08, 2016 at 06:47:25PM +0000, John.C.Harrison@Intel.com wrote:
> >>> From: John Harrison <John.C.Harrison@Intel.com>
> >>>
> >>> The request structure is reference counted. When the count reached
> >>> zero, the request was immediately freed and all associated objects
> >>> were unrefereced/unallocated. This meant that the driver mutex lock
> >>> must be held at the point where the count reaches zero. This was fine
> >>> while all references were held internally to the driver. However, the
> >>> plan is to allow the underlying fence object (and hence the request
> >>> itself) to be returned to other drivers and to userland. External
> >>> users cannot be expected to acquire a driver private mutex lock.
> >> It's a trivial issue to fix to enable freeing requests without holding the
> >> struct_mutex. You don't need to even add any new lists, delayed freeing
> >> mechanisms and whotnot.
> >> -Chris
> >>
> >
> > As the driver stands, it is not trivial to free a request without holding the mutex. It does things like unpinning buffers, freeing up contexts (which is a whole other bundle of complication), releasing IRQs. It may be possible to re-organise things to make those operations safe to do without the mutex but it certainly does not look trivial!
> Those things could be done as soon as the fence is signaled, doing it on free() is slightly too late..

Most recently

http://patchwork.freedesktop.org/patch/69900/

The caveat is that you need to teach execlists about the lifetime of its
contexts, which is also quite simple (and in the process takes yet
another step towards making execlist less special).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2016-01-25 12:11 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-11 13:11 [PATCH 00/13] Convert requests to use struct fence John.C.Harrison
2015-12-11 13:11 ` [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
2015-12-17 17:32   ` [Intel-gfx] " Jesse Barnes
2015-12-11 13:11 ` [PATCH 02/13] staging/android/sync: add sync_fence_create_dma John.C.Harrison
2015-12-17 17:29   ` Jesse Barnes
2015-12-11 13:11 ` [PATCH 03/13] staging/android/sync: Move sync framework out of staging John.C.Harrison
2015-12-17 17:35   ` Jesse Barnes
2015-12-21 10:03     ` Daniel Vetter
2015-12-21 14:20       ` John Harrison
2015-12-21 15:46         ` Daniel Vetter
2015-12-22 12:14           ` John Harrison
2015-12-11 13:11 ` [PATCH 04/13] android/sync: Improved debug dump to dmesg John.C.Harrison
2015-12-17 17:36   ` Jesse Barnes
2015-12-11 13:11 ` [PATCH 05/13] drm/i915: Convert requests to use struct fence John.C.Harrison
2015-12-17 17:43   ` Jesse Barnes
2016-01-04 17:20     ` Jesse Barnes
2016-01-04 20:57       ` Chris Wilson
2016-01-04 21:16         ` Jesse Barnes
2016-01-08 21:47           ` Chris Wilson
2016-01-08 21:55             ` Jesse Barnes
2015-12-11 13:11 ` [PATCH 06/13] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2015-12-11 13:11 ` [PATCH 07/13] drm/i915: Add per context timelines to fence object John.C.Harrison
2015-12-17 17:49   ` Jesse Barnes
2015-12-21 10:16     ` Chris Wilson
2015-12-11 13:11 ` [PATCH 08/13] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
2015-12-11 13:11 ` [PATCH 09/13] drm/i915: Interrupt driven fences John.C.Harrison
2015-12-11 15:30   ` John Harrison
2015-12-11 16:07     ` Tvrtko Ursulin
2015-12-11 13:11 ` [PATCH 10/13] drm/i915: Updated request structure tracing John.C.Harrison
2015-12-11 13:11 ` [PATCH 11/13] android/sync: Fix reversed sense of signaled fence John.C.Harrison
2015-12-11 15:57   ` Tvrtko Ursulin
2015-12-14 11:22     ` John Harrison
2015-12-14 12:37       ` Tvrtko Ursulin
2015-12-11 13:12 ` [PATCH 12/13] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
2015-12-11 15:29   ` Tvrtko Ursulin
2015-12-14 11:46     ` John Harrison
2015-12-14 12:23       ` Chris Wilson
2015-12-11 13:12 ` [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2015-12-11 14:28   ` Tvrtko Ursulin
2015-12-14 11:58     ` John Harrison
2015-12-14 12:52       ` Tvrtko Ursulin
2015-12-11 14:55   ` Chris Wilson
2015-12-11 15:35     ` John Harrison
2015-12-11 16:07       ` Chris Wilson
2016-01-08 18:47 ` [PATCH 0/7] Convert requests to use struct fence John.C.Harrison
2016-01-08 18:47   ` [PATCH 1/7] drm/i915: " John.C.Harrison
2016-01-08 21:59     ` Chris Wilson
2016-01-11 19:03       ` John Harrison
2016-01-11 22:41         ` Jesse Barnes
2016-01-08 18:47   ` [PATCH 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2016-01-11 22:43     ` Jesse Barnes
2016-01-08 18:47   ` [PATCH 3/7] drm/i915: Add per context timelines to fence object John.C.Harrison
2016-01-08 22:05     ` Chris Wilson
2016-01-11 19:03       ` John Harrison
2016-01-11 22:47         ` Jesse Barnes
2016-01-11 22:58           ` Chris Wilson
2016-01-12 11:03             ` John Harrison
2016-01-12 11:26               ` Chris Wilson
2016-01-08 18:47   ` [PATCH 4/7] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
2016-01-08 22:08     ` Chris Wilson
2016-01-11 19:06       ` John Harrison
2016-01-25 11:52         ` Maarten Lankhorst
2016-01-25 12:11           ` Chris Wilson
2016-01-08 18:47   ` [PATCH 5/7] drm/i915: Interrupt driven fences John.C.Harrison
2016-01-08 22:14     ` Chris Wilson
2016-01-09  0:30       ` Chris Wilson
2016-01-08 22:46     ` Chris Wilson
2016-01-11 19:10       ` John Harrison
2016-01-11 23:01         ` Jesse Barnes
2016-01-08 18:47   ` [PATCH 6/7] drm/i915: Updated request structure tracing John.C.Harrison
2016-01-08 22:16     ` Chris Wilson
2016-01-08 18:47   ` [PATCH 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2016-01-08 22:47   ` [PATCH 0/7] Convert requests to use struct fence Chris Wilson
2016-01-11 19:15     ` John Harrison

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.