intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission
@ 2021-07-27 15:21 Matthew Brost
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 1/7] include/drm-uapi: Add parallel context configuration uAPI Matthew Brost
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Matthew Brost @ 2021-07-27 15:21 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-gfx

IGT updates for GuC submission [1] and parallel submission (aka multi-bb
execbuf) [2]. This entails adding tests for parallel submission and
teaching IGTs to know of static priority mapping.

More IGTs likely need to be updated gem_ctx_persistence and i915_hangman
come to mind. Expect following series to address those tests.

v2:
 (CI)
  - Fix off by 1 error in reserved fields of drm_i915_engine_info

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/series/91840/
[2] https://patchwork.freedesktop.org/series/92789/

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Matthew Brost (7):
  include/drm-uapi: Add parallel context configuration uAPI
  include/drm-uapi: Add logical mapping uAPI
  lib/intel_ctx: Add support for parallel contexts to intel_ctx library
  i915/gem_exec_balancer: Test parallel execbuf
  include/drm-uapi: Add static priority mapping UAPI
  i915/gem_scheduler: Make gem_scheduler understand static priority
    mapping
  i915/gem_ctx_shared: Make gem_ctx_shared understand static priority
    mapping

 include/drm-uapi/i915_drm.h    | 145 +++++++++-
 lib/i915/gem_scheduler.c       |  13 +
 lib/i915/gem_scheduler.h       |   1 +
 lib/intel_ctx.c                |  28 +-
 lib/intel_ctx.h                |   2 +
 lib/intel_reg.h                |   5 +
 tests/i915/gem_ctx_shared.c    |   5 +-
 tests/i915/gem_exec_balancer.c | 487 +++++++++++++++++++++++++++++++++
 tests/i915/gem_exec_schedule.c |  47 ++--
 9 files changed, 709 insertions(+), 24 deletions(-)

-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 1/7] include/drm-uapi: Add parallel context configuration uAPI
  2021-07-27 15:21 [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission Matthew Brost
@ 2021-07-27 15:21 ` Matthew Brost
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 2/7] include/drm-uapi: Add logical mapping uAPI Matthew Brost
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Matthew Brost @ 2021-07-27 15:21 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-gfx

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 include/drm-uapi/i915_drm.h | 128 ++++++++++++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index a1c0030c3..3c1aac348 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -1705,6 +1705,7 @@ struct drm_i915_gem_context_param {
  * Extensions:
  *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
  *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
+ *   i915_context_engines_parallel_submit (I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)
  */
 #define I915_CONTEXT_PARAM_ENGINES	0xa
 
@@ -1883,10 +1884,137 @@ struct i915_context_engines_bond {
 	struct i915_engine_class_instance engines[N__]; \
 } __attribute__((packed)) name__
 
+/**
+ * struct i915_context_engines_parallel_submit - Configure engine for
+ * parallel submission.
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs. Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
+ * many BBs there are based on the slot's configuration. The N BBs are the last
+ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * The default placement behavior is to create implicit bonds between each
+ * context if each context maps to more than 1 physical engine (e.g. context is
+ * a virtual engine). Also we only allow contexts of same engine class and these
+ * contexts must be in logically contiguous order. Examples of the placement
+ * behavior described below. Lastly, the default is to not allow BBs to
+ * preempted mid BB rather insert coordinated preemption on all hardware
+ * contexts between each set of BBs. Flags may be added in the future to change
+ * both of these default behaviors.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * interface.
+ *
+ * .. code-block:: none
+ *
+ *	Example 1 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=1,
+ *		     engines=CS[0],CS[1])
+ *
+ *	Results in the following valid placement:
+ *	CS[0], CS[1]
+ *
+ *	Example 2 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[2],CS[1],CS[3])
+ *
+ *	Results in the following valid placements:
+ *	CS[0], CS[1]
+ *	CS[2], CS[3]
+ *
+ *	This can also be thought of as 2 virtual engines described by 2-D array
+ *	in the engines the field with bonds placed between each index of the
+ *	virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
+ *	CS[3].
+ *	VE[0] = CS[0], CS[2]
+ *	VE[1] = CS[1], CS[3]
+ *
+ *	Example 3 pseudo code:
+ *	CS[X] = generic engine of same class, logical instance X
+ *	INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ *	set_engines(INVALID)
+ *	set_parallel(engine_index=0, width=2, num_siblings=2,
+ *		     engines=CS[0],CS[1],CS[1],CS[3])
+ *
+ *	Results in the following valid and invalid placements:
+ *	CS[0], CS[1]
+ *	CS[1], CS[3] - Not logical contiguous, return -EINVAL
+ */
+struct i915_context_engines_parallel_submit {
+	/**
+	 * @base: base user extension.
+	 */
+	struct i915_user_extension base;
+
+	/**
+	 * @engine_index: slot for parallel engine
+	 */
+	__u16 engine_index;
+
+	/**
+	 * @width: number of contexts per parallel engine
+	 */
+	__u16 width;
+
+	/**
+	 * @num_siblings: number of siblings per context
+	 */
+	__u16 num_siblings;
+
+	/**
+	 * @mbz16: reserved for future use; must be zero
+	 */
+	__u16 mbz16;
+
+	/**
+	 * @flags: all undefined flags must be zero, currently not defined flags
+	 */
+	__u64 flags;
+
+	/**
+	 * @mbz64: reserved for future use; must be zero
+	 */
+	__u64 mbz64[3];
+
+	/**
+	 * @engines: 2-d array of engine instances to configure parallel engine
+	 *
+	 * length = width (i) * num_siblings (j)
+	 * index = j + i * num_siblings
+	 */
+	struct i915_engine_class_instance engines[0];
+
+} __packed;
+
+#define I915_DEFINE_CONTEXT_ENGINES_PARALLEL_SUBMIT(name__, N__) struct { \
+	struct i915_user_extension base; \
+	__u16 engine_index; \
+	__u16 width; \
+	__u16 num_siblings; \
+	__u16 mbz16; \
+	__u64 flags; \
+	__u64 mbz64[3]; \
+	struct i915_engine_class_instance engines[N__]; \
+} __attribute__((packed)) name__
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
 #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0 /* see i915_context_engines_load_balance */
 #define I915_CONTEXT_ENGINES_EXT_BOND 1 /* see i915_context_engines_bond */
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */
 	struct i915_engine_class_instance engines[0];
 } __attribute__((packed));
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 2/7] include/drm-uapi: Add logical mapping uAPI
  2021-07-27 15:21 [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission Matthew Brost
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 1/7] include/drm-uapi: Add parallel context configuration uAPI Matthew Brost
@ 2021-07-27 15:21 ` Matthew Brost
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 3/7] lib/intel_ctx: Add support for parallel contexts to intel_ctx library Matthew Brost
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Matthew Brost @ 2021-07-27 15:21 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-gfx

v2:
 (CI)
  - Fix off by 1 error in size of reserved fields

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 include/drm-uapi/i915_drm.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 3c1aac348..332c07e3d 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -2518,14 +2518,20 @@ struct drm_i915_engine_info {
 
 	/** @flags: Engine flags. */
 	__u64 flags;
+#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE		(1 << 0)
 
 	/** @capabilities: Capabilities of this engine. */
 	__u64 capabilities;
 #define I915_VIDEO_CLASS_CAPABILITY_HEVC		(1 << 0)
 #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC	(1 << 1)
 
+	/** @logical_instance: Logical instance of engine */
+	__u16 logical_instance;
+
 	/** @rsvd1: Reserved fields. */
-	__u64 rsvd1[4];
+	__u16 rsvd1[3];
+	/** @rsvd2: Reserved fields. */
+	__u64 rsvd2[3];
 };
 
 /**
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 3/7] lib/intel_ctx: Add support for parallel contexts to intel_ctx library
  2021-07-27 15:21 [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission Matthew Brost
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 1/7] include/drm-uapi: Add parallel context configuration uAPI Matthew Brost
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 2/7] include/drm-uapi: Add logical mapping uAPI Matthew Brost
@ 2021-07-27 15:21 ` Matthew Brost
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 4/7] i915/gem_exec_balancer: Test parallel execbuf Matthew Brost
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Matthew Brost @ 2021-07-27 15:21 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-gfx

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 lib/intel_ctx.c | 28 +++++++++++++++++++++++++++-
 lib/intel_ctx.h |  2 ++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/lib/intel_ctx.c b/lib/intel_ctx.c
index f28c15544..11ec6fca4 100644
--- a/lib/intel_ctx.c
+++ b/lib/intel_ctx.c
@@ -83,6 +83,7 @@ __context_create_cfg(int fd, const intel_ctx_cfg_t *cfg, uint32_t *ctx_id)
 {
 	uint64_t ext_root = 0;
 	I915_DEFINE_CONTEXT_ENGINES_LOAD_BALANCE(balance, GEM_MAX_ENGINES);
+	I915_DEFINE_CONTEXT_ENGINES_PARALLEL_SUBMIT(parallel, GEM_MAX_ENGINES);
 	I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, GEM_MAX_ENGINES);
 	struct drm_i915_gem_context_create_ext_setparam engines_param, vm_param;
 	struct drm_i915_gem_context_create_ext_setparam persist_param;
@@ -117,7 +118,29 @@ __context_create_cfg(int fd, const intel_ctx_cfg_t *cfg, uint32_t *ctx_id)
 		unsigned num_logical_engines;
 		memset(&engines, 0, sizeof(engines));
 
-		if (cfg->load_balance) {
+		if (cfg->parallel) {
+			memset(&parallel, 0, sizeof(parallel));
+
+			num_logical_engines = 1;
+
+			parallel.base.name =
+				I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT;
+
+			engines.engines[0].engine_class =
+				I915_ENGINE_CLASS_INVALID;
+			engines.engines[0].engine_instance =
+				I915_ENGINE_CLASS_INVALID_NONE;
+
+			parallel.num_siblings = cfg->num_engines;
+			parallel.width = cfg->width;
+			for (i = 0; i < cfg->num_engines * cfg->width; i++) {
+				igt_assert_eq(cfg->engines[0].engine_class,
+					      cfg->engines[i].engine_class);
+				parallel.engines[i] = cfg->engines[i];
+			}
+
+			engines.extensions = to_user_pointer(&parallel);
+		} else if (cfg->load_balance) {
 			memset(&balance, 0, sizeof(balance));
 
 			/* In this case, the first engine is the virtual
@@ -127,6 +150,9 @@ __context_create_cfg(int fd, const intel_ctx_cfg_t *cfg, uint32_t *ctx_id)
 			igt_assert(cfg->num_engines + 1 <= GEM_MAX_ENGINES);
 			num_logical_engines = cfg->num_engines + 1;
 
+			balance.base.name =
+				I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE;
+
 			engines.engines[0].engine_class =
 				I915_ENGINE_CLASS_INVALID;
 			engines.engines[0].engine_instance =
diff --git a/lib/intel_ctx.h b/lib/intel_ctx.h
index 9649f6d96..89c65fcd3 100644
--- a/lib/intel_ctx.h
+++ b/lib/intel_ctx.h
@@ -46,7 +46,9 @@ typedef struct intel_ctx_cfg {
 	uint32_t vm;
 	bool nopersist;
 	bool load_balance;
+	bool parallel;
 	unsigned int num_engines;
+	unsigned int width;
 	struct i915_engine_class_instance engines[GEM_MAX_ENGINES];
 } intel_ctx_cfg_t;
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 4/7] i915/gem_exec_balancer: Test parallel execbuf
  2021-07-27 15:21 [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission Matthew Brost
                   ` (2 preceding siblings ...)
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 3/7] lib/intel_ctx: Add support for parallel contexts to intel_ctx library Matthew Brost
@ 2021-07-27 15:21 ` Matthew Brost
  2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 5/7] include/drm-uapi: Add static priority mapping UAPI Matthew Brost
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Matthew Brost @ 2021-07-27 15:21 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-gfx

Add basic parallel execbuf submission test which more or less just
submits the same BB in loop a which does an atomic increment to a memory
location. The memory location is checked at the end for the correct
value. Different sections use various IOCTL options (e.g. fences,
location of BBs, etc...).

In addition to above sections, an additional section ensure the ordering
of parallel submission by submitting a spinning batch to 1 individual
engine, submit a parallel execbuf to all engines instances within the
class, verify none on parallel execbuf make to hardware, release
spinner, and finally verify everything has completed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 lib/intel_reg.h                |   5 +
 tests/i915/gem_exec_balancer.c | 487 +++++++++++++++++++++++++++++++++
 2 files changed, 492 insertions(+)

diff --git a/lib/intel_reg.h b/lib/intel_reg.h
index ac1fc6cbc..146ac76c9 100644
--- a/lib/intel_reg.h
+++ b/lib/intel_reg.h
@@ -2593,6 +2593,11 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
 #define STATE3D_COLOR_FACTOR	((0x3<<29)|(0x1d<<24)|(0x01<<16))
 
+/* Atomics */
+#define MI_ATOMIC			((0x2f << 23) | 2)
+#define   MI_ATOMIC_INLINE_DATA         (1 << 18)
+#define   MI_ATOMIC_ADD                 (0x7 << 8)
+
 /* Batch */
 #define MI_BATCH_BUFFER		((0x30 << 23) | 1)
 #define MI_BATCH_BUFFER_START	(0x31 << 23)
diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
index 2f98950bb..053f1d1f7 100644
--- a/tests/i915/gem_exec_balancer.c
+++ b/tests/i915/gem_exec_balancer.c
@@ -25,6 +25,7 @@
 #include <sched.h>
 #include <sys/ioctl.h>
 #include <sys/signal.h>
+#include <poll.h>
 
 #include "i915/gem.h"
 #include "i915/gem_create.h"
@@ -56,6 +57,31 @@ static size_t sizeof_load_balance(int count)
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
 
+static int
+__i915_query(int fd, struct drm_i915_query *q)
+{
+	if (igt_ioctl(fd, DRM_IOCTL_I915_QUERY, q))
+		return -errno;
+
+	return 0;
+}
+
+static int
+__i915_query_items(int fd, struct drm_i915_query_item *items, uint32_t n_items)
+{
+	struct drm_i915_query q = {
+		.num_items = n_items,
+		.items_ptr = to_user_pointer(items),
+		};
+
+	return __i915_query(fd, &q);
+}
+
+#define i915_query_items(fd, items, n_items) do { \
+		igt_assert_eq(__i915_query_items(fd, items, n_items), 0); \
+		errno = 0; \
+	} while (0)
+
 static bool has_class_instance(int i915, uint16_t class, uint16_t instance)
 {
 	int fd;
@@ -2691,6 +2717,380 @@ static void nohangcheck(int i915)
 	close(params);
 }
 
+static void check_bo(int i915, uint32_t handle, unsigned int count, bool wait)
+{
+	uint32_t *map;
+
+	map = gem_mmap__cpu(i915, handle, 0, 4096, PROT_READ);
+	if (wait)
+		gem_set_domain(i915, handle, I915_GEM_DOMAIN_CPU,
+			       I915_GEM_DOMAIN_CPU);
+	igt_assert_eq(map[0], count);
+	munmap(map, 4096);
+}
+
+static struct drm_i915_query_engine_info *query_engine_info(int i915)
+{
+	struct drm_i915_query_engine_info *engines;
+	struct drm_i915_query_item item;
+
+#define QUERY_SIZE	0x4000
+	engines = malloc(QUERY_SIZE);
+	igt_assert(engines);
+
+	memset(engines, 0, QUERY_SIZE);
+	memset(&item, 0, sizeof(item));
+	item.query_id = DRM_I915_QUERY_ENGINE_INFO;
+	item.data_ptr = to_user_pointer(engines);
+	item.length = QUERY_SIZE;
+
+	i915_query_items(i915, &item, 1);
+	igt_assert(item.length >= 0);
+	igt_assert(item.length <= QUERY_SIZE);
+#undef QUERY_SIZE
+
+	return engines;
+}
+
+/* This function only works if siblings contains all instances of a class */
+static void logical_sort_siblings(int i915,
+				  struct i915_engine_class_instance *siblings,
+				  unsigned int count)
+{
+	struct i915_engine_class_instance *sorted;
+	struct drm_i915_query_engine_info *engines;
+	unsigned int i, j;
+
+	sorted = calloc(count, sizeof(*sorted));
+	igt_assert(sorted);
+
+	engines = query_engine_info(i915);
+
+	for (j = 0; j < count; ++j) {
+		for (i = 0; i < engines->num_engines; ++i) {
+			if (siblings[j].engine_class ==
+			    engines->engines[i].engine.engine_class &&
+			    siblings[j].engine_instance ==
+			    engines->engines[i].engine.engine_instance) {
+				uint16_t logical_instance =
+					engines->engines[i].logical_instance;
+
+				igt_assert(logical_instance < count);
+				igt_assert(!sorted[logical_instance].engine_class);
+				igt_assert(!sorted[logical_instance].engine_instance);
+
+				sorted[logical_instance] = siblings[j];
+				break;
+			}
+		}
+		igt_assert(i != engines->num_engines);
+	}
+
+	memcpy(siblings, sorted, sizeof(*sorted) * count);
+	free(sorted);
+	free(engines);
+}
+
+#define PARALLEL_BB_FIRST		(0x1 << 0)
+#define PARALLEL_OUT_FENCE		(0x1 << 1)
+#define PARALLEL_IN_FENCE		(0x1 << 2)
+#define PARALLEL_SUBMIT_FENCE		(0x1 << 3)
+#define PARALLEL_CONTEXTS		(0x1 << 4)
+#define PARALLEL_VIRTUAL		(0x1 << 5)
+
+static void parallel_thread(int i915, unsigned int flags,
+			    struct i915_engine_class_instance *siblings,
+			    unsigned int count, unsigned int bb_per_execbuf)
+{
+	const intel_ctx_t *ctx = NULL;
+	int n, i, j, fence = 0;
+	uint32_t batch[16];
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 obj[32];
+#define PARALLEL_BB_LOOP_COUNT	0x1000
+	const intel_ctx_t *ctxs[PARALLEL_BB_LOOP_COUNT];
+	uint32_t target_bo_idx = 0;
+	uint32_t first_bb_idx = 1;
+	intel_ctx_cfg_t cfg;
+
+	if (flags & PARALLEL_BB_FIRST) {
+		target_bo_idx = bb_per_execbuf;
+		first_bb_idx = 0;
+	}
+
+	memset(&cfg, 0, sizeof(cfg));
+	if (flags & PARALLEL_VIRTUAL) {
+		cfg.parallel = true;
+		cfg.num_engines = count / bb_per_execbuf;
+		cfg.width = bb_per_execbuf;
+
+		for (i = 0; i < cfg.width; ++i)
+			for (j = 0; j < cfg.num_engines; ++j)
+				memcpy(cfg.engines + i * cfg.num_engines + j,
+				       siblings + j * cfg.width + i,
+				       sizeof(*siblings));
+	} else {
+		cfg.parallel = true;
+		cfg.num_engines = 1;
+		cfg.width = count;
+		memcpy(cfg.engines, siblings, sizeof(*siblings) * count);
+	}
+	ctx = intel_ctx_create(i915, &cfg);
+
+	i = 0;
+	batch[i] = MI_ATOMIC | MI_ATOMIC_INLINE_DATA |
+		MI_ATOMIC_ADD;
+#define TARGET_BO_OFFSET	(0x1 << 16)
+	batch[++i] = TARGET_BO_OFFSET;
+	batch[++i] = 0;
+	batch[++i] = 1;
+	batch[++i] = MI_BATCH_BUFFER_END;
+
+	memset(obj, 0, sizeof(obj));
+	obj[target_bo_idx].offset = TARGET_BO_OFFSET;
+	obj[target_bo_idx].flags = EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+	obj[target_bo_idx].handle = gem_create(i915, 4096);
+
+	for (i = first_bb_idx; i < bb_per_execbuf + first_bb_idx; ++i) {
+		obj[i].handle = gem_create(i915, 4096);
+		gem_write(i915, obj[i].handle, 0, batch,
+			  sizeof(batch));
+	}
+
+	memset(&execbuf, 0, sizeof(execbuf));
+	execbuf.buffers_ptr = to_user_pointer(obj);
+	execbuf.buffer_count = bb_per_execbuf + 1;
+	execbuf.flags |= I915_EXEC_HANDLE_LUT;
+	if (flags & PARALLEL_BB_FIRST)
+		execbuf.flags |= I915_EXEC_BATCH_FIRST;
+	if (flags & PARALLEL_OUT_FENCE)
+		execbuf.flags |= I915_EXEC_FENCE_OUT;
+	execbuf.buffers_ptr = to_user_pointer(obj);
+	execbuf.rsvd1 = ctx->id;
+
+	for (n = 0; n < PARALLEL_BB_LOOP_COUNT; ++n) {
+		for (i = 0; i < count / bb_per_execbuf; ++i ) {
+			execbuf.flags &= ~0x3full;
+			execbuf.flags |= i;
+			gem_execbuf_wr(i915, &execbuf);
+
+			if (flags & PARALLEL_OUT_FENCE) {
+				igt_assert_eq(sync_fence_wait(execbuf.rsvd2 >> 32,
+							      1000), 0);
+				igt_assert_eq(sync_fence_status(execbuf.rsvd2 >> 32), 1);
+
+				if (fence)
+					close(fence);
+				fence = execbuf.rsvd2 >> 32;
+
+				if (flags & PARALLEL_SUBMIT_FENCE) {
+					execbuf.flags |=
+						I915_EXEC_FENCE_SUBMIT;
+					execbuf.rsvd2 >>= 32;
+				} else if (flags &  PARALLEL_IN_FENCE) {
+					execbuf.flags |=
+						I915_EXEC_FENCE_IN;
+					execbuf.rsvd2 >>= 32;
+				} else {
+					execbuf.rsvd2 = 0;
+				}
+			}
+
+			if (flags & PARALLEL_VIRTUAL)
+				break;
+		}
+
+		if (flags & PARALLEL_CONTEXTS) {
+			ctxs[n] = ctx;
+			ctx = intel_ctx_create(i915, &cfg);
+			execbuf.rsvd1 = ctx->id;
+		}
+	}
+	if (fence)
+		close(fence);
+
+	check_bo(i915, obj[target_bo_idx].handle, flags & PARALLEL_VIRTUAL ?
+		 bb_per_execbuf * PARALLEL_BB_LOOP_COUNT :
+		 count * PARALLEL_BB_LOOP_COUNT, true);
+
+	intel_ctx_destroy(i915, ctx);
+	for (i = 0; flags & PARALLEL_CONTEXTS &&
+	     i < PARALLEL_BB_LOOP_COUNT; ++i) {
+		intel_ctx_destroy(i915, ctxs[i]);
+	}
+	for (i = 0; i < bb_per_execbuf + 1; ++i)
+		gem_close(i915, obj[i].handle);
+}
+
+static void parallel(int i915, unsigned int flags)
+{
+	for (int class = 0; class < 32; class++) {
+		struct i915_engine_class_instance *siblings;
+		unsigned int count, bb_per_execbuf;
+
+		siblings = list_engines(i915, 1u << class, &count);
+		if (!siblings)
+			continue;
+
+		if (count < 2) {
+			free(siblings);
+			continue;
+		}
+
+		logical_sort_siblings(i915, siblings, count);
+		bb_per_execbuf = count;
+
+		parallel_thread(i915, flags, siblings,
+				count, bb_per_execbuf);
+
+		free(siblings);
+	}
+}
+
+static void parallel_balancer(int i915, unsigned int flags)
+{
+	for (int class = 0; class < 32; class++) {
+		struct i915_engine_class_instance *siblings;
+		unsigned int count;
+
+		siblings = list_engines(i915, 1u << class, &count);
+		if (!siblings)
+			continue;
+
+		if (count < 4) {
+			free(siblings);
+			continue;
+		}
+
+		logical_sort_siblings(i915, siblings, count);
+
+		for (unsigned int bb_per_execbuf = 2;;) {
+			igt_fork(child, count / bb_per_execbuf)
+				parallel_thread(i915,
+						flags | PARALLEL_VIRTUAL,
+						siblings,
+						count,
+						bb_per_execbuf);
+			igt_waitchildren();
+
+			if (count / ++bb_per_execbuf <= 1)
+				break;
+		}
+
+		free(siblings);
+	}
+}
+
+static bool fence_busy(int fence)
+{
+	return poll(&(struct pollfd){fence, POLLIN}, 1, 0) == 0;
+}
+
+static void parallel_ordering(int i915, unsigned int flags)
+{
+	for (int class = 0; class < 32; class++) {
+		const intel_ctx_t *ctx = NULL, *spin_ctx = NULL;
+		struct i915_engine_class_instance *siblings;
+		unsigned int count;
+		int i = 0, fence = 0;
+		uint32_t batch[16];
+		struct drm_i915_gem_execbuffer2 execbuf;
+		struct drm_i915_gem_exec_object2 obj[32];
+		igt_spin_t *spin;
+		intel_ctx_cfg_t cfg;
+
+		siblings = list_engines(i915, 1u << class, &count);
+		if (!siblings)
+			continue;
+
+		if (count < 2) {
+			free(siblings);
+			continue;
+		}
+
+		logical_sort_siblings(i915, siblings, count);
+
+		memset(&cfg, 0, sizeof(cfg));
+		cfg.parallel = true;
+		cfg.num_engines = 1;
+		cfg.width = count;
+		memcpy(cfg.engines, siblings, sizeof(*siblings) * count);
+
+		ctx = intel_ctx_create(i915, &cfg);
+
+		batch[i] = MI_ATOMIC | MI_ATOMIC_INLINE_DATA |
+			MI_ATOMIC_ADD;
+		batch[++i] = TARGET_BO_OFFSET;
+		batch[++i] = 0;
+		batch[++i] = 1;
+		batch[++i] = MI_BATCH_BUFFER_END;
+
+		memset(obj, 0, sizeof(obj));
+		obj[0].offset = TARGET_BO_OFFSET;
+		obj[0].flags = EXEC_OBJECT_PINNED | EXEC_OBJECT_WRITE;
+		obj[0].handle = gem_create(i915, 4096);
+
+		for (i = 1; i < count + 1; ++i) {
+			obj[i].handle = gem_create(i915, 4096);
+			gem_write(i915, obj[i].handle, 0, batch,
+				  sizeof(batch));
+		}
+
+		memset(&execbuf, 0, sizeof(execbuf));
+		execbuf.buffers_ptr = to_user_pointer(obj);
+		execbuf.buffer_count = count + 1;
+		execbuf.flags |= I915_EXEC_HANDLE_LUT;
+		execbuf.flags |= I915_EXEC_NO_RELOC;
+		execbuf.flags |= I915_EXEC_FENCE_OUT;
+		execbuf.buffers_ptr = to_user_pointer(obj);
+		execbuf.rsvd1 = ctx->id;
+
+		/* Block parallel submission */
+		spin_ctx = ctx_create_engines(i915, siblings, count);
+		spin = __igt_spin_new(i915,
+				      .ctx = spin_ctx,
+				      .engine = 0,
+				      .flags = IGT_SPIN_FENCE_OUT |
+				      IGT_SPIN_NO_PREEMPTION);
+
+		/* Wait for spinners to start */
+		usleep(5 * 10000);
+		igt_assert(fence_busy(spin->out_fence));
+
+		/* Submit parallel execbuf */
+		gem_execbuf_wr(i915, &execbuf);
+		fence = execbuf.rsvd2 >> 32;
+
+		/*
+		 * Wait long enough for timeslcing to kick in but not
+		 * preemption. Spinner + parallel execbuf should be
+		 * active.
+		 */
+		usleep(25 * 10000);
+		igt_assert(fence_busy(spin->out_fence));
+		igt_assert(fence_busy(fence));
+		check_bo(i915, obj[0].handle, 0, false);
+
+		/*
+		 * End spinner and wait for spinner + parallel execbuf
+		 * to compelte.
+		 */
+		igt_spin_end(spin);
+		igt_assert_eq(sync_fence_wait(fence, 1000), 0);
+		igt_assert_eq(sync_fence_status(fence), 1);
+		check_bo(i915, obj[0].handle, count, true);
+		close(fence);
+
+		/* Clean up */
+		intel_ctx_destroy(i915, ctx);
+		intel_ctx_destroy(i915, spin_ctx);
+		for (i = 0; i < count + 1; ++i)
+			gem_close(i915, obj[i].handle);
+		free(siblings);
+		igt_spin_free(i915, spin);
+	}
+}
+
 static bool has_persistence(int i915)
 {
 	struct drm_i915_gem_context_param p = {
@@ -2725,6 +3125,61 @@ static bool has_load_balancer(int i915)
 	return err == 0;
 }
 
+static bool has_logical_mapping(int i915)
+{
+	struct drm_i915_query_engine_info *engines;
+	unsigned int i;
+
+	engines = query_engine_info(i915);
+
+	for (i = 0; i < engines->num_engines; ++i)
+		if (!(engines->engines[i].flags &
+		     I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE)) {
+			free(engines);
+			return false;
+		}
+
+	free(engines);
+	return true;
+}
+
+static bool has_parallel_execbuf(int i915)
+{
+	intel_ctx_cfg_t cfg = {
+		.parallel = true,
+		.num_engines = 1,
+	};
+	const intel_ctx_t *ctx = NULL;
+	int err;
+
+	for (int class = 0; class < 32; class++) {
+		struct i915_engine_class_instance *siblings;
+		unsigned int count;
+
+		siblings = list_engines(i915, 1u << class, &count);
+		if (!siblings)
+			continue;
+
+		if (count < 2) {
+			free(siblings);
+			continue;
+		}
+
+		logical_sort_siblings(i915, siblings, count);
+
+		cfg.width = count;
+		memcpy(cfg.engines, siblings, sizeof(*siblings) * count);
+		free(siblings);
+
+		err = __intel_ctx_create(i915, &cfg, &ctx);
+		intel_ctx_destroy(i915, ctx);
+
+		return err == 0;
+	}
+
+	return false;
+}
+
 igt_main
 {
 	int i915 = -1;
@@ -2813,6 +3268,38 @@ igt_main
 		igt_stop_hang_detector();
 	}
 
+	igt_subtest_group {
+		igt_fixture {
+			igt_require(has_logical_mapping(i915));
+			igt_require(has_parallel_execbuf(i915));
+		}
+
+		igt_subtest("parallel-ordering")
+			parallel_ordering(i915, 0);
+
+		igt_subtest("parallel")
+			parallel(i915, 0);
+
+		igt_subtest("parallel-bb-first")
+			parallel(i915, PARALLEL_BB_FIRST);
+
+		igt_subtest("parallel-out-fence")
+			parallel(i915, PARALLEL_OUT_FENCE);
+
+		igt_subtest("parallel-keep-in-fence")
+			parallel(i915, PARALLEL_OUT_FENCE | PARALLEL_IN_FENCE);
+
+		igt_subtest("parallel-keep-submit-fence")
+			parallel(i915, PARALLEL_OUT_FENCE |
+				 PARALLEL_SUBMIT_FENCE);
+
+		igt_subtest("parallel-contexts")
+			parallel(i915, PARALLEL_CONTEXTS);
+
+		igt_subtest("parallel-balancer")
+			parallel_balancer(i915, 0);
+	}
+
 	igt_subtest_group {
 		igt_hang_t  hang;
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 5/7] include/drm-uapi: Add static priority mapping UAPI
  2021-07-27 15:21 [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission Matthew Brost
                   ` (3 preceding siblings ...)
  2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 4/7] i915/gem_exec_balancer: Test parallel execbuf Matthew Brost
@ 2021-07-27 15:22 ` Matthew Brost
  2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 6/7] i915/gem_scheduler: Make gem_scheduler understand static priority mapping Matthew Brost
  2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 7/7] i915/gem_ctx_shared: Make gem_ctx_shared " Matthew Brost
  6 siblings, 0 replies; 10+ messages in thread
From: Matthew Brost @ 2021-07-27 15:22 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-gfx

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 include/drm-uapi/i915_drm.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 332c07e3d..0c023a52d 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -572,6 +572,15 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
 #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
+/*
+ * Indicates the 2k user priority levels are statically mapped into 3 buckets as
+ * follows:
+ *
+ * -1k to -1	Low priority
+ * 0		Normal priority
+ * 1 to 1k	Highest priority
+ */
+#define   I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP	(1ul << 5)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 6/7] i915/gem_scheduler: Make gem_scheduler understand static priority mapping
  2021-07-27 15:21 [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission Matthew Brost
                   ` (4 preceding siblings ...)
  2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 5/7] include/drm-uapi: Add static priority mapping UAPI Matthew Brost
@ 2021-07-27 15:22 ` Matthew Brost
  2021-07-29  1:51   ` [Intel-gfx] [igt-dev] " Daniele Ceraolo Spurio
  2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 7/7] i915/gem_ctx_shared: Make gem_ctx_shared " Matthew Brost
  6 siblings, 1 reply; 10+ messages in thread
From: Matthew Brost @ 2021-07-27 15:22 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-gfx

The i915 currently has 2k visible priority levels which are currently
unique. This is changing to statically map these 2k levels into 3
buckets:

low: < 0
mid: 0
high: > 0

Update gem_scheduler to understand this. This entails updating promotion
test to use 3 levels that will map into different buckets and also
delete a racey check. Also skip any tests that rely on having more than
3 priority levels.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 lib/i915/gem_scheduler.c       | 13 ++++++++++
 lib/i915/gem_scheduler.h       |  1 +
 tests/i915/gem_exec_schedule.c | 47 ++++++++++++++++++++--------------
 3 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/lib/i915/gem_scheduler.c b/lib/i915/gem_scheduler.c
index cdddf42ad..bec2e485a 100644
--- a/lib/i915/gem_scheduler.c
+++ b/lib/i915/gem_scheduler.c
@@ -90,6 +90,19 @@ bool gem_scheduler_has_ctx_priority(int fd)
 		I915_SCHEDULER_CAP_PRIORITY;
 }
 
+/**
+ * gem_scheduler_has_ctx_priority:
+ * @fd: open i915 drm file descriptor
+ *
+ * Feature test macro to query whether the driver supports priority assigned
+ * from user space are statically mapping into 3 buckets.
+ */
+bool gem_scheduler_has_static_priority(int fd)
+{
+	return gem_scheduler_capability(fd) &
+		I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
+}
+
 /**
  * gem_scheduler_has_preemption:
  * @fd: open i915 drm file descriptor
diff --git a/lib/i915/gem_scheduler.h b/lib/i915/gem_scheduler.h
index d43e84bd2..b00804f70 100644
--- a/lib/i915/gem_scheduler.h
+++ b/lib/i915/gem_scheduler.h
@@ -29,6 +29,7 @@
 unsigned gem_scheduler_capability(int fd);
 bool gem_scheduler_enabled(int fd);
 bool gem_scheduler_has_ctx_priority(int fd);
+bool gem_scheduler_has_static_priority(int fd);
 bool gem_scheduler_has_preemption(int fd);
 bool gem_scheduler_has_semaphores(int fd);
 bool gem_scheduler_has_engine_busy_stats(int fd);
diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c
index e5fb45982..f03842478 100644
--- a/tests/i915/gem_exec_schedule.c
+++ b/tests/i915/gem_exec_schedule.c
@@ -1344,8 +1344,7 @@ static void reorder(int fd, const intel_ctx_cfg_t *cfg,
 static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
 {
 	IGT_CORK_FENCE(cork);
-	uint32_t result, dep;
-	uint32_t result_read, dep_read;
+	uint32_t result, dep, dep_read;
 	const intel_ctx_t *ctx[3];
 	int fence;
 
@@ -1353,10 +1352,10 @@ static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
 	gem_context_set_priority(fd, ctx[LO]->id, MIN_PRIO);
 
 	ctx[HI] = intel_ctx_create(fd, cfg);
-	gem_context_set_priority(fd, ctx[HI]->id, 0);
+	gem_context_set_priority(fd, ctx[HI]->id, MAX_PRIO);
 
 	ctx[NOISE] = intel_ctx_create(fd, cfg);
-	gem_context_set_priority(fd, ctx[NOISE]->id, MIN_PRIO/2);
+	gem_context_set_priority(fd, ctx[NOISE]->id, 0);
 
 	result = gem_create(fd, 4096);
 	dep = gem_create(fd, 4096);
@@ -1383,11 +1382,9 @@ static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
 	dep_read = __sync_read_u32(fd, dep, 0);
 	gem_close(fd, dep);
 
-	result_read = __sync_read_u32(fd, result, 0);
 	gem_close(fd, result);
 
 	igt_assert_eq_u32(dep_read, ctx[HI]->id);
-	igt_assert_eq_u32(result_read, ctx[NOISE]->id);
 
 	intel_ctx_destroy(fd, ctx[NOISE]);
 	intel_ctx_destroy(fd, ctx[LO]);
@@ -2963,19 +2960,25 @@ igt_main
 			test_each_engine_store("preempt-other-chain", fd, ctx, e)
 				preempt_other(fd, &ctx->cfg, e->flags, CHAIN);
 
-			test_each_engine_store("preempt-queue", fd, ctx, e)
-				preempt_queue(fd, &ctx->cfg, e->flags, 0);
+			test_each_engine_store("preempt-engines", fd, ctx, e)
+				preempt_engines(fd, e, 0);
 
-			test_each_engine_store("preempt-queue-chain", fd, ctx, e)
-				preempt_queue(fd, &ctx->cfg, e->flags, CHAIN);
-			test_each_engine_store("preempt-queue-contexts", fd, ctx, e)
-				preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS);
+			igt_subtest_group {
+				igt_fixture {
+					igt_require(!gem_scheduler_has_static_priority(fd));
+				}
 
-			test_each_engine_store("preempt-queue-contexts-chain", fd, ctx, e)
-				preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS | CHAIN);
+				test_each_engine_store("preempt-queue", fd, ctx, e)
+					preempt_queue(fd, &ctx->cfg, e->flags, 0);
 
-			test_each_engine_store("preempt-engines", fd, ctx, e)
-				preempt_engines(fd, e, 0);
+				test_each_engine_store("preempt-queue-chain", fd, ctx, e)
+					preempt_queue(fd, &ctx->cfg, e->flags, CHAIN);
+				test_each_engine_store("preempt-queue-contexts", fd, ctx, e)
+					preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS);
+
+				test_each_engine_store("preempt-queue-contexts-chain", fd, ctx, e)
+					preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS | CHAIN);
+			}
 
 			igt_subtest_group {
 				igt_hang_t hang;
@@ -3017,11 +3020,17 @@ igt_main
 		test_each_engine_store("wide", fd, ctx, e)
 			wide(fd, &ctx->cfg, e->flags);
 
-		test_each_engine_store("reorder-wide", fd, ctx, e)
-			reorder_wide(fd, &ctx->cfg, e->flags);
-
 		test_each_engine_store("smoketest", fd, ctx, e)
 			smoketest(fd, &ctx->cfg, e->flags, 5);
+
+		igt_subtest_group {
+			igt_fixture {
+				igt_require(!gem_scheduler_has_static_priority(fd));
+			}
+
+			test_each_engine_store("reorder-wide", fd, ctx, e)
+				reorder_wide(fd, &ctx->cfg, e->flags);
+		}
 	}
 
 	igt_subtest_group {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH i-g-t 7/7] i915/gem_ctx_shared: Make gem_ctx_shared understand static priority mapping
  2021-07-27 15:21 [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission Matthew Brost
                   ` (5 preceding siblings ...)
  2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 6/7] i915/gem_scheduler: Make gem_scheduler understand static priority mapping Matthew Brost
@ 2021-07-27 15:22 ` Matthew Brost
  6 siblings, 0 replies; 10+ messages in thread
From: Matthew Brost @ 2021-07-27 15:22 UTC (permalink / raw)
  To: igt-dev; +Cc: intel-gfx

The i915 currently has 2k visible priority levels which are currently
unique. This is changing to statically map these 2k levels into 3
buckets:

low: < 0
mid: 0
high: > 0

Update gem_scheduler to understand this. This entails updating promotion
test to use 3 levels that will map into different buckets and also
delete a racey check.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 tests/i915/gem_ctx_shared.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/i915/gem_ctx_shared.c b/tests/i915/gem_ctx_shared.c
index 4441e6eb7..0d95df8a5 100644
--- a/tests/i915/gem_ctx_shared.c
+++ b/tests/i915/gem_ctx_shared.c
@@ -771,10 +771,10 @@ static void promotion(int i915, const intel_ctx_cfg_t *cfg, unsigned ring)
 	gem_context_set_priority(i915, ctx[LO]->id, MIN_PRIO);
 
 	ctx[HI] = intel_ctx_create(i915, &q_cfg);
-	gem_context_set_priority(i915, ctx[HI]->id, 0);
+	gem_context_set_priority(i915, ctx[HI]->id, MAX_PRIO);
 
 	ctx[NOISE] = intel_ctx_create(i915, &q_cfg);
-	gem_context_set_priority(i915, ctx[NOISE]->id, MIN_PRIO/2);
+	gem_context_set_priority(i915, ctx[NOISE]->id, 0);
 
 	result = gem_create(i915, 4096);
 	dep = gem_create(i915, 4096);
@@ -811,7 +811,6 @@ static void promotion(int i915, const intel_ctx_cfg_t *cfg, unsigned ring)
 			I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
 	gem_close(i915, result);
 
-	igt_assert_eq_u32(ptr[0], ctx[NOISE]->id);
 	munmap(ptr, 4096);
 
 	intel_ctx_destroy(i915, ctx[NOISE]);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 6/7] i915/gem_scheduler: Make gem_scheduler understand static priority mapping
  2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 6/7] i915/gem_scheduler: Make gem_scheduler understand static priority mapping Matthew Brost
@ 2021-07-29  1:51   ` Daniele Ceraolo Spurio
  2021-07-29 23:35     ` Matthew Brost
  0 siblings, 1 reply; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-29  1:51 UTC (permalink / raw)
  To: Matthew Brost, igt-dev; +Cc: intel-gfx



On 7/27/2021 8:22 AM, Matthew Brost wrote:
> The i915 currently has 2k visible priority levels which are currently
> unique. This is changing to statically map these 2k levels into 3
> buckets:
>
> low: < 0
> mid: 0
> high: > 0
>
> Update gem_scheduler to understand this. This entails updating promotion
> test to use 3 levels that will map into different buckets and also
> delete a racey check. Also skip any tests that rely on having more than

Can you elaborate on why the check is racey? I'm sure you've already 
explained it to me in the past but can't spot it right now.

> 3 priority levels.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   lib/i915/gem_scheduler.c       | 13 ++++++++++
>   lib/i915/gem_scheduler.h       |  1 +
>   tests/i915/gem_exec_schedule.c | 47 ++++++++++++++++++++--------------
>   3 files changed, 42 insertions(+), 19 deletions(-)
>
> diff --git a/lib/i915/gem_scheduler.c b/lib/i915/gem_scheduler.c
> index cdddf42ad..bec2e485a 100644
> --- a/lib/i915/gem_scheduler.c
> +++ b/lib/i915/gem_scheduler.c
> @@ -90,6 +90,19 @@ bool gem_scheduler_has_ctx_priority(int fd)
>   		I915_SCHEDULER_CAP_PRIORITY;
>   }
>   
> +/**
> + * gem_scheduler_has_ctx_priority:

s/ctx/static/

LGTM apart from this.

Daniele

> + * @fd: open i915 drm file descriptor
> + *
> + * Feature test macro to query whether the driver supports priority assigned
> + * from user space are statically mapping into 3 buckets.
> + */
> +bool gem_scheduler_has_static_priority(int fd)
> +{
> +	return gem_scheduler_capability(fd) &
> +		I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
> +}
> +
>   /**
>    * gem_scheduler_has_preemption:
>    * @fd: open i915 drm file descriptor
> diff --git a/lib/i915/gem_scheduler.h b/lib/i915/gem_scheduler.h
> index d43e84bd2..b00804f70 100644
> --- a/lib/i915/gem_scheduler.h
> +++ b/lib/i915/gem_scheduler.h
> @@ -29,6 +29,7 @@
>   unsigned gem_scheduler_capability(int fd);
>   bool gem_scheduler_enabled(int fd);
>   bool gem_scheduler_has_ctx_priority(int fd);
> +bool gem_scheduler_has_static_priority(int fd);
>   bool gem_scheduler_has_preemption(int fd);
>   bool gem_scheduler_has_semaphores(int fd);
>   bool gem_scheduler_has_engine_busy_stats(int fd);
> diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c
> index e5fb45982..f03842478 100644
> --- a/tests/i915/gem_exec_schedule.c
> +++ b/tests/i915/gem_exec_schedule.c
> @@ -1344,8 +1344,7 @@ static void reorder(int fd, const intel_ctx_cfg_t *cfg,
>   static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
>   {
>   	IGT_CORK_FENCE(cork);
> -	uint32_t result, dep;
> -	uint32_t result_read, dep_read;
> +	uint32_t result, dep, dep_read;
>   	const intel_ctx_t *ctx[3];
>   	int fence;
>   
> @@ -1353,10 +1352,10 @@ static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
>   	gem_context_set_priority(fd, ctx[LO]->id, MIN_PRIO);
>   
>   	ctx[HI] = intel_ctx_create(fd, cfg);
> -	gem_context_set_priority(fd, ctx[HI]->id, 0);
> +	gem_context_set_priority(fd, ctx[HI]->id, MAX_PRIO);
>   
>   	ctx[NOISE] = intel_ctx_create(fd, cfg);
> -	gem_context_set_priority(fd, ctx[NOISE]->id, MIN_PRIO/2);
> +	gem_context_set_priority(fd, ctx[NOISE]->id, 0);
>   
>   	result = gem_create(fd, 4096);
>   	dep = gem_create(fd, 4096);
> @@ -1383,11 +1382,9 @@ static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
>   	dep_read = __sync_read_u32(fd, dep, 0);
>   	gem_close(fd, dep);
>   
> -	result_read = __sync_read_u32(fd, result, 0);
>   	gem_close(fd, result);
>   
>   	igt_assert_eq_u32(dep_read, ctx[HI]->id);
> -	igt_assert_eq_u32(result_read, ctx[NOISE]->id);
>   
>   	intel_ctx_destroy(fd, ctx[NOISE]);
>   	intel_ctx_destroy(fd, ctx[LO]);
> @@ -2963,19 +2960,25 @@ igt_main
>   			test_each_engine_store("preempt-other-chain", fd, ctx, e)
>   				preempt_other(fd, &ctx->cfg, e->flags, CHAIN);
>   
> -			test_each_engine_store("preempt-queue", fd, ctx, e)
> -				preempt_queue(fd, &ctx->cfg, e->flags, 0);
> +			test_each_engine_store("preempt-engines", fd, ctx, e)
> +				preempt_engines(fd, e, 0);
>   
> -			test_each_engine_store("preempt-queue-chain", fd, ctx, e)
> -				preempt_queue(fd, &ctx->cfg, e->flags, CHAIN);
> -			test_each_engine_store("preempt-queue-contexts", fd, ctx, e)
> -				preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS);
> +			igt_subtest_group {
> +				igt_fixture {
> +					igt_require(!gem_scheduler_has_static_priority(fd));
> +				}
>   
> -			test_each_engine_store("preempt-queue-contexts-chain", fd, ctx, e)
> -				preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS | CHAIN);
> +				test_each_engine_store("preempt-queue", fd, ctx, e)
> +					preempt_queue(fd, &ctx->cfg, e->flags, 0);
>   
> -			test_each_engine_store("preempt-engines", fd, ctx, e)
> -				preempt_engines(fd, e, 0);
> +				test_each_engine_store("preempt-queue-chain", fd, ctx, e)
> +					preempt_queue(fd, &ctx->cfg, e->flags, CHAIN);
> +				test_each_engine_store("preempt-queue-contexts", fd, ctx, e)
> +					preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS);
> +
> +				test_each_engine_store("preempt-queue-contexts-chain", fd, ctx, e)
> +					preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS | CHAIN);
> +			}
>   
>   			igt_subtest_group {
>   				igt_hang_t hang;
> @@ -3017,11 +3020,17 @@ igt_main
>   		test_each_engine_store("wide", fd, ctx, e)
>   			wide(fd, &ctx->cfg, e->flags);
>   
> -		test_each_engine_store("reorder-wide", fd, ctx, e)
> -			reorder_wide(fd, &ctx->cfg, e->flags);
> -
>   		test_each_engine_store("smoketest", fd, ctx, e)
>   			smoketest(fd, &ctx->cfg, e->flags, 5);
> +
> +		igt_subtest_group {
> +			igt_fixture {
> +				igt_require(!gem_scheduler_has_static_priority(fd));
> +			}
> +
> +			test_each_engine_store("reorder-wide", fd, ctx, e)
> +				reorder_wide(fd, &ctx->cfg, e->flags);
> +		}
>   	}
>   
>   	igt_subtest_group {

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 6/7] i915/gem_scheduler: Make gem_scheduler understand static priority mapping
  2021-07-29  1:51   ` [Intel-gfx] [igt-dev] " Daniele Ceraolo Spurio
@ 2021-07-29 23:35     ` Matthew Brost
  0 siblings, 0 replies; 10+ messages in thread
From: Matthew Brost @ 2021-07-29 23:35 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: igt-dev, intel-gfx

On Wed, Jul 28, 2021 at 06:51:29PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/27/2021 8:22 AM, Matthew Brost wrote:
> > The i915 currently has 2k visible priority levels which are currently
> > unique. This is changing to statically map these 2k levels into 3
> > buckets:
> > 
> > low: < 0
> > mid: 0
> > high: > 0
> > 
> > Update gem_scheduler to understand this. This entails updating promotion
> > test to use 3 levels that will map into different buckets and also
> > delete a racey check. Also skip any tests that rely on having more than
> 
> Can you elaborate on why the check is racey? I'm sure you've already
> explained it to me in the past but can't spot it right now.
> 

Basically NOISE can complete before the HIGH context gets submitted to
the GuC. That being said, removing this check is totally wrong - not
quite sure what I was thinking. The proper solution is to add a delay
(sleep) in unplug_show_queue between igt_cork_unplug & igt_spin_free.
Will respin this patch, likely spliting it into 2 (1 for the promotion
fix and 1 for the skips).

Matt

> > 3 priority levels.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   lib/i915/gem_scheduler.c       | 13 ++++++++++
> >   lib/i915/gem_scheduler.h       |  1 +
> >   tests/i915/gem_exec_schedule.c | 47 ++++++++++++++++++++--------------
> >   3 files changed, 42 insertions(+), 19 deletions(-)
> > 
> > diff --git a/lib/i915/gem_scheduler.c b/lib/i915/gem_scheduler.c
> > index cdddf42ad..bec2e485a 100644
> > --- a/lib/i915/gem_scheduler.c
> > +++ b/lib/i915/gem_scheduler.c
> > @@ -90,6 +90,19 @@ bool gem_scheduler_has_ctx_priority(int fd)
> >   		I915_SCHEDULER_CAP_PRIORITY;
> >   }
> > +/**
> > + * gem_scheduler_has_ctx_priority:
> 
> s/ctx/static/
> 
> LGTM apart from this.
> 
> Daniele
> 
> > + * @fd: open i915 drm file descriptor
> > + *
> > + * Feature test macro to query whether the driver supports priority assigned
> > + * from user space are statically mapping into 3 buckets.
> > + */
> > +bool gem_scheduler_has_static_priority(int fd)
> > +{
> > +	return gem_scheduler_capability(fd) &
> > +		I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
> > +}
> > +
> >   /**
> >    * gem_scheduler_has_preemption:
> >    * @fd: open i915 drm file descriptor
> > diff --git a/lib/i915/gem_scheduler.h b/lib/i915/gem_scheduler.h
> > index d43e84bd2..b00804f70 100644
> > --- a/lib/i915/gem_scheduler.h
> > +++ b/lib/i915/gem_scheduler.h
> > @@ -29,6 +29,7 @@
> >   unsigned gem_scheduler_capability(int fd);
> >   bool gem_scheduler_enabled(int fd);
> >   bool gem_scheduler_has_ctx_priority(int fd);
> > +bool gem_scheduler_has_static_priority(int fd);
> >   bool gem_scheduler_has_preemption(int fd);
> >   bool gem_scheduler_has_semaphores(int fd);
> >   bool gem_scheduler_has_engine_busy_stats(int fd);
> > diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c
> > index e5fb45982..f03842478 100644
> > --- a/tests/i915/gem_exec_schedule.c
> > +++ b/tests/i915/gem_exec_schedule.c
> > @@ -1344,8 +1344,7 @@ static void reorder(int fd, const intel_ctx_cfg_t *cfg,
> >   static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
> >   {
> >   	IGT_CORK_FENCE(cork);
> > -	uint32_t result, dep;
> > -	uint32_t result_read, dep_read;
> > +	uint32_t result, dep, dep_read;
> >   	const intel_ctx_t *ctx[3];
> >   	int fence;
> > @@ -1353,10 +1352,10 @@ static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
> >   	gem_context_set_priority(fd, ctx[LO]->id, MIN_PRIO);
> >   	ctx[HI] = intel_ctx_create(fd, cfg);
> > -	gem_context_set_priority(fd, ctx[HI]->id, 0);
> > +	gem_context_set_priority(fd, ctx[HI]->id, MAX_PRIO);
> >   	ctx[NOISE] = intel_ctx_create(fd, cfg);
> > -	gem_context_set_priority(fd, ctx[NOISE]->id, MIN_PRIO/2);
> > +	gem_context_set_priority(fd, ctx[NOISE]->id, 0);
> >   	result = gem_create(fd, 4096);
> >   	dep = gem_create(fd, 4096);
> > @@ -1383,11 +1382,9 @@ static void promotion(int fd, const intel_ctx_cfg_t *cfg, unsigned ring)
> >   	dep_read = __sync_read_u32(fd, dep, 0);
> >   	gem_close(fd, dep);
> > -	result_read = __sync_read_u32(fd, result, 0);
> >   	gem_close(fd, result);
> >   	igt_assert_eq_u32(dep_read, ctx[HI]->id);
> > -	igt_assert_eq_u32(result_read, ctx[NOISE]->id);
> >   	intel_ctx_destroy(fd, ctx[NOISE]);
> >   	intel_ctx_destroy(fd, ctx[LO]);
> > @@ -2963,19 +2960,25 @@ igt_main
> >   			test_each_engine_store("preempt-other-chain", fd, ctx, e)
> >   				preempt_other(fd, &ctx->cfg, e->flags, CHAIN);
> > -			test_each_engine_store("preempt-queue", fd, ctx, e)
> > -				preempt_queue(fd, &ctx->cfg, e->flags, 0);
> > +			test_each_engine_store("preempt-engines", fd, ctx, e)
> > +				preempt_engines(fd, e, 0);
> > -			test_each_engine_store("preempt-queue-chain", fd, ctx, e)
> > -				preempt_queue(fd, &ctx->cfg, e->flags, CHAIN);
> > -			test_each_engine_store("preempt-queue-contexts", fd, ctx, e)
> > -				preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS);
> > +			igt_subtest_group {
> > +				igt_fixture {
> > +					igt_require(!gem_scheduler_has_static_priority(fd));
> > +				}
> > -			test_each_engine_store("preempt-queue-contexts-chain", fd, ctx, e)
> > -				preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS | CHAIN);
> > +				test_each_engine_store("preempt-queue", fd, ctx, e)
> > +					preempt_queue(fd, &ctx->cfg, e->flags, 0);
> > -			test_each_engine_store("preempt-engines", fd, ctx, e)
> > -				preempt_engines(fd, e, 0);
> > +				test_each_engine_store("preempt-queue-chain", fd, ctx, e)
> > +					preempt_queue(fd, &ctx->cfg, e->flags, CHAIN);
> > +				test_each_engine_store("preempt-queue-contexts", fd, ctx, e)
> > +					preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS);
> > +
> > +				test_each_engine_store("preempt-queue-contexts-chain", fd, ctx, e)
> > +					preempt_queue(fd, &ctx->cfg, e->flags, CONTEXTS | CHAIN);
> > +			}
> >   			igt_subtest_group {
> >   				igt_hang_t hang;
> > @@ -3017,11 +3020,17 @@ igt_main
> >   		test_each_engine_store("wide", fd, ctx, e)
> >   			wide(fd, &ctx->cfg, e->flags);
> > -		test_each_engine_store("reorder-wide", fd, ctx, e)
> > -			reorder_wide(fd, &ctx->cfg, e->flags);
> > -
> >   		test_each_engine_store("smoketest", fd, ctx, e)
> >   			smoketest(fd, &ctx->cfg, e->flags, 5);
> > +
> > +		igt_subtest_group {
> > +			igt_fixture {
> > +				igt_require(!gem_scheduler_has_static_priority(fd));
> > +			}
> > +
> > +			test_each_engine_store("reorder-wide", fd, ctx, e)
> > +				reorder_wide(fd, &ctx->cfg, e->flags);
> > +		}
> >   	}
> >   	igt_subtest_group {
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-07-29 23:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27 15:21 [Intel-gfx] [PATCH i-g-t 0/7] Updates for GuC & parallel submission Matthew Brost
2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 1/7] include/drm-uapi: Add parallel context configuration uAPI Matthew Brost
2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 2/7] include/drm-uapi: Add logical mapping uAPI Matthew Brost
2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 3/7] lib/intel_ctx: Add support for parallel contexts to intel_ctx library Matthew Brost
2021-07-27 15:21 ` [Intel-gfx] [PATCH i-g-t 4/7] i915/gem_exec_balancer: Test parallel execbuf Matthew Brost
2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 5/7] include/drm-uapi: Add static priority mapping UAPI Matthew Brost
2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 6/7] i915/gem_scheduler: Make gem_scheduler understand static priority mapping Matthew Brost
2021-07-29  1:51   ` [Intel-gfx] [igt-dev] " Daniele Ceraolo Spurio
2021-07-29 23:35     ` Matthew Brost
2021-07-27 15:22 ` [Intel-gfx] [PATCH i-g-t 7/7] i915/gem_ctx_shared: Make gem_ctx_shared " Matthew Brost

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox