All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements
@ 2020-06-17 16:01 Tvrtko Ursulin
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing Tvrtko Ursulin
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Ripping out of legacy code and adding some new features, plus bugfixes.

First bit is important so the tool can be refactored to support Gen11 and 12
properly. Working sets support is useful for simulating buffer contention and
similar.

Tvrtko Ursulin (10):
  gem_wsim: Rip out userspace balancing
  gem_wsim: Buffer objects working sets and complex dependencies
  gem_wsim: Show workload timing stats
  gem_wsim: Move BO allocation to a helper
  gem_wsim: Support random buffer sizes
  gem_wsim: Support scaling workload batch durations
  gem_wsim: Log max and active working set sizes in verbose mode
  gem_wsim: Snippet of a workload extracted from carchase
  gem_wsim: Implement device selection
  gem_wsim: Fix calibration handling

 benchmarks/Makefile.am                        |    2 +-
 benchmarks/Makefile.sources                   |    6 -
 benchmarks/ewma.h                             |   71 -
 benchmarks/gem_wsim.c                         | 2054 ++++++-----------
 benchmarks/ilog2.h                            |  104 -
 benchmarks/meson.build                        |    6 +-
 benchmarks/wsim/README                        |   63 +
 benchmarks/wsim/carchasepart.wsim             |  184 ++
 benchmarks/wsim/cloud-gaming-60fps.wsim       |   11 +
 benchmarks/wsim/composited-ui.wsim            |    7 +
 benchmarks/wsim/media-1080p-player.wsim       |    2 +
 benchmarks/wsim/media_1n2_480p.wsim           |   12 +-
 benchmarks/wsim/media_1n2_asy.wsim            |    8 +-
 benchmarks/wsim/media_1n3_480p.wsim           |   16 +-
 benchmarks/wsim/media_1n3_asy.wsim            |    8 +
 benchmarks/wsim/media_1n4_480p.wsim           |   20 +-
 benchmarks/wsim/media_1n4_asy.wsim            |   10 +
 benchmarks/wsim/media_1n5_480p.wsim           |   24 +-
 benchmarks/wsim/media_1n5_asy.wsim            |   12 +
 benchmarks/wsim/media_load_balance_17i7.wsim  |   10 +-
 benchmarks/wsim/media_load_balance_19.wsim    |    4 +-
 .../wsim/media_load_balance_4k12u7.wsim       |    2 +
 .../wsim/media_load_balance_fhd26u7.wsim      |   16 +-
 benchmarks/wsim/media_load_balance_hd01.wsim  |   34 +-
 .../wsim/media_load_balance_hd06mp2.wsim      |    6 +-
 benchmarks/wsim/media_load_balance_hd12.wsim  |    6 +-
 .../wsim/media_load_balance_hd17i4.wsim       |    8 +-
 benchmarks/wsim/media_mfe2_480p.wsim          |   12 +-
 benchmarks/wsim/media_mfe3_480p.wsim          |   18 +-
 benchmarks/wsim/media_mfe4_480p.wsim          |   24 +-
 benchmarks/wsim/media_nn_1080p.wsim           |    4 +
 benchmarks/wsim/media_nn_1080p_s1.wsim        |    4 +-
 benchmarks/wsim/media_nn_1080p_s2.wsim        |    2 +
 benchmarks/wsim/media_nn_1080p_s3.wsim        |    2 +
 benchmarks/wsim/media_nn_480p.wsim            |    4 +
 benchmarks/wsim/vcs_balanced.wsim             |   52 +-
 scripts/media-bench.pl                        |  736 ------
 37 files changed, 1171 insertions(+), 2393 deletions(-)
 delete mode 100644 benchmarks/ewma.h
 delete mode 100644 benchmarks/ilog2.h
 create mode 100644 benchmarks/wsim/carchasepart.wsim
 create mode 100644 benchmarks/wsim/cloud-gaming-60fps.wsim
 create mode 100644 benchmarks/wsim/composited-ui.wsim
 delete mode 100755 scripts/media-bench.pl

-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 16:07   ` Chris Wilson
  2020-06-18  7:14   ` Chris Wilson
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 02/10] gem_wsim: Buffer objects working sets and complex dependencies Tvrtko Ursulin
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Evaluation of userspace load balancing options was how this tool started
but since we have settled on doing it in the kernel.

Tomorrow we will want to update the tool for new engine interfaces and all
this legacy code will just be a distraction.

Rip out everything not related to explicit load balancing implemented via
context engine maps and adjust the workloads to use it.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/Makefile.am                        |    2 +-
 benchmarks/Makefile.sources                   |    6 -
 benchmarks/ewma.h                             |   71 -
 benchmarks/gem_wsim.c                         | 1362 +----------------
 benchmarks/ilog2.h                            |  104 --
 benchmarks/meson.build                        |    6 +-
 benchmarks/wsim/media-1080p-player.wsim       |    2 +
 benchmarks/wsim/media_1n2_480p.wsim           |   12 +-
 benchmarks/wsim/media_1n2_asy.wsim            |    8 +-
 benchmarks/wsim/media_1n3_480p.wsim           |   16 +-
 benchmarks/wsim/media_1n3_asy.wsim            |    8 +
 benchmarks/wsim/media_1n4_480p.wsim           |   20 +-
 benchmarks/wsim/media_1n4_asy.wsim            |   10 +
 benchmarks/wsim/media_1n5_480p.wsim           |   24 +-
 benchmarks/wsim/media_1n5_asy.wsim            |   12 +
 benchmarks/wsim/media_load_balance_17i7.wsim  |   10 +-
 benchmarks/wsim/media_load_balance_19.wsim    |    4 +-
 .../wsim/media_load_balance_4k12u7.wsim       |    2 +
 .../wsim/media_load_balance_fhd26u7.wsim      |   16 +-
 benchmarks/wsim/media_load_balance_hd01.wsim  |   34 +-
 .../wsim/media_load_balance_hd06mp2.wsim      |    6 +-
 benchmarks/wsim/media_load_balance_hd12.wsim  |    6 +-
 .../wsim/media_load_balance_hd17i4.wsim       |    8 +-
 benchmarks/wsim/media_mfe2_480p.wsim          |   12 +-
 benchmarks/wsim/media_mfe3_480p.wsim          |   18 +-
 benchmarks/wsim/media_mfe4_480p.wsim          |   24 +-
 benchmarks/wsim/media_nn_1080p.wsim           |    4 +
 benchmarks/wsim/media_nn_1080p_s1.wsim        |    4 +-
 benchmarks/wsim/media_nn_1080p_s2.wsim        |    2 +
 benchmarks/wsim/media_nn_1080p_s3.wsim        |    2 +
 benchmarks/wsim/media_nn_480p.wsim            |    4 +
 benchmarks/wsim/vcs_balanced.wsim             |   52 +-
 scripts/media-bench.pl                        |  736 ---------
 33 files changed, 300 insertions(+), 2307 deletions(-)
 delete mode 100644 benchmarks/ewma.h
 delete mode 100644 benchmarks/ilog2.h
 delete mode 100755 scripts/media-bench.pl

diff --git a/benchmarks/Makefile.am b/benchmarks/Makefile.am
index 1f05adf31527..45b923ebbae3 100644
--- a/benchmarks/Makefile.am
+++ b/benchmarks/Makefile.am
@@ -25,4 +25,4 @@ gem_latency_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_latency_LDADD = $(LDADD) -lpthread
 gem_syslatency_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_syslatency_LDADD = $(LDADD) -lpthread
-gem_wsim_LDADD = $(LDADD) $(top_builddir)/lib/libigt_perf.la -lpthread
+gem_wsim_LDADD = $(LDADD) -lpthread
diff --git a/benchmarks/Makefile.sources b/benchmarks/Makefile.sources
index ee045fb309ad..dae3cdda4cf7 100644
--- a/benchmarks/Makefile.sources
+++ b/benchmarks/Makefile.sources
@@ -19,12 +19,6 @@ benchmarks_prog_list =			\
 	vgem_mmap			\
 	$(NULL)
 
-gem_wsim_SOURCES =                      \
-	gem_wsim.c                      \
-	ewma.h                          \
-	ilog2.h                         \
-	$(NULL)
-
 LIBDRM_INTEL_BENCHMARKS =		\
 	intel_upload_blit_large		\
 	intel_upload_blit_large_gtt	\
diff --git a/benchmarks/ewma.h b/benchmarks/ewma.h
deleted file mode 100644
index 8711004ed992..000000000000
--- a/benchmarks/ewma.h
+++ /dev/null
@@ -1,71 +0,0 @@
-#ifndef EWMA_H
-#define EWMA_H
-
-#include <ilog2.h>
-
-#define BUILD_BUG_ON(expr)
-#define BUILD_BUG_ON_NOT_POWER_OF_2(expr)
-
-/*
- * Exponentially weighted moving average (EWMA)
- *
- * This implements a fixed-precision EWMA algorithm, with both the
- * precision and fall-off coefficient determined at compile-time
- * and built into the generated helper funtions.
- *
- * The first argument to the macro is the name that will be used
- * for the struct and helper functions.
- *
- * The second argument, the precision, expresses how many bits are
- * used for the fractional part of the fixed-precision values.
- *
- * The third argument, the weight reciprocal, determines how the
- * new values will be weighed vs. the old state, new values will
- * get weight 1/weight_rcp and old values 1-1/weight_rcp. Note
- * that this parameter must be a power of two for efficiency.
- */
-
-#define DECLARE_EWMA(T, name, _precision, _weight_rcp)			\
-	struct ewma_##name {						\
-		T internal;					\
-	};								\
-	static inline void ewma_##name##_init(struct ewma_##name *e)	\
-	{								\
-		BUILD_BUG_ON(!__builtin_constant_p(_precision));	\
-		BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp));	\
-		/*							\
-		 * Even if you want to feed it just 0/1 you should have	\
-		 * some bits for the non-fractional part...		\
-		 */							\
-		BUILD_BUG_ON((_precision) > 30);			\
-		BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp);		\
-		e->internal = 0;					\
-	}								\
-	static inline T							\
-	ewma_##name##_read(struct ewma_##name *e)			\
-	{								\
-		BUILD_BUG_ON(!__builtin_constant_p(_precision));	\
-		BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp));	\
-		BUILD_BUG_ON((_precision) > 30);			\
-		BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp);		\
-		return e->internal >> (_precision);			\
-	}								\
-	static inline void ewma_##name##_add(struct ewma_##name *e,	\
-					     T val)			\
-	{								\
-		const T weight_rcp = ilog2(_weight_rcp);		\
-		const T precision = _precision;				\
-		T internal = e->internal;				\
-									\
-		BUILD_BUG_ON(!__builtin_constant_p(_precision));	\
-		BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp));	\
-		BUILD_BUG_ON((_precision) > 30);			\
-		BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp);		\
-									\
-		e->internal = internal ?				\
-			(((internal << weight_rcp) - internal) +	\
-				(val << precision)) >> weight_rcp :	\
-			(val << precision);				\
-	}
-
-#endif /* EWMA_H */
diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index ad4edb936920..02fe8f5a5e69 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -55,7 +55,6 @@
 #include "sw_sync.h"
 #include "i915/gem_mman.h"
 
-#include "ewma.h"
 #include "i915/gem_engine_topology.h"
 
 enum intel_engine_id {
@@ -154,21 +153,12 @@ struct w_step
 
 	struct drm_i915_gem_execbuffer2 eb;
 	struct drm_i915_gem_exec_object2 *obj;
-	struct drm_i915_gem_relocation_entry reloc[5];
+	struct drm_i915_gem_relocation_entry reloc[1];
 	unsigned long bb_sz;
 	uint32_t bb_handle;
-	uint32_t *seqno_value;
-	uint32_t *seqno_address;
-	uint32_t *rt0_value;
-	uint32_t *rt0_address;
-	uint32_t *rt1_address;
-	uint32_t *latch_value;
-	uint32_t *latch_address;
 	uint32_t *recursive_bb_start;
 };
 
-DECLARE_EWMA(uint64_t, rt, 4, 2)
-
 struct ctx {
 	uint32_t id;
 	int priority;
@@ -176,9 +166,7 @@ struct ctx {
 	enum intel_engine_id *engine_map;
 	unsigned int bond_count;
 	struct bond *bonds;
-	bool targets_instance;
-	bool wants_balance;
-	unsigned int static_vcs;
+	bool load_balance;
 	uint64_t sseu;
 };
 
@@ -194,13 +182,11 @@ struct workload
 	pthread_t thread;
 	bool run;
 	bool background;
-	const struct workload_balancer *balancer;
 	unsigned int repeat;
 	unsigned int flags;
 	bool print_stats;
 
 	uint32_t bb_prng;
-	uint32_t prng;
 
 	struct timespec repeat_start;
 
@@ -210,73 +196,25 @@ struct workload
 	int sync_timeline;
 	uint32_t sync_seqno;
 
-	uint32_t seqno[NUM_ENGINES];
-	struct drm_i915_gem_exec_object2 status_object[2];
-	uint32_t *status_page;
-	uint32_t *status_cs;
-	unsigned int vcs_rr;
-
-	unsigned long qd_sum[NUM_ENGINES];
-	unsigned long nr_bb[NUM_ENGINES];
-
 	struct igt_list_head requests[NUM_ENGINES];
 	unsigned int nrequest[NUM_ENGINES];
-
-	struct workload *global_wrk;
-	const struct workload_balancer *global_balancer;
-	pthread_mutex_t mutex;
-
-	union {
-		struct rtavg {
-			struct ewma_rt avg[NUM_ENGINES];
-			uint32_t last[NUM_ENGINES];
-		} rt;
-	};
-
-	struct busy_balancer {
-		int fd;
-		bool first;
-		unsigned int num_engines;
-		unsigned int engine_map[NUM_ENGINES];
-		uint64_t t_prev;
-		uint64_t prev[NUM_ENGINES];
-		double busy[NUM_ENGINES];
-	} busy_balancer;
 };
 
-struct intel_mmio_data mmio_data;
 static const unsigned int nop_calibration_us = 1000;
 static bool has_nop_calibration = false;
 static bool sequential = true;
 
 static unsigned int master_prng;
 
-static unsigned int context_vcs_rr;
-
 static int verbose = 1;
 static int fd;
 static struct drm_i915_gem_context_param_sseu device_sseu = {
 	.slice_mask = -1 /* Force read on first use. */
 };
 
-#define SWAPVCS		(1<<0)
-#define SEQNO		(1<<1)
-#define BALANCE		(1<<2)
-#define RT		(1<<3)
-#define VCS2REMAP	(1<<4)
-#define INITVCSRR	(1<<5)
-#define SYNCEDCLIENTS	(1<<6)
-#define HEARTBEAT	(1<<7)
-#define GLOBAL_BALANCE	(1<<8)
-#define DEPSYNC		(1<<9)
-#define I915		(1<<10)
-#define SSEU		(1<<11)
-
-#define SEQNO_IDX(engine) ((engine) * 16)
-#define SEQNO_OFFSET(engine) (SEQNO_IDX(engine) * sizeof(uint32_t))
-
-#define RCS_TIMESTAMP (0x2000 + 0x358)
-#define REG(x) (volatile uint32_t *)((volatile char *)igt_global_mmio + x)
+#define SYNCEDCLIENTS	(1<<1)
+#define DEPSYNC		(1<<2)
+#define SSEU		(1<<3)
 
 static const char *ring_str_map[NUM_ENGINES] = {
 	[DEFAULT] = "DEFAULT",
@@ -578,26 +516,6 @@ static unsigned int num_engines_in_class(enum intel_engine_id class)
 	return count;
 }
 
-static void
-fill_engines_class(struct i915_engine_class_instance *ci,
-		   enum intel_engine_id class)
-{
-	unsigned int i, j = 0;
-
-	igt_assert(class == VCS);
-
-	query_engines();
-
-	for (i = 0; i < __num_engines; i++) {
-		if (__engines[i].engine_class != I915_ENGINE_CLASS_VIDEO)
-			continue;
-
-		ci[j].engine_class = __engines[i].engine_class;
-		ci[j].engine_instance = __engines[i].engine_instance;
-		j++;
-	}
-}
-
 static void
 fill_engines_id_class(enum intel_engine_id *list,
 		      enum intel_engine_id class)
@@ -744,7 +662,6 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 	char *_token, *token, *tctx = NULL, *tstart = desc;
 	char *field, *fctx = NULL, *fstart;
 	struct w_step step, *steps = NULL;
-	bool bcs_used = false;
 	unsigned int valid;
 	int i, j, tmp;
 
@@ -962,9 +879,6 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			valid++;
 
 			step.engine = i;
-
-			if (step.engine == BCS)
-				bcs_used = true;
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx))) {
@@ -1089,9 +1003,6 @@ add_step:
 		}
 	}
 
-	if (bcs_used && (flags & VCS2REMAP) && verbose)
-		printf("BCS usage in workload with VCS2 remapping enabled!\n");
-
 	return wrk;
 }
 
@@ -1147,7 +1058,7 @@ static unsigned int get_duration(struct workload *wrk, struct w_step *w)
 static struct ctx *
 __get_ctx(struct workload *wrk, const struct w_step *w)
 {
-	return &wrk->ctx_list[w->context * 2];
+	return &wrk->ctx_list[w->context];
 }
 
 static unsigned long
@@ -1211,13 +1122,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	unsigned int r = 0;
 	uint32_t *ptr, *cs;
 
-	igt_assert(((flags & RT) && (flags & SEQNO)) || !(flags & RT));
-
 	batch_start -= sizeof(uint32_t); /* bbend */
-	if (flags & SEQNO)
-		batch_start -= 4 * sizeof(uint32_t);
-	if (flags & RT)
-		batch_start -= 12 * sizeof(uint32_t);
 
 	if (w->unbound_duration)
 		batch_start -= 4 * sizeof(uint32_t); /* MI_ARB_CHK + MI_BATCH_BUFFER_START */
@@ -1242,49 +1147,6 @@ terminate_bb(struct w_step *w, unsigned int flags)
 		*cs++ = 0;
 	}
 
-	if (flags & SEQNO) {
-		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
-		batch_start += 4 * sizeof(uint32_t);
-
-		*cs++ = MI_STORE_DWORD_IMM;
-		w->seqno_address = cs;
-		*cs++ = 0;
-		*cs++ = 0;
-		w->seqno_value = cs;
-		*cs++ = 0;
-	}
-
-	if (flags & RT) {
-		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
-		batch_start += 4 * sizeof(uint32_t);
-
-		*cs++ = MI_STORE_DWORD_IMM;
-		w->rt0_address = cs;
-		*cs++ = 0;
-		*cs++ = 0;
-		w->rt0_value = cs;
-		*cs++ = 0;
-
-		w->reloc[r++].offset = batch_start + 2 * sizeof(uint32_t);
-		batch_start += 4 * sizeof(uint32_t);
-
-		*cs++ = 0x24 << 23 | 2; /* MI_STORE_REG_MEM */
-		*cs++ = RCS_TIMESTAMP;
-		w->rt1_address = cs;
-		*cs++ = 0;
-		*cs++ = 0;
-
-		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
-		batch_start += 4 * sizeof(uint32_t);
-
-		*cs++ = MI_STORE_DWORD_IMM;
-		w->latch_address = cs;
-		*cs++ = 0;
-		*cs++ = 0;
-		w->latch_value = cs;
-		*cs++ = 0;
-	}
-
 	*cs = bbe;
 
 	return r;
@@ -1305,13 +1167,7 @@ eb_set_engine(struct drm_i915_gem_execbuffer2 *eb,
 	      enum intel_engine_id engine,
 	      unsigned int flags)
 {
-	if (engine == VCS2 && (flags & VCS2REMAP))
-		engine = BCS;
-
-	if ((flags & I915) && engine == VCS)
-		eb->flags = 0;
-	else
-		eb->flags = eb_engine_map[engine];
+	eb->flags = eb_engine_map[engine];
 }
 
 static unsigned int
@@ -1324,7 +1180,7 @@ find_engine_in_map(struct ctx *ctx, enum intel_engine_id engine)
 			return i + 1;
 	}
 
-	igt_assert(ctx->wants_balance);
+	igt_assert(ctx->load_balance);
 	return 0;
 }
 
@@ -1347,24 +1203,10 @@ eb_update_flags(struct workload *wrk, struct w_step *w,
 		w->eb.flags |= I915_EXEC_FENCE_OUT;
 }
 
-static struct drm_i915_gem_exec_object2 *
-get_status_objects(struct workload *wrk)
-{
-	if (wrk->flags & GLOBAL_BALANCE)
-		return wrk->global_wrk->status_object;
-	else
-		return wrk->status_object;
-}
-
 static uint32_t
 get_ctxid(struct workload *wrk, struct w_step *w)
 {
-	struct ctx *ctx = __get_ctx(wrk, w);
-
-	if (ctx->targets_instance && ctx->wants_balance && w->engine == VCS)
-		return wrk->ctx_list[w->context * 2 + 1].id;
-	else
-		return wrk->ctx_list[w->context * 2].id;
+	return wrk->ctx_list[w->context].id;
 }
 
 static void
@@ -1372,7 +1214,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 {
 	enum intel_engine_id engine = w->engine;
 	unsigned int j = 0;
-	unsigned int nr_obj = 3 + w->data_deps.nr;
+	unsigned int nr_obj = 2 + w->data_deps.nr;
 	unsigned int i;
 
 	w->obj = calloc(nr_obj, sizeof(*w->obj));
@@ -1383,11 +1225,6 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 	j++;
 	igt_assert(j < nr_obj);
 
-	if (flags & SEQNO) {
-		w->obj[j++] = get_status_objects(wrk)[0];
-		igt_assert(j < nr_obj);
-	}
-
 	for (i = 0; i < w->data_deps.nr; i++) {
 		igt_assert(w->data_deps.list[i] <= 0);
 		if (w->data_deps.list[i]) {
@@ -1414,21 +1251,15 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 	w->obj[j].relocation_count = terminate_bb(w, flags);
 
 	if (w->obj[j].relocation_count) {
+		igt_assert(w->unbound_duration);
 		w->obj[j].relocs_ptr = to_user_pointer(&w->reloc);
-		for (i = 0; i < w->obj[j].relocation_count; i++)
-			w->reloc[i].target_handle = 1;
-		if (w->unbound_duration)
-			w->reloc[0].target_handle = j;
+		w->reloc[0].target_handle = j;
 	}
 
 	w->eb.buffers_ptr = to_user_pointer(w->obj);
 	w->eb.buffer_count = j + 1;
 	w->eb.rsvd1 = get_ctxid(wrk, w);
 
-	if (flags & SWAPVCS && engine == VCS1)
-		engine = VCS2;
-	else if (flags & SWAPVCS && engine == VCS2)
-		engine = VCS1;
 	eb_update_flags(wrk, w, engine, flags);
 #ifdef DEBUG
 	printf("%u: %u:|", w->idx, w->eb.buffer_count);
@@ -1528,7 +1359,7 @@ set_ctx_sseu(struct ctx *ctx, uint64_t slice_mask)
 	if (slice_mask == -1)
 		slice_mask = device_sseu.slice_mask;
 
-	if (ctx->engine_map && ctx->wants_balance) {
+	if (ctx->engine_map && ctx->load_balance) {
 		sseu.flags = I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX;
 		sseu.engine.engine_class = I915_ENGINE_CLASS_INVALID;
 		sseu.engine.engine_instance = 0;
@@ -1569,48 +1400,20 @@ static size_t sizeof_engines_bond(int count)
 static int
 prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 {
-	unsigned int ctx_vcs;
+	uint32_t share_vm = 0;
 	int max_ctx = -1;
 	struct w_step *w;
 	int i, j;
 
 	wrk->id = id;
-	wrk->prng = rand();
 	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
 	wrk->run = true;
 
-	ctx_vcs =  0;
-	if (flags & INITVCSRR)
-		ctx_vcs = id & 1;
-	wrk->vcs_rr = ctx_vcs;
-
-	if (flags & GLOBAL_BALANCE) {
-		int ret = pthread_mutex_init(&wrk->mutex, NULL);
-		igt_assert(ret == 0);
-	}
-
-	if (flags & SEQNO) {
-		if (!(flags & GLOBAL_BALANCE) || id == 0) {
-			uint32_t handle;
-
-			handle = gem_create(fd, 4096);
-			gem_set_caching(fd, handle, I915_CACHING_CACHED);
-			wrk->status_object[0].handle = handle;
-			wrk->status_page = gem_mmap__cpu(fd, handle, 0, 4096,
-							 PROT_READ);
-
-			handle = gem_create(fd, 4096);
-			wrk->status_object[1].handle = handle;
-			wrk->status_cs = gem_mmap__wc(fd, handle,
-						      0, 4096, PROT_WRITE);
-		}
-	}
-
 	/*
 	 * Pre-scan workload steps to allocate context list storage.
 	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		int ctx = w->context * 2 + 1; /* Odd slots are special. */
+		int ctx = w->context + 1;
 		int delta;
 
 		w->wrk = wrk;
@@ -1630,27 +1433,16 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	}
 
 	/*
-	 * Identify if contexts target specific engine instances and if they
-	 * want to be balanced.
-	 *
 	 * Transfer over engine map configuration from the workload step.
 	 */
-	for (j = 0; j < wrk->nr_ctxs; j += 2) {
+	for (j = 0; j < wrk->nr_ctxs; j++) {
 		struct ctx *ctx = &wrk->ctx_list[j];
 
-		bool targets = false;
-		bool balance = false;
-
 		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-			if (w->context != (j / 2))
+			if (w->context != j)
 				continue;
 
-			if (w->type == BATCH) {
-				if (w->engine == VCS)
-					balance = true;
-				else
-					targets = true;
-			} else if (w->type == ENGINE_MAP) {
+			if (w->type == ENGINE_MAP) {
 				ctx->engine_map = w->engine_map;
 				ctx->engine_map_count = w->engine_map_count;
 			} else if (w->type == LOAD_BALANCE) {
@@ -1658,9 +1450,9 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 					wsim_err("Load balancing needs an engine map!\n");
 					return 1;
 				}
-				ctx->wants_balance = w->load_balance;
+				ctx->load_balance = w->load_balance;
 			} else if (w->type == BOND) {
-				if (!ctx->wants_balance) {
+				if (!ctx->load_balance) {
 					wsim_err("Engine bonds need load balancing engine map!\n");
 					return 1;
 				}
@@ -1675,133 +1467,53 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 					w->bond_master;
 			}
 		}
-
-		wrk->ctx_list[j].targets_instance = targets;
-		if (flags & I915)
-			wrk->ctx_list[j].wants_balance |= balance;
-	}
-
-	/*
-	 * Ensure VCS is not allowed with engine map contexts.
-	 */
-	for (j = 0; j < wrk->nr_ctxs; j += 2) {
-		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-			if (w->context != (j / 2))
-				continue;
-
-			if (w->type != BATCH)
-				continue;
-
-			if (wrk->ctx_list[j].engine_map &&
-			    !wrk->ctx_list[j].wants_balance &&
-			    (w->engine == VCS || w->engine == DEFAULT)) {
-				wsim_err("Batches targetting engine maps must use explicit engines!\n");
-				return -1;
-			}
-		}
 	}
 
-
 	/*
 	 * Create and configure contexts.
 	 */
-	for (i = 0; i < wrk->nr_ctxs; i += 2) {
+	for (i = 0; i < wrk->nr_ctxs; i++) {
+		struct drm_i915_gem_context_create_ext_setparam ext = {
+			.base.name = I915_CONTEXT_CREATE_EXT_SETPARAM,
+			.param.param = I915_CONTEXT_PARAM_VM,
+		};
+		struct drm_i915_gem_context_create_ext args = { };
 		struct ctx *ctx = &wrk->ctx_list[i];
-		uint32_t ctx_id, share_vm = 0;
+		uint32_t ctx_id;
 
-		if (ctx->id)
-			continue;
+		igt_assert(!ctx->id);
 
-		if ((flags & I915) || ctx->engine_map) {
-			struct drm_i915_gem_context_create_ext_setparam ext = {
-				.base.name = I915_CONTEXT_CREATE_EXT_SETPARAM,
-				.param.param = I915_CONTEXT_PARAM_VM,
+		/* Find existing context to share ppgtt with. */
+		for (j = 0; !share_vm && j < wrk->nr_ctxs; j++) {
+			struct drm_i915_gem_context_param param = {
+				.param = I915_CONTEXT_PARAM_VM,
+				.ctx_id = wrk->ctx_list[j].id,
 			};
-			struct drm_i915_gem_context_create_ext args = { };
-
-			/* Find existing context to share ppgtt with. */
-			for (j = 0; j < wrk->nr_ctxs; j++) {
-				struct drm_i915_gem_context_param param = {
-					.param = I915_CONTEXT_PARAM_VM,
-				};
-
-				if (!wrk->ctx_list[j].id)
-					continue;
 
-				param.ctx_id = wrk->ctx_list[j].id;
-
-				gem_context_get_param(fd, &param);
-				igt_assert(param.value);
-
-				share_vm = param.value;
-
-				ext.param.value = share_vm;
-				args.flags =
-				    I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS;
-				args.extensions = to_user_pointer(&ext);
-				break;
-			}
-
-			if ((!ctx->engine_map && !ctx->targets_instance) ||
-			    (ctx->engine_map && ctx->wants_balance))
-				args.flags |=
-				     I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE;
-
-			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT,
-				 &args);
+			if (!param.ctx_id)
+				continue;
 
-			ctx_id = args.ctx_id;
-		} else {
-			struct drm_i915_gem_context_create args = {};
+			gem_context_get_param(fd, &param);
+			igt_assert(param.value);
+			share_vm = param.value;
+			break;
+		}
 
-			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
-			ctx_id = args.ctx_id;
+		if (share_vm) {
+			ext.param.value = share_vm;
+			args.flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS;
+			args.extensions = to_user_pointer(&ext);
 		}
 
-		igt_assert(ctx_id);
+		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT, &args);
+		igt_assert(args.ctx_id);
+
+		ctx_id = args.ctx_id;
 		ctx->id = ctx_id;
 		ctx->sseu = device_sseu.slice_mask;
 
-		if (flags & GLOBAL_BALANCE) {
-			ctx->static_vcs = context_vcs_rr;
-			context_vcs_rr ^= 1;
-		} else {
-			ctx->static_vcs = ctx_vcs;
-			ctx_vcs ^= 1;
-		}
-
 		__configure_context(ctx_id, wrk->prio);
 
-		/*
-		 * Do we need a separate context to satisfy this workloads which
-		 * both want to target specific engines and be balanced by i915?
-		 */
-		if ((flags & I915) && ctx->wants_balance &&
-		    ctx->targets_instance && !ctx->engine_map) {
-			struct drm_i915_gem_context_create_ext_setparam ext = {
-				.base.name = I915_CONTEXT_CREATE_EXT_SETPARAM,
-				.param.param = I915_CONTEXT_PARAM_VM,
-				.param.value = share_vm,
-			};
-			struct drm_i915_gem_context_create_ext args = {
-				.extensions = to_user_pointer(&ext),
-				.flags =
-				    I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS |
-				    I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE,
-			};
-
-			igt_assert(share_vm);
-
-			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT,
-				 &args);
-
-			igt_assert(args.ctx_id);
-			ctx_id = args.ctx_id;
-			wrk->ctx_list[i + 1].id = args.ctx_id;
-
-			__configure_context(ctx_id, wrk->prio);
-		}
-
 		if (ctx->engine_map) {
 			struct i915_context_param_engines *set_engines =
 				alloca0(sizeof_param_engines(ctx->engine_map_count + 1));
@@ -1815,7 +1527,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			};
 			struct i915_context_engines_bond *last = NULL;
 
-			if (ctx->wants_balance) {
+			if (ctx->load_balance) {
 				set_engines->extensions =
 					to_user_pointer(load_balance);
 
@@ -1869,34 +1581,6 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			}
 			load_balance->base.next_extension = to_user_pointer(last);
 
-			gem_context_set_param(fd, &param);
-		} else if (ctx->wants_balance) {
-			const unsigned int count = num_engines_in_class(VCS);
-			struct i915_context_engines_load_balance *load_balance =
-				alloca0(sizeof_load_balance(count));
-			struct i915_context_param_engines *set_engines =
-				alloca0(sizeof_param_engines(count + 1));
-			struct drm_i915_gem_context_param param = {
-				.ctx_id = ctx_id,
-				.param = I915_CONTEXT_PARAM_ENGINES,
-				.size = sizeof_param_engines(count + 1),
-				.value = to_user_pointer(set_engines),
-			};
-
-			set_engines->extensions = to_user_pointer(load_balance);
-
-			set_engines->engines[0].engine_class =
-				I915_ENGINE_CLASS_INVALID;
-			set_engines->engines[0].engine_instance =
-				I915_ENGINE_CLASS_INVALID_NONE;
-			fill_engines_class(&set_engines->engines[1], VCS);
-
-			load_balance->base.name =
-				I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE;
-			load_balance->num_siblings = count;
-
-			fill_engines_class(&load_balance->engines[0], VCS);
-
 			gem_context_set_param(fd, &param);
 		}
 
@@ -1904,11 +1588,11 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			/* Set to slice 0 only, one slice. */
 			ctx->sseu = set_ctx_sseu(ctx, 1);
 		}
-
-		if (share_vm)
-			vm_destroy(fd, share_vm);
 	}
 
+	if (share_vm)
+		vm_destroy(fd, share_vm);
+
 	/* Record default preemption. */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		if (w->type == BATCH)
@@ -1954,16 +1638,10 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	 * Allocate batch buffers.
 	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		unsigned int _flags = flags;
-		enum intel_engine_id engine = w->engine;
-
 		if (w->type != BATCH)
 			continue;
 
-		if (engine == VCS)
-			_flags &= ~SWAPVCS;
-
-		alloc_step_batch(wrk, w, _flags);
+		alloc_step_batch(wrk, w, flags);
 	}
 
 	return 0;
@@ -1980,602 +1658,6 @@ static int elapsed_us(const struct timespec *start, const struct timespec *end)
 	return elapsed(start, end) * 1e6;
 }
 
-static enum intel_engine_id get_vcs_engine(unsigned int n)
-{
-	const enum intel_engine_id vcs_engines[2] = { VCS1, VCS2 };
-
-	igt_assert(n < ARRAY_SIZE(vcs_engines));
-
-	return vcs_engines[n];
-}
-
-static uint32_t new_seqno(struct workload *wrk, enum intel_engine_id engine)
-{
-	uint32_t seqno;
-	int ret;
-
-	if (wrk->flags & GLOBAL_BALANCE) {
-		igt_assert(wrk->global_wrk);
-		wrk = wrk->global_wrk;
-
-		ret = pthread_mutex_lock(&wrk->mutex);
-		igt_assert(ret == 0);
-	}
-
-	seqno = ++wrk->seqno[engine];
-
-	if (wrk->flags & GLOBAL_BALANCE) {
-		ret = pthread_mutex_unlock(&wrk->mutex);
-		igt_assert(ret == 0);
-	}
-
-	return seqno;
-}
-
-static uint32_t
-current_seqno(struct workload *wrk, enum intel_engine_id engine)
-{
-	if (wrk->flags & GLOBAL_BALANCE)
-		return wrk->global_wrk->seqno[engine];
-	else
-		return wrk->seqno[engine];
-}
-
-static uint32_t
-read_status_page(struct workload *wrk, unsigned int idx)
-{
-	if (wrk->flags & GLOBAL_BALANCE)
-		return READ_ONCE(wrk->global_wrk->status_page[idx]);
-	else
-		return READ_ONCE(wrk->status_page[idx]);
-}
-
-static uint32_t
-current_gpu_seqno(struct workload *wrk, enum intel_engine_id engine)
-{
-       return read_status_page(wrk, SEQNO_IDX(engine));
-}
-
-struct workload_balancer {
-	unsigned int id;
-	const char *name;
-	const char *desc;
-	unsigned int flags;
-	unsigned int min_gen;
-
-	int (*init)(const struct workload_balancer *balancer,
-		    struct workload *wrk);
-	unsigned int (*get_qd)(const struct workload_balancer *balancer,
-			       struct workload *wrk,
-			       enum intel_engine_id engine);
-	enum intel_engine_id (*balance)(const struct workload_balancer *balancer,
-					struct workload *wrk, struct w_step *w);
-};
-
-static enum intel_engine_id
-rr_balance(const struct workload_balancer *balancer,
-	   struct workload *wrk, struct w_step *w)
-{
-	unsigned int engine;
-
-	engine = get_vcs_engine(wrk->vcs_rr);
-	wrk->vcs_rr ^= 1;
-
-	return engine;
-}
-
-static enum intel_engine_id
-rand_balance(const struct workload_balancer *balancer,
-	     struct workload *wrk, struct w_step *w)
-{
-	return get_vcs_engine(hars_petruska_f54_1_random(&wrk->prng) & 1);
-}
-
-static unsigned int
-get_qd_depth(const struct workload_balancer *balancer,
-	     struct workload *wrk, enum intel_engine_id engine)
-{
-	return current_seqno(wrk, engine) - current_gpu_seqno(wrk, engine);
-}
-
-static enum intel_engine_id
-__qd_select_engine(struct workload *wrk, const unsigned long *qd, bool random)
-{
-	unsigned int n;
-
-	if (qd[VCS1] < qd[VCS2])
-		n = 0;
-	else if (qd[VCS1] > qd[VCS2])
-		n = 1;
-	else if (random)
-		n = hars_petruska_f54_1_random(&wrk->prng) & 1;
-	else
-		n = wrk->vcs_rr;
-	wrk->vcs_rr = n ^ 1;
-
-	return get_vcs_engine(n);
-}
-
-static enum intel_engine_id
-__qd_balance(const struct workload_balancer *balancer,
-	     struct workload *wrk, struct w_step *w, bool random)
-{
-	enum intel_engine_id engine;
-	unsigned long qd[NUM_ENGINES];
-
-	igt_assert(w->engine == VCS);
-
-	qd[VCS1] = balancer->get_qd(balancer, wrk, VCS1);
-	wrk->qd_sum[VCS1] += qd[VCS1];
-
-	qd[VCS2] = balancer->get_qd(balancer, wrk, VCS2);
-	wrk->qd_sum[VCS2] += qd[VCS2];
-
-	engine = __qd_select_engine(wrk, qd, random);
-
-#ifdef DEBUG
-	printf("qd_balance[%u]: 1:%ld 2:%ld rr:%u = %u\t(%u - %u) (%u - %u)\n",
-	       wrk->id, qd[VCS1], qd[VCS2], wrk->vcs_rr, engine,
-	       current_seqno(wrk, VCS1), current_gpu_seqno(wrk, VCS1),
-	       current_seqno(wrk, VCS2), current_gpu_seqno(wrk, VCS2));
-#endif
-	return engine;
-}
-
-static enum intel_engine_id
-qd_balance(const struct workload_balancer *balancer,
-	     struct workload *wrk, struct w_step *w)
-{
-	return __qd_balance(balancer, wrk, w, false);
-}
-
-static enum intel_engine_id
-qdr_balance(const struct workload_balancer *balancer,
-	     struct workload *wrk, struct w_step *w)
-{
-	return __qd_balance(balancer, wrk, w, true);
-}
-
-static enum intel_engine_id
-qdavg_balance(const struct workload_balancer *balancer,
-	     struct workload *wrk, struct w_step *w)
-{
-	unsigned long qd[NUM_ENGINES];
-	unsigned int engine;
-
-	igt_assert(w->engine == VCS);
-
-	for (engine = VCS1; engine <= VCS2; engine++) {
-		qd[engine] = balancer->get_qd(balancer, wrk, engine);
-		wrk->qd_sum[engine] += qd[engine];
-
-		ewma_rt_add(&wrk->rt.avg[engine], qd[engine]);
-		qd[engine] = ewma_rt_read(&wrk->rt.avg[engine]);
-	}
-
-	engine = __qd_select_engine(wrk, qd, false);
-#ifdef DEBUG
-	printf("qdavg_balance[%u]: 1:%ld 2:%ld rr:%u = %u\t(%u - %u) (%u - %u)\n",
-	       wrk->id, qd[VCS1], qd[VCS2], wrk->vcs_rr, engine,
-	       current_seqno(wrk, VCS1), current_gpu_seqno(wrk, VCS1),
-	       current_seqno(wrk, VCS2), current_gpu_seqno(wrk, VCS2));
-#endif
-	return engine;
-}
-
-static enum intel_engine_id
-__rt_select_engine(struct workload *wrk, unsigned long *qd, bool random)
-{
-	qd[VCS1] >>= 10;
-	qd[VCS2] >>= 10;
-
-	return __qd_select_engine(wrk, qd, random);
-}
-
-struct rt_depth {
-	uint32_t seqno;
-	uint32_t submitted;
-	uint32_t completed;
-};
-
-static void get_rt_depth(struct workload *wrk,
-			 unsigned int engine,
-			 struct rt_depth *rt)
-{
-	const unsigned int idx = SEQNO_IDX(engine);
-	uint32_t latch;
-
-	do {
-		latch = read_status_page(wrk, idx + 3);
-		rt->submitted = read_status_page(wrk, idx + 1);
-		rt->completed = read_status_page(wrk, idx + 2);
-		rt->seqno = read_status_page(wrk, idx);
-	} while (latch != rt->seqno);
-}
-
-static enum intel_engine_id
-__rt_balance(const struct workload_balancer *balancer,
-	     struct workload *wrk, struct w_step *w, bool random)
-{
-	unsigned long qd[NUM_ENGINES];
-	unsigned int engine;
-
-	igt_assert(w->engine == VCS);
-
-	/* Estimate the "speed" of the most recent batch
-	 *    (finish time - submit time)
-	 * and use that as an approximate for the total remaining time for
-	 * all batches on that engine, plus the time we expect this batch to
-	 * take. We try to keep the total balanced between the engines.
-	 */
-	for (engine = VCS1; engine <= VCS2; engine++) {
-		struct rt_depth rt;
-
-		get_rt_depth(wrk, engine, &rt);
-		qd[engine] = current_seqno(wrk, engine) - rt.seqno;
-		wrk->qd_sum[engine] += qd[engine];
-		qd[engine] = (qd[engine] + 1) * (rt.completed - rt.submitted);
-#ifdef DEBUG
-		printf("rt[0] = %d (%d - %d) x %d (%d - %d) = %ld\n",
-		       current_seqno(wrk, engine) - rt.seqno,
-		       current_seqno(wrk, engine), rt.seqno,
-		       rt.completed - rt.submitted,
-		       rt.completed, rt.submitted,
-		       qd[engine]);
-#endif
-	}
-
-	return __rt_select_engine(wrk, qd, random);
-}
-
-static enum intel_engine_id
-rt_balance(const struct workload_balancer *balancer,
-	   struct workload *wrk, struct w_step *w)
-{
-
-	return __rt_balance(balancer, wrk, w, false);
-}
-
-static enum intel_engine_id
-rtr_balance(const struct workload_balancer *balancer,
-	   struct workload *wrk, struct w_step *w)
-{
-	return __rt_balance(balancer, wrk, w, true);
-}
-
-static enum intel_engine_id
-rtavg_balance(const struct workload_balancer *balancer,
-	   struct workload *wrk, struct w_step *w)
-{
-	unsigned long qd[NUM_ENGINES];
-	unsigned int engine;
-
-	igt_assert(w->engine == VCS);
-
-	/* Estimate the average "speed" of the most recent batches
-	 *    (finish time - submit time)
-	 * and use that as an approximate for the total remaining time for
-	 * all batches on that engine plus the time we expect to execute in.
-	 * We try to keep the total remaining balanced between the engines.
-	 */
-	for (engine = VCS1; engine <= VCS2; engine++) {
-		struct rt_depth rt;
-
-		get_rt_depth(wrk, engine, &rt);
-		if (rt.seqno != wrk->rt.last[engine]) {
-			igt_assert((long)(rt.completed - rt.submitted) > 0);
-			ewma_rt_add(&wrk->rt.avg[engine],
-				    rt.completed - rt.submitted);
-			wrk->rt.last[engine] = rt.seqno;
-		}
-		qd[engine] = current_seqno(wrk, engine) - rt.seqno;
-		wrk->qd_sum[engine] += qd[engine];
-		qd[engine] =
-			(qd[engine] + 1) * ewma_rt_read(&wrk->rt.avg[engine]);
-
-#ifdef DEBUG
-		printf("rtavg[%d] = %d (%d - %d) x %ld (%d) = %ld\n",
-		       engine,
-		       current_seqno(wrk, engine) - rt.seqno,
-		       current_seqno(wrk, engine), rt.seqno,
-		       ewma_rt_read(&wrk->rt.avg[engine]),
-		       rt.completed - rt.submitted,
-		       qd[engine]);
-#endif
-	}
-
-	return __rt_select_engine(wrk, qd, false);
-}
-
-static enum intel_engine_id
-context_balance(const struct workload_balancer *balancer,
-		struct workload *wrk, struct w_step *w)
-{
-	return get_vcs_engine(__get_ctx(wrk, w)->static_vcs);
-}
-
-static unsigned int
-get_engine_busy(const struct workload_balancer *balancer,
-		struct workload *wrk, enum intel_engine_id engine)
-{
-	struct busy_balancer *bb = &wrk->busy_balancer;
-
-	if (engine == VCS2 && (wrk->flags & VCS2REMAP))
-		engine = BCS;
-
-	return bb->busy[bb->engine_map[engine]];
-}
-
-static void
-get_pmu_stats(const struct workload_balancer *b, struct workload *wrk)
-{
-	struct busy_balancer *bb = &wrk->busy_balancer;
-	uint64_t val[7];
-	unsigned int i;
-
-	igt_assert_eq(read(bb->fd, val, sizeof(val)),
-		      (2 + bb->num_engines) * sizeof(uint64_t));
-
-	if (!bb->first) {
-		for (i = 0; i < bb->num_engines; i++) {
-			double d;
-
-			d = (val[2 + i] - bb->prev[i]) * 100;
-			d /= val[1] - bb->t_prev;
-			bb->busy[i] = d;
-		}
-	}
-
-	for (i = 0; i < bb->num_engines; i++)
-		bb->prev[i] = val[2 + i];
-
-	bb->t_prev = val[1];
-	bb->first = false;
-}
-
-static enum intel_engine_id
-busy_avg_balance(const struct workload_balancer *balancer,
-		 struct workload *wrk, struct w_step *w)
-{
-	get_pmu_stats(balancer, wrk);
-
-	return qdavg_balance(balancer, wrk, w);
-}
-
-static enum intel_engine_id
-busy_balance(const struct workload_balancer *balancer,
-	     struct workload *wrk, struct w_step *w)
-{
-	get_pmu_stats(balancer, wrk);
-
-	return qd_balance(balancer, wrk, w);
-}
-
-static int
-busy_init(const struct workload_balancer *balancer, struct workload *wrk)
-{
-	struct busy_balancer *bb = &wrk->busy_balancer;
-	struct engine_desc {
-		unsigned class, inst;
-		enum intel_engine_id id;
-	} *d, engines[] = {
-		{ I915_ENGINE_CLASS_RENDER, 0, RCS },
-		{ I915_ENGINE_CLASS_COPY, 0, BCS },
-		{ I915_ENGINE_CLASS_VIDEO, 0, VCS1 },
-		{ I915_ENGINE_CLASS_VIDEO, 1, VCS2 },
-		{ I915_ENGINE_CLASS_VIDEO_ENHANCE, 0, VECS },
-		{ 0, 0, VCS }
-	};
-
-	bb->num_engines = 0;
-	bb->first = true;
-	bb->fd = -1;
-
-	for (d = &engines[0]; d->id != VCS; d++) {
-		int pfd;
-
-		pfd = perf_igfx_open_group(I915_PMU_ENGINE_BUSY(d->class,
-								d->inst),
-					   bb->fd);
-		if (pfd < 0) {
-			if (d->id != VCS2)
-				return -(10 + bb->num_engines);
-			else
-				continue;
-		}
-
-		if (bb->num_engines == 0)
-			bb->fd = pfd;
-
-		bb->engine_map[d->id] = bb->num_engines++;
-	}
-
-	if (bb->num_engines < 5 && !(wrk->flags & VCS2REMAP))
-		return -1;
-
-	return 0;
-}
-
-static const struct workload_balancer all_balancers[] = {
-	{
-		.id = 0,
-		.name = "rr",
-		.desc = "Simple round-robin.",
-		.balance = rr_balance,
-	},
-	{
-		.id = 6,
-		.name = "rand",
-		.desc = "Random selection.",
-		.balance = rand_balance,
-	},
-	{
-		.id = 1,
-		.name = "qd",
-		.desc = "Queue depth estimation with round-robin on equal depth.",
-		.flags = SEQNO,
-		.min_gen = 8,
-		.get_qd = get_qd_depth,
-		.balance = qd_balance,
-	},
-	{
-		.id = 5,
-		.name = "qdr",
-		.desc = "Queue depth estimation with random selection on equal depth.",
-		.flags = SEQNO,
-		.min_gen = 8,
-		.get_qd = get_qd_depth,
-		.balance = qdr_balance,
-	},
-	{
-		.id = 7,
-		.name = "qdavg",
-		.desc = "Like qd, but using an average queue depth estimator.",
-		.flags = SEQNO,
-		.min_gen = 8,
-		.get_qd = get_qd_depth,
-		.balance = qdavg_balance,
-	},
-	{
-		.id = 2,
-		.name = "rt",
-		.desc = "Queue depth plus last runtime estimation.",
-		.flags = SEQNO | RT,
-		.min_gen = 8,
-		.get_qd = get_qd_depth,
-		.balance = rt_balance,
-	},
-	{
-		.id = 3,
-		.name = "rtr",
-		.desc = "Like rt but with random engine selection on equal depth.",
-		.flags = SEQNO | RT,
-		.min_gen = 8,
-		.get_qd = get_qd_depth,
-		.balance = rtr_balance,
-	},
-	{
-		.id = 4,
-		.name = "rtavg",
-		.desc = "Improved version rt tracking average execution speed per engine.",
-		.flags = SEQNO | RT,
-		.min_gen = 8,
-		.get_qd = get_qd_depth,
-		.balance = rtavg_balance,
-	},
-	{
-		.id = 8,
-		.name = "context",
-		.desc = "Static round-robin VCS assignment at context creation.",
-		.balance = context_balance,
-	},
-	{
-		.id = 9,
-		.name = "busy",
-		.desc = "Engine busyness based balancing.",
-		.init = busy_init,
-		.get_qd = get_engine_busy,
-		.balance = busy_balance,
-	},
-	{
-		.id = 10,
-		.name = "busy-avg",
-		.desc = "Average engine busyness based balancing.",
-		.init = busy_init,
-		.get_qd = get_engine_busy,
-		.balance = busy_avg_balance,
-	},
-	{
-		.id = 11,
-		.name = "i915",
-		.desc = "i915 balancing.",
-		.flags = I915,
-	},
-};
-
-static unsigned int
-global_get_qd(const struct workload_balancer *balancer,
-	      struct workload *wrk, enum intel_engine_id engine)
-{
-	igt_assert(wrk->global_wrk);
-	igt_assert(wrk->global_balancer);
-
-	return wrk->global_balancer->get_qd(wrk->global_balancer,
-					    wrk->global_wrk, engine);
-}
-
-static enum intel_engine_id
-global_balance(const struct workload_balancer *balancer,
-	       struct workload *wrk, struct w_step *w)
-{
-	enum intel_engine_id engine;
-	int ret;
-
-	igt_assert(wrk->global_wrk);
-	igt_assert(wrk->global_balancer);
-
-	wrk = wrk->global_wrk;
-
-	ret = pthread_mutex_lock(&wrk->mutex);
-	igt_assert(ret == 0);
-
-	engine = wrk->global_balancer->balance(wrk->global_balancer, wrk, w);
-
-	ret = pthread_mutex_unlock(&wrk->mutex);
-	igt_assert(ret == 0);
-
-	return engine;
-}
-
-static const struct workload_balancer global_balancer = {
-		.id = ~0,
-		.name = "global",
-		.desc = "Global balancer",
-		.get_qd = global_get_qd,
-		.balance = global_balance,
-	};
-
-static void
-update_bb_seqno(struct w_step *w, enum intel_engine_id engine, uint32_t seqno)
-{
-	gem_set_domain(fd, w->bb_handle,
-		       I915_GEM_DOMAIN_WC, I915_GEM_DOMAIN_WC);
-
-	w->reloc[0].delta = SEQNO_OFFSET(engine);
-
-	*w->seqno_value = seqno;
-	*w->seqno_address = w->reloc[0].presumed_offset + w->reloc[0].delta;
-
-	/* If not using NO_RELOC, force the relocations */
-	if (!(w->eb.flags & I915_EXEC_NO_RELOC))
-		w->reloc[0].presumed_offset = -1;
-}
-
-static void
-update_bb_rt(struct w_step *w, enum intel_engine_id engine, uint32_t seqno)
-{
-	gem_set_domain(fd, w->bb_handle,
-		       I915_GEM_DOMAIN_WC, I915_GEM_DOMAIN_WC);
-
-	w->reloc[1].delta = SEQNO_OFFSET(engine) + sizeof(uint32_t);
-	w->reloc[2].delta = SEQNO_OFFSET(engine) + 2 * sizeof(uint32_t);
-	w->reloc[3].delta = SEQNO_OFFSET(engine) + 3 * sizeof(uint32_t);
-
-	*w->latch_value = seqno;
-	*w->latch_address = w->reloc[3].presumed_offset + w->reloc[3].delta;
-
-	*w->rt0_value = *REG(RCS_TIMESTAMP);
-	*w->rt0_address = w->reloc[1].presumed_offset + w->reloc[1].delta;
-	*w->rt1_address = w->reloc[2].presumed_offset + w->reloc[2].delta;
-
-	/* If not using NO_RELOC, force the relocations */
-	if (!(w->eb.flags & I915_EXEC_NO_RELOC)) {
-		w->reloc[1].presumed_offset = -1;
-		w->reloc[2].presumed_offset = -1;
-		w->reloc[3].presumed_offset = -1;
-	}
-}
-
 static void
 update_bb_start(struct w_step *w)
 {
@@ -2606,123 +1688,13 @@ static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 	gem_sync(fd, wrk->steps[target].obj[0].handle);
 }
 
-static uint32_t *get_status_cs(struct workload *wrk)
-{
-	return wrk->status_cs;
-}
-
-#define INIT_CLOCKS 0x1
-#define INIT_ALL (INIT_CLOCKS)
-static void init_status_page(struct workload *wrk, unsigned int flags)
-{
-	struct drm_i915_gem_relocation_entry reloc[4] = {};
-	struct drm_i915_gem_exec_object2 *status_object =
-						get_status_objects(wrk);
-	struct drm_i915_gem_execbuffer2 eb = {
-		.buffer_count = ARRAY_SIZE(wrk->status_object),
-		.buffers_ptr = to_user_pointer(status_object)
-	};
-	uint32_t *base = get_status_cs(wrk);
-
-	/* Want to make sure that the balancer has a reasonable view of
-	 * the background busyness of each engine. To do that we occasionally
-	 * send a dummy batch down the pipeline.
-	 */
-
-	if (!base)
-		return;
-
-	gem_set_domain(fd, status_object[1].handle,
-		       I915_GEM_DOMAIN_WC, I915_GEM_DOMAIN_WC);
-
-	status_object[1].relocs_ptr = to_user_pointer(reloc);
-	status_object[1].relocation_count = 2;
-	if (flags & INIT_CLOCKS)
-		status_object[1].relocation_count += 2;
-
-	for (int engine = 0; engine < NUM_ENGINES; engine++) {
-		struct drm_i915_gem_relocation_entry *r = reloc;
-		uint64_t presumed_offset = status_object[0].offset;
-		uint32_t offset = engine * 128;
-		uint32_t *cs = base + offset / sizeof(*cs);
-		uint64_t addr;
-
-		r->offset = offset + sizeof(uint32_t);
-		r->delta = SEQNO_OFFSET(engine);
-		r->presumed_offset = presumed_offset;
-		addr = presumed_offset + r->delta;
-		r++;
-		*cs++ = MI_STORE_DWORD_IMM;
-		*cs++ = addr;
-		*cs++ = addr >> 32;
-		*cs++ = new_seqno(wrk, engine);
-		offset += 4 * sizeof(uint32_t);
-
-		/* When we are busy, we can just reuse the last set of timings.
-		 * If we have been idle for a while, we want to resample the
-		 * latency on each engine (to measure external load).
-		 */
-		if (flags & INIT_CLOCKS) {
-			r->offset = offset + sizeof(uint32_t);
-			r->delta = SEQNO_OFFSET(engine) + sizeof(uint32_t);
-			r->presumed_offset = presumed_offset;
-			addr = presumed_offset + r->delta;
-			r++;
-			*cs++ = MI_STORE_DWORD_IMM;
-			*cs++ = addr;
-			*cs++ = addr >> 32;
-			*cs++ = *REG(RCS_TIMESTAMP);
-			offset += 4 * sizeof(uint32_t);
-
-			r->offset = offset + 2 * sizeof(uint32_t);
-			r->delta = SEQNO_OFFSET(engine) + 2*sizeof(uint32_t);
-			r->presumed_offset = presumed_offset;
-			addr = presumed_offset + r->delta;
-			r++;
-			*cs++ = 0x24 << 23 | 2; /* MI_STORE_REG_MEM */
-			*cs++ = RCS_TIMESTAMP;
-			*cs++ = addr;
-			*cs++ = addr >> 32;
-			offset += 4 * sizeof(uint32_t);
-		}
-
-		r->offset = offset + sizeof(uint32_t);
-		r->delta = SEQNO_OFFSET(engine) + 3*sizeof(uint32_t);
-		r->presumed_offset = presumed_offset;
-		addr = presumed_offset + r->delta;
-		r++;
-		*cs++ = MI_STORE_DWORD_IMM;
-		*cs++ = addr;
-		*cs++ = addr >> 32;
-		*cs++ = current_seqno(wrk, engine);
-		offset += 4 * sizeof(uint32_t);
-
-		*cs++ = MI_BATCH_BUFFER_END;
-
-		eb_set_engine(&eb, engine, wrk->flags);
-		eb.flags |= I915_EXEC_HANDLE_LUT;
-		eb.flags |= I915_EXEC_NO_RELOC;
-
-		eb.batch_start_offset = 128 * engine;
-
-		gem_execbuf(fd, &eb);
-	}
-}
-
 static void
 do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
       unsigned int flags)
 {
-	uint32_t seqno = new_seqno(wrk, engine);
 	unsigned int i;
 
 	eb_update_flags(wrk, w, engine, flags);
-
-	if (flags & SEQNO)
-		update_bb_seqno(w, engine, seqno);
-	if (flags & RT)
-		update_bb_rt(w, engine, seqno);
-
 	update_bb_start(w);
 
 	w->eb.batch_start_offset =
@@ -2758,9 +1730,8 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 	}
 }
 
-static bool sync_deps(struct workload *wrk, struct w_step *w)
+static void sync_deps(struct workload *wrk, struct w_step *w)
 {
-	bool synced = false;
 	unsigned int i;
 
 	for (i = 0; i < w->data_deps.nr; i++) {
@@ -2777,11 +1748,7 @@ static bool sync_deps(struct workload *wrk, struct w_step *w)
 		igt_assert(wrk->steps[dep_idx].type == BATCH);
 
 		gem_sync(fd, wrk->steps[dep_idx].obj[0].handle);
-
-		synced = true;
 	}
-
-	return synced;
 }
 
 static void *run_workload(void *data)
@@ -2789,7 +1756,6 @@ static void *run_workload(void *data)
 	struct workload *wrk = (struct workload *)data;
 	struct timespec t_start, t_end;
 	struct w_step *w;
-	bool last_sync = false;
 	int throttle = -1;
 	int qd_throttle = -1;
 	int count;
@@ -2797,7 +1763,6 @@ static void *run_workload(void *data)
 
 	clock_gettime(CLOCK_MONOTONIC, &t_start);
 
-	init_status_page(wrk, INIT_ALL);
 	for (count = 0; wrk->run && (wrk->background || count < wrk->repeat);
 	     count++) {
 		unsigned int cur_seqno = wrk->sync_seqno;
@@ -2898,21 +1863,8 @@ static void *run_workload(void *data)
 
 			igt_assert(w->type == BATCH);
 
-			if ((wrk->flags & DEPSYNC) && engine == VCS)
-				last_sync = sync_deps(wrk, w);
-
-			if (last_sync && (wrk->flags & HEARTBEAT))
-				init_status_page(wrk, 0);
-
-			last_sync = false;
-
-			wrk->nr_bb[engine]++;
-			if (engine == VCS && wrk->balancer &&
-			    wrk->balancer->balance) {
-				engine = wrk->balancer->balance(wrk->balancer,
-								wrk, w);
-				wrk->nr_bb[engine]++;
-			}
+			if (wrk->flags & DEPSYNC)
+				sync_deps(wrk, w);
 
 			if (throttle > 0)
 				w_sync_to(wrk, w, i - throttle);
@@ -2930,10 +1882,8 @@ static void *run_workload(void *data)
 			if (!wrk->run)
 				break;
 
-			if (w->sync) {
+			if (w->sync)
 				gem_sync(fd, w->obj[0].handle);
-				last_sync = true;
-			}
 
 			if (qd_throttle > 0) {
 				while (wrk->nrequest[engine] > qd_throttle) {
@@ -2943,7 +1893,6 @@ static void *run_workload(void *data)
 								 s, rq_link);
 
 					gem_sync(fd, s->obj[0].handle);
-					last_sync = true;
 
 					s->request = -1;
 					igt_list_del(&s->rq_link);
@@ -2986,13 +1935,6 @@ static void *run_workload(void *data)
 		printf("%c%u: %.3fs elapsed (%d cycles, %.3f workloads/s).",
 		       wrk->background ? ' ' : '*', wrk->id,
 		       t, count, count / t);
-		if (wrk->balancer)
-			printf(" %lu (%lu + %lu) total VCS batches.",
-			       wrk->nr_bb[VCS], wrk->nr_bb[VCS1], wrk->nr_bb[VCS2]);
-		if (wrk->balancer && wrk->balancer->get_qd)
-			printf(" Average queue depths %.3f, %.3f.",
-			       (double)wrk->qd_sum[VCS1] / wrk->nr_bb[VCS],
-			       (double)wrk->qd_sum[VCS2] / wrk->nr_bb[VCS]);
 		putchar('\n');
 	}
 
@@ -3114,8 +2056,6 @@ calibrate_engines(void)
 
 static void print_help(void)
 {
-	unsigned int i;
-
 	puts(
 "Usage: gem_wsim [OPTIONS]\n"
 "\n"
@@ -3145,32 +2085,11 @@ static void print_help(void)
 "  -a <desc|path>    Append a workload to all other workloads.\n"
 "  -r <n>            How many times to emit the workload.\n"
 "  -c <n>            Fork N clients emitting the workload simultaneously.\n"
-"  -x                Swap VCS1 and VCS2 engines in every other client.\n"
-"  -b <n>            Load balancing to use.\n"
-"                    Available load balancers are:"
-	);
-
-	for (i = 0; i < ARRAY_SIZE(all_balancers); i++) {
-		igt_assert(all_balancers[i].desc);
-		printf(
-"                       %s (%u): %s\n",
-		       all_balancers[i].name, all_balancers[i].id,
-		       all_balancers[i].desc);
-	}
-	puts(
-"                     Balancers can be specified either as names or as their id\n"
-"                     number as listed above.\n"
-"  -2                 Remap VCS2 to BCS.\n"
-"  -R                 Round-robin initial VCS assignment per client.\n"
-"  -H                 Send heartbeat on synchronisation points with seqno based\n"
-"                     balancers. Gives better engine busyness view in some cases.\n"
-"  -s                 Turn on small SSEU config for the next workload on the\n"
-"                     command line. Subsequent -s switches it off.\n"
-"  -S                 Synchronize the sequence of random batch durations between\n"
-"                     clients.\n"
-"  -G                 Global load balancing - a single load balancer will be shared\n"
-"                     between all clients and there will be a single seqno domain.\n"
-"  -d                 Sync between data dependencies in userspace."
+"  -s                Turn on small SSEU config for the next workload on the\n"
+"                    command line. Subsequent -s switches it off.\n"
+"  -S                Synchronize the sequence of random batch durations between\n"
+"                    clients.\n"
+"  -d                Sync between data dependencies in userspace."
 	);
 }
 
@@ -3218,62 +2137,6 @@ add_workload_arg(struct w_arg *w_args, unsigned int nr_args, char *w_arg,
 	return w_args;
 }
 
-static int find_balancer_by_name(char *name)
-{
-	unsigned int i;
-
-	for (i = 0; i < ARRAY_SIZE(all_balancers); i++) {
-		if (!strcasecmp(name, all_balancers[i].name))
-			return all_balancers[i].id;
-	}
-
-	return -1;
-}
-
-static const struct workload_balancer *find_balancer_by_id(unsigned int id)
-{
-	unsigned int i;
-
-	for (i = 0; i < ARRAY_SIZE(all_balancers); i++) {
-		if (id == all_balancers[i].id)
-			return &all_balancers[i];
-	}
-
-	return NULL;
-}
-
-static void init_clocks(void)
-{
-	struct timespec t_start, t_end;
-	uint32_t rcs_start, rcs_end;
-	double overhead, t;
-
-	if (verbose <= 1)
-		return;
-
-	clock_gettime(CLOCK_MONOTONIC, &t_start);
-	for (int i = 0; i < 100; i++)
-		rcs_start = *REG(RCS_TIMESTAMP);
-	clock_gettime(CLOCK_MONOTONIC, &t_end);
-	overhead = 2 * elapsed(&t_start, &t_end) / 100;
-
-	clock_gettime(CLOCK_MONOTONIC, &t_start);
-	for (int i = 0; i < 100; i++)
-		clock_gettime(CLOCK_MONOTONIC, &t_end);
-	clock_gettime(CLOCK_MONOTONIC, &t_end);
-	overhead += elapsed(&t_start, &t_end) / 100;
-
-	clock_gettime(CLOCK_MONOTONIC, &t_start);
-	rcs_start = *REG(RCS_TIMESTAMP);
-	usleep(100);
-	rcs_end = *REG(RCS_TIMESTAMP);
-	clock_gettime(CLOCK_MONOTONIC, &t_end);
-
-	t = elapsed(&t_start, &t_end) - overhead;
-	printf("%d cycles in %.1fus, i.e. 1024 cycles takes %1.fus\n",
-	       rcs_end - rcs_start, 1e6*t, 1024e6 * t / (rcs_end - rcs_start));
-}
-
 int main(int argc, char **argv)
 {
 	unsigned int repeat = 1;
@@ -3287,9 +2150,7 @@ int main(int argc, char **argv)
 	char *append_workload_arg = NULL;
 	struct w_arg *w_args = NULL;
 	unsigned int tolerance_pct = 1;
-	const struct workload_balancer *balancer = NULL;
 	int exitcode = EXIT_FAILURE;
-	char *endptr = NULL;
 	int prio = 0;
 	double t;
 	int i, c;
@@ -3304,17 +2165,13 @@ int main(int argc, char **argv)
 	 * This minimizes the gap in engine utilization tracking when observed
 	 * via external tools like trace.pl.
 	 */
-	fd = __drm_open_driver(DRIVER_INTEL);
+	fd = __drm_open_driver_render(DRIVER_INTEL);
 	igt_require(fd);
 
-	intel_register_access_init(&mmio_data, intel_get_pci_device(), false, fd);
-
-	init_clocks();
-
 	master_prng = time(NULL);
 
 	while ((c = getopt(argc, argv,
-			   "Thqv2RsSHxGdc:n:r:w:W:a:t:b:p:I:")) != -1) {
+			   "ThqvsSdc:n:r:w:W:a:t:p:I:")) != -1) {
 		switch (c) {
 		case 'W':
 			if (master_workload >= 0) {
@@ -3413,52 +2270,15 @@ int main(int argc, char **argv)
 		case 'v':
 			verbose++;
 			break;
-		case 'x':
-			flags |= SWAPVCS;
-			break;
-		case '2':
-			flags |= VCS2REMAP;
-			break;
-		case 'R':
-			flags |= INITVCSRR;
-			break;
 		case 'S':
 			flags |= SYNCEDCLIENTS;
 			break;
 		case 's':
 			flags ^= SSEU;
 			break;
-		case 'H':
-			flags |= HEARTBEAT;
-			break;
-		case 'G':
-			flags |= GLOBAL_BALANCE;
-			break;
 		case 'd':
 			flags |= DEPSYNC;
 			break;
-		case 'b':
-			i = find_balancer_by_name(optarg);
-			if (i < 0) {
-				i = strtol(optarg, &endptr, 0);
-				if (endptr && *endptr)
-					i = -1;
-			}
-
-			if (i >= 0) {
-				balancer = find_balancer_by_id(i);
-				if (balancer) {
-					igt_assert(intel_gen(intel_get_drm_devid(fd)) >= balancer->min_gen);
-					flags |= BALANCE | balancer->flags;
-				}
-			}
-
-			if (!balancer) {
-				wsim_err("Unknown balancing mode '%s'!\n",
-					 optarg);
-				goto err;
-			}
-			break;
 		case 'I':
 			master_prng = strtol(optarg, NULL, 0);
 			break;
@@ -3470,16 +2290,6 @@ int main(int argc, char **argv)
 		}
 	}
 
-	if ((flags & HEARTBEAT) && !(flags & SEQNO)) {
-		wsim_err("Heartbeat needs a seqno based balancer!\n");
-		goto err;
-	}
-
-	if ((flags & VCS2REMAP) && (flags & I915)) {
-		wsim_err("VCS remapping not supported with i915 balancing!\n");
-		goto err;
-	}
-
 	if (!has_nop_calibration) {
 		if (verbose > 1) {
 			printf("Calibrating nop delays with %u%% tolerance...\n",
@@ -3519,11 +2329,6 @@ int main(int argc, char **argv)
 		goto err;
 	}
 
-	if ((flags & GLOBAL_BALANCE) && !balancer) {
-		wsim_err("Balancer not specified in global balancing mode!\n");
-		goto err;
-	}
-
 	if (append_workload_arg) {
 		append_workload_arg = load_workload_descriptor(append_workload_arg);
 		if (!append_workload_arg) {
@@ -3566,19 +2371,6 @@ int main(int argc, char **argv)
 		printf("Random seed is %u.\n", master_prng);
 		print_engine_calibrations();
 		printf("%u client%s.\n", clients, clients > 1 ? "s" : "");
-		if (flags & SWAPVCS)
-			printf("Swapping VCS rings between clients.\n");
-		if (flags & GLOBAL_BALANCE) {
-			if (flags & I915) {
-				printf("Ignoring global balancing with i915!\n");
-				flags &= ~GLOBAL_BALANCE;
-			} else {
-				printf("Using %s balancer in global mode.\n",
-				       balancer->name);
-			}
-		} else if (balancer) {
-			printf("Using %s balancer.\n", balancer->name);
-		}
 	}
 
 	srand(master_prng);
@@ -3591,41 +2383,18 @@ int main(int argc, char **argv)
 	igt_assert(w);
 
 	for (i = 0; i < clients; i++) {
-		unsigned int flags_ = flags;
-
 		w[i] = clone_workload(wrk[nr_w_args > 1 ? i : 0]);
 
-		if (flags & SWAPVCS && i & 1)
-			flags_ &= ~SWAPVCS;
-
-		if ((flags & GLOBAL_BALANCE) && !(flags & I915)) {
-			w[i]->balancer = &global_balancer;
-			w[i]->global_wrk = w[0];
-			w[i]->global_balancer = balancer;
-		} else {
-			w[i]->balancer = balancer;
-		}
-
 		w[i]->flags = flags;
 		w[i]->repeat = repeat;
 		w[i]->background = master_workload >= 0 && i != master_workload;
 		w[i]->print_stats = verbose > 1 ||
 				    (verbose > 0 && master_workload == i);
 
-		if (prepare_workload(i, w[i], flags_)) {
+		if (prepare_workload(i, w[i], flags)) {
 			wsim_err("Failed to prepare workload %u!\n", i);
 			goto err;
 		}
-
-
-		if (balancer && balancer->init) {
-			int ret = balancer->init(balancer, w[i]);
-			if (ret) {
-				wsim_err("Failed to initialize balancing! (%u=%d)\n",
-					 i, ret);
-				goto err;
-			}
-		}
 	}
 
 	clock_gettime(CLOCK_MONOTONIC, &t_start);
@@ -3670,6 +2439,5 @@ int main(int argc, char **argv)
 out:
 	exitcode = EXIT_SUCCESS;
 err:
-	intel_register_access_fini(&mmio_data);
 	return exitcode;
 }
diff --git a/benchmarks/ilog2.h b/benchmarks/ilog2.h
deleted file mode 100644
index 596d7c23e0d1..000000000000
--- a/benchmarks/ilog2.h
+++ /dev/null
@@ -1,104 +0,0 @@
-#ifndef ILOG2_H
-#define ILOG2_H
-
-#include <stdint.h>
-
-static inline int fls(int x)
-{
-        int r = -1;
-        asm("bsrl %1,%0" : "=r" (r) : "rm" (x), "0" (-1));
-        return r + 1;
-}
-
-static inline int fls64(__u64 x)
-{
-        int r = -1;
-        asm("bsrq %1,%q0" : "+r" (r) : "rm" (x));
-        return r + 1;
-}
-
-static inline __attribute__((const))
-int __ilog2_u32(uint32_t n)
-{
-	return fls(n) - 1;
-}
-
-static inline __attribute__((const))
-int __ilog2_u64(uint64_t n)
-{
-	return fls64(n) - 1;
-}
-
-#define ilog2(n)				\
-(						\
-	__builtin_constant_p(n) ? (		\
-		(n) < 2 ? 0 :			\
-		(n) & (1ULL << 63) ? 63 :	\
-		(n) & (1ULL << 62) ? 62 :	\
-		(n) & (1ULL << 61) ? 61 :	\
-		(n) & (1ULL << 60) ? 60 :	\
-		(n) & (1ULL << 59) ? 59 :	\
-		(n) & (1ULL << 58) ? 58 :	\
-		(n) & (1ULL << 57) ? 57 :	\
-		(n) & (1ULL << 56) ? 56 :	\
-		(n) & (1ULL << 55) ? 55 :	\
-		(n) & (1ULL << 54) ? 54 :	\
-		(n) & (1ULL << 53) ? 53 :	\
-		(n) & (1ULL << 52) ? 52 :	\
-		(n) & (1ULL << 51) ? 51 :	\
-		(n) & (1ULL << 50) ? 50 :	\
-		(n) & (1ULL << 49) ? 49 :	\
-		(n) & (1ULL << 48) ? 48 :	\
-		(n) & (1ULL << 47) ? 47 :	\
-		(n) & (1ULL << 46) ? 46 :	\
-		(n) & (1ULL << 45) ? 45 :	\
-		(n) & (1ULL << 44) ? 44 :	\
-		(n) & (1ULL << 43) ? 43 :	\
-		(n) & (1ULL << 42) ? 42 :	\
-		(n) & (1ULL << 41) ? 41 :	\
-		(n) & (1ULL << 40) ? 40 :	\
-		(n) & (1ULL << 39) ? 39 :	\
-		(n) & (1ULL << 38) ? 38 :	\
-		(n) & (1ULL << 37) ? 37 :	\
-		(n) & (1ULL << 36) ? 36 :	\
-		(n) & (1ULL << 35) ? 35 :	\
-		(n) & (1ULL << 34) ? 34 :	\
-		(n) & (1ULL << 33) ? 33 :	\
-		(n) & (1ULL << 32) ? 32 :	\
-		(n) & (1ULL << 31) ? 31 :	\
-		(n) & (1ULL << 30) ? 30 :	\
-		(n) & (1ULL << 29) ? 29 :	\
-		(n) & (1ULL << 28) ? 28 :	\
-		(n) & (1ULL << 27) ? 27 :	\
-		(n) & (1ULL << 26) ? 26 :	\
-		(n) & (1ULL << 25) ? 25 :	\
-		(n) & (1ULL << 24) ? 24 :	\
-		(n) & (1ULL << 23) ? 23 :	\
-		(n) & (1ULL << 22) ? 22 :	\
-		(n) & (1ULL << 21) ? 21 :	\
-		(n) & (1ULL << 20) ? 20 :	\
-		(n) & (1ULL << 19) ? 19 :	\
-		(n) & (1ULL << 18) ? 18 :	\
-		(n) & (1ULL << 17) ? 17 :	\
-		(n) & (1ULL << 16) ? 16 :	\
-		(n) & (1ULL << 15) ? 15 :	\
-		(n) & (1ULL << 14) ? 14 :	\
-		(n) & (1ULL << 13) ? 13 :	\
-		(n) & (1ULL << 12) ? 12 :	\
-		(n) & (1ULL << 11) ? 11 :	\
-		(n) & (1ULL << 10) ? 10 :	\
-		(n) & (1ULL <<  9) ?  9 :	\
-		(n) & (1ULL <<  8) ?  8 :	\
-		(n) & (1ULL <<  7) ?  7 :	\
-		(n) & (1ULL <<  6) ?  6 :	\
-		(n) & (1ULL <<  5) ?  5 :	\
-		(n) & (1ULL <<  4) ?  4 :	\
-		(n) & (1ULL <<  3) ?  3 :	\
-		(n) & (1ULL <<  2) ?  2 :	\
-		1 ) :				\
-	(sizeof(n) <= 4) ?			\
-	__ilog2_u32(n) :			\
-	__ilog2_u64(n)				\
- )
-
-#endif /* ILOG2_H */
diff --git a/benchmarks/meson.build b/benchmarks/meson.build
index ef93193b70dd..c70e1aac79c6 100644
--- a/benchmarks/meson.build
+++ b/benchmarks/meson.build
@@ -11,6 +11,7 @@ benchmark_progs = [
 	'gem_prw',
 	'gem_set_domain',
 	'gem_syslatency',
+	'gem_wsim',
 	'kms_vblank',
 	'prime_lookup',
 	'vgem_mmap',
@@ -34,8 +35,3 @@ foreach prog : benchmark_progs
 		   install_dir : benchmarksdir,
 		   dependencies : igt_deps)
 endforeach
-
-executable('gem_wsim', 'gem_wsim.c',
-	   install : true,
-	   install_dir : benchmarksdir,
-	   dependencies : igt_deps + [ lib_igt_perf ])
diff --git a/benchmarks/wsim/media-1080p-player.wsim b/benchmarks/wsim/media-1080p-player.wsim
index bcbb0cfd2ad3..c87e1aee4f5d 100644
--- a/benchmarks/wsim/media-1080p-player.wsim
+++ b/benchmarks/wsim/media-1080p-player.wsim
@@ -1,3 +1,5 @@
+M.1.VCS
+B.1
 1.VCS.5000-10000.0.0
 2.RCS.1000-2000.-1.0
 P.3.1
diff --git a/benchmarks/wsim/media_1n2_480p.wsim b/benchmarks/wsim/media_1n2_480p.wsim
index 11a4da6bfae8..3ce15ebc3d71 100644
--- a/benchmarks/wsim/media_1n2_480p.wsim
+++ b/benchmarks/wsim/media_1n2_480p.wsim
@@ -1,9 +1,15 @@
-1.VCS.12000-15000.0.0
+M.10.VCS
+B.10
+M.11.VCS
+B.11
+M.12.VCS
+B.12
+10.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.1000-1400.-1.0
 3.RCS.10000-12000.0.0
-3.VCS.2500-3500.-1.0
+11.VCS.2500-3500.-1.0
 4.RCS.1000-2200.-5.0
 5.RCS.1000-1400.-1.0
 5.RCS.10000-12000.0.0
-5.VCS.2500-3500.-1.1
+12.VCS.2500-3500.-1.1
diff --git a/benchmarks/wsim/media_1n2_asy.wsim b/benchmarks/wsim/media_1n2_asy.wsim
index 58c99ca1122c..f9943eb62e8a 100644
--- a/benchmarks/wsim/media_1n2_asy.wsim
+++ b/benchmarks/wsim/media_1n2_asy.wsim
@@ -1,9 +1,11 @@
-1.VCS.12000-15000.0.0
+M.10.VCS
+B.10
+10.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.1000-1400.-1.0
 3.RCS.10000-12000.0.0
-3.VCS.2500-3500.-1.0
+11.VCS.2500-3500.-1.0
 4.RCS.400-800.-5.0
 5.RCS.500-700.-1.0
 5.RCS.5000-6000.0.0
-5.VCS.1200-1500.-1.1
+12.VCS.1200-1500.-1.1
diff --git a/benchmarks/wsim/media_1n3_480p.wsim b/benchmarks/wsim/media_1n3_480p.wsim
index c724ab28a1f4..4f585fa8a8e0 100644
--- a/benchmarks/wsim/media_1n3_480p.wsim
+++ b/benchmarks/wsim/media_1n3_480p.wsim
@@ -1,13 +1,21 @@
-1.VCS.12000-15000.0.0
+M.10.VCS
+B.10
+M.11.VCS
+B.11
+M.12.VCS
+B.12
+M.13.VCS
+B.13
+10.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.1000-1400.-1.0
 3.RCS.10000-12000.0.0
-3.VCS.2500-3500.-1.0
+11.VCS.2500-3500.-1.0
 4.RCS.1000-2200.-5.0
 5.RCS.1000-1400.-1.0
 5.RCS.10000-12000.0.0
-5.VCS.2500-3500.-1.0
+12.VCS.2500-3500.-1.0
 6.RCS.1000-2200.-9.0
 7.RCS.1000-1400.-1.0
 7.RCS.10000-12000.0.0
-7.VCS.2500-3500.-1.1
+13.VCS.2500-3500.-1.1
diff --git a/benchmarks/wsim/media_1n3_asy.wsim b/benchmarks/wsim/media_1n3_asy.wsim
index c7588328e3f1..dce7789ec1d8 100644
--- a/benchmarks/wsim/media_1n3_asy.wsim
+++ b/benchmarks/wsim/media_1n3_asy.wsim
@@ -1,3 +1,11 @@
+M.10.VCS
+B.10
+M.11.VCS
+B.11
+M.12.VCS
+B.12
+M.13.VCS
+B.13
 1.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.1000-1400.-1.0
diff --git a/benchmarks/wsim/media_1n4_480p.wsim b/benchmarks/wsim/media_1n4_480p.wsim
index e67fefc3bf17..06fa9adef5eb 100644
--- a/benchmarks/wsim/media_1n4_480p.wsim
+++ b/benchmarks/wsim/media_1n4_480p.wsim
@@ -1,17 +1,27 @@
-1.VCS.12000-15000.0.0
+M.10.VCS
+B.10
+M.11.VCS
+B.11
+M.12.VCS
+B.12
+M.13.VCS
+B.13
+M.14.VCS
+B.14
+10.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.1000-1400.-1.0
 3.RCS.10000-12000.0.0
-3.VCS.2500-3500.-1.0
+11.VCS.2500-3500.-1.0
 4.RCS.1000-2200.-5.0
 5.RCS.1000-1400.-1.0
 5.RCS.10000-12000.0.0
-5.VCS.2500-3500.-1.0
+12.VCS.2500-3500.-1.0
 6.RCS.1000-2200.-9.0
 7.RCS.1000-1400.-1.0
 7.RCS.10000-12000.0.0
-7.VCS.2500-3500.-1.0
+13.VCS.2500-3500.-1.0
 8.RCS.1000-2200.-13.0
 9.RCS.1000-1400.-1.0
 9.RCS.10000-12000.0.0
-9.VCS.2500-3500.-1.1
+14.VCS.2500-3500.-1.1
diff --git a/benchmarks/wsim/media_1n4_asy.wsim b/benchmarks/wsim/media_1n4_asy.wsim
index ede4fd7a2205..6dc6b652e903 100644
--- a/benchmarks/wsim/media_1n4_asy.wsim
+++ b/benchmarks/wsim/media_1n4_asy.wsim
@@ -1,3 +1,13 @@
+M.10.VCS
+B.10
+M.11.VCS
+B.11
+M.12.VCS
+B.12
+M.13.VCS
+B.13
+M.14.VCS
+B.14
 1.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.1000-1400.-1.0
diff --git a/benchmarks/wsim/media_1n5_480p.wsim b/benchmarks/wsim/media_1n5_480p.wsim
index 9e43b9845430..3467a386887a 100644
--- a/benchmarks/wsim/media_1n5_480p.wsim
+++ b/benchmarks/wsim/media_1n5_480p.wsim
@@ -1,21 +1,33 @@
-1.VCS.12000-15000.0.0
+M.10.VCS
+B.10
+M.11.VCS
+B.11
+M.12.VCS
+B.12
+M.13.VCS
+B.13
+M.14.VCS
+B.14
+M.15.VCS
+B.15
+10.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.1000-1400.-1.0
 3.RCS.10000-12000.0.0
-3.VCS.2500-3500.-1.0
+11.VCS.2500-3500.-1.0
 4.RCS.1000-2200.-5.0
 5.RCS.1000-1400.-1.0
 5.RCS.10000-12000.0.0
-5.VCS.2500-3500.-1.0
+12.VCS.2500-3500.-1.0
 6.RCS.1000-2200.-9.0
 7.RCS.1000-1400.-1.0
 7.RCS.10000-12000.0.0
-7.VCS.2500-3500.-1.0
+13.VCS.2500-3500.-1.0
 8.RCS.1000-2200.-13.0
 9.RCS.1000-1400.-1.0
 9.RCS.10000-12000.0.0
-9.VCS.2500-3500.-1.0
+14.VCS.2500-3500.-1.0
 10.RCS.1000-2200.-17.0
 11.RCS.1000-1400.-1.0
 11.RCS.10000-12000.0.0
-11.VCS.2500-3500.-1.1
+15.VCS.2500-3500.-1.1
diff --git a/benchmarks/wsim/media_1n5_asy.wsim b/benchmarks/wsim/media_1n5_asy.wsim
index 78bb4a86dbca..4b205457a8d4 100644
--- a/benchmarks/wsim/media_1n5_asy.wsim
+++ b/benchmarks/wsim/media_1n5_asy.wsim
@@ -1,3 +1,15 @@
+M.10.VCS
+B.10
+M.11.VCS
+B.11
+M.12.VCS
+B.12
+M.13.VCS
+B.13
+M.14.VCS
+B.14
+M.15.VCS
+B.15
 1.VCS.12000-15000.0.0
 2.RCS.2000-3000.-1.0
 3.RCS.500-900.-1.0
diff --git a/benchmarks/wsim/media_load_balance_17i7.wsim b/benchmarks/wsim/media_load_balance_17i7.wsim
index 0830a3231ea9..bcb1ab2f04fa 100644
--- a/benchmarks/wsim/media_load_balance_17i7.wsim
+++ b/benchmarks/wsim/media_load_balance_17i7.wsim
@@ -1,7 +1,9 @@
+M.1.VCS
+B.1
 1.VCS.2800-3200.0.1
-1.RCS.900-1100.-1.0
-1.RCS.3600-3800.0.0
-1.RCS.900-1100.-2.0
+2.RCS.900-1100.-1.0
+2.RCS.3600-3800.0.0
+2.RCS.900-1100.-2.0
 1.VCS.2200-2400.-2.0
-1.RCS.4500-4900.-1.0
+2.RCS.4500-4900.-1.0
 1.VCS.500-700.-1.1
diff --git a/benchmarks/wsim/media_load_balance_19.wsim b/benchmarks/wsim/media_load_balance_19.wsim
index 03890776fda3..88cd34fb6898 100644
--- a/benchmarks/wsim/media_load_balance_19.wsim
+++ b/benchmarks/wsim/media_load_balance_19.wsim
@@ -1,3 +1,5 @@
+M.1.VCS
+B.1
 0.VECS.1400-1500.0.0
 0.RCS.1000-1500.-1.0
 s.-2
@@ -5,6 +7,6 @@ s.-2
 1.VCS.1300-1400.0.1
 0.VECS.1400-1500.0.0
 0.RCS.100-300.-1.1
-1.RCS.1300-1500.0.0
+2.RCS.1300-1500.-3.0
 1.VCS.100-300.-1.1
 1.VCS.900-1400.0.1
diff --git a/benchmarks/wsim/media_load_balance_4k12u7.wsim b/benchmarks/wsim/media_load_balance_4k12u7.wsim
index ff10425b6bec..a417bb18e121 100644
--- a/benchmarks/wsim/media_load_balance_4k12u7.wsim
+++ b/benchmarks/wsim/media_load_balance_4k12u7.wsim
@@ -1,3 +1,5 @@
+M.1.VCS
+B.1
 1.VCS.4000-6000.0.0
 2.RCS.400-800.-1.0
 3.RCS.1900-2200.-1.0
diff --git a/benchmarks/wsim/media_load_balance_fhd26u7.wsim b/benchmarks/wsim/media_load_balance_fhd26u7.wsim
index 56114ddc48c2..4c8225e1fe13 100644
--- a/benchmarks/wsim/media_load_balance_fhd26u7.wsim
+++ b/benchmarks/wsim/media_load_balance_fhd26u7.wsim
@@ -1,25 +1,27 @@
+M.3.VCS
+B.3
 1.VCS1.1200-1800.0.0
 1.VCS1.1900-2100.0.0
 2.RCS.1500-2000.-1.0
-2.VCS.1400-1800.-1.1
+3.VCS.1400-1800.-1.1
 1.VCS1.1900-2100.-1.0
 2.RCS.1500-2000.-1.0
-2.VCS.1400-1800.-1.1
+3.VCS.1400-1800.-1.1
 1.VCS1.1900-2100.-1.0
 2.RCS.200-400.-1.0
 2.RCS.1500-2000.0.0
-2.VCS.1400-1800.-1.1
+3.VCS.1400-1800.-1.1
 1.VCS1.1900-2100.-1.0
 2.RCS.1500-2000.-1.0
-2.VCS.1400-1800.-1.1
+3.VCS.1400-1800.-1.1
 1.VCS1.1900-2100.-1.0
 2.RCS.200-400.-1.0
 2.RCS.1500-2000.0.0
-2.VCS.1400-1800.-1.1
+3.VCS.1400-1800.-1.1
 1.VCS1.1900-2100.-1.0
 2.RCS.1500-2000.-1.0
-2.VCS.1400-1800.-1.1
+3.VCS.1400-1800.-1.1
 1.VCS1.1900-2100.-1.0
 2.RCS.1500-2000.-1.0
 2.RCS.1500-2000.0.0
-2.VCS.1400-1800.-1.1
+3.VCS.1400-1800.-1.1
diff --git a/benchmarks/wsim/media_load_balance_hd01.wsim b/benchmarks/wsim/media_load_balance_hd01.wsim
index 862931521c90..8e7e9d90e435 100644
--- a/benchmarks/wsim/media_load_balance_hd01.wsim
+++ b/benchmarks/wsim/media_load_balance_hd01.wsim
@@ -1,23 +1,27 @@
+M.1.VCS
+B.1
+M.3.VCS
+B.3
 1.VCS.1400-1900.0.0
-1.RCS.1200-1600.-1.0
-1.RCS.1000-1400.-1.0
-2.VCS.800-1000.-1.0
+2.RCS.1200-1600.-1.0
+2.RCS.1000-1400.-1.0
+3.VCS.800-1000.-1.0
 1.VCS.1400-1900.-4.0
-1.RCS.1200-1600.-1.0
-1.RCS.1000-1400.-1.0
-2.VCS.800-1000.-1.0
+2.RCS.1200-1600.-1.0
+2.RCS.1000-1400.-1.0
+3.VCS.800-1000.-1.0
 1.VCS.1400-1900.-4.0
-1.RCS.1200-1600.-1.0
-1.RCS.1000-1400.-1.0
-2.VCS.800-1000.-1.0
+2.RCS.1200-1600.-1.0
+2.RCS.1000-1400.-1.0
+3.VCS.800-1000.-1.0
 1.VCS.1400-1900.-4.0
-1.RCS.1200-1600.-1.0
-1.RCS.1000-1400.-1.0
-2.VCS.800-1000.-1.0
+2.RCS.1200-1600.-1.0
+2.RCS.1000-1400.-1.0
+3.VCS.800-1000.-1.0
 1.VCS.1400-1900.-4.0
-1.RCS.1200-1600.-1.0
-1.RCS.1000-1400.-1.0
-2.VCS.800-1000.-1.0
+2.RCS.1200-1600.-1.0
+2.RCS.1000-1400.-1.0
+3.VCS.800-1000.-1.0
 s.-17
 s.-14
 s.-11
diff --git a/benchmarks/wsim/media_load_balance_hd06mp2.wsim b/benchmarks/wsim/media_load_balance_hd06mp2.wsim
index 1e1fc003c755..cfe985019a7b 100644
--- a/benchmarks/wsim/media_load_balance_hd06mp2.wsim
+++ b/benchmarks/wsim/media_load_balance_hd06mp2.wsim
@@ -1,4 +1,8 @@
+M.1.VCS
+B.1
+M.4.VCS
+B.4
 1.VCS.900-1700.0.0
 2.RCS.100-400.-1.0
 3.RCS.800-900.-1.0
-3.VCS.100-200.-1.1
+4.VCS.100-200.-1.1
diff --git a/benchmarks/wsim/media_load_balance_hd12.wsim b/benchmarks/wsim/media_load_balance_hd12.wsim
index 8f3b41ca5ab6..684e6b511762 100644
--- a/benchmarks/wsim/media_load_balance_hd12.wsim
+++ b/benchmarks/wsim/media_load_balance_hd12.wsim
@@ -1,4 +1,8 @@
+M.1.VCS
+B.1
+M.4.VCS
+B.4
 1.VCS.850-1300.0.0
 2.RCS.50-250.-1.0
 3.RCS.400-800.-1.0
-3.VCS.100-200.-1.1
+4.VCS.100-200.-1.1
diff --git a/benchmarks/wsim/media_load_balance_hd17i4.wsim b/benchmarks/wsim/media_load_balance_hd17i4.wsim
index b6195b605bf7..1430f18df033 100644
--- a/benchmarks/wsim/media_load_balance_hd17i4.wsim
+++ b/benchmarks/wsim/media_load_balance_hd17i4.wsim
@@ -1,7 +1,11 @@
+M.1.VCS
+B.1
+M.3.VCS
+B.3
 1.VCS.900-1400.0.0
 2.RCS.200-300.-1.0
 2.RCS.1000-2000.0.0
 2.RCS.1000-2000.0.0
-2.VCS.800-1000.-1.0
-1.RCS.2800-3100.-1.0
+3.VCS.800-1000.-1.0
+4.RCS.2800-3100.-1.0
 1.VCS.800-1000.-1.1
diff --git a/benchmarks/wsim/media_mfe2_480p.wsim b/benchmarks/wsim/media_mfe2_480p.wsim
index 18bc756f1b55..00ef5c3a7574 100644
--- a/benchmarks/wsim/media_mfe2_480p.wsim
+++ b/benchmarks/wsim/media_mfe2_480p.wsim
@@ -1,3 +1,11 @@
+M.1.VCS
+B.1
+M.4.VCS
+B.4
+M.7.VCS
+B.7
+M.8.VCS
+B.8
 1.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.800-1600.-1.0
@@ -5,5 +13,5 @@
 5.RCS.1000-2200.-1.0
 6.RCS.800-1600.-1.0
 6.RCS.10000-12000.-4.0
-6.VCS.2500-3500.-1.0
-3.VCS.2500-3500.-2.1
+7.VCS.2500-3500.-1.0
+8.VCS.2500-3500.-2.1
diff --git a/benchmarks/wsim/media_mfe3_480p.wsim b/benchmarks/wsim/media_mfe3_480p.wsim
index e12a2e6ac29d..3ac4db0eb8ec 100644
--- a/benchmarks/wsim/media_mfe3_480p.wsim
+++ b/benchmarks/wsim/media_mfe3_480p.wsim
@@ -1,3 +1,15 @@
+M.1.VCS
+B.1
+M.4.VCS
+B.4
+M.7.VCS
+B.7
+M.10.VCS
+B.10
+M.11.VCS
+B.11
+M.12.VCS
+B.12
 1.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.800-1600.-1.0
@@ -8,6 +20,6 @@
 8.RCS.1000-2200.-1.0
 9.RCS.800-1600.-1.0
 9.RCS.10000-12000.-7/-4.0
-9.VCS.2500-3500.-1.0
-3.VCS.2500-3500.-2.0
-6.VCS.2500-3500.-3.1
+10.VCS.2500-3500.-1.0
+11.VCS.2500-3500.-2.0
+12.VCS.2500-3500.-3.1
diff --git a/benchmarks/wsim/media_mfe4_480p.wsim b/benchmarks/wsim/media_mfe4_480p.wsim
index 75d4f67ea4fb..7f6831569908 100644
--- a/benchmarks/wsim/media_mfe4_480p.wsim
+++ b/benchmarks/wsim/media_mfe4_480p.wsim
@@ -1,3 +1,19 @@
+M.1.VCS
+B.1
+M.4.VCS
+B.4
+M.7.VCS
+B.7
+M.10.VCS
+B.10
+M.13.VCS
+B.13
+M.14.VCS
+B.14
+M.15.VCS
+B.15
+M.16.VCS
+B.16
 1.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.800-1600.-1.0
@@ -11,7 +27,7 @@
 11.RCS.1000-2200.-1.0
 12.RCS.800-1600.-1.0
 12.RCS.10000-12000.-4/-7/-10.0
-12.VCS.2500-3500.-1.0
-3.VCS.2500-3500.-2.0
-6.VCS.2500-3500.-3.0
-9.VCS.2500-3500.-4.1
+13.VCS.2500-3500.-1.0
+14.VCS.2500-3500.-2.0
+15.VCS.2500-3500.-3.0
+16.VCS.2500-3500.-4.1
diff --git a/benchmarks/wsim/media_nn_1080p.wsim b/benchmarks/wsim/media_nn_1080p.wsim
index f9a3ca1b9963..88c5c772202c 100644
--- a/benchmarks/wsim/media_nn_1080p.wsim
+++ b/benchmarks/wsim/media_nn_1080p.wsim
@@ -1,3 +1,7 @@
+M.1.VCS
+B.1
+M.3.VCS
+B.3
 1.VCS.13000-17000.0.0
 2.RCS.2000-4000.-1.0
 3.RCS.3000-5000.-1.0
diff --git a/benchmarks/wsim/media_nn_1080p_s1.wsim b/benchmarks/wsim/media_nn_1080p_s1.wsim
index 4fa6ca653000..5b47d2a3c7ec 100644
--- a/benchmarks/wsim/media_nn_1080p_s1.wsim
+++ b/benchmarks/wsim/media_nn_1080p_s1.wsim
@@ -1,3 +1,5 @@
+M.4.VCS
+B.4
 f
 1.VCS1.6500-8000.f-1.0
 1.VCS2.6500-8000.f-2.0
@@ -5,4 +7,4 @@ a.-3
 2.RCS.2000-4000.-2/-3.0
 3.RCS.3000-5000.-1.0
 3.RCS.23000-27000.0.0
-3.VCS.16000-20000.-1.1
+4.VCS.16000-20000.-1.1
diff --git a/benchmarks/wsim/media_nn_1080p_s2.wsim b/benchmarks/wsim/media_nn_1080p_s2.wsim
index 68f0acdfb842..e3678b396b42 100644
--- a/benchmarks/wsim/media_nn_1080p_s2.wsim
+++ b/benchmarks/wsim/media_nn_1080p_s2.wsim
@@ -1,3 +1,5 @@
+M.1.VCS
+B.1
 1.VCS.13000-17000.0.0
 2.RCS.2000-4000.-1.0
 3.RCS.3000-5000.-1.0
diff --git a/benchmarks/wsim/media_nn_1080p_s3.wsim b/benchmarks/wsim/media_nn_1080p_s3.wsim
index 12368da83dca..ee3b675de9e5 100644
--- a/benchmarks/wsim/media_nn_1080p_s3.wsim
+++ b/benchmarks/wsim/media_nn_1080p_s3.wsim
@@ -1,3 +1,5 @@
+M.1.VCS
+B.1
 1.VCS.13000-17000.0.0
 2.RCS.2000-4000.-1.0
 3.RCS.3000-5000.-1.0
diff --git a/benchmarks/wsim/media_nn_480p.wsim b/benchmarks/wsim/media_nn_480p.wsim
index ab64a4569d71..73fc643dc9e5 100644
--- a/benchmarks/wsim/media_nn_480p.wsim
+++ b/benchmarks/wsim/media_nn_480p.wsim
@@ -1,3 +1,7 @@
+M.1.VCS
+B.1
+M.3.VCS
+B.3
 1.VCS.12000-15000.0.0
 2.RCS.1000-2200.-1.0
 3.RCS.1000-1400.-1.0
diff --git a/benchmarks/wsim/vcs_balanced.wsim b/benchmarks/wsim/vcs_balanced.wsim
index e8958b8f7f43..78d953fb7551 100644
--- a/benchmarks/wsim/vcs_balanced.wsim
+++ b/benchmarks/wsim/vcs_balanced.wsim
@@ -1,26 +1,28 @@
 q.5
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
-0.VCS.500-2000.0.0
+M.1.VCS
+B.1
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
+1.VCS.500-2000.0.0
diff --git a/scripts/media-bench.pl b/scripts/media-bench.pl
deleted file mode 100755
index 1cd8205ff07c..000000000000
--- a/scripts/media-bench.pl
+++ /dev/null
@@ -1,736 +0,0 @@
-#! /usr/bin/perl
-#
-# Copyright © 2017 Intel Corporation
-#
-# Permission is hereby granted, free of charge, to any person obtaining a
-# copy of this software and associated documentation files (the "Software"),
-# to deal in the Software without restriction, including without limitation
-# the rights to use, copy, modify, merge, publish, distribute, sublicense,
-# and/or sell copies of the Software, and to permit persons to whom the
-# Software is furnished to do so, subject to the following conditions:
-#
-# The above copyright notice and this permission notice (including the next
-# paragraph) shall be included in all copies or substantial portions of the
-# Software.
-#
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
-# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
-# IN THE SOFTWARE.
-#
-
-use strict;
-use warnings;
-use 5.010;
-
-use Getopt::Std;
-
-chomp(my $igt_root = `pwd -P`);
-my $wsim = "$igt_root/benchmarks/gem_wsim";
-my $wrk_root = "$igt_root/benchmarks/wsim";
-my $tracepl = "$igt_root/scripts/trace.pl";
-my $tolerance = 0.01;
-my $client_target_s = 10;
-my $idle_tolerance_pct = 2.0;
-my $verbose = 0;
-my $gt2 = 0;
-my $show_cmds = 0;
-my $realtime_target = 0;
-my $wps_target = 0;
-my $wps_target_param = 0;
-my $multi_mode = 0;
-my @multi_workloads;
-my $w_direct;
-my $balancer;
-my $nop;
-my %opts;
-
-my @balancers = ( 'rr', 'rand', 'qd', 'qdr', 'qdavg', 'rt', 'rtr', 'rtavg',
-		  'context', 'busy', 'busy-avg', 'i915' );
-my %bal_skip_H = ( 'rr' => 1, 'rand' => 1, 'context' => 1, , 'busy' => 1,
-		   'busy-avg' => 1, 'i915' => 1 );
-my %bal_skip_R = ( 'i915' => 1 );
-my %bal_skip_G = ( 'i915' => 1 );
-
-my @workloads = (
-	'media_load_balance_17i7.wsim',
-	'media_load_balance_19.wsim',
-	'media_load_balance_4k12u7.wsim',
-	'media_load_balance_fhd26u7.wsim',
-	'media_load_balance_hd01.wsim',
-	'media_load_balance_hd06mp2.wsim',
-	'media_load_balance_hd12.wsim',
-	'media_load_balance_hd17i4.wsim',
-	'media_1n2_480p.wsim',
-	'media_1n3_480p.wsim',
-	'media_1n4_480p.wsim',
-	'media_1n5_480p.wsim',
-	'media_1n2_asy.wsim',
-	'media_1n3_asy.wsim',
-	'media_1n4_asy.wsim',
-	'media_1n5_asy.wsim',
-	'media_mfe2_480p.wsim',
-	'media_mfe3_480p.wsim',
-	'media_mfe4_480p.wsim',
-	'media_nn_1080p.wsim',
-	'media_nn_480p.wsim',
-    );
-
-sub show_cmd
-{
-	my ($cmd) = @_;
-
-	say "\n+++ $cmd" if $show_cmds;
-}
-
-sub calibrate_nop
-{
-	my ($delay, $nop);
-	my $cmd = "$wsim";
-
-	show_cmd($cmd);
-	open WSIM, "$cmd |" or die;
-	while (<WSIM>) {
-		chomp;
-		if (/Nop calibration for (\d+)us delay is (\d+)./) {
-			$delay = $1;
-			$nop = $2;
-		}
-
-	}
-	close WSIM;
-
-	die unless $nop;
-
-	return $nop
-}
-
-sub can_balance_workload
-{
-	my ($wrk) = @_;
-	my $res = 0;
-
-	open WRK, "$wrk_root/$wrk" or die;
-	while (<WRK>) {
-		chomp;
-		if (/\.VCS\./) {
-			$res = 1;
-			last;
-		}
-	}
-	close WRK;
-
-	return $res;
-}
-
-sub add_wps_arg
-{
-	my (@args) = @_;
-	my $period;
-
-	return @args if $realtime_target <= 0;
-
-	$period = int(1000000 / $realtime_target);
-	push @args, '-a';
-	push @args, 'p.$period';
-
-	return @args;
-}
-
-sub run_workload
-{
-	my (@args) = @_;
-	my ($time, $wps, $cmd);
-	my @ret;
-
-	@args = add_wps_arg(@args);
-	push @args, '-2' if $gt2;
-
-	unshift @args, $wsim;
-	$cmd = join ' ', @args;
-	show_cmd($cmd);
-
-	open WSIM, "$cmd |" or die;
-	while (<WSIM>) {
-		chomp;
-		if (/^(\d+\.\d+)s elapsed \((\d+\.?\d+) workloads\/s\)$/) {
-			$time = $1;
-			$wps = $2;
-		} elsif (/(\d+)\: \d+\.\d+s elapsed \(\d+ cycles, (\d+\.?\d+) workloads\/s\)/) {
-			$ret[$1] = $2;
-		}
-	}
-	close WSIM;
-
-	return ($time, $wps, \@ret);
-}
-
-sub dump_cmd
-{
-	my ($cmd, $file) = @_;
-
-	show_cmd("$cmd > $file");
-
-	open FOUT, '>', $file or die;
-	open TIN, "$cmd |" or die;
-	while (<TIN>) {
-		print FOUT $_;
-	}
-	close TIN;
-	close FOUT;
-}
-
-sub trace_workload
-{
-	my ($wrk, $b, $r, $c) = @_;
-	my @args = ($tracepl, '--trace', $wsim, '-q', '-n', $nop, '-r', $r, '-c', $c);
-	my $min_batches = 16 + $r * $c / 2;
-	my @skip_engine;
-	my %engines;
-	my ($cmd, $file);
-
-	push @args, '-2' if $gt2;
-
-	unless ($b eq '<none>') {
-		push @args, '-R';
-		push @args, split /\s+/, $b;
-	}
-
-	if (defined $w_direct) {
-		push @args, split /\s+/, $wrk;
-	} else {
-		push @args, '-w';
-		push @args, $wrk_root . '/' . $wrk;
-	}
-
-	show_cmd(join ' ', @args);
-	if (-e 'perf.data') {
-		unlink 'perf.data' or die;
-	}
-	system(@args) == 0 or die;
-
-	$cmd = "perf script | $tracepl";
-	show_cmd($cmd);
-	open CMD, "$cmd |" or die;
-	while (<CMD>) {
-		chomp;
-		if (/Ring(\S+): (\d+) batches.*?(\d+\.?\d+)% idle,/) {
-			if ($2 >= $min_batches) {
-				$engines{$1} = $3;
-			} else {
-				push @skip_engine, $1;
-			}
-		} elsif (/GPU: (\d+\.?\d+)% idle/) {
-			$engines{'gpu'} = $1;
-		}
-	}
-	close CMD;
-
-	$wrk =~ s/$wrk_root//g;
-	$wrk =~ s/\.wsim//g;
-	$wrk =~ s/-w/W/g;
-	$wrk =~ s/[ -]/_/g;
-	$wrk =~ s/\//-/g;
-	$b =~ s/[ <>]/_/g;
-	$file = "${wrk}_${b}_-r${r}_-c${c}";
-
-	dump_cmd('perf script', "${file}.trace");
-
-	$cmd = "perf script | $tracepl --html -x ctxsave -s -c ";
-	$cmd .= join ' ', map("-i $_", @skip_engine);
-
-	dump_cmd($cmd, "${file}.html");
-
-	return \%engines;
-}
-
-sub calibrate_workload
-{
-	my ($wrk) = @_;
-	my $tol = $tolerance;
-	my $loops = 0;
-	my $error;
-	my $r;
-
-	$r = $realtime_target > 0 ? $realtime_target * $client_target_s : 23;
-	for (;;) {
-		my @args = ('-n', $nop, '-r', $r);
-		my ($time, $wps);
-
-		if (defined $w_direct) {
-			push @args, split /\s+/, $wrk;
-		} else {
-			push @args, '-w';
-			push @args, $wrk_root . '/' . $wrk;
-		}
-
-		($time, $wps) = run_workload(@args);
-
-		$wps = $r / $time if $w_direct;
-		$error = abs($time - $client_target_s) / $client_target_s;
-
-		last if $error <= $tol;
-
-		$r = int($wps * $client_target_s);
-		$loops = $loops + 1;
-		if ($loops >= 3) {
-			$tol = $tol * (1.2 + ($tol));
-			$loops = 0;
-		}
-		last if $tol > 0.2;
-	}
-
-	return ($r, $error);
-}
-
-sub find_saturation_point
-{
-	my ($wrk, $rr, $verbose, @args) = @_;
-	my ($last_wps, $c, $swps, $wwps);
-	my $target = $realtime_target > 0 ? $realtime_target : $wps_target;
-	my $r = $rr;
-	my $wcnt;
-	my $maxc;
-	my $max = 0;
-
-	push @args, '-v' if $multi_mode and $w_direct;
-
-	if (defined $w_direct) {
-		push @args, split /\s+/, $wrk;
-		$wcnt = () = $wrk =~ /-[wW]/gi;
-
-	} else {
-		push @args, '-w';
-		push @args, $wrk_root . '/' . $wrk;
-		$wcnt = 1;
-	}
-
-	for ($c = 1; ; $c = $c + 1) {
-		my ($time, $wps);
-		my @args_ = (@args, ('-r', $r, '-c', $c));
-
-		($time, $wps, $wwps) = run_workload(@args_);
-
-		say "        $c clients is $wps wps." if $verbose;
-
-		if ($c > 1) {
-			my $delta;
-
-			if ($target <= 0) {
-				if ($wps > $max) {
-					$max = $wps;
-					$maxc = $c;
-				}
-				$delta = ($wps - $last_wps) / $last_wps;
-				if ($delta > 0) {
-					last if $delta < $tolerance;
-				} else {
-					$delta = ($wps - $max) / $max;
-					last if abs($delta) >= $tolerance;
-				}
-			} else {
-				$delta = ($wps / $c - $target) / $target;
-				last if $delta < 0 and abs($delta) >= $tolerance;
-			}
-			$r = int($rr * ($client_target_s / $time));
-		} elsif ($c == 1) {
-			$swps = $wps;
-			return ($c, $wps, $swps, $wwps) if $wcnt > 1 or
-							   $multi_mode or
-							   ($wps_target_param < 0 and
-							    $wps_target == 0);
-		}
-
-		$last_wps = $wps;
-	}
-
-	if ($target <= 0) {
-		return ($maxc, $max, $swps, $wwps);
-	} else {
-		return ($c - 1, $last_wps, $swps, $wwps);
-	}
-}
-
-getopts('hv2xmn:b:W:B:r:t:i:R:T:w:', \%opts);
-
-if (defined $opts{'h'}) {
-	print <<ENDHELP;
-Supported options:
-
-  -h          Help text.
-  -v          Be verbose.
-  -x          Show external commands.
-  -2          Run gem_wsim in GT2 mode.
-  -n num      Nop calibration.
-  -b str      Balancer to pre-select.
-              Skips balancer auto-selection.
-              Passed straight the gem_wsim so use like -b "-b qd -R"
-  -W a,b,c    Override the default list of workloads.
-  -B a,b,c    Override the default list of balancers.
-  -r sec      Target workload duration.
-  -t pct      Calibration tolerance.
-  -i pct      Engine idleness tolerance.
-  -R wps      Run workloads in the real-time mode at wps rate.
-  -T wps      Calibrate up to wps/client target instead of GPU saturation.
-              Negative values set the target based on the single client
-              performance where target = single-client-wps / -N.
-  -w str      Pass-through to gem_wsim. Overrides normal workload selection.
-  -m          Multi-workload mode. All selected workloads will be run in
-              parallel and overal score will be relative to when run
-              individually.
-ENDHELP
-	exit 0;
-}
-
-$verbose = 1 if defined $opts{'v'};
-$gt2 = 1 if defined $opts{'2'};
-$show_cmds = 1 if defined $opts{'x'};
-$multi_mode = 1 if defined $opts{'m'};
-if (defined $opts{'b'}) {
-	die unless substr($opts{'b'}, 0, 2) eq '-b';
-	$balancer = $opts{'b'};
-}
-if (defined $opts{'B'}) {
-	@balancers = split /,/, $opts{'B'};
-} else {
-	unshift @balancers, '';
-}
-@workloads = split /,/, $opts{'W'} if defined $opts{'W'};
-$client_target_s = $opts{'r'} if defined $opts{'r'};
-$tolerance = $opts{'t'} / 100.0 if defined $opts{'t'};
-$idle_tolerance_pct = $opts{'i'} if defined $opts{'i'};
-$realtime_target = $opts{'R'} if defined $opts{'R'};
-$wps_target = $opts{'T'} if defined $opts{'T'};
-$wps_target_param = $wps_target;
-$w_direct = $opts{'w'} if defined $opts{'w'};
-
-if ($multi_mode) {
-	die if $w_direct; # Not supported
-	@multi_workloads = @workloads;
-}
-
-@workloads = ($w_direct) if defined $w_direct;
-
-say "Workloads:";
-print map { "  $_\n" } @workloads;
-print "Balancers: ";
-say map { "$_," } @balancers;
-say "Target workload duration is ${client_target_s}s.";
-say "Calibration tolerance is $tolerance.";
-say "Real-time mode at ${realtime_target} wps." if $realtime_target > 0;
-say "Wps target is ${wps_target} wps." if $wps_target > 0;
-say "Multi-workload mode." if $multi_mode;
-$nop = $opts{'n'};
-$nop = calibrate_nop() unless $nop;
-say "Nop calibration is $nop.";
-
-goto VERIFY if defined $balancer;
-
-my (%best_bal, %best_bid);
-my %results;
-my %scores;
-my %wscores;
-my %cscores;
-my %cwscores;
-my %mscores;
-my %mwscores;
-
-sub add_points
-{
-	my ($wps, $scores, $wscores) = @_;
-	my ($min, $max, $spread);
-	my @sorted;
-
-	@sorted = sort { $b <=> $a } values %{$wps};
-	$max = $sorted[0];
-	$min = $sorted[-1];
-	$spread = $max - $min;
-	die if $spread < 0;
-
-	foreach my $w (keys %{$wps}) {
-		my ($score, $wscore);
-
-		unless (exists $scores->{$w}) {
-			$scores->{$w} = 0;
-			$wscores->{$w} = 0;
-		}
-
-		$score = $wps->{$w} / $max;
-		$scores->{$w} = $scores->{$w} + $score;
-		$wscore = $score * $spread / $max;
-		$wscores->{$w} = $wscores->{$w} + $wscore;
-	}
-}
-
-my @saturation_workloads = $multi_mode ? @multi_workloads : @workloads;
-my %allwps;
-my $widx = 0;
-
-push @saturation_workloads, '-w ' . join ' -w ', map("$wrk_root/$_", @workloads)
-     if $multi_mode;
-
-foreach my $wrk (@saturation_workloads) {
-	my @args = ( "-n $nop");
-	my ($r, $error, $should_b, $best);
-	my (%wps, %cwps, %mwps);
-	my @sorted;
-	my $range;
-
-	$w_direct = $wrk if $multi_mode and $widx == $#saturation_workloads;
-
-	$should_b = 1;
-	$should_b = can_balance_workload($wrk) unless defined $w_direct;
-
-	print "\nEvaluating '$wrk'...";
-
-	($r, $error) = calibrate_workload($wrk);
-	say " ${client_target_s}s is $r workloads. (error=$error)";
-
-	say "  Finding saturation points for '$wrk'...";
-
-	BAL: foreach my $bal (@balancers) {
-		GBAL: foreach my $G ('', '-G', '-d', '-G -d') {
-			foreach my $H ('', '-H') {
-				my @xargs;
-				my ($w, $c, $s, $bwwps);
-				my $bid;
-
-				if ($bal ne '') {
-					next GBAL if $G =~ '-G' and exists $bal_skip_G{$bal};
-
-					push @xargs, "-b $bal";
-					push @xargs, '-R' unless exists $bal_skip_R{$bal};
-					push @xargs, $G if $G ne '';
-					push @xargs, $H if $H ne '';
-					$bid = join ' ', @xargs;
-					print "    $bal balancer ('$bid'): ";
-				} else {
-					$bid = '<none>';
-					print "    No balancing: ";
-				}
-
-				$wps_target = 0 if $wps_target_param < 0;
-
-				($c, $w, $s, $bwwps) =
-					find_saturation_point($wrk, $r, 0,
-							      (@args, @xargs));
-
-				if ($wps_target_param < 0) {
-					$wps_target = $s / -$wps_target_param;
-
-					($c, $w, $s, $bwwps) =
-						find_saturation_point($wrk, $r,
-								      0,
-								      (@args,
-								       @xargs));
-				}
-
-				if ($multi_mode and $w_direct) {
-					my $widx;
-
-					die unless scalar(@multi_workloads) ==
-						   scalar(@{$bwwps});
-					die unless scalar(@multi_workloads) ==
-						   scalar(keys %allwps);
-
-					# Total of all workload wps from the
-					# mixed run.
-					$w = 0;
-					foreach $widx (0..$#{$bwwps}) {
-						$w += $bwwps->[$widx];
-					}
-
-					# Total of all workload wps from when
-					# ran individually with the best
-					# balancer.
-					my $tot = 0;
-					foreach my $wrk (@multi_workloads) {
-						$tot += $allwps{$wrk}->{$best_bid{$wrk}};
-					}
-
-					# Normalize mixed sum with sum of
-					# individual runs.
-					$w *= 100;
-					$w /= $tot;
-
-					# Second metric is average of each
-					# workload wps normalized by their
-					# individual run performance with the
-					# best balancer.
-					$s = 0;
-					$widx = 0;
-					foreach my $wrk (@multi_workloads) {
-						$s += 100 * $bwwps->[$widx] /
-						      $allwps{$wrk}->{$best_bid{$wrk}};
-						$widx++;
-					}
-					$s /= scalar(@multi_workloads);
-
-					say sprintf('Aggregate (normalized) %.2f%%; fairness %.2f%%',
-						    $w, $s);
-				} else {
-					$allwps{$wrk} = \%wps;
-				}
-
-				$wps{$bid} = $w;
-				$cwps{$bid} = $s;
-
-				if ($realtime_target > 0 || $wps_target_param > 0) {
-					$mwps{$bid} = $w * $c;
-				} else {
-					$mwps{$bid} = $w + $s;
-				}
-
-				say "$c clients ($w wps, $s wps single client, score=$mwps{$bid})."
-				    unless $multi_mode and $w_direct;
-
-				last BAL unless $should_b;
-				next BAL if $bal eq '';
-				next GBAL if exists $bal_skip_H{$bal};
-			}
-		}
-	}
-
-	$widx++;
-
-	@sorted = sort { $mwps{$b} <=> $mwps{$a} } keys %mwps;
-	$best_bid{$wrk} = $sorted[0];
-	@sorted = sort { $b <=> $a } values %mwps;
-	$range = 1 - $sorted[-1] / $sorted[0];
-	$best_bal{$wrk} = $sorted[0];
-
-	next if $multi_mode and not $w_direct;
-
-	say "  Best balancer is '$best_bid{$wrk}' (range=$range).";
-
-
-	$results{$wrk} = \%mwps;
-
-	add_points(\%wps, \%scores, \%wscores);
-	add_points(\%mwps, \%mscores, \%mwscores);
-	add_points(\%cwps, \%cscores, \%cwscores);
-}
-
-sub dump_scoreboard
-{
-	my ($n, $h) = @_;
-	my ($i, $str, $balancer);
-	my ($max, $range);
-	my @sorted;
-
-	@sorted = sort { $b <=> $a } values %{$h};
-	$max = $sorted[0];
-	$range = 1 - $sorted[-1] / $max;
-	$str = "$n rank (range=$range):";
-	say "\n$str";
-	say '=' x length($str);
-	$i = 1;
-	foreach my $w (sort { $h->{$b} <=> $h->{$a} } keys %{$h}) {
-		my $score;
-
-		$balancer = $w if $i == 1;
-		$score = $h->{$w} / $max;
-
-		say "  $i: '$w' ($score)";
-
-		$i = $i + 1;
-	}
-
-	return $balancer;
-}
-
-dump_scoreboard($multi_mode ? 'Throughput' : 'Total wps', \%scores);
-dump_scoreboard('Total weighted wps', \%wscores) unless $multi_mode;
-dump_scoreboard($multi_mode ? 'Fairness' : 'Per client wps', \%cscores);
-dump_scoreboard('Per client weighted wps', \%cwscores) unless $multi_mode;
-$balancer = dump_scoreboard($multi_mode ? 'Combined' : 'Combined wps', \%mscores);
-$balancer = dump_scoreboard('Combined weighted wps', \%mwscores) unless $multi_mode;
-
-VERIFY:
-
-my %problem_wrk;
-
-die unless defined $balancer;
-
-say "\nBalancer is '$balancer'.";
-say "Idleness tolerance is $idle_tolerance_pct%.";
-
-if ($multi_mode) {
-	$w_direct = '-w ' . join ' -w ', map("$wrk_root/$_", @workloads);
-	@workloads = ($w_direct);
-}
-
-foreach my $wrk (@workloads) {
-	my @args = ( "-n $nop" );
-	my ($r, $error, $c, $wps, $swps);
-	my $saturated = 0;
-	my $result = 'Pass';
-	my $vcs2 = $gt2 ? '1:0' : '2:1';
-	my %problem;
-	my $engines;
-
-	next if not defined $w_direct and not can_balance_workload($wrk);
-
-	push @args, $balancer unless $balancer eq '<none>';
-
-	if (scalar(keys %results)) {
-		$r = $results{$wrk}->{$balancer} / $best_bal{$wrk} * 100.0;
-	} else {
-		$r = '---';
-	}
-	say "  \nProfiling '$wrk' ($r% of best)...";
-
-	($r, $error) = calibrate_workload($wrk);
-	say "      ${client_target_s}s is $r workloads. (error=$error)";
-
-	($c, $wps, $swps) = find_saturation_point($wrk, $r, $verbose, @args);
-	say "      Saturation at $c clients ($wps workloads/s).";
-	push @args, "-c $c";
-
-	$engines = trace_workload($wrk, $balancer, $r, $c);
-
-	foreach my $key (keys %{$engines}) {
-		next if $key eq 'gpu';
-		$saturated = $saturated + 1
-			     if $engines->{$key} < $idle_tolerance_pct;
-	}
-
-	if ($saturated == 0) {
-		# Not a single saturated engine
-		$result = 'FAIL';
-	} elsif (not exists $engines->{'2:0'} or not exists $engines->{$vcs2}) {
-		# VCS1 and VCS2 not present in a balancing workload
-		$result = 'FAIL';
-	} elsif ($saturated == 1 and
-		 ($engines->{'2:0'} < $idle_tolerance_pct or
-		  $engines->{$vcs2} < $idle_tolerance_pct)) {
-		# Only one VCS saturated
-		$result = 'WARN';
-	}
-
-	$result = 'WARN' if $engines->{'gpu'} > $idle_tolerance_pct;
-
-	if ($result ne 'Pass') {
-		$problem{'c'} = $c;
-		$problem{'r'} = $r;
-		$problem{'stats'} = $engines;
-		$problem_wrk{$wrk} = \%problem;
-	}
-
-	print "    $result [";
-	print map " $_: $engines->{$_}%,", sort keys %{$engines};
-	say " ]";
-}
-
-say "\nProblematic workloads were:" if scalar(keys %problem_wrk) > 0;
-foreach my $wrk (sort keys %problem_wrk) {
-	my $problem = $problem_wrk{$wrk};
-
-	print "   $wrk -c $problem->{'c'} -r $problem->{'r'} [";
-	print map " $_: $problem->{'stats'}->{$_}%,",
-	      sort keys %{$problem->{'stats'}};
-	say " ]";
-}
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 02/10] gem_wsim: Buffer objects working sets and complex dependencies
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 16:57   ` Chris Wilson
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 03/10] gem_wsim: Show workload timing stats Tvrtko Ursulin
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add support for defining buffer object working sets and targetting them as
data dependencies. For more information please see the README file.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c                   | 453 +++++++++++++++++++++---
 benchmarks/wsim/README                  |  59 +++
 benchmarks/wsim/cloud-gaming-60fps.wsim |  11 +
 benchmarks/wsim/composited-ui.wsim      |   7 +
 4 files changed, 476 insertions(+), 54 deletions(-)
 create mode 100644 benchmarks/wsim/cloud-gaming-60fps.wsim
 create mode 100644 benchmarks/wsim/composited-ui.wsim

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 02fe8f5a5e69..9e5bfe6a36d4 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -88,14 +88,21 @@ enum w_type
 	LOAD_BALANCE,
 	BOND,
 	TERMINATE,
-	SSEU
+	SSEU,
+	WORKINGSET,
+};
+
+struct dep_entry {
+	int target;
+	bool write;
+	int working_set; /* -1 = step dependecy, >= 0 working set id */
 };
 
 struct deps
 {
 	int nr;
 	bool submit_fence;
-	int *list;
+	struct dep_entry *list;
 };
 
 struct w_arg {
@@ -110,6 +117,14 @@ struct bond {
 	enum intel_engine_id master;
 };
 
+struct working_set {
+	int id;
+	bool shared;
+	unsigned int nr;
+	uint32_t *handles;
+	unsigned long *sizes;
+};
+
 struct workload;
 
 struct w_step
@@ -143,6 +158,7 @@ struct w_step
 			enum intel_engine_id bond_master;
 		};
 		int sseu;
+		struct working_set working_set;
 	};
 
 	/* Implementation details */
@@ -193,6 +209,9 @@ struct workload
 	unsigned int nr_ctxs;
 	struct ctx *ctx_list;
 
+	struct working_set **working_sets; /* array indexed by set id */
+	int max_working_set_id;
+
 	int sync_timeline;
 	uint32_t sync_seqno;
 
@@ -281,11 +300,120 @@ print_engine_calibrations(void)
 	printf("\n");
 }
 
+static void add_dep(struct deps *deps, struct dep_entry entry)
+{
+	deps->list = realloc(deps->list, sizeof(*deps->list) * (deps->nr + 1));
+	igt_assert(deps->list);
+
+	deps->list[deps->nr++] = entry;
+}
+
+static int
+parse_working_set_deps(struct workload *wrk,
+		       struct deps *deps,
+		       struct dep_entry _entry,
+		       char *str)
+{
+	/*
+	 * 1 - target handle index in the specified working set.
+	 * 2-4 - range
+	 */
+	struct dep_entry entry = _entry;
+	char *s;
+
+	s = index(str, '-');
+	if (s) {
+		int from, to;
+
+		from = atoi(str);
+		if (from < 0)
+			return -1;
+
+		to = atoi(++s);
+		if (to <= 0)
+			return -1;
+
+		for (entry.target = from; entry.target <= to; entry.target++)
+			add_dep(deps, entry);
+	} else {
+		entry.target = atoi(str);
+		if (entry.target < 0)
+			return -1;
+
+		add_dep(deps, entry);
+	}
+
+	return 0;
+}
+
+static int
+parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
+{
+	struct dep_entry entry = { .working_set = -1 };
+	bool submit_fence = false;
+	char *s;
+
+	switch (str[0]) {
+	case '-':
+		if (str[1] < '0' || str[1] > '9')
+			return -1;
+
+		entry.target = atoi(str);
+		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
+			return -1;
+
+		add_dep(&w->data_deps, entry);
+
+		break;
+	case 's':
+		submit_fence = true;
+		/* Fall-through. */
+	case 'f':
+		/* Multiple fences not yet supported. */
+		igt_assert_eq(w->fence_deps.nr, 0);
+
+		entry.target = atoi(++str);
+		if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
+			return -1;
+
+		add_dep(&w->fence_deps, entry);
+
+		w->fence_deps.submit_fence = submit_fence;
+		break;
+	case 'w':
+		entry.write = true;
+		/* Fall-through. */
+	case 'r':
+		/*
+		 * [rw]N-<str>
+		 * r1-<str> or w2-<str>, where N is working set id.
+		 */
+		s = index(++str, '-');
+		if (!s)
+			return -1;
+
+		entry.working_set = atoi(str);
+
+		if (parse_working_set_deps(w->wrk, &w->data_deps, entry, ++s))
+			return -1;
+
+		break;
+	default:
+		return -1;
+	};
+
+	return 0;
+}
+
 static int
 parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
 {
 	char *desc = strdup(_desc);
 	char *token, *tctx = NULL, *tstart = desc;
+	int ret = 0;
+
+	if (!strcmp(_desc, "0"))
+		goto out;
 
 	igt_assert(desc);
 	igt_assert(!w->data_deps.nr && w->data_deps.nr == w->fence_deps.nr);
@@ -293,47 +421,17 @@ parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
 		   w->data_deps.list == w->fence_deps.list);
 
 	while ((token = strtok_r(tstart, "/", &tctx)) != NULL) {
-		bool submit_fence = false;
-		char *str = token;
-		struct deps *deps;
-		int dep;
-
 		tstart = NULL;
 
-		if (str[0] == '-' || (str[0] >= '0' && str[0] <= '9')) {
-			deps = &w->data_deps;
-		} else {
-			if (str[0] == 's')
-				submit_fence = true;
-			else if (str[0] != 'f')
-				return -1;
-
-			deps = &w->fence_deps;
-			str++;
-		}
-
-		dep = atoi(str);
-		if (dep > 0 || ((int)nr_steps + dep) < 0) {
-			if (deps->list)
-				free(deps->list);
-			return -1;
-		}
-
-		if (dep < 0) {
-			deps->nr++;
-			/* Multiple fences not yet supported. */
-			igt_assert(deps->nr == 1 || deps != &w->fence_deps);
-			deps->list = realloc(deps->list,
-					     sizeof(*deps->list) * deps->nr);
-			igt_assert(deps->list);
-			deps->list[deps->nr - 1] = dep;
-			deps->submit_fence = submit_fence;
-		}
+		ret = parse_dependency(nr_steps, w, token);
+		if (ret)
+			break;
 	}
 
+out:
 	free(desc);
 
-	return 0;
+	return ret;
 }
 
 static void __attribute__((format(printf, 1, 2)))
@@ -624,6 +722,88 @@ static int parse_engine_map(struct w_step *step, const char *_str)
 	return 0;
 }
 
+static unsigned long parse_size(char *str)
+{
+	const unsigned int len = strlen(str);
+	unsigned int mult = 1;
+
+	if (len == 0)
+		return 0;
+
+	switch (str[len - 1]) {
+	case 'g':
+	case 'G':
+		mult *= 1024;
+		/* Fall-throuogh. */
+	case 'm':
+	case 'M':
+		mult *= 1024;
+		/* Fall-throuogh. */
+	case 'k':
+	case 'K':
+		mult *= 1024;
+
+		str[len - 1] = 0;
+	}
+
+	return atol(str) * mult;
+}
+
+static int add_buffers(struct working_set *set, char *str)
+{
+	/*
+	 * 4096
+	 * 4k
+	 * 4m
+	 * 4g
+	 * 10n4k - 10 4k batches
+	 */
+	unsigned long *sizes, size;
+	unsigned int add, i;
+	char *n;
+
+	n = index(str, 'n');
+	if (n) {
+		*n = 0;
+		add = atoi(str);
+		if (!add)
+			return -1;
+		str = ++n;
+	} else {
+		add = 1;
+	}
+
+	size = parse_size(str);
+	if (!size)
+		return -1;
+
+	sizes = realloc(set->sizes, (set->nr + add) * sizeof(*sizes));
+	if (!sizes)
+		return -1;
+
+	for (i = 0; i < add; i++)
+		sizes[set->nr + i] = size;
+
+	set->nr += add;
+	set->sizes = sizes;
+
+	return 0;
+}
+
+static int parse_working_set(struct working_set *set, char *str)
+{
+	char *token, *tctx = NULL, *tstart = str;
+
+	while ((token = strtok_r(tstart, "/", &tctx))) {
+		tstart = NULL;
+
+		if (add_buffers(set, token))
+			return -1;
+	}
+
+	return 0;
+}
+
 static uint64_t engine_list_mask(const char *_str)
 {
 	uint64_t mask = 0;
@@ -644,6 +824,8 @@ static uint64_t engine_list_mask(const char *_str)
 	return mask;
 }
 
+static void allocate_working_set(struct working_set *set);
+
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
 	if ((field = strtok_r(fstart, ".", &fctx))) { \
 		tmp = atoi(field); \
@@ -661,7 +843,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 	char *desc = strdup(arg->desc);
 	char *_token, *token, *tctx = NULL, *tstart = desc;
 	char *field, *fctx = NULL, *fstart;
-	struct w_step step, *steps = NULL;
+	struct w_step step, *w, *steps = NULL;
 	unsigned int valid;
 	int i, j, tmp;
 
@@ -851,6 +1033,28 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 				step.type = BOND;
 				goto add_step;
+			} else if (!strcmp(field, "w") || !strcmp(field, "W")) {
+				unsigned int nr = 0;
+
+				step.working_set.shared = field[0] == 'W';
+
+				while ((field = strtok_r(fstart, ".", &fctx))) {
+					tmp = atoi(field);
+					if (nr == 0) {
+						step.working_set.id = tmp;
+					} else {
+						tmp = parse_working_set(&step.working_set,
+									field);
+						check_arg(tmp < 0,
+							  "Invalid working set at step %u!\n",
+							  nr_steps);
+					}
+
+					nr++;
+				}
+
+				step.type = WORKINGSET;
+				goto add_step;
 			}
 
 			if (!field) {
@@ -975,6 +1179,8 @@ add_step:
 	wrk->steps = steps;
 	wrk->prio = arg->prio;
 	wrk->sseu = arg->sseu;
+	wrk->max_working_set_id = -1;
+	wrk->working_sets = NULL;
 
 	free(desc);
 
@@ -984,7 +1190,7 @@ add_step:
 	 */
 	for (i = 0; i < nr_steps; i++) {
 		for (j = 0; j < steps[i].fence_deps.nr; j++) {
-			tmp = steps[i].idx + steps[i].fence_deps.list[j];
+			tmp = steps[i].idx + steps[i].fence_deps.list[j].target;
 			check_arg(tmp < 0 || tmp >= i ||
 				  (steps[tmp].type != BATCH &&
 				   steps[tmp].type != SW_FENCE),
@@ -1003,6 +1209,51 @@ add_step:
 		}
 	}
 
+	/*
+	 * Check no duplicate working set ids.
+	 */
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		struct w_step *w2;
+
+		if (w->type != WORKINGSET)
+			continue;
+
+		for (j = 0, w2 = wrk->steps; j < wrk->nr_steps; w2++, j++) {
+			if (j == i)
+				continue;
+			if (w2->type != WORKINGSET)
+				continue;
+
+			check_arg(w->working_set.id == w2->working_set.id,
+				  "Duplicate working set id at %u!\n", j);
+		}
+	}
+
+	/*
+	 * Allocate shared working sets.
+	 */
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET && w->working_set.shared)
+			allocate_working_set(&w->working_set);
+	}
+
+	wrk->max_working_set_id = -1;
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET &&
+		    w->working_set.shared &&
+		    w->working_set.id > wrk->max_working_set_id)
+			wrk->max_working_set_id = w->working_set.id;
+	}
+
+	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
+				   sizeof(*wrk->working_sets));
+	igt_assert(wrk->working_sets);
+
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET && w->working_set.shared)
+			wrk->working_sets[w->working_set.id] = &w->working_set;
+	}
+
 	return wrk;
 }
 
@@ -1024,6 +1275,18 @@ clone_workload(struct workload *_wrk)
 
 	memcpy(wrk->steps, _wrk->steps, sizeof(struct w_step) * wrk->nr_steps);
 
+	wrk->max_working_set_id = _wrk->max_working_set_id;
+	if (wrk->max_working_set_id >= 0) {
+		wrk->working_sets = calloc(wrk->max_working_set_id + 1,
+					sizeof(*wrk->working_sets));
+		igt_assert(wrk->working_sets);
+
+		memcpy(wrk->working_sets,
+		       _wrk->working_sets,
+		       (wrk->max_working_set_id + 1) *
+		       sizeof(*wrk->working_sets));
+	}
+
 	/* Check if we need a sw sync timeline. */
 	for (i = 0; i < wrk->nr_steps; i++) {
 		if (wrk->steps[i].type == SW_FENCE) {
@@ -1226,17 +1489,36 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 	igt_assert(j < nr_obj);
 
 	for (i = 0; i < w->data_deps.nr; i++) {
-		igt_assert(w->data_deps.list[i] <= 0);
-		if (w->data_deps.list[i]) {
-			int dep_idx = w->idx + w->data_deps.list[i];
+		struct dep_entry *entry = &w->data_deps.list[i];
+		uint32_t dep_handle;
+
+		if (entry->working_set == -1) {
+			int dep_idx = w->idx + entry->target;
 
+			igt_assert(entry->target <= 0);
 			igt_assert(dep_idx >= 0 && dep_idx < w->idx);
 			igt_assert(wrk->steps[dep_idx].type == BATCH);
 
-			w->obj[j].handle = wrk->steps[dep_idx].obj[0].handle;
-			j++;
-			igt_assert(j < nr_obj);
+			dep_handle = wrk->steps[dep_idx].obj[0].handle;
+		} else {
+			struct working_set *set;
+
+			igt_assert(entry->working_set <=
+				   wrk->max_working_set_id);
+
+			set = wrk->working_sets[entry->working_set];
+
+			igt_assert(set->nr);
+			igt_assert(entry->target < set->nr);
+			igt_assert(set->sizes[entry->target]);
+
+			dep_handle = set->handles[entry->target];
 		}
+
+		w->obj[j].flags = entry->write ? EXEC_OBJECT_WRITE : 0;
+		w->obj[j].handle = dep_handle;
+		j++;
+		igt_assert(j < nr_obj);
 	}
 
 	if (w->unbound_duration)
@@ -1395,11 +1677,23 @@ static size_t sizeof_engines_bond(int count)
 			engines[count]);
 }
 
+static void allocate_working_set(struct working_set *set)
+{
+	unsigned int i;
+
+	set->handles = calloc(set->nr, sizeof(*set->handles));
+	igt_assert(set->handles);
+
+	for (i = 0; i < set->nr; i++)
+		set->handles[i] = gem_create(fd, set->sizes[i]);
+}
+
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
 
 static int
 prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 {
+	struct working_set **sets;
 	uint32_t share_vm = 0;
 	int max_ctx = -1;
 	struct w_step *w;
@@ -1634,6 +1928,51 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		}
 	}
 
+	/*
+	 * Allocate working sets.
+	 */
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET && !w->working_set.shared)
+			allocate_working_set(&w->working_set);
+	}
+
+	/*
+	 * Map of working set ids.
+	 */
+	wrk->max_working_set_id = -1;
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type == WORKINGSET &&
+		    w->working_set.id > wrk->max_working_set_id)
+			wrk->max_working_set_id = w->working_set.id;
+	}
+
+	sets = wrk->working_sets;
+	wrk->working_sets = calloc(wrk->max_working_set_id + 1,
+				   sizeof(*wrk->working_sets));
+	igt_assert(wrk->working_sets);
+
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		struct working_set *set;
+
+		if (w->type != WORKINGSET)
+			continue;
+
+		if (!w->working_set.shared) {
+			set = &w->working_set;
+		} else {
+			igt_assert(sets);
+
+			set = sets[w->working_set.id];
+			igt_assert(set->shared);
+			igt_assert(set->sizes);
+		}
+
+		wrk->working_sets[w->working_set.id] = set;
+	}
+
+	if (sets)
+		free(sets);
+
 	/*
 	 * Allocate batch buffers.
 	 */
@@ -1704,7 +2043,7 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 		      2 * sizeof(uint32_t));
 
 	for (i = 0; i < w->fence_deps.nr; i++) {
-		int tgt = w->idx + w->fence_deps.list[i];
+		int tgt = w->idx + w->fence_deps.list[i].target;
 
 		/* TODO: fence merging needed to support multiple inputs */
 		igt_assert(i == 0);
@@ -1735,14 +2074,18 @@ static void sync_deps(struct workload *wrk, struct w_step *w)
 	unsigned int i;
 
 	for (i = 0; i < w->data_deps.nr; i++) {
+		struct dep_entry *entry = &w->data_deps.list[i];
 		int dep_idx;
 
-		igt_assert(w->data_deps.list[i] <= 0);
+		if (entry->working_set == -1)
+			continue;
+
+		igt_assert(entry->target <= 0);
 
-		if (!w->data_deps.list[i])
+		if (!entry->target)
 			continue;
 
-		dep_idx = w->idx + w->data_deps.list[i];
+		dep_idx = w->idx + entry->target;
 
 		igt_assert(dep_idx >= 0 && dep_idx < w->idx);
 		igt_assert(wrk->steps[dep_idx].type == BATCH);
@@ -1842,11 +2185,6 @@ static void *run_workload(void *data)
 					MI_BATCH_BUFFER_END;
 				__sync_synchronize();
 				continue;
-			} else if (w->type == PREEMPTION ||
-				   w->type == ENGINE_MAP ||
-				   w->type == LOAD_BALANCE ||
-				   w->type == BOND) {
-				continue;
 			} else if (w->type == SSEU) {
 				if (w->sseu != wrk->ctx_list[w->context * 2].sseu) {
 					wrk->ctx_list[w->context * 2].sseu =
@@ -1854,6 +2192,13 @@ static void *run_workload(void *data)
 							     w->sseu);
 				}
 				continue;
+			} else if (w->type == PREEMPTION ||
+				   w->type == ENGINE_MAP ||
+				   w->type == LOAD_BALANCE ||
+				   w->type == BOND ||
+				   w->type == WORKINGSET) {
+				   /* No action for these at execution time. */
+				continue;
 			}
 
 			if (do_sleep || w->type == PERIOD) {
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 9f770217f075..3d9143226740 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -8,6 +8,7 @@ M.<uint>.<str>[|<str>]...
 P|S|X.<uint>.<int>
 d|p|s|t|q|a|T.<int>,...
 b.<uint>.<str>[|<str>].<str>
+w|W.<uint>.<str>[/<str>]...
 f
 
 For duration a range can be given from which a random value will be picked
@@ -32,6 +33,8 @@ Additional workload steps are also supported:
  'P' - Context priority.
  'S' - Context SSEU configuration.
  'T' - Terminate an infinite batch.
+ 'w' - Working set. (See Working sets section.)
+ 'W' - Shared working set.
  'X' - Context preemption control.
 
 Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
@@ -275,3 +278,59 @@ for the render engine.
 Slice mask of -1 has a special meaning of "all slices". Otherwise any integer
 can be specifying as the slice mask, but beware any apart from 1 and -1 can make
 the workload not portable between different GPUs.
+
+Working sets
+------------
+
+When used plainly workload steps can create implicit data dependencies by
+relatively referencing another workload steps of a batch buffer type. Fourth
+field contains the relative data dependncy. For example:
+
+  1.RCS.1000.0.0
+  1.BCS.1000.-1.0
+
+This means the second batch buffer will be marked as having a read data
+dependency on the first one. (The shared buffer is always marked as written to
+by the dependency target buffer.) This will cause a serialization between the
+two batch buffers.
+
+Working sets are used where more complex data dependencies are required. Each
+working set has an id, a list of buffers, and can either be local to the
+workload or shared within the cloned workloads (-c command line option).
+
+Lower-case 'w' command defines a local working set while upper-case 'W' defines
+a shared version. Syntax is as follows:
+
+  w.<id>.<size>[/<size>]...
+
+For size a byte size can be given, or suffix 'k', 'm' or 'g' can be used (case
+insensitive). Prefix in the format of "<int>n<size>" can also be given to create
+multiple objects of the same size.
+
+Examples:
+
+  w.1.4k - Working set 1 with a single 4KiB object in it.
+  W.2.2M/32768 - Working set 2 with one 2MiB and one 32768 byte object.
+  w.3.10n4k/2n20000 - Working set 3 with ten 4KiB and two 20000 byte objects.
+
+Working set objects can be referenced as data dependency targets using the new
+'r'/'w' syntax. Simple example:
+
+  w.1.4k
+  W.2.1m
+  1.RCS.1000.r1-0/w2-0.0
+  1.BCS.1000.r2-0.0
+
+In this example the RCS batch is reading from working set 1 object 0 and writing
+to working set 2 object 0. BCS batch is reading from working set 2 object 0.
+
+Because working set 2 is of a shared type, should two instances of the same
+workload be executed (-c 2) then the 1MiB buffer would be shared and written
+and read by both clients creating a serialization point.
+
+Apart from single objects, ranges can also be given as depenencies:
+
+  w.1.10n4k
+  1.RCS.1000.r1-0-9.0
+
+Here the RCS batch has a read dependency on working set 1 objects 0 to 9.
diff --git a/benchmarks/wsim/cloud-gaming-60fps.wsim b/benchmarks/wsim/cloud-gaming-60fps.wsim
new file mode 100644
index 000000000000..9e48bbc2f617
--- /dev/null
+++ b/benchmarks/wsim/cloud-gaming-60fps.wsim
@@ -0,0 +1,11 @@
+w.1.10n8m
+w.2.3n16m
+1.RCS.500-1500.r1-0-4/w2-0.0
+1.RCS.500-1500.r1-5-9/w2-1.0
+1.RCS.500-1500.r2-0-1/w2-2.0
+M.2.VCS
+B.2
+3.RCS.500-1500.r2-2.0
+2.DEFAULT.2000-4000.-1.0
+4.VCS1.250-750.-1.1
+p.16667
diff --git a/benchmarks/wsim/composited-ui.wsim b/benchmarks/wsim/composited-ui.wsim
new file mode 100644
index 000000000000..4164f8bf7393
--- /dev/null
+++ b/benchmarks/wsim/composited-ui.wsim
@@ -0,0 +1,7 @@
+w.1.10n8m/3n16m
+W.2.16m
+1.RCS.200-600.r1-0-4/w1-10.0
+1.RCS.200-600.r1-5-9/w1-11.0
+1.RCS.400-800.r1-10-11/w1-12.0
+3.BCS.200-800.r1-12/w2-0.1
+p.16667
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 03/10] gem_wsim: Show workload timing stats
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing Tvrtko Ursulin
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 02/10] gem_wsim: Buffer objects working sets and complex dependencies Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 16:58   ` Chris Wilson
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 04/10] gem_wsim: Move BO allocation to a helper Tvrtko Ursulin
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Show average/min/max workload iteration and dropped period stats when 'p'
command is used.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 9e5bfe6a36d4..60982cb73ba7 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -2101,7 +2101,8 @@ static void *run_workload(void *data)
 	struct w_step *w;
 	int throttle = -1;
 	int qd_throttle = -1;
-	int count;
+	int count, missed = 0;
+	unsigned long time_tot = 0, time_min = ULONG_MAX, time_max = 0;
 	int i;
 
 	clock_gettime(CLOCK_MONOTONIC, &t_start);
@@ -2121,12 +2122,19 @@ static void *run_workload(void *data)
 				do_sleep = w->delay;
 			} else if (w->type == PERIOD) {
 				struct timespec now;
+				int elapsed;
 
 				clock_gettime(CLOCK_MONOTONIC, &now);
-				do_sleep = w->period -
-					   elapsed_us(&wrk->repeat_start, &now);
+				elapsed = elapsed_us(&wrk->repeat_start, &now);
+				do_sleep = w->period - elapsed;
+				time_tot += elapsed;
+				if (elapsed < time_min)
+					time_min = elapsed;
+				if (elapsed > time_max)
+					time_max = elapsed;
 				if (do_sleep < 0) {
-					if (verbose > 1)
+					missed++;
+					if (verbose > 2)
 						printf("%u: Dropped period @ %u/%u (%dus late)!\n",
 						       wrk->id, count, i, do_sleep);
 					continue;
@@ -2280,6 +2288,9 @@ static void *run_workload(void *data)
 		printf("%c%u: %.3fs elapsed (%d cycles, %.3f workloads/s).",
 		       wrk->background ? ' ' : '*', wrk->id,
 		       t, count, count / t);
+		if (time_tot)
+			printf(" Time avg/min/max=%lu/%lu/%luus; %u missed.",
+			       time_tot / count, time_min, time_max, missed);
 		putchar('\n');
 	}
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 04/10] gem_wsim: Move BO allocation to a helper
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 03/10] gem_wsim: Show workload timing stats Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 05/10] gem_wsim: Support random buffer sizes Tvrtko Ursulin
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 60982cb73ba7..5893de38a98e 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1472,6 +1472,11 @@ get_ctxid(struct workload *wrk, struct w_step *w)
 	return wrk->ctx_list[w->context].id;
 }
 
+static uint32_t alloc_bo(int i915, unsigned long size)
+{
+	return gem_create(i915, size);
+}
+
 static void
 alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 {
@@ -1483,7 +1488,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 	w->obj = calloc(nr_obj, sizeof(*w->obj));
 	igt_assert(w->obj);
 
-	w->obj[j].handle = gem_create(fd, 4096);
+	w->obj[j].handle = alloc_bo(fd, 4096);
 	w->obj[j].flags = EXEC_OBJECT_WRITE;
 	j++;
 	igt_assert(j < nr_obj);
@@ -1528,7 +1533,8 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 	else
 		w->bb_sz = get_bb_sz(w, w->duration.max);
 
-	w->bb_handle = w->obj[j].handle = gem_create(fd, w->bb_sz + (w->unbound_duration ? 4096 : 0));
+	w->bb_handle = w->obj[j].handle =
+		alloc_bo(fd, w->bb_sz + (w->unbound_duration ? 4096 : 0));
 	init_bb(w, flags);
 	w->obj[j].relocation_count = terminate_bb(w, flags);
 
@@ -1685,7 +1691,7 @@ static void allocate_working_set(struct working_set *set)
 	igt_assert(set->handles);
 
 	for (i = 0; i < set->nr; i++)
-		set->handles[i] = gem_create(fd, set->sizes[i]);
+		set->handles[i] = alloc_bo(fd, set->sizes[i]);
 }
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
@@ -2323,7 +2329,7 @@ static unsigned long calibrate_nop(unsigned int tolerance_pct, struct intel_exec
 	do {
 		struct timespec t_start;
 
-		obj.handle = gem_create(fd, size);
+		obj.handle = alloc_bo(fd, size);
 		gem_write(fd, obj.handle, size - sizeof(bbe), &bbe,
 			  sizeof(bbe));
 		gem_execbuf(fd, &eb);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 05/10] gem_wsim: Support random buffer sizes
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 04/10] gem_wsim: Move BO allocation to a helper Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 16:31   ` Chris Wilson
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 06/10] gem_wsim: Support scaling workload batch durations Tvrtko Ursulin
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

See README for more details.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 71 +++++++++++++++++++++++++++++++++---------
 benchmarks/wsim/README |  4 +++
 2 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 5893de38a98e..c1405596c46a 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -117,12 +117,18 @@ struct bond {
 	enum intel_engine_id master;
 };
 
+struct work_buffer_size {
+	unsigned long size;
+	unsigned long min;
+	unsigned long max;
+};
+
 struct working_set {
 	int id;
 	bool shared;
 	unsigned int nr;
 	uint32_t *handles;
-	unsigned long *sizes;
+	struct work_buffer_size *sizes;
 };
 
 struct workload;
@@ -203,6 +209,7 @@ struct workload
 	bool print_stats;
 
 	uint32_t bb_prng;
+	uint32_t bo_prng;
 
 	struct timespec repeat_start;
 
@@ -757,10 +764,12 @@ static int add_buffers(struct working_set *set, char *str)
 	 * 4m
 	 * 4g
 	 * 10n4k - 10 4k batches
+	 * 4096-16k - random size in range
 	 */
-	unsigned long *sizes, size;
+	struct work_buffer_size *sizes;
+	unsigned long min_sz, max_sz;
+	char *n, *max = NULL;
 	unsigned int add, i;
-	char *n;
 
 	n = index(str, 'n');
 	if (n) {
@@ -773,16 +782,34 @@ static int add_buffers(struct working_set *set, char *str)
 		add = 1;
 	}
 
-	size = parse_size(str);
-	if (!size)
+	n = index(str, '-');
+	if (n) {
+		*n = 0;
+		max = ++n;
+	}
+
+	min_sz = parse_size(str);
+	if (!min_sz)
 		return -1;
 
+	if (max) {
+		max_sz = parse_size(max);
+		if (!max_sz)
+			return -1;
+	} else {
+		max_sz = min_sz;
+	}
+
 	sizes = realloc(set->sizes, (set->nr + add) * sizeof(*sizes));
 	if (!sizes)
 		return -1;
 
-	for (i = 0; i < add; i++)
-		sizes[set->nr + i] = size;
+	for (i = 0; i < add; i++) {
+		struct work_buffer_size *sz = &sizes[set->nr + i];
+		sz->min = min_sz;
+		sz->max = max_sz;
+		sz->size = 0;
+	}
 
 	set->nr += add;
 	set->sizes = sizes;
@@ -824,7 +851,7 @@ static uint64_t engine_list_mask(const char *_str)
 	return mask;
 }
 
-static void allocate_working_set(struct working_set *set);
+static void allocate_working_set(struct workload *wrk, struct working_set *set);
 
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
 	if ((field = strtok_r(fstart, ".", &fctx))) { \
@@ -1177,10 +1204,12 @@ add_step:
 
 	wrk->nr_steps = nr_steps;
 	wrk->steps = steps;
+	wrk->flags = flags;
 	wrk->prio = arg->prio;
 	wrk->sseu = arg->sseu;
 	wrk->max_working_set_id = -1;
 	wrk->working_sets = NULL;
+	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
 
 	free(desc);
 
@@ -1234,7 +1263,7 @@ add_step:
 	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		if (w->type == WORKINGSET && w->working_set.shared)
-			allocate_working_set(&w->working_set);
+			allocate_working_set(wrk, &w->working_set);
 	}
 
 	wrk->max_working_set_id = -1;
@@ -1267,6 +1296,7 @@ clone_workload(struct workload *_wrk)
 	igt_assert(wrk);
 	memset(wrk, 0, sizeof(*wrk));
 
+	wrk->flags = _wrk->flags;
 	wrk->prio = _wrk->prio;
 	wrk->sseu = _wrk->sseu;
 	wrk->nr_steps = _wrk->nr_steps;
@@ -1515,7 +1545,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 
 			igt_assert(set->nr);
 			igt_assert(entry->target < set->nr);
-			igt_assert(set->sizes[entry->target]);
+			igt_assert(set->sizes[entry->target].size);
 
 			dep_handle = set->handles[entry->target];
 		}
@@ -1683,15 +1713,27 @@ static size_t sizeof_engines_bond(int count)
 			engines[count]);
 }
 
-static void allocate_working_set(struct working_set *set)
+static unsigned long
+get_buffer_size(struct workload *wrk, const struct work_buffer_size *sz)
+{
+	if (sz->min == sz->max)
+		return sz->min;
+	else
+		return sz->min + hars_petruska_f54_1_random(&wrk->bo_prng) %
+		       (sz->max + 1 - sz->min);
+}
+
+static void allocate_working_set(struct workload *wrk, struct working_set *set)
 {
 	unsigned int i;
 
 	set->handles = calloc(set->nr, sizeof(*set->handles));
 	igt_assert(set->handles);
 
-	for (i = 0; i < set->nr; i++)
-		set->handles[i] = alloc_bo(fd, set->sizes[i]);
+	for (i = 0; i < set->nr; i++) {
+		set->sizes[i].size = get_buffer_size(wrk, &set->sizes[i]);
+		set->handles[i] = alloc_bo(fd, set->sizes[i].size);
+	}
 }
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
@@ -1707,6 +1749,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 
 	wrk->id = id;
 	wrk->bb_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
+	wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
 	wrk->run = true;
 
 	/*
@@ -1939,7 +1982,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		if (w->type == WORKINGSET && !w->working_set.shared)
-			allocate_working_set(&w->working_set);
+			allocate_working_set(wrk, &w->working_set);
 	}
 
 	/*
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 3d9143226740..8c71f2fe6579 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -307,11 +307,15 @@ For size a byte size can be given, or suffix 'k', 'm' or 'g' can be used (case
 insensitive). Prefix in the format of "<int>n<size>" can also be given to create
 multiple objects of the same size.
 
+Ranges can also be specified using the <min>-<max> syntax.
+
 Examples:
 
   w.1.4k - Working set 1 with a single 4KiB object in it.
   W.2.2M/32768 - Working set 2 with one 2MiB and one 32768 byte object.
   w.3.10n4k/2n20000 - Working set 3 with ten 4KiB and two 20000 byte objects.
+  w.4.4n4k-1m - Working set 4 with four objects of random size between 4KiB and
+		1MiB.
 
 Working set objects can be referenced as data dependency targets using the new
 'r'/'w' syntax. Simple example:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 06/10] gem_wsim: Support scaling workload batch durations
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 05/10] gem_wsim: Support random buffer sizes Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 16:22   ` Chris Wilson
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 07/10] gem_wsim: Log max and active working set sizes in verbose mode Tvrtko Ursulin
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

-f <float> on the command line can be used to scale batch buffer durations
in all parsed workloads.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index c1405596c46a..025385a144b8 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -41,6 +41,7 @@
 #include <assert.h>
 #include <limits.h>
 #include <pthread.h>
+#include <math.h>
 
 #include "intel_chipset.h"
 #include "intel_reg.h"
@@ -853,6 +854,11 @@ static uint64_t engine_list_mask(const char *_str)
 
 static void allocate_working_set(struct workload *wrk, struct working_set *set);
 
+static long __duration(long dur, double scale)
+{
+	return round(scale * dur);
+}
+
 #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
 	if ((field = strtok_r(fstart, ".", &fctx))) { \
 		tmp = atoi(field); \
@@ -863,7 +869,8 @@ static void allocate_working_set(struct workload *wrk, struct working_set *set);
 	} \
 
 static struct workload *
-parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
+parse_workload(struct w_arg *arg, unsigned int flags, double scale_dur,
+	       struct workload *app_w)
 {
 	struct workload *wrk;
 	unsigned int nr_steps = 0;
@@ -1129,7 +1136,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					  tmpl == LONG_MAX,
 					  "Invalid duration at step %u!\n",
 					  nr_steps);
-				step.duration.min = tmpl;
+				step.duration.min = __duration(tmpl, scale_dur);
 
 				if (sep && *sep == '-') {
 					tmpl = strtol(sep + 1, NULL, 10);
@@ -1139,7 +1146,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 						tmpl == LONG_MAX,
 						"Invalid duration range at step %u!\n",
 						nr_steps);
-					step.duration.max = tmpl;
+					step.duration.max = __duration(tmpl,
+								       scale_dur);
 				} else {
 					step.duration.max = step.duration.min;
 				}
@@ -2494,7 +2502,8 @@ static void print_help(void)
 "                    command line. Subsequent -s switches it off.\n"
 "  -S                Synchronize the sequence of random batch durations between\n"
 "                    clients.\n"
-"  -d                Sync between data dependencies in userspace."
+"  -d                Sync between data dependencies in userspace.\n"
+"  -f <scale>        Scale factor for batch durations."
 	);
 }
 
@@ -2556,6 +2565,7 @@ int main(int argc, char **argv)
 	struct w_arg *w_args = NULL;
 	unsigned int tolerance_pct = 1;
 	int exitcode = EXIT_FAILURE;
+	double scale_arg = 1.0f;
 	int prio = 0;
 	double t;
 	int i, c;
@@ -2576,7 +2586,7 @@ int main(int argc, char **argv)
 	master_prng = time(NULL);
 
 	while ((c = getopt(argc, argv,
-			   "ThqvsSdc:n:r:w:W:a:t:p:I:")) != -1) {
+			   "ThqvsSdc:n:r:w:W:a:t:p:I:f:")) != -1) {
 		switch (c) {
 		case 'W':
 			if (master_workload >= 0) {
@@ -2687,6 +2697,9 @@ int main(int argc, char **argv)
 		case 'I':
 			master_prng = strtol(optarg, NULL, 0);
 			break;
+		case 'f':
+			scale_arg = atof(optarg);
+			break;
 		case 'h':
 			print_help();
 			goto out;
@@ -2744,7 +2757,7 @@ int main(int argc, char **argv)
 
 	if (append_workload_arg) {
 		struct w_arg arg = { NULL, append_workload_arg, 0 };
-		app_w = parse_workload(&arg, flags, NULL);
+		app_w = parse_workload(&arg, flags, scale_arg, NULL);
 		if (!app_w) {
 			wsim_err("Failed to parse append workload!\n");
 			goto err;
@@ -2762,7 +2775,7 @@ int main(int argc, char **argv)
 			goto err;
 		}
 
-		wrk[i] = parse_workload(&w_args[i], flags, app_w);
+		wrk[i] = parse_workload(&w_args[i], flags, scale_arg, app_w);
 		if (!wrk[i]) {
 			wsim_err("Failed to parse workload %u!\n", i);
 			goto err;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 07/10] gem_wsim: Log max and active working set sizes in verbose mode
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 06/10] gem_wsim: Support scaling workload batch durations Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 17:07   ` Chris Wilson
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 08/10] gem_wsim: Snippet of a workload extracted from carchase Tvrtko Ursulin
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

It is useful to know how much memory workload is allocating.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 100 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 95 insertions(+), 5 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 025385a144b8..96ee923fb699 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -852,7 +852,8 @@ static uint64_t engine_list_mask(const char *_str)
 	return mask;
 }
 
-static void allocate_working_set(struct workload *wrk, struct working_set *set);
+static unsigned long
+allocate_working_set(struct workload *wrk, struct working_set *set);
 
 static long __duration(long dur, double scale)
 {
@@ -1270,8 +1271,14 @@ add_step:
 	 * Allocate shared working sets.
 	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if (w->type == WORKINGSET && w->working_set.shared)
-			allocate_working_set(wrk, &w->working_set);
+		if (w->type == WORKINGSET && w->working_set.shared) {
+			unsigned long total =
+				allocate_working_set(wrk, &w->working_set);
+
+			if (verbose > 1)
+				printf("%u: %lu bytes in shared working set %u\n",
+				       wrk->id, total, w->working_set.id);
+		}
 	}
 
 	wrk->max_working_set_id = -1;
@@ -1731,8 +1738,10 @@ get_buffer_size(struct workload *wrk, const struct work_buffer_size *sz)
 		       (sz->max + 1 - sz->min);
 }
 
-static void allocate_working_set(struct workload *wrk, struct working_set *set)
+static unsigned long
+allocate_working_set(struct workload *wrk, struct working_set *set)
 {
+	unsigned long total = 0;
 	unsigned int i;
 
 	set->handles = calloc(set->nr, sizeof(*set->handles));
@@ -1741,7 +1750,82 @@ static void allocate_working_set(struct workload *wrk, struct working_set *set)
 	for (i = 0; i < set->nr; i++) {
 		set->sizes[i].size = get_buffer_size(wrk, &set->sizes[i]);
 		set->handles[i] = alloc_bo(fd, set->sizes[i].size);
+		total += set->sizes[i].size;
+	}
+
+	return total;
+}
+
+static bool
+find_dep(struct dep_entry *deps, unsigned int nr, struct dep_entry dep)
+{
+	unsigned int i;
+
+	for (i = 0; i < nr; i++) {
+		if (deps[i].working_set == dep.working_set &&
+		    deps[i].target == dep.target)
+			return true;
 	}
+
+	return false;
+}
+
+static void measure_active_set(struct workload *wrk)
+{
+	unsigned long total = 0, batch_sizes = 0;
+	struct dep_entry *deps = NULL;
+	unsigned int nr = 0, i, j;
+	struct w_step *w;
+
+	if (verbose < 3)
+		return;
+
+	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+		if (w->type != BATCH)
+			continue;
+
+		batch_sizes += w->bb_sz;
+
+		for (j = 0; j < w->data_deps.nr; j++) {
+			struct dep_entry *dep = &w->data_deps.list[j];
+			struct dep_entry _dep = *dep;
+
+			if (dep->working_set == -1 && dep->target < 0) {
+				int idx = w->idx + dep->target;
+
+				igt_assert(idx >= 0 && idx < w->idx);
+				igt_assert(wrk->steps[idx].type == BATCH);
+
+				_dep.target = wrk->steps[idx].obj[0].handle;
+			}
+
+			if (!find_dep(deps, nr, _dep)) {
+				if (dep->working_set == -1) {
+					total += 4096;
+				} else {
+					struct working_set *set;
+
+					igt_assert(dep->working_set <=
+						   wrk->max_working_set_id);
+
+					set = wrk->working_sets[dep->working_set];
+					igt_assert(set->nr);
+					igt_assert(dep->target < set->nr);
+					igt_assert(set->sizes[dep->target].size);
+
+					total += set->sizes[dep->target].size;
+				}
+
+				deps = realloc(deps, (nr + 1) * sizeof(*deps));
+				deps[nr++] = *dep;
+			}
+		}
+	}
+
+	free(deps);
+
+	printf("%u: %lu bytes active working set in %u buffers. %lu in batch buffers.\n",
+	       wrk->id, total, nr, batch_sizes);
 }
 
 #define alloca0(sz) ({ size_t sz__ = (sz); memset(alloca(sz__), 0, sz__); })
@@ -1750,6 +1834,7 @@ static int
 prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 {
 	struct working_set **sets;
+	unsigned long total = 0;
 	uint32_t share_vm = 0;
 	int max_ctx = -1;
 	struct w_step *w;
@@ -1990,9 +2075,12 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		if (w->type == WORKINGSET && !w->working_set.shared)
-			allocate_working_set(wrk, &w->working_set);
+			total += allocate_working_set(wrk, &w->working_set);
 	}
 
+	if (verbose > 2)
+		printf("%u: %lu bytes in working sets.\n", wrk->id, total);
+
 	/*
 	 * Map of working set ids.
 	 */
@@ -2040,6 +2128,8 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		alloc_step_batch(wrk, w, flags);
 	}
 
+	measure_active_set(wrk);
+
 	return 0;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 08/10] gem_wsim: Snippet of a workload extracted from carchase
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 07/10] gem_wsim: Log max and active working set sizes in verbose mode Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 17:45   ` Chris Wilson
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 09/10] gem_wsim: Implement device selection Tvrtko Ursulin
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 10/10] gem_wsim: Fix calibration handling Tvrtko Ursulin
  9 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Some frames from the middle of a demo with corresponding buffers.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/wsim/carchasepart.wsim | 184 ++++++++++++++++++++++++++++++
 1 file changed, 184 insertions(+)
 create mode 100644 benchmarks/wsim/carchasepart.wsim

diff --git a/benchmarks/wsim/carchasepart.wsim b/benchmarks/wsim/carchasepart.wsim
new file mode 100644
index 000000000000..4407d0ef47dc
--- /dev/null
+++ b/benchmarks/wsim/carchasepart.wsim
@@ -0,0 +1,184 @@
+w.0.118n8192
+w.1.69n12288
+w.10.145n131072
+w.11.1n163840
+w.12.3n196608
+w.13.2n229376
+w.14.2n262144
+w.15.7n327680
+w.16.2n393216
+w.17.9n458752
+w.18.30n524288
+w.19.1n655360
+w.2.74n16384
+w.20.2n917504
+w.21.1n1048576
+w.22.33n1310720
+w.23.1n1572864
+w.24.24n1835008
+w.25.117n2097152
+w.26.1n2621440
+w.27.2n3670016
+w.28.4n4194304
+w.29.3n6291456
+w.3.123n20480
+w.30.1n7340032
+w.31.1n8388608
+w.32.20n10485760
+w.33.4n12582912
+w.34.3n14680064
+w.35.1n25165824
+w.4.19n24576
+w.5.2n32768
+w.6.2n40960
+w.7.4n49152
+w.8.2n65536
+w.9.9n81920
+1.RCS.5736.r0-58/r3-80/r22-31/r3-41/r3-42/r9-4/w27-1/r3-18/r22-1/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/w28-1/r3-93/w32-14/r32-0/r33-2/w32-16.0
+2.RCS.1000.r9-4/r3-41/r3-42/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5950.r9-4/r33-2/r3-6/r22-31/r3-42/w0-103/w0-106/r3-41/r2-54.0
+1.RCS.14562.r0-58/r3-80/r22-31/r3-41/r3-42/r9-4/w32-18/r3-18/r22-1/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-106/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6494.r3-80/r22-31/r3-41/r3-42/r9-4/w9-5/r0-71/r0-58/w32-19/r3-18/r22-1/r3-78/r3-88/r22-5/r22-14/r22-6/r22-29/r22-7/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-60/r22-20/r25-99/r22-16/r21-0/r22-21/r0-97/r22-27/r0-98/r22-4/r0-84/r3-47/w28-2/r0-107/w8-0/r0-48.0
+d.36394
+2.RCS.2207.r9-4/r28-2/r3-42/r22-31/w8-0/r2-47/r20-0/w22-23/r3-47/r3-95/r0-58/w22-10/r3-80.0
+d.17275
+1.RCS.1000.r3-47/w27-0/r0-58/r3-80/r22-31/r3-42/r9-4/w34-0/r3-18/r3-41/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-67/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-45/r3-110/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-85/r25-48/r3-12/r25-104/r24-23/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r3-49/r3-103/r22-6/r3-68/r3-112/r22-29/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-67/r3-37/r25-0/r22-7/r25-59/r25-71/r25-101/r25-75/r25-20/r25-91/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r18-19/r18-26/r18-21/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r22-21/r25-22/r3-29/r25-93/r18-2/r18-14/r18-3/r22-10/r18-23/r18-7/r18-11/r3-73/r8-0/r25-92/r25-41/w33-3/r0-107/w19-0.
 0
+d.2101
+1.RCS.32763.r9-4/r3-18/r3-47/r22-3/r22-31/w19-0/r3-98/r33-3/r3-6/r32-3/r3-78/r3-80/r3-73/r0-107/r0-58/r3-110/w31-0/w33-2/r32-19/r3-114/w32-4/w29-1/w32-8/r9-5/r32-17/r32-1/r30-0/r3-104/r35-0/r34-1/r34-0/r3-92/r34-2/r22-12/r22-1/w32-0/r3-75/r3-118/r3-70/r3-1/r3-113/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/r3-8/r4-5/r3-85/r3-115/r3-55/r3-13/r3-59/r22-2.0
+d.1788
+2.RCS.33594.r9-4/r0-26/r3-110/w7-2/r3-47/r0-109/r0-75/w6-1/r0-29/w7-0/r0-68/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.5573.r0-58/r3-80/r22-31/r3-47/r3-110/r9-4/w27-1/r3-18/r22-1/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/w28-1/r3-93/r3-60/w32-14/r32-0/r33-2/w32-16.0
+2.RCS.1000.r9-4/r3-60/r3-110/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5826.r9-4/r33-2/r3-6/r22-31/r3-110/w0-103/w0-22/r3-60/r2-54.0
+1.RCS.14378.r0-58/r3-80/r22-31/r3-60/r3-110/r9-4/w32-18/r3-18/r3-47/r22-1/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-22/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6218.r3-80/r22-31/r3-60/r3-110/r9-4/w9-5/r0-53/r0-58/w32-19/r3-18/r3-47/r22-1/r3-78/r3-88/r22-5/r22-14/r22-6/r22-29/r22-7/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-115/r22-20/r25-99/r22-16/r21-0/r22-21/r0-35/r22-27/r0-19/r22-4/r0-92/w28-2/r0-107/r3-0/w8-0/r0-76.0
+d.36049
+2.RCS.2085.r9-4/r28-2/r3-0/r22-31/w8-0/r2-9/r20-0/w22-23/r3-60/r3-95/r0-58/w22-10/r3-80.0
+d.12310
+1.RCS.1000.r3-60/w27-0/r0-58/r3-80/r22-31/r3-0/r9-4/w34-0/r3-18/r3-47/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-88/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-71/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-85/r25-48/r3-12/r25-104/r24-23/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r3-49/r3-103/r22-6/r3-68/r3-112/r22-29/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-67/r3-37/r25-0/r22-7/r25-59/r25-71/r25-101/r25-75/r25-20/r25-91/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r18-19/r18-26/r18-21/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r22-21/r25-22/r3-29/r25-93/r18-2/r18-14/r18-3/r22-10/r18-23/r18-7/r18-11/r3-73/r8-0/r25-92/r25-41/w33-3/r0-107/w19-0.0
+d.3238
+1.RCS.32454.r9-4/r3-18/r3-60/r22-3/r22-31/w19-0/r3-98/r33-3/r3-6/r32-3/r3-78/r3-80/r3-73/r0-107/r0-58/r3-0/w31-0/w33-2/r32-19/r3-114/w32-4/w29-1/r3-74/w32-8/r9-5/r32-17/r32-1/r30-0/r3-104/r35-0/r34-1/r34-0/r3-92/r34-2/r22-12/r22-1/w32-0/r3-75/r3-118/r3-70/r3-1/r3-113/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/r3-8/r4-5/r3-85/r3-115/r3-55/r3-13/r3-59/r22-2.0
+d.3160
+2.RCS.33268.r9-4/r0-60/r3-0/w7-2/r3-60/r0-109/r3-74/r0-97/w6-1/r0-98/w7-0/r0-84/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.5675.r0-58/r3-80/r22-31/r3-74/r3-0/r9-4/w27-1/r3-18/r3-60/r22-1/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/w28-1/r3-93/w32-14/r32-0/r33-2/w32-16.0
+d.1148
+2.RCS.1000.r9-4/r3-74/r3-0/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5911.r9-4/r33-2/r3-6/r22-31/r3-0/w0-103/w0-48/r3-74/r2-54.0
+1.RCS.13981.r0-58/r3-80/r22-31/r3-74/r3-0/r9-4/w32-18/r3-18/r3-60/r22-1/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-48/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6208.r3-80/r22-31/r3-74/r3-0/r9-4/w9-5/r0-67/r0-58/w32-19/r3-18/r3-60/r22-1/r3-78/r3-88/r22-5/r22-14/r22-6/r22-29/r22-7/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-45/r22-20/r25-99/r22-16/r21-0/r0-64/r3-42/r22-21/r22-27/r0-42/r22-4/r0-52/w28-2/r0-107/w8-0/r0-87.0
+d.41148
+2.RCS.2070.r9-4/r28-2/r3-42/r22-31/w8-0/r2-47/r20-0/w22-23/r3-74/r3-95/r0-58/w22-10/r3-80.0
+d.18190
+1.RCS.1000.r3-74/w27-0/r0-58/r3-80/r22-31/r3-42/r9-4/w34-0/r3-18/r3-60/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-106/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-53/r3-31/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-85/r25-48/r3-12/r25-104/r24-23/r3-49/r3-103/r3-96/r22-6/r22-14/r3-87/r3-108/r3-26/r22-5/r22-29/r3-68/r3-112/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-67/r3-37/r25-0/r25-59/r25-71/r25-101/r25-75/r25-20/r25-91/r22-7/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r18-19/r18-26/r18-21/r22-21/r25-22/r3-29/r25-93/r22-10/r18-23/r18-7/r18-11/r3-73/r8-0/r25-92/r25-41/w33-3/r0-107/w19-0.0
+d.1071
+1.RCS.31421.r0-58/r3-80/r22-31/r3-31/r3-42/r9-4/w31-0/r3-18/r22-3/r3-98/r19-0/r3-6/w33-2/r32-19/r3-114/r0-107/w32-4/r3-78/w29-1/w32-8/r9-5/r32-17/r32-3/r32-1/r30-0/r3-104/r35-0/r34-1/r34-0/r3-92/r34-2/r22-12/r22-1/w32-0/r3-75/r3-118/r3-70/r3-1/r3-113/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/r3-8/r4-5/r3-85/r3-115/r3-55/r3-13/r3-59/r22-2.0
+d.1795
+2.RCS.32222.r9-4/r0-115/r3-42/w7-2/r3-74/r0-109/r3-31/r0-35/w6-1/r0-19/w7-0/r0-92/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.5663.r0-58/r3-80/r22-31/r3-31/r3-42/r9-4/w27-1/r3-18/r22-1/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/w28-1/r3-93/r3-110/w32-14/r32-0/r33-2/w32-16.0
+2.RCS.1000.r9-4/r3-31/r3-110/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5878.r9-4/r33-2/r3-6/r22-31/r3-110/w0-103/w0-76/r3-31/r2-54.0
+1.RCS.14059.r0-58/r3-80/r22-31/r3-31/r3-110/r9-4/w32-18/r3-18/r22-1/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-76/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6168.r3-80/r22-31/r3-31/r3-110/r9-4/w9-5/r0-88/r0-58/w32-19/r3-18/r22-1/r3-78/r3-88/r22-5/r22-14/r22-6/r22-29/r22-7/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-71/r22-20/r25-99/r22-16/r21-0/r0-26/r22-21/r0-75/r3-20/r22-4/r22-27/r0-29/w28-2/r0-107/w8-0/r0-68.0
+d.32708
+2.RCS.2145.r9-4/r28-2/r3-110/r22-31/w8-0/r2-9/r20-0/w22-23/r3-20/r3-95/r0-58/w22-10/r3-80.0
+d.11843
+1.RCS.1000.r3-20/w27-0/r0-58/r3-80/r22-31/r3-110/r9-4/w34-0/r3-18/r3-31/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-22/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-67/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-85/r25-48/r3-12/r25-104/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r22-29/r3-68/r3-112/r3-103/r3-49/r22-6/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-59/r25-71/r25-101/r25-67/r3-37/r25-0/r22-7/r25-75/r25-20/r25-91/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r18-19/r18-26/r18-21/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r9-2/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r22-21/r25-82/r25-77/r25-33/r25-22/r3-29/r25-93/r22-10/r18-23/r18-7/r18-11/r3-73/r8-0/r25-92/r25-41/w33-3/r0-107/w19-0.0
+d.3354
+1.RCS.31662.r0-58/r3-80/r22-31/r3-20/r3-110/r9-2/w31-0/r3-18/r22-3/r3-98/r19-0/r3-6/w33-2/r32-19/r3-114/r0-107/w32-4/r3-78/w29-1/w32-8/r9-5/r32-17/r32-3/r32-1/r30-0/r3-104/r35-0/r34-1/r34-0/r3-92/r34-2/r22-12/r22-1/w32-0/r3-75/r3-118/r3-70/r3-1/r3-113/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/r3-8/r4-5/r3-85/r3-115/r3-55/r3-13/r3-59/r22-2.0
+d.2763
+2.RCS.32465.r0-58/r9-2/r0-45/r3-110/w7-2/r3-20/r0-109/r0-64/w6-1/r0-42/w7-0/r0-52/w7-1/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.5493.r0-58/r3-80/r22-31/r3-20/r3-110/r9-2/w27-1/r3-18/r22-1/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/r3-0/w28-1/r3-93/r3-41/w32-14/r32-0/r33-2/w32-16.0
+2.RCS.1000.r9-2/r3-41/r3-0/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5680.r9-2/r33-2/r3-6/r22-31/r3-0/w0-103/w0-87/r3-41/r2-54.0
+1.RCS.13742.r0-58/r3-80/r22-31/r3-41/r3-0/r9-2/w32-18/r3-18/r3-20/r22-1/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-87/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.5993.r3-80/r22-31/r3-41/r3-0/r9-2/w9-5/r0-106/r0-58/w32-19/r3-18/r3-20/r22-1/r3-78/r3-88/r22-5/r22-14/r22-6/r22-29/r22-7/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-53/r22-20/r25-99/r22-16/r21-0/r0-60/r22-21/r0-97/r22-27/r0-98/r22-4/w28-2/r0-107/w8-0/r0-84.0
+d.37477
+2.RCS.1875.r9-2/r28-2/r3-0/r22-31/w8-0/r2-47/r20-0/w22-23/r3-41/r3-95/r0-58/w22-10/r3-80.0
+d.12907
+1.RCS.1000.r3-41/w27-0/r0-58/r3-80/r22-31/r3-0/r9-2/w34-0/r3-18/r3-20/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-48/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-88/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r22-29/r3-68/r3-112/r3-103/r3-49/r3-12/r22-6/r3-85/r25-104/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-59/r25-71/r25-101/r25-67/r3-37/r25-0/r25-75/r25-20/r25-91/r22-7/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r18-19/r18-26/r18-21/r22-21/r25-82/r25-77/r25-33/r25-22/r3-29/r25-93/r22-10/r25-110/r25-62/r25-72/r22-9/r8-0/r18-23/r18-7/r18-11/r3-73/r25-92/r25-41/w33-3/r0-10
 7/w19-0/w31-0.0
+d.3000
+1.RCS.33546.r9-2/r3-18/r3-41/r22-3/r22-31/w31-0/r3-98/r19-0/r3-6/w33-2/r32-19/r3-0/r3-80/r3-114/r0-107/r0-58/w32-4/r3-47/r3-78/r3-42/w29-1/w32-8/r9-5/r32-17/r32-3/r32-1/r30-0/r3-104/r35-0/r34-1/r34-0/r3-92/r34-2/r22-12/r22-1/w32-0/r3-75/r3-118/r3-70/r3-1/r3-113/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/r3-8/r4-5/r3-85/r3-115/r3-55/r3-13/r3-59/r22-2.0
+d.3250
+2.RCS.34417.r9-2/r0-71/r3-42/w7-2/r3-41/r0-109/r3-47/r0-26/w6-1/r0-75/w7-0/r0-29/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.5575.r0-58/r3-80/r22-31/r3-47/r3-42/r9-2/w27-1/r3-18/r3-41/r22-1/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/w28-1/r3-93/w32-14/r32-0/r33-2/w32-16.0
+2.RCS.1000.r9-2/r3-47/r3-42/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5743.r9-2/r33-2/r3-6/r22-31/r3-42/w0-103/w0-68/r3-47/r2-54.0
+1.RCS.13966.r0-58/r3-80/r22-31/r3-47/r3-42/r9-2/w32-18/r3-18/r3-41/r22-1/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-68/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6056.r3-80/r22-31/r3-47/r3-42/r9-2/w9-5/r0-22/r0-58/w32-19/r3-18/r3-41/r22-1/r3-78/r3-88/r22-5/r22-14/r22-6/r22-29/r22-7/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-67/r22-20/r25-99/r22-16/r21-0/r0-115/r22-21/r0-35/r22-27/r0-19/w28-2/r0-107/w8-0/r0-92.0
+d.36302
+2.RCS.1862.r9-2/r28-2/r3-42/r22-31/w8-0/r2-9/r20-0/w22-23/r3-47/r3-95/r0-58/w22-10/r3-80.0
+d.15703
+1.RCS.1000.r3-47/w27-0/r0-58/r3-80/r22-31/r3-42/r9-2/w34-0/r3-18/r3-41/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-76/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-106/r3-60/w32-17/r3-27/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r22-29/r3-68/r3-112/r3-103/r3-49/r3-12/r22-6/r3-85/r25-104/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-59/r25-71/r25-101/r25-67/r3-37/r25-0/r25-75/r25-20/r25-91/r22-7/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r18-19/r18-26/r18-21/r22-21/r25-82/r25-77/r25-33/r25-22/r3-29/r25-93/r22-10/r25-110/r25-62/r25-72/r22-9/r8-0/r18-23/r18-7/r18-11/r3-73/r25-92/r25-
 41/w33-3/r0-107/w19-0/w31-0.0
+d.2669
+1.RCS.33805.r9-2/r3-18/r3-60/r22-3/r22-31/w31-0/r3-98/r19-0/r3-6/w33-2/r32-19/r3-27/r3-80/r3-114/r0-107/r0-58/w32-4/r3-78/w29-1/w32-8/r9-5/r32-17/r32-3/r32-1/r30-0/r3-104/r35-0/r34-1/r34-0/r3-92/r34-2/r22-12/r22-1/w32-0/r3-75/r3-118/r3-70/r3-1/r3-113/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/r3-8/r4-5/r3-85/r3-115/r3-55/r3-13/r3-59/r22-2.0
+d.2564
+2.RCS.34680.r9-2/r0-53/r3-27/w7-2/r3-47/r0-109/r3-60/r0-60/w6-1/r0-97/w7-0/r0-98/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.5597.r0-58/r3-80/r22-31/r3-60/r3-27/r9-2/w27-1/r3-18/r22-1/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/w28-1/r3-93/w32-14/r32-0/r33-2/w32-16.0
+2.RCS.1000.r9-2/r3-60/r3-27/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5780.r9-2/r33-2/r3-6/r22-31/r3-27/w0-103/w0-84/r3-60/r2-54.0
+1.RCS.14008.r0-58/r3-80/r22-31/r3-60/r3-27/r9-2/w32-18/r3-18/r22-1/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-84/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6132.r3-80/r22-31/r3-60/r3-27/r9-2/w9-5/r0-48/r0-58/w32-19/r3-18/r22-1/r3-78/r3-88/r22-5/r22-14/r22-6/r22-29/r22-7/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-88/r22-20/r25-99/r22-16/r21-0/r0-45/r22-21/r0-64/r3-74/r22-27/r0-42/w28-2/r0-107/w8-0/r0-52.0
+d.37446
+2.RCS.1900.r9-2/r28-2/r3-27/r22-31/w8-0/r2-47/r20-0/w22-23/r3-74/r3-95/r0-58/w22-10/r3-80.0
+d.18811
+1.RCS.1000.r3-74/w27-0/r0-58/r3-80/r22-31/r3-27/r9-2/w34-0/r3-18/r3-60/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-87/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r3-110/r0-22/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r22-29/r3-68/r3-112/r3-103/r3-49/r3-12/r22-6/r3-85/r25-104/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-59/r25-71/r25-101/r25-67/r3-37/r25-0/r25-75/r25-20/r25-91/r22-7/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r18-19/r18-26/r18-21/r22-21/r25-82/r25-77/r25-33/r25-22/r3-29/r25-93/r22-10/r25-110/r25-62/r25-72/r22-9/r8-0/r18-23/r18-7/r18-11/r3-73/r25-92/r25-41/w33
 -3/r0-107/w19-0/w31-0.0
+d.1865
+1.RCS.34308.r9-2/r3-18/r3-74/r22-3/r22-31/w31-0/r3-98/r19-0/r3-6/w33-2/r32-19/r3-110/r3-80/r3-114/r0-107/r0-58/w32-4/r3-78/w29-1/w32-8/r9-5/r32-17/r32-3/r32-1/r30-0/r3-104/r35-0/r34-1/r34-0/r3-92/r34-2/r22-12/r22-1/w32-0/r3-75/r3-118/r3-70/r3-1/r3-113/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/r3-8/r4-5/r3-85/r3-115/r3-55/r3-13/r3-59/r22-2.0
+d.1284
+2.RCS.35212.r9-2/r0-67/r3-110/w7-2/r3-74/r0-109/r0-115/w6-1/r0-35/w7-0/r0-19/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.5686.r0-58/r3-80/r22-31/r3-74/r3-110/r9-2/w27-1/r3-18/r22-1/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/w28-1/r3-31/r3-93/w32-14/r32-0/r33-2/w32-16.0
+2.RCS.1000.r9-2/r3-31/r3-110/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5791.r9-2/r33-2/r3-6/r22-31/r3-110/w0-103/w0-92/r3-31/r2-54.0
+1.RCS.14049.r0-58/r3-80/r22-31/r3-31/r3-110/r9-2/w32-18/r3-18/r3-74/r22-1/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-92/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6085.r3-80/r22-31/r3-31/r3-110/r9-2/w9-5/r0-76/r0-58/w32-19/r3-18/r3-74/r22-1/r3-78/r3-88/r22-5/r22-14/r22-6/r22-29/r22-7/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-106/r22-20/r25-99/r22-16/r21-0/r0-71/r0-26/r22-27/r0-75/w28-2/r0-107/r3-0/w8-0/r0-29.0
+d.34977
+2.RCS.1794.r9-2/r28-2/r3-0/r22-31/w8-0/r2-9/r20-0/w22-23/r3-31/r3-95/r0-58/w22-10/r3-80.0
+d.11049
+1.RCS.32894.r3-31/w27-0/r0-58/r3-80/r22-31/r3-0/r9-2/w34-0/r3-18/r3-74/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-68/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-48/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r22-29/r3-68/r3-112/r3-103/r3-49/r3-12/r22-6/r3-85/r25-104/r22-28/r25-14/r25-44/r25-19/r3-67/r18-17/r4-5/r18-4/r3-2/r3-117/r3-33/r22-2/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r22-7/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r18-19/r18-26/r18-21/r25-105/r25-61/r25-11/r25-75/r25-20/r25-91/r25-51/r25-64/r25-70/r25-55/r25-66/r25-24/r25-111/r3-66/r25-68/r25-86/r25-26/r22-21/r25-82/r25-77/r25-33/r22-10/r25-110/r25-62/r25-72/r22-9/r8-0/r18-23/r18-7/r18-11/r3-73/r25-92/r25-41/w33-3/r0-107/w19-0/w31-0/r32-19/r3-20/r3-114/w32-4/w29-1/w32-8/r35-0/r3
 4-1/r3-92/w32-0.0
+d.5753
+2.RCS.33634.r9-2/r0-88/r3-0/w7-2/r3-31/r0-109/r3-20/r0-45/w6-1/r0-64/w7-0/r0-42/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+d.1325
+1.RCS.6527.r9-2/r3-18/r3-31/r3-20/r22-1/r22-31/r9-5/r3-78/w32-0/r3-98/r3-75/r3-104/r3-118/r3-6/r3-70/r30-0/r35-0/r34-1/r34-0/r32-19/r3-0/r3-1/r3-113/w33-2/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/r0-58/r3-80/w27-1/r0-32/r32-8/r0-107/w28-0/r3-91/w28-1/r3-93/w32-14/w32-16.0
+d.2261
+2.RCS.1000.r9-2/r3-20/r3-0/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.6788.r9-2/r33-2/r3-6/r22-31/r3-0/w0-103/w0-52/r3-20/r2-54.0
+1.RCS.13974.r0-58/r3-80/r22-31/r3-20/r3-0/r9-2/w32-18/r3-18/r3-31/r22-14/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-52/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6098.r3-80/r22-31/r3-20/r3-0/r9-2/w9-5/r0-87/r0-58/w32-19/r3-18/r3-31/r22-14/r3-78/r3-42/r3-88/r22-5/r22-6/r22-29/r22-7/r22-1/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-22/r22-20/r25-99/r22-16/r21-0/r0-53/r0-60/r0-97/w28-2/r0-107/w8-0/r0-98.0
+d.39039
+2.RCS.1753.r9-2/r28-2/r3-42/r22-31/w8-0/r2-47/r20-0/w22-23/r3-20/r3-95/r0-58/w22-10/r3-80.0
+d.14507
+1.RCS.31161.r3-20/w27-0/r0-58/r3-80/r22-31/r3-42/r9-2/w34-0/r3-18/r3-31/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-84/r25-40/r3-90/r22-20/r0-76/r3-41/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-87/r3-108/r3-26/r3-96/r22-29/r22-14/r3-68/r3-112/r3-103/r22-5/r3-49/r3-12/r22-6/r3-85/r25-14/r3-6/r25-104/r22-28/r25-44/r25-19/r3-67/r18-17/r4-5/r18-4/r25-111/r3-66/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r22-7/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r18-19/r18-26/r18-21/r3-2/r3-117/r3-33/r22-2/r25-51/r25-64/r25-70/r25-105/r25-61/r25-11/r25-75/r25-20/r25-91/r25-55/r25-66/r25-24/r25-68/r25-86/r25-26/r22-21/r25-82/r25-77/r25-33/r22-10/r25-110/r25-62/r25-72/r22-9/r8-0/r18-23/r18-7/r18-11/r3-73/r25-99/r25-73/r25-92/r25-41/w33-3/r0-107/w19-0/w31-0/r32-19/r3-114/w32-4/w29-1/w32-8/r35-0/r
 34-1/r3-92.0
+d.3932
+2.RCS.31979.r9-2/r0-106/r3-42/w7-2/r3-20/r0-109/r3-41/r0-71/w6-1/r0-26/w7-0/r0-75/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+d.1568
+1.RCS.6364.r0-58/r3-80/r22-31/r3-41/r3-42/r9-2/w32-0/r3-18/r22-1/r9-5/r3-78/r3-98/r3-75/r3-104/r3-118/r3-6/r3-70/r30-0/r35-0/r34-1/r34-0/r32-19/r3-1/r3-113/w33-2/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/w27-1/r0-32/r32-8/r0-107/w28-0/r3-91/w28-1/r3-93/r3-27/w32-14/w32-16.0
+2.RCS.1000.r9-2/r3-41/r3-27/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.6617.r9-2/r33-2/r3-6/r22-31/r3-27/w0-103/w0-29/r3-41/r2-54.0
+1.RCS.13317.r0-58/r3-80/r22-31/r3-41/r3-27/r9-2/w32-18/r3-18/r22-14/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-29/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6237.r3-80/r22-31/r3-41/r3-27/r9-2/w9-5/r0-68/r0-58/w32-19/r3-18/r22-14/r3-78/r3-88/r22-5/r22-6/r22-29/r22-7/r22-1/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-48/r22-20/r25-99/r22-16/r21-0/r0-67/r3-47/r0-115/r0-35/w28-2/r0-107/w8-0/r0-19.0
+d.38304
+2.RCS.1901.r9-2/r28-2/r3-27/r22-31/w8-0/r2-9/r20-0/w22-23/r3-47/r3-95/r0-58/w22-10/r3-80.0
+d.20018
+1.RCS.31362.r3-47/w27-0/r0-58/r3-80/r22-31/r3-27/r9-2/w34-0/r3-18/r3-41/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-92/r25-40/r3-90/r22-20/r0-87/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-87/r3-108/r3-26/r3-96/r22-29/r22-14/r3-68/r3-112/r3-103/r22-5/r3-49/r3-12/r22-6/r3-85/r25-14/r3-6/r25-104/r22-28/r24-23/r25-44/r25-19/r3-67/r18-17/r4-5/r18-4/r25-111/r3-66/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r22-7/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r18-19/r18-26/r18-21/r3-2/r3-117/r3-33/r22-2/r25-51/r25-64/r25-70/r25-105/r25-61/r25-11/r25-75/r25-20/r25-91/r25-55/r25-66/r25-24/r25-68/r25-86/r25-26/r22-21/r25-82/r25-77/r25-33/r22-10/r25-110/r25-62/r25-72/r22-9/r8-0/r18-23/r18-7/r18-11/r3-73/r25-99/r25-73/r25-92/r25-41/w33-3/r0-107/w19-0/w31-0/r32-19/r3-114/w32-4/w29-1/w32-8/r35-0/
 r34-1/r3-92.0
+d.1960
+2.RCS.32169.r9-2/r0-22/r3-27/w7-2/r3-47/r0-109/r0-53/w6-1/r0-60/w7-0/r0-97/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.6329.r0-58/r3-80/r22-31/r3-47/r3-27/r9-2/w32-0/r3-18/r22-1/r9-5/r3-78/r3-98/r3-75/r3-104/r3-118/r3-6/r3-70/r30-0/r35-0/r34-1/r34-0/r32-19/r3-1/r3-113/w33-2/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7/w27-1/r0-32/r32-8/r0-107/w28-0/r3-91/r3-110/w28-1/r3-60/r3-93/w32-14/w32-16.0
+2.RCS.1000.r9-2/r3-60/r3-110/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.6597.r9-2/r33-2/r3-6/r22-31/r3-110/w0-103/w0-98/r3-60/r2-54.0
+1.RCS.13180.r0-58/r3-80/r22-31/r3-60/r3-110/r9-2/w32-18/r3-18/r3-47/r22-14/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-98/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6083.r3-80/r22-31/r3-60/r3-110/r9-2/w9-5/r0-84/r0-58/w32-19/r3-18/r3-47/r22-14/r3-78/r3-88/r22-5/r22-6/r22-29/r22-7/r22-1/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-76/r22-20/r9-4/r25-99/r22-16/r21-0/r0-88/r0-45/r0-64/w28-2/r0-107/w8-0/r0-42.0
+d.31011
+2.RCS.1752.r0-58/r9-4/r28-2/r3-110/r22-31/w8-0/r2-47/r20-0/w22-23/r3-60/r3-95/w22-10/r3-80.0
+d.8789
+1.RCS.1000.r3-60/w27-0/r0-58/r3-80/r22-31/r3-110/r9-4/w34-0/r3-18/r3-47/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-52/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-68/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-85/r25-14/r3-12/r3-87/r3-108/r3-26/r3-96/r22-29/r22-14/r3-68/r3-112/r3-103/r22-5/r3-49/r22-6/r25-104/r22-28/r24-23/r25-44/r25-19/r3-67/r18-17/r4-5/r18-4/r25-111/r3-66/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r22-7/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r18-19/r18-26/r18-21/r3-2/r3-117/r3-33/r22-2/r25-51/r25-64/r25-70/r25-105/r25-61/r25-11/r25-75/r25-20/r25-91/r25-55/r25-66/r25-24/r25-68/r25-86/r25-26/r22-21/r25-82/r25-77/r25-33/r22-10/r25-110/r25-62/r25-72/r22-9/r8-0/r18-23/r18-7/r18-11/r3-73/r25-92/r25-41/w33-3/r0-107/w19-0/w31-0/r3-74/r32-19/r3-0/r3-114/w32-4/w29-1/w
 32-8.0
+d.3470
+1.RCS.32141.r9-4/r3-60/r3-18/r3-74/r22-3/r22-31/r9-5/w32-8/r3-98/r32-17/r3-6/r32-3/r32-1/w33-2/r29-1/r30-0/r3-104/r35-0/r34-1/r34-0/r3-78/r3-80/r3-92/r0-107/r34-2/r22-12/r22-1/r0-58/r3-0/w32-0/r3-75/r3-118/r3-70/r32-19/r3-1/r3-113/r22-29/r22-14/r3-109/r3-28/r3-79/r22-7.0
+d.2303
+2.RCS.33018.r9-4/r0-48/r3-0/w7-2/r3-60/r0-109/r3-74/r0-67/w6-1/r0-115/w7-0/r0-35/w7-1/r0-58/r32-8/r3-6/r22-31/w0-2/r3-113/r9-5/w0-32.0
+1.RCS.5660.r0-58/r3-80/r22-31/r3-74/r3-0/r9-4/w27-1/r3-18/r3-60/r22-14/r9-5/r0-32/r3-6/r32-8/r3-78/r3-113/r0-107/w28-0/r3-98/r3-91/w28-1/r3-93/w32-14/r32-0/r33-2/w32-16.0
+d.1494
+2.RCS.1000.r9-4/r3-74/r3-0/r32-16/r3-93/w4-16/r0-3/w3-122/r3-91/r0-58.0
+2.RCS.5873.r9-4/r33-2/r3-6/r22-31/r3-0/w0-103/w0-19/r3-74/r2-54.0
+d.1151
+1.RCS.12578.r0-58/r3-80/r22-31/r3-74/r3-0/r9-4/w32-18/r3-18/r3-60/r22-14/r9-5/r3-98/r32-14/r3-93/r33-2/r3-6/r32-16/r3-122/r3-106/r3-78/r2-54/r0-107/w32-7/w32-11/w32-2/r33-0/r25-116/r0-19/r16-0/r28-3/r0-3/r0-111/r22-32/r10-58/r4-13/r4-3/r4-1.0
+1.RCS.6159.r3-80/r22-31/r3-74/r3-0/r9-4/w9-5/r0-92/r0-58/w32-19/r3-18/r3-60/r22-14/r3-78/r3-88/r22-5/r22-6/r22-29/r22-7/r22-1/r22-12/r22-18/r22-28/r22-3/r22-0/r3-75/r3-104/w3-64/r0-87/r22-20/r25-99/r22-16/r21-0/r0-106/r0-71/r0-26/w28-2/r0-107/w8-0/r0-75.0
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 09/10] gem_wsim: Implement device selection
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 08/10] gem_wsim: Snippet of a workload extracted from carchase Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  2020-06-17 17:09   ` Chris Wilson
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 10/10] gem_wsim: Fix calibration handling Tvrtko Ursulin
  9 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

New command line options -L and -D <device> can respectively be used to
list and select a GPU when more than one is present.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 62 +++++++++++++++++++++++++++++++------------
 1 file changed, 45 insertions(+), 17 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 96ee923fb699..ca07b670bd42 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -43,6 +43,7 @@
 #include <pthread.h>
 #include <math.h>
 
+#include "igt_device_scan.h"
 #include "intel_chipset.h"
 #include "intel_reg.h"
 #include "drm.h"
@@ -2593,7 +2594,9 @@ static void print_help(void)
 "  -S                Synchronize the sequence of random batch durations between\n"
 "                    clients.\n"
 "  -d                Sync between data dependencies in userspace.\n"
-"  -f <scale>        Scale factor for batch durations."
+"  -f <scale>        Scale factor for batch durations.\n"
+"  -L                List GPUs.\n"
+"  -D <gpu>          One of the GPUs from -L."
 	);
 }
 
@@ -2654,30 +2657,31 @@ int main(int argc, char **argv)
 	char *append_workload_arg = NULL;
 	struct w_arg *w_args = NULL;
 	unsigned int tolerance_pct = 1;
+	enum igt_devices_print_type printtype = IGT_PRINT_SIMPLE;
+	bool list_devices_arg = false;
 	int exitcode = EXIT_FAILURE;
+	struct igt_device_card card;
 	double scale_arg = 1.0f;
+	char *device_arg = NULL;
 	int prio = 0;
 	double t;
-	int i, c;
+	int i, c, ret;
 	char *subopts, *value;
 	int raw_number = 0;
 	long calib_val;
 	int eng;
 
-	/*
-	 * Open the device via the low-level API so we can do the GPU quiesce
-	 * manually as close as possible in time to the start of the workload.
-	 * This minimizes the gap in engine utilization tracking when observed
-	 * via external tools like trace.pl.
-	 */
-	fd = __drm_open_driver_render(DRIVER_INTEL);
-	igt_require(fd);
-
 	master_prng = time(NULL);
 
 	while ((c = getopt(argc, argv,
-			   "ThqvsSdc:n:r:w:W:a:t:p:I:f:")) != -1) {
+			   "LThqvsSdc:n:r:w:W:a:t:p:I:f:D:")) != -1) {
 		switch (c) {
+		case 'L':
+			list_devices_arg = true;
+			break;
+		case 'D':
+			device_arg = strdup(optarg);
+			break;
 		case 'W':
 			if (master_workload >= 0) {
 				wsim_err("Only one master workload can be given!\n");
@@ -2798,6 +2802,33 @@ int main(int argc, char **argv)
 		}
 	}
 
+
+	igt_devices_scan(false);
+
+	if (list_devices_arg) {
+		igt_devices_print(printtype);
+		return EXIT_SUCCESS;
+	}
+
+	if (device_arg) {
+		ret = igt_device_card_match(device_arg, &card);
+		if (!ret) {
+			wsim_err("Requested device %s not found!\n", device_arg);
+			free(device_arg);
+
+			return EXIT_FAILURE;
+		}
+		free(device_arg);
+	} else {
+		igt_device_find_first_i915_discrete_card(&card);
+	}
+
+	if (card.render[0])
+		fd = igt_open_render(&card);
+	else
+		fd = __drm_open_driver_render(DRIVER_INTEL);
+	igt_require(fd);
+
 	if (!has_nop_calibration) {
 		if (verbose > 1) {
 			printf("Calibrating nop delays with %u%% tolerance...\n",
@@ -2908,15 +2939,12 @@ int main(int argc, char **argv)
 	clock_gettime(CLOCK_MONOTONIC, &t_start);
 
 	for (i = 0; i < clients; i++) {
-		int ret;
-
 		ret = pthread_create(&w[i]->thread, NULL, run_workload, w[i]);
 		igt_assert_eq(ret, 0);
 	}
 
 	if (master_workload >= 0) {
-		int ret = pthread_join(w[master_workload]->thread, NULL);
-
+		ret = pthread_join(w[master_workload]->thread, NULL);
 		igt_assert(ret == 0);
 
 		for (i = 0; i < clients; i++)
@@ -2925,7 +2953,7 @@ int main(int argc, char **argv)
 
 	for (i = 0; i < clients; i++) {
 		if (master_workload != i) {
-			int ret = pthread_join(w[i]->thread, NULL);
+			ret = pthread_join(w[i]->thread, NULL);
 			igt_assert(ret == 0);
 		}
 	}
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH i-g-t 10/10] gem_wsim: Fix calibration handling
  2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 09/10] gem_wsim: Implement device selection Tvrtko Ursulin
@ 2020-06-17 16:01 ` Tvrtko Ursulin
  9 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-17 16:01 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Intended use case was that run without arguments prints out the
calibrations which can be simply copied and pasted to the -n argument and
things should just work.

Two problems we need to solve: If the print out loops shows zero
calibrations (engine not present) they are later rejected and also if some
calibration is not given it is only an error if it needs to be used
(engine present).

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 20 ++------------------
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index ca07b670bd42..3bbb8fcbe17c 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -296,8 +296,8 @@ print_engine_calibrations(void)
 
 	printf("Nop calibration for %uus delay is: ", nop_calibration_us);
 	for (int i = 0; i < NUM_ENGINES; i++) {
-		/* skip DEFAULT and VCS engines */
-		if (i != DEFAULT && i != VCS) {
+		/* skip engines not present and DEFAULT and VCS */
+		if (i != DEFAULT && i != VCS && engine_calib_map[i]) {
 			if (first_entry) {
 				printf("%s=%lu", ring_str_map[i], engine_calib_map[i]);
 				first_entry = false;
@@ -2840,22 +2840,6 @@ int main(int argc, char **argv)
 		if (verbose)
 			print_engine_calibrations();
 		goto out;
-	} else {
-		bool missing = false;
-
-		for (i = 0; i < NUM_ENGINES; i++) {
-			if (i == VCS)
-				continue;
-
-			if (!engine_calib_map[i]) {
-				wsim_err("Missing calibration for '%s'!\n",
-					 ring_str_map[i]);
-				missing = true;
-			}
-		}
-
-		if (missing)
-			goto err;
 	}
 
 	if (!nr_w_args) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing Tvrtko Ursulin
@ 2020-06-17 16:07   ` Chris Wilson
  2020-06-18  7:14   ` Chris Wilson
  1 sibling, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2020-06-17 16:07 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:11)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Evaluation of userspace load balancing options was how this tool started
> but since we have settled on doing it in the kernel.
> 
> Tomorrow we will want to update the tool for new engine interfaces and all
> this legacy code will just be a distraction.
> 
> Rip out everything not related to explicit load balancing implemented via
> context engine maps and adjust the workloads to use it.

I'm still using busy & contexts to ground i915. Any chance of a
reprieve? At least for context with -R?

Or shall I just keep a pristine copy?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 06/10] gem_wsim: Support scaling workload batch durations
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 06/10] gem_wsim: Support scaling workload batch durations Tvrtko Ursulin
@ 2020-06-17 16:22   ` Chris Wilson
  2020-06-18  8:01     ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2020-06-17 16:22 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:16)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> -f <float> on the command line can be used to scale batch buffer durations
> in all parsed workloads.

But not the period?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 05/10] gem_wsim: Support random buffer sizes
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 05/10] gem_wsim: Support random buffer sizes Tvrtko Ursulin
@ 2020-06-17 16:31   ` Chris Wilson
  2020-06-18  8:06     ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2020-06-17 16:31 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:15)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> See README for more details.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  benchmarks/gem_wsim.c  | 71 +++++++++++++++++++++++++++++++++---------
>  benchmarks/wsim/README |  4 +++
>  2 files changed, 61 insertions(+), 14 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 5893de38a98e..c1405596c46a 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -117,12 +117,18 @@ struct bond {
>         enum intel_engine_id master;
>  };
>  
> +struct work_buffer_size {
> +       unsigned long size;
> +       unsigned long min;
> +       unsigned long max;
> +};
> +
>  struct working_set {
>         int id;
>         bool shared;
>         unsigned int nr;
>         uint32_t *handles;
> -       unsigned long *sizes;
> +       struct work_buffer_size *sizes;
>  };
>  
>  struct workload;
> @@ -203,6 +209,7 @@ struct workload
>         bool print_stats;
>  
>         uint32_t bb_prng;
> +       uint32_t bo_prng;
>  
>         struct timespec repeat_start;
>  
> @@ -757,10 +764,12 @@ static int add_buffers(struct working_set *set, char *str)
>          * 4m
>          * 4g
>          * 10n4k - 10 4k batches
> +        * 4096-16k - random size in range
>          */
> -       unsigned long *sizes, size;
> +       struct work_buffer_size *sizes;
> +       unsigned long min_sz, max_sz;
> +       char *n, *max = NULL;
>         unsigned int add, i;
> -       char *n;
>  
>         n = index(str, 'n');
>         if (n) {
> @@ -773,16 +782,34 @@ static int add_buffers(struct working_set *set, char *str)
>                 add = 1;
>         }
>  
> -       size = parse_size(str);
> -       if (!size)
> +       n = index(str, '-');
> +       if (n) {
> +               *n = 0;
> +               max = ++n;
> +       }
> +
> +       min_sz = parse_size(str);
> +       if (!min_sz)
>                 return -1;
>  
> +       if (max) {
> +               max_sz = parse_size(max);
> +               if (!max_sz)
> +                       return -1;
> +       } else {
> +               max_sz = min_sz;
> +       }
> +
>         sizes = realloc(set->sizes, (set->nr + add) * sizeof(*sizes));
>         if (!sizes)
>                 return -1;
>  
> -       for (i = 0; i < add; i++)
> -               sizes[set->nr + i] = size;
> +       for (i = 0; i < add; i++) {
> +               struct work_buffer_size *sz = &sizes[set->nr + i];
> +               sz->min = min_sz;
> +               sz->max = max_sz;
> +               sz->size = 0;
> +       }
>  
>         set->nr += add;
>         set->sizes = sizes;
> @@ -824,7 +851,7 @@ static uint64_t engine_list_mask(const char *_str)
>         return mask;
>  }
>  
> -static void allocate_working_set(struct working_set *set);
> +static void allocate_working_set(struct workload *wrk, struct working_set *set);
>  
>  #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
>         if ((field = strtok_r(fstart, ".", &fctx))) { \
> @@ -1177,10 +1204,12 @@ add_step:
>  
>         wrk->nr_steps = nr_steps;
>         wrk->steps = steps;
> +       wrk->flags = flags;
>         wrk->prio = arg->prio;
>         wrk->sseu = arg->sseu;
>         wrk->max_working_set_id = -1;
>         wrk->working_sets = NULL;
> +       wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
>  
>         free(desc);
>  
> @@ -1234,7 +1263,7 @@ add_step:
>          */
>         for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>                 if (w->type == WORKINGSET && w->working_set.shared)
> -                       allocate_working_set(&w->working_set);
> +                       allocate_working_set(wrk, &w->working_set);
>         }
>  
>         wrk->max_working_set_id = -1;
> @@ -1267,6 +1296,7 @@ clone_workload(struct workload *_wrk)
>         igt_assert(wrk);
>         memset(wrk, 0, sizeof(*wrk));
>  
> +       wrk->flags = _wrk->flags;
>         wrk->prio = _wrk->prio;
>         wrk->sseu = _wrk->sseu;
>         wrk->nr_steps = _wrk->nr_steps;

wrk->flags wasn't introduced in this patch, why are we needing to copy
them now.

I see wrk->bo_prn = flags & SYNC above, but I haven't seem them used
again later. They used to carry the balancer info and were setup in
main. Am I not mistaken in thinking they still are being set in main()
as well?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 02/10] gem_wsim: Buffer objects working sets and complex dependencies
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 02/10] gem_wsim: Buffer objects working sets and complex dependencies Tvrtko Ursulin
@ 2020-06-17 16:57   ` Chris Wilson
  2020-06-18  9:05     ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2020-06-17 16:57 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:12)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Add support for defining buffer object working sets and targetting them as
> data dependencies. For more information please see the README file.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  benchmarks/gem_wsim.c                   | 453 +++++++++++++++++++++---
>  benchmarks/wsim/README                  |  59 +++
>  benchmarks/wsim/cloud-gaming-60fps.wsim |  11 +
>  benchmarks/wsim/composited-ui.wsim      |   7 +
>  4 files changed, 476 insertions(+), 54 deletions(-)
>  create mode 100644 benchmarks/wsim/cloud-gaming-60fps.wsim
>  create mode 100644 benchmarks/wsim/composited-ui.wsim
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 02fe8f5a5e69..9e5bfe6a36d4 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -88,14 +88,21 @@ enum w_type
>         LOAD_BALANCE,
>         BOND,
>         TERMINATE,
> -       SSEU
> +       SSEU,
> +       WORKINGSET,
> +};
> +
> +struct dep_entry {
> +       int target;
> +       bool write;
> +       int working_set; /* -1 = step dependecy, >= 0 working set id */
>  };
>  
>  struct deps
>  {
>         int nr;
>         bool submit_fence;
> -       int *list;
> +       struct dep_entry *list;
>  };
>  
>  struct w_arg {
> @@ -110,6 +117,14 @@ struct bond {
>         enum intel_engine_id master;
>  };
>  
> +struct working_set {
> +       int id;
> +       bool shared;
> +       unsigned int nr;
> +       uint32_t *handles;
> +       unsigned long *sizes;
> +};
> +
>  struct workload;
>  
>  struct w_step
> @@ -143,6 +158,7 @@ struct w_step
>                         enum intel_engine_id bond_master;
>                 };
>                 int sseu;
> +               struct working_set working_set;
>         };
>  
>         /* Implementation details */
> @@ -193,6 +209,9 @@ struct workload
>         unsigned int nr_ctxs;
>         struct ctx *ctx_list;
>  
> +       struct working_set **working_sets; /* array indexed by set id */
> +       int max_working_set_id;
> +
>         int sync_timeline;
>         uint32_t sync_seqno;
>  
> @@ -281,11 +300,120 @@ print_engine_calibrations(void)
>         printf("\n");
>  }
>  
> +static void add_dep(struct deps *deps, struct dep_entry entry)
> +{
> +       deps->list = realloc(deps->list, sizeof(*deps->list) * (deps->nr + 1));
> +       igt_assert(deps->list);
> +
> +       deps->list[deps->nr++] = entry;
> +}
> +
> +
> +static int
> +parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
> +{
> +       struct dep_entry entry = { .working_set = -1 };
> +       bool submit_fence = false;
> +       char *s;
> +
> +       switch (str[0]) {
> +       case '-':
> +               if (str[1] < '0' || str[1] > '9')
> +                       return -1;
> +
> +               entry.target = atoi(str);
> +               if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
> +                       return -1;

add_dep for N steps ago, using a handle.

> +
> +               add_dep(&w->data_deps, entry);
> +
> +               break;
> +       case 's':
> +               submit_fence = true;
> +               /* Fall-through. */
> +       case 'f':
> +               /* Multiple fences not yet supported. */
> +               igt_assert_eq(w->fence_deps.nr, 0);
> +
> +               entry.target = atoi(++str);
> +               if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
> +                       return -1;
> +
> +               add_dep(&w->fence_deps, entry);
> +
> +               w->fence_deps.submit_fence = submit_fence;

add_dep for N steps ago, using the out-fence from that step
[A post processing steps adds emit_fence to the earlier steps.]

> +               break;
> +       case 'w':
> +               entry.write = true;

Got confused for a moment as I was expecting the submit_fence
fallthrough pattern.
> +               /* Fall-through. */
> +       case 'r':
> +               /*
> +                * [rw]N-<str>
> +                * r1-<str> or w2-<str>, where N is working set id.
> +                */
> +               s = index(++str, '-');
> +               if (!s)
> +                       return -1;
> +
> +               entry.working_set = atoi(str);

if (entry.working_set < 0)
	return -1;

> +
> +               if (parse_working_set_deps(w->wrk, &w->data_deps, entry, ++s))
> +                       return -1;

The new one...

> +static int
> +parse_working_set_deps(struct workload *wrk,
> +                      struct deps *deps,
> +                      struct dep_entry _entry,
> +                      char *str)
> +{
> +       /*
> +        * 1 - target handle index in the specified working set.
> +        * 2-4 - range
> +        */
> +       struct dep_entry entry = _entry;
> +       char *s;
> +
> +       s = index(str, '-');
> +       if (s) {
> +               int from, to;
> +
> +               from = atoi(str);
> +               if (from < 0)
> +                       return -1;
> +
> +               to = atoi(++s);
> +               if (to <= 0)
> +                       return -1;

if to < from, we add nothing. Is that worth the error?

> +
> +               for (entry.target = from; entry.target <= to; entry.target++)
> +                       add_dep(deps, entry);
> +       } else {
> +               entry.target = atoi(str);
> +               if (entry.target < 0)
> +                       return -1;
> +
> +               add_dep(deps, entry);


> +       }
> +
> +       return 0;
> +}
> +
> +               break;
> +       default:
> +               return -1;
> +       };
> +
> +       return 0;
> +}
> +
>  static int
>  parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
>  {
>         char *desc = strdup(_desc);
>         char *token, *tctx = NULL, *tstart = desc;
> +       int ret = 0;
> +
> +       if (!strcmp(_desc, "0"))
> +               goto out;

Hang on, what this special case?

>  
>         igt_assert(desc);
>         igt_assert(!w->data_deps.nr && w->data_deps.nr == w->fence_deps.nr);
>  static void __attribute__((format(printf, 1, 2)))
> @@ -624,6 +722,88 @@ static int parse_engine_map(struct w_step *step, const char *_str)
>         return 0;
>  }
>  
> +static unsigned long parse_size(char *str)
> +{
/* "1234567890[gGmMkK]" */
> +       const unsigned int len = strlen(str);
> +       unsigned int mult = 1;
> +
> +       if (len == 0)
> +               return 0;
> +
> +       switch (str[len - 1]) {

T? P? E? Let's plan ahead! :)

> +       case 'g':
> +       case 'G':
> +               mult *= 1024;
> +               /* Fall-throuogh. */
> +       case 'm':
> +       case 'M':
> +               mult *= 1024;
> +               /* Fall-throuogh. */
> +       case 'k':
> +       case 'K':
> +               mult *= 1024;
> +
> +               str[len - 1] = 0;
> +       }
> +
> +       return atol(str) * mult;

Negatives?

> +}
> +
> +static int add_buffers(struct working_set *set, char *str)
> +{
> +       /*
> +        * 4096
> +        * 4k
> +        * 4m
> +        * 4g
> +        * 10n4k - 10 4k batches
> +        */
> +       unsigned long *sizes, size;
> +       unsigned int add, i;
> +       char *n;
> +
> +       n = index(str, 'n');
> +       if (n) {
> +               *n = 0;
> +               add = atoi(str);
> +               if (!add)
> +                       return -1;

if (add <= 0) [int add goes without saying then]

> +               str = ++n;
> +       } else {
> +               add = 1;
> +       }
> +
> +       size = parse_size(str);
> +       if (!size)
> +               return -1;
> +
> +       sizes = realloc(set->sizes, (set->nr + add) * sizeof(*sizes));
> +       if (!sizes)
> +               return -1;
> +
> +       for (i = 0; i < add; i++)
> +               sizes[set->nr + i] = size;
> +
> +       set->nr += add;
> +       set->sizes = sizes;
> +
> +       return 0;
> +}

> @@ -1003,6 +1209,51 @@ add_step:
>                 }
>         }
>  
> +       /*
> +        * Check no duplicate working set ids.
> +        */
> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +               struct w_step *w2;
> +
> +               if (w->type != WORKINGSET)
> +                       continue;
> +
> +               for (j = 0, w2 = wrk->steps; j < wrk->nr_steps; w2++, j++) {
> +                       if (j == i)
> +                               continue;
> +                       if (w2->type != WORKINGSET)
> +                               continue;
> +
> +                       check_arg(w->working_set.id == w2->working_set.id,
> +                                 "Duplicate working set id at %u!\n", j);
> +               }
> +       }
> +
> +       /*
> +        * Allocate shared working sets.
> +        */
> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +               if (w->type == WORKINGSET && w->working_set.shared)
> +                       allocate_working_set(&w->working_set);
> +       }
> +
> +       wrk->max_working_set_id = -1;
> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +               if (w->type == WORKINGSET &&
> +                   w->working_set.shared &&
> +                   w->working_set.id > wrk->max_working_set_id)
> +                       wrk->max_working_set_id = w->working_set.id;
> +       }
> +
> +       wrk->working_sets = calloc(wrk->max_working_set_id + 1,
> +                                  sizeof(*wrk->working_sets));
> +       igt_assert(wrk->working_sets);
> +
> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +               if (w->type == WORKINGSET && w->working_set.shared)
> +                       wrk->working_sets[w->working_set.id] = &w->working_set;
> +       }

Ok, sharing works by reusing the same set of handles within the process.

Is there room in the parser namespace for dmabuf sharing?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 03/10] gem_wsim: Show workload timing stats
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 03/10] gem_wsim: Show workload timing stats Tvrtko Ursulin
@ 2020-06-17 16:58   ` Chris Wilson
  2020-06-18  7:46     ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2020-06-17 16:58 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:13)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Show average/min/max workload iteration and dropped period stats when 'p'
> command is used.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  benchmarks/gem_wsim.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 9e5bfe6a36d4..60982cb73ba7 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -2101,7 +2101,8 @@ static void *run_workload(void *data)
>         struct w_step *w;
>         int throttle = -1;
>         int qd_throttle = -1;
> -       int count;
> +       int count, missed = 0;
> +       unsigned long time_tot = 0, time_min = ULONG_MAX, time_max = 0;
>         int i;
>  
>         clock_gettime(CLOCK_MONOTONIC, &t_start);
> @@ -2121,12 +2122,19 @@ static void *run_workload(void *data)
>                                 do_sleep = w->delay;
>                         } else if (w->type == PERIOD) {
>                                 struct timespec now;
> +                               int elapsed;
>  
>                                 clock_gettime(CLOCK_MONOTONIC, &now);
> -                               do_sleep = w->period -
> -                                          elapsed_us(&wrk->repeat_start, &now);
> +                               elapsed = elapsed_us(&wrk->repeat_start, &now);
> +                               do_sleep = w->period - elapsed;
> +                               time_tot += elapsed;
> +                               if (elapsed < time_min)
> +                                       time_min = elapsed;
> +                               if (elapsed > time_max)
> +                                       time_max = elapsed;

Keep the running average?

>                                 if (do_sleep < 0) {
> -                                       if (verbose > 1)
> +                                       missed++;
> +                                       if (verbose > 2)
>                                                 printf("%u: Dropped period @ %u/%u (%dus late)!\n",
>                                                        wrk->id, count, i, do_sleep);
>                                         continue;
> @@ -2280,6 +2288,9 @@ static void *run_workload(void *data)
>                 printf("%c%u: %.3fs elapsed (%d cycles, %.3f workloads/s).",
>                        wrk->background ? ' ' : '*', wrk->id,
>                        t, count, count / t);
> +               if (time_tot)
> +                       printf(" Time avg/min/max=%lu/%lu/%luus; %u missed.",
> +                              time_tot / count, time_min, time_max, missed);
>                 putchar('\n');
>         }
>  
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 07/10] gem_wsim: Log max and active working set sizes in verbose mode
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 07/10] gem_wsim: Log max and active working set sizes in verbose mode Tvrtko Ursulin
@ 2020-06-17 17:07   ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2020-06-17 17:07 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:17)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> It is useful to know how much memory workload is allocating.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  benchmarks/gem_wsim.c | 100 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 95 insertions(+), 5 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 025385a144b8..96ee923fb699 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -852,7 +852,8 @@ static uint64_t engine_list_mask(const char *_str)
>         return mask;
>  }
>  
> -static void allocate_working_set(struct workload *wrk, struct working_set *set);
> +static unsigned long
> +allocate_working_set(struct workload *wrk, struct working_set *set);
>  
>  static long __duration(long dur, double scale)
>  {
> @@ -1270,8 +1271,14 @@ add_step:
>          * Allocate shared working sets.
>          */
>         for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> -               if (w->type == WORKINGSET && w->working_set.shared)
> -                       allocate_working_set(wrk, &w->working_set);
> +               if (w->type == WORKINGSET && w->working_set.shared) {
> +                       unsigned long total =
> +                               allocate_working_set(wrk, &w->working_set);
> +
> +                       if (verbose > 1)
> +                               printf("%u: %lu bytes in shared working set %u\n",
> +                                      wrk->id, total, w->working_set.id);
> +               }
>         }

The total total might be nice; although that doesn't reflect usage so
might be misleading as to what is the active RSS is at any time.
  
>         wrk->max_working_set_id = -1;
> @@ -1731,8 +1738,10 @@ get_buffer_size(struct workload *wrk, const struct work_buffer_size *sz)
>                        (sz->max + 1 - sz->min);
>  }
>  
> -static void allocate_working_set(struct workload *wrk, struct working_set *set)
> +static unsigned long
> +allocate_working_set(struct workload *wrk, struct working_set *set)
>  {
> +       unsigned long total = 0;
>         unsigned int i;
>  
>         set->handles = calloc(set->nr, sizeof(*set->handles));
> @@ -1741,7 +1750,82 @@ static void allocate_working_set(struct workload *wrk, struct working_set *set)
>         for (i = 0; i < set->nr; i++) {
>                 set->sizes[i].size = get_buffer_size(wrk, &set->sizes[i]);
>                 set->handles[i] = alloc_bo(fd, set->sizes[i].size);
> +               total += set->sizes[i].size;
> +       }
> +
> +       return total;
> +}
> +
> +static bool
> +find_dep(struct dep_entry *deps, unsigned int nr, struct dep_entry dep)
> +{
> +       unsigned int i;
> +
> +       for (i = 0; i < nr; i++) {
> +               if (deps[i].working_set == dep.working_set &&
> +                   deps[i].target == dep.target)
> +                       return true;
>         }
> +
> +       return false;
> +}
> +
> +static void measure_active_set(struct workload *wrk)
> +{
> +       unsigned long total = 0, batch_sizes = 0;
> +       struct dep_entry *deps = NULL;
> +       unsigned int nr = 0, i, j;
> +       struct w_step *w;
> +
> +       if (verbose < 3)
> +               return;
> +
> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
> +               if (w->type != BATCH)
> +                       continue;
> +
> +               batch_sizes += w->bb_sz;
> +
> +               for (j = 0; j < w->data_deps.nr; j++) {
> +                       struct dep_entry *dep = &w->data_deps.list[j];
> +                       struct dep_entry _dep = *dep;
> +
> +                       if (dep->working_set == -1 && dep->target < 0) {
> +                               int idx = w->idx + dep->target;
> +
> +                               igt_assert(idx >= 0 && idx < w->idx);
> +                               igt_assert(wrk->steps[idx].type == BATCH);
> +
> +                               _dep.target = wrk->steps[idx].obj[0].handle;
> +                       }
> +
> +                       if (!find_dep(deps, nr, _dep)) {
> +                               if (dep->working_set == -1) {
> +                                       total += 4096;
> +                               } else {
> +                                       struct working_set *set;
> +
> +                                       igt_assert(dep->working_set <=
> +                                                  wrk->max_working_set_id);
> +
> +                                       set = wrk->working_sets[dep->working_set];
> +                                       igt_assert(set->nr);
> +                                       igt_assert(dep->target < set->nr);
> +                                       igt_assert(set->sizes[dep->target].size);
> +
> +                                       total += set->sizes[dep->target].size;
> +                               }
> +
> +                               deps = realloc(deps, (nr + 1) * sizeof(*deps));
> +                               deps[nr++] = *dep;
> +                       }
> +               }
> +       }

So a sum of all the unique handles used by all the steps.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 09/10] gem_wsim: Implement device selection
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 09/10] gem_wsim: Implement device selection Tvrtko Ursulin
@ 2020-06-17 17:09   ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2020-06-17 17:09 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:19)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> New command line options -L and -D <device> can respectively be used to
> list and select a GPU when more than one is present.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  benchmarks/gem_wsim.c | 62 +++++++++++++++++++++++++++++++------------
>  1 file changed, 45 insertions(+), 17 deletions(-)
> 
> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> index 96ee923fb699..ca07b670bd42 100644
> --- a/benchmarks/gem_wsim.c
> +++ b/benchmarks/gem_wsim.c
> @@ -43,6 +43,7 @@
>  #include <pthread.h>
>  #include <math.h>
>  
> +#include "igt_device_scan.h"
>  #include "intel_chipset.h"
>  #include "intel_reg.h"
>  #include "drm.h"
> @@ -2593,7 +2594,9 @@ static void print_help(void)
>  "  -S                Synchronize the sequence of random batch durations between\n"
>  "                    clients.\n"
>  "  -d                Sync between data dependencies in userspace.\n"
> -"  -f <scale>        Scale factor for batch durations."
> +"  -f <scale>        Scale factor for batch durations.\n"
> +"  -L                List GPUs.\n"
> +"  -D <gpu>          One of the GPUs from -L."

This is unlike you, I was expecting a range!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 08/10] gem_wsim: Snippet of a workload extracted from carchase
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 08/10] gem_wsim: Snippet of a workload extracted from carchase Tvrtko Ursulin
@ 2020-06-17 17:45   ` Chris Wilson
  2020-06-18  7:53     ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2020-06-17 17:45 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:18)
> +1.RCS.1000.r3-47/w27-0/r0-58/r3-80/r22-31/r3-42/r9-4/w34-0/r3-18/r3-41/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-67/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-45/r3-110/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-85/r25-48/r3-12/r25-104/r24-23/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r3-49/r3-103/r22-6/r3-68/r3-112/r22-29/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-67/r3-37/r25-0/r22-7/r25-59/r25-71/r25-101/r25-75/r25-20/r25-91/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r18-19/r18-26/r18-21/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r22-21/r25-22/r3-29/r25-93/r18-2/r18-14/r18-3/r22-10/r18-23/r18-7/r18-11/r3-73/r8-0/r25-92/r25-41/w33-3/r0-1!
>  07/w19-0.
>  0

This patch has been mangled.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing
  2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing Tvrtko Ursulin
  2020-06-17 16:07   ` Chris Wilson
@ 2020-06-18  7:14   ` Chris Wilson
  2020-06-18  7:40     ` Tvrtko Ursulin
  1 sibling, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2020-06-18  7:14 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-17 17:01:11)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Evaluation of userspace load balancing options was how this tool started
> but since we have settled on doing it in the kernel.
> 
> Tomorrow we will want to update the tool for new engine interfaces and all
> this legacy code will just be a distraction.
> 
> Rip out everything not related to explicit load balancing implemented via
> context engine maps and adjust the workloads to use it.

Hmm, if this is on the table, should we also then restrict
load-balancing wsim to gen11+ so that we can use the timed loops rather
nop batches? That would be a huge selling point, and I'll just keep an
old checkout around for nop load balancing with all the trimmings.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing
  2020-06-18  7:14   ` Chris Wilson
@ 2020-06-18  7:40     ` Tvrtko Ursulin
  2020-06-18  7:55       ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-18  7:40 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 18/06/2020 08:14, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-17 17:01:11)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Evaluation of userspace load balancing options was how this tool started
>> but since we have settled on doing it in the kernel.
>>
>> Tomorrow we will want to update the tool for new engine interfaces and all
>> this legacy code will just be a distraction.
>>
>> Rip out everything not related to explicit load balancing implemented via
>> context engine maps and adjust the workloads to use it.
> 
> Hmm, if this is on the table, should we also then restrict
> load-balancing wsim to gen11+ so that we can use the timed loops rather
> nop batches? That would be a huge selling point, and I'll just keep an
> old checkout around for nop load balancing with all the trimmings.

That was my plan for the next step yes. Just taking your patch without 
further changes would already make it work I think. But also at some 
point I want to convert the engine selection (and engine naming in 
descriptors) to class:instance.

Why do you need the nop/old balancing stuff? I would hope going forward 
we only need to compare current balancing against any changes. So I'd 
really like to remoev the userspace balancing stuff.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 03/10] gem_wsim: Show workload timing stats
  2020-06-17 16:58   ` Chris Wilson
@ 2020-06-18  7:46     ` Tvrtko Ursulin
  2020-06-18  7:57       ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-18  7:46 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 17/06/2020 17:58, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-17 17:01:13)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Show average/min/max workload iteration and dropped period stats when 'p'
>> command is used.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   benchmarks/gem_wsim.c | 19 +++++++++++++++----
>>   1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 9e5bfe6a36d4..60982cb73ba7 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -2101,7 +2101,8 @@ static void *run_workload(void *data)
>>          struct w_step *w;
>>          int throttle = -1;
>>          int qd_throttle = -1;
>> -       int count;
>> +       int count, missed = 0;
>> +       unsigned long time_tot = 0, time_min = ULONG_MAX, time_max = 0;
>>          int i;
>>   
>>          clock_gettime(CLOCK_MONOTONIC, &t_start);
>> @@ -2121,12 +2122,19 @@ static void *run_workload(void *data)
>>                                  do_sleep = w->delay;
>>                          } else if (w->type == PERIOD) {
>>                                  struct timespec now;
>> +                               int elapsed;
>>   
>>                                  clock_gettime(CLOCK_MONOTONIC, &now);
>> -                               do_sleep = w->period -
>> -                                          elapsed_us(&wrk->repeat_start, &now);
>> +                               elapsed = elapsed_us(&wrk->repeat_start, &now);
>> +                               do_sleep = w->period - elapsed;
>> +                               time_tot += elapsed;
>> +                               if (elapsed < time_min)
>> +                                       time_min = elapsed;
>> +                               if (elapsed > time_max)
>> +                                       time_max = elapsed;
> 
> Keep the running average?

Could do but why? I already have the count so adding up total elapsed 
frame time sound easiest.

Regards,

Tvrtko

> 
>>                                  if (do_sleep < 0) {
>> -                                       if (verbose > 1)
>> +                                       missed++;
>> +                                       if (verbose > 2)
>>                                                  printf("%u: Dropped period @ %u/%u (%dus late)!\n",
>>                                                         wrk->id, count, i, do_sleep);
>>                                          continue;
>> @@ -2280,6 +2288,9 @@ static void *run_workload(void *data)
>>                  printf("%c%u: %.3fs elapsed (%d cycles, %.3f workloads/s).",
>>                         wrk->background ? ' ' : '*', wrk->id,
>>                         t, count, count / t);
>> +               if (time_tot)
>> +                       printf(" Time avg/min/max=%lu/%lu/%luus; %u missed.",
>> +                              time_tot / count, time_min, time_max, missed);
>>                  putchar('\n');
>>          }
>>   
>> -- 
>> 2.20.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 08/10] gem_wsim: Snippet of a workload extracted from carchase
  2020-06-17 17:45   ` Chris Wilson
@ 2020-06-18  7:53     ` Tvrtko Ursulin
  2020-06-18  8:02       ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-18  7:53 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 17/06/2020 18:45, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-17 17:01:18)
>> +1.RCS.1000.r3-47/w27-0/r0-58/r3-80/r22-31/r3-42/r9-4/w34-0/r3-18/r3-41/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-67/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-45/r3-110/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-85/r25-48/r3-12/r25-104/r24-23/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r3-49/r3-103/r22-6/r3-68/r3-112/r22-29/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-67/r3-37/r25-0/r22-7/r25-59/r25-71/r25-101/r25-75/r25-20/r25-91/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r18-19/r18-26/r18-21/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r22-21/r25-22/r3-29/r25-93/r18-2/r18-14/r18-3/r22-10/r18-23/r18-7/r18-11/r3-73/r8-0/r25-92/r25-41/w33-3/r0-1!
>>   07/w19-0.
>>   0
> 
> This patch has been mangled.

Could it be your email client? (Very long lines in the patch.) I don't 
see corruption anywhere on my side, or on the copy I received from the 
mailing list.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing
  2020-06-18  7:40     ` Tvrtko Ursulin
@ 2020-06-18  7:55       ` Chris Wilson
  2020-06-18 10:03         ` Tvrtko Ursulin
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2020-06-18  7:55 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-18 08:40:25)
> 
> On 18/06/2020 08:14, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-17 17:01:11)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Evaluation of userspace load balancing options was how this tool started
> >> but since we have settled on doing it in the kernel.
> >>
> >> Tomorrow we will want to update the tool for new engine interfaces and all
> >> this legacy code will just be a distraction.
> >>
> >> Rip out everything not related to explicit load balancing implemented via
> >> context engine maps and adjust the workloads to use it.
> > 
> > Hmm, if this is on the table, should we also then restrict
> > load-balancing wsim to gen11+ so that we can use the timed loops rather
> > nop batches? That would be a huge selling point, and I'll just keep an
> > old checkout around for nop load balancing with all the trimmings.
> 
> That was my plan for the next step yes. Just taking your patch without 
> further changes would already make it work I think. But also at some 
> point I want to convert the engine selection (and engine naming in 
> descriptors) to class:instance.
> 
> Why do you need the nop/old balancing stuff? I would hope going forward 
> we only need to compare current balancing against any changes. So I'd 
> really like to remoev the userspace balancing stuff.

There are still some cases where i915 is beaten by plain old contexts,
usually that is a combination of semaphores and interrupt latency, but
some I just don't understand. There is still an uncomfortably large
variation between kernel releases, and comparing the regressions in
different balancers is useful to narrow down the problem.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 03/10] gem_wsim: Show workload timing stats
  2020-06-18  7:46     ` Tvrtko Ursulin
@ 2020-06-18  7:57       ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2020-06-18  7:57 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-18 08:46:18)
> 
> On 17/06/2020 17:58, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-17 17:01:13)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Show average/min/max workload iteration and dropped period stats when 'p'
> >> command is used.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   benchmarks/gem_wsim.c | 19 +++++++++++++++----
> >>   1 file changed, 15 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> >> index 9e5bfe6a36d4..60982cb73ba7 100644
> >> --- a/benchmarks/gem_wsim.c
> >> +++ b/benchmarks/gem_wsim.c
> >> @@ -2101,7 +2101,8 @@ static void *run_workload(void *data)
> >>          struct w_step *w;
> >>          int throttle = -1;
> >>          int qd_throttle = -1;
> >> -       int count;
> >> +       int count, missed = 0;
> >> +       unsigned long time_tot = 0, time_min = ULONG_MAX, time_max = 0;
> >>          int i;
> >>   
> >>          clock_gettime(CLOCK_MONOTONIC, &t_start);
> >> @@ -2121,12 +2122,19 @@ static void *run_workload(void *data)
> >>                                  do_sleep = w->delay;
> >>                          } else if (w->type == PERIOD) {
> >>                                  struct timespec now;
> >> +                               int elapsed;
> >>   
> >>                                  clock_gettime(CLOCK_MONOTONIC, &now);
> >> -                               do_sleep = w->period -
> >> -                                          elapsed_us(&wrk->repeat_start, &now);
> >> +                               elapsed = elapsed_us(&wrk->repeat_start, &now);
> >> +                               do_sleep = w->period - elapsed;
> >> +                               time_tot += elapsed;
> >> +                               if (elapsed < time_min)
> >> +                                       time_min = elapsed;
> >> +                               if (elapsed > time_max)
> >> +                                       time_max = elapsed;
> > 
> > Keep the running average?
> 
> Could do but why? I already have the count so adding up total elapsed 
> frame time sound easiest.

Because I was blind and didn't see it in the printf.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 06/10] gem_wsim: Support scaling workload batch durations
  2020-06-17 16:22   ` Chris Wilson
@ 2020-06-18  8:01     ` Tvrtko Ursulin
  2020-06-18  8:07       ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-18  8:01 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 17/06/2020 17:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-17 17:01:16)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> -f <float> on the command line can be used to scale batch buffer durations
>> in all parsed workloads.
> 
> But not the period?

I had it scale both at some point but then it ended up more useful to 
only do batches. So I could stuff more clients in before saturation. I 
suppose that's an argument to have both independently controlled.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 08/10] gem_wsim: Snippet of a workload extracted from carchase
  2020-06-18  7:53     ` Tvrtko Ursulin
@ 2020-06-18  8:02       ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2020-06-18  8:02 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-18 08:53:47)
> 
> On 17/06/2020 18:45, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-17 17:01:18)
> >> +1.RCS.1000.r3-47/w27-0/r0-58/r3-80/r22-31/r3-42/r9-4/w34-0/r3-18/r3-41/r21-0/r9-5/r3-78/r25-4/r3-104/r3-23/r30-0/r3-88/r22-27/r22-1/r25-45/r3-50/r22-12/r22-22/r22-3/r22-0/r25-56/r3-4/r22-15/r25-113/r3-7/r22-18/r25-60/r3-81/r25-21/r3-89/r18-5/r3-93/r17-8/r25-28/r25-87/r25-9/r25-13/r25-42/r25-90/r22-16/r34-2/r3-15/w3-64/r0-67/r25-99/r25-73/r3-6/r25-40/r3-90/r22-20/r0-45/r3-110/w32-17/w32-16/w32-3/w32-1/w33-2/r28-2/r3-98/r3-85/r25-48/r3-12/r25-104/r24-23/r3-87/r3-108/r3-26/r3-96/r22-5/r22-14/r3-49/r3-103/r22-6/r3-68/r3-112/r22-29/r22-28/r25-14/r25-44/r25-19/r3-67/r25-111/r18-4/r3-66/r18-17/r4-5/r25-68/r25-86/r25-26/r25-67/r3-37/r25-0/r22-7/r25-59/r25-71/r25-101/r25-75/r25-20/r25-91/r3-2/r3-117/r3-33/r22-2/r25-55/r25-66/r25-24/r25-105/r25-61/r25-11/r25-51/r25-64/r25-70/r18-19/r18-26/r18-21/r25-81/r25-78/r25-37/r25-50/r25-102/r25-35/r18-18/r18-13/r18-12/r3-69/r3-19/r3-100/r22-21/r25-22/r3-29/r25-93/r18-2/r18-14/r18-3/r22-10/r18-23/r18-7/r18-11/r3-73/r8-0/r25-92/r25-41/w33-3/r!
>  0-1!
> >>   07/w19-0.
> >>   0
> > 
> > This patch has been mangled.
> 
> Could it be your email client? (Very long lines in the patch.) I don't 
> see corruption anywhere on my side, or on the copy I received from the 
> mailing list.

Somewhere in the chain, it's the same breakage in the file as well. It's
not something I've seen before; and extremely strange choice for
breaking the subsequent lines.

Anyway, it did lead me to suggest to add printing the _token for the
invalid step; and perhaps change that from "step" to "line"?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 05/10] gem_wsim: Support random buffer sizes
  2020-06-17 16:31   ` Chris Wilson
@ 2020-06-18  8:06     ` Tvrtko Ursulin
  0 siblings, 0 replies; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-18  8:06 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 17/06/2020 17:31, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-17 17:01:15)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> See README for more details.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   benchmarks/gem_wsim.c  | 71 +++++++++++++++++++++++++++++++++---------
>>   benchmarks/wsim/README |  4 +++
>>   2 files changed, 61 insertions(+), 14 deletions(-)
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 5893de38a98e..c1405596c46a 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -117,12 +117,18 @@ struct bond {
>>          enum intel_engine_id master;
>>   };
>>   
>> +struct work_buffer_size {
>> +       unsigned long size;
>> +       unsigned long min;
>> +       unsigned long max;
>> +};
>> +
>>   struct working_set {
>>          int id;
>>          bool shared;
>>          unsigned int nr;
>>          uint32_t *handles;
>> -       unsigned long *sizes;
>> +       struct work_buffer_size *sizes;
>>   };
>>   
>>   struct workload;
>> @@ -203,6 +209,7 @@ struct workload
>>          bool print_stats;
>>   
>>          uint32_t bb_prng;
>> +       uint32_t bo_prng;
>>   
>>          struct timespec repeat_start;
>>   
>> @@ -757,10 +764,12 @@ static int add_buffers(struct working_set *set, char *str)
>>           * 4m
>>           * 4g
>>           * 10n4k - 10 4k batches
>> +        * 4096-16k - random size in range
>>           */
>> -       unsigned long *sizes, size;
>> +       struct work_buffer_size *sizes;
>> +       unsigned long min_sz, max_sz;
>> +       char *n, *max = NULL;
>>          unsigned int add, i;
>> -       char *n;
>>   
>>          n = index(str, 'n');
>>          if (n) {
>> @@ -773,16 +782,34 @@ static int add_buffers(struct working_set *set, char *str)
>>                  add = 1;
>>          }
>>   
>> -       size = parse_size(str);
>> -       if (!size)
>> +       n = index(str, '-');
>> +       if (n) {
>> +               *n = 0;
>> +               max = ++n;
>> +       }
>> +
>> +       min_sz = parse_size(str);
>> +       if (!min_sz)
>>                  return -1;
>>   
>> +       if (max) {
>> +               max_sz = parse_size(max);
>> +               if (!max_sz)
>> +                       return -1;
>> +       } else {
>> +               max_sz = min_sz;
>> +       }
>> +
>>          sizes = realloc(set->sizes, (set->nr + add) * sizeof(*sizes));
>>          if (!sizes)
>>                  return -1;
>>   
>> -       for (i = 0; i < add; i++)
>> -               sizes[set->nr + i] = size;
>> +       for (i = 0; i < add; i++) {
>> +               struct work_buffer_size *sz = &sizes[set->nr + i];
>> +               sz->min = min_sz;
>> +               sz->max = max_sz;
>> +               sz->size = 0;
>> +       }
>>   
>>          set->nr += add;
>>          set->sizes = sizes;
>> @@ -824,7 +851,7 @@ static uint64_t engine_list_mask(const char *_str)
>>          return mask;
>>   }
>>   
>> -static void allocate_working_set(struct working_set *set);
>> +static void allocate_working_set(struct workload *wrk, struct working_set *set);
>>   
>>   #define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
>>          if ((field = strtok_r(fstart, ".", &fctx))) { \
>> @@ -1177,10 +1204,12 @@ add_step:
>>   
>>          wrk->nr_steps = nr_steps;
>>          wrk->steps = steps;
>> +       wrk->flags = flags;
>>          wrk->prio = arg->prio;
>>          wrk->sseu = arg->sseu;
>>          wrk->max_working_set_id = -1;
>>          wrk->working_sets = NULL;
>> +       wrk->bo_prng = (wrk->flags & SYNCEDCLIENTS) ? master_prng : rand();
>>   
>>          free(desc);
>>   
>> @@ -1234,7 +1263,7 @@ add_step:
>>           */
>>          for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>>                  if (w->type == WORKINGSET && w->working_set.shared)
>> -                       allocate_working_set(&w->working_set);
>> +                       allocate_working_set(wrk, &w->working_set);
>>          }
>>   
>>          wrk->max_working_set_id = -1;
>> @@ -1267,6 +1296,7 @@ clone_workload(struct workload *_wrk)
>>          igt_assert(wrk);
>>          memset(wrk, 0, sizeof(*wrk));
>>   
>> +       wrk->flags = _wrk->flags;
>>          wrk->prio = _wrk->prio;
>>          wrk->sseu = _wrk->sseu;
>>          wrk->nr_steps = _wrk->nr_steps;
> 
> wrk->flags wasn't introduced in this patch, why are we needing to copy
> them now.
> 
> I see wrk->bo_prn = flags & SYNC above, but I haven't seem them used
> again later. They used to carry the balancer info and were setup in
> main. Am I not mistaken in thinking they still are being set in main()
> as well?

I couldn't remember but looking around looks like you are right. I'll do 
some experiments and remove it if confirmed.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 06/10] gem_wsim: Support scaling workload batch durations
  2020-06-18  8:01     ` Tvrtko Ursulin
@ 2020-06-18  8:07       ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2020-06-18  8:07 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-18 09:01:10)
> 
> On 17/06/2020 17:22, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-17 17:01:16)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> -f <float> on the command line can be used to scale batch buffer durations
> >> in all parsed workloads.
> > 
> > But not the period?
> 
> I had it scale both at some point but then it ended up more useful to 
> only do batches. So I could stuff more clients in before saturation. I 
> suppose that's an argument to have both independently controlled.

I was moreover trying to work out why one would want to. I was guessing
shrink the duration and add more clients, and there you would the
independent control, but if you just shrank the duration for a fixed
number of clients, you would also want to shrink the period.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 02/10] gem_wsim: Buffer objects working sets and complex dependencies
  2020-06-17 16:57   ` Chris Wilson
@ 2020-06-18  9:05     ` Tvrtko Ursulin
  2020-06-18  9:22       ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-18  9:05 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 17/06/2020 17:57, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-17 17:01:12)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Add support for defining buffer object working sets and targetting them as
>> data dependencies. For more information please see the README file.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   benchmarks/gem_wsim.c                   | 453 +++++++++++++++++++++---
>>   benchmarks/wsim/README                  |  59 +++
>>   benchmarks/wsim/cloud-gaming-60fps.wsim |  11 +
>>   benchmarks/wsim/composited-ui.wsim      |   7 +
>>   4 files changed, 476 insertions(+), 54 deletions(-)
>>   create mode 100644 benchmarks/wsim/cloud-gaming-60fps.wsim
>>   create mode 100644 benchmarks/wsim/composited-ui.wsim
>>
>> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
>> index 02fe8f5a5e69..9e5bfe6a36d4 100644
>> --- a/benchmarks/gem_wsim.c
>> +++ b/benchmarks/gem_wsim.c
>> @@ -88,14 +88,21 @@ enum w_type
>>          LOAD_BALANCE,
>>          BOND,
>>          TERMINATE,
>> -       SSEU
>> +       SSEU,
>> +       WORKINGSET,
>> +};
>> +
>> +struct dep_entry {
>> +       int target;
>> +       bool write;
>> +       int working_set; /* -1 = step dependecy, >= 0 working set id */
>>   };
>>   
>>   struct deps
>>   {
>>          int nr;
>>          bool submit_fence;
>> -       int *list;
>> +       struct dep_entry *list;
>>   };
>>   
>>   struct w_arg {
>> @@ -110,6 +117,14 @@ struct bond {
>>          enum intel_engine_id master;
>>   };
>>   
>> +struct working_set {
>> +       int id;
>> +       bool shared;
>> +       unsigned int nr;
>> +       uint32_t *handles;
>> +       unsigned long *sizes;
>> +};
>> +
>>   struct workload;
>>   
>>   struct w_step
>> @@ -143,6 +158,7 @@ struct w_step
>>                          enum intel_engine_id bond_master;
>>                  };
>>                  int sseu;
>> +               struct working_set working_set;
>>          };
>>   
>>          /* Implementation details */
>> @@ -193,6 +209,9 @@ struct workload
>>          unsigned int nr_ctxs;
>>          struct ctx *ctx_list;
>>   
>> +       struct working_set **working_sets; /* array indexed by set id */
>> +       int max_working_set_id;
>> +
>>          int sync_timeline;
>>          uint32_t sync_seqno;
>>   
>> @@ -281,11 +300,120 @@ print_engine_calibrations(void)
>>          printf("\n");
>>   }
>>   
>> +static void add_dep(struct deps *deps, struct dep_entry entry)
>> +{
>> +       deps->list = realloc(deps->list, sizeof(*deps->list) * (deps->nr + 1));
>> +       igt_assert(deps->list);
>> +
>> +       deps->list[deps->nr++] = entry;
>> +}
>> +
>> +
>> +static int
>> +parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
>> +{
>> +       struct dep_entry entry = { .working_set = -1 };
>> +       bool submit_fence = false;
>> +       char *s;
>> +
>> +       switch (str[0]) {
>> +       case '-':
>> +               if (str[1] < '0' || str[1] > '9')
>> +                       return -1;
>> +
>> +               entry.target = atoi(str);
>> +               if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>> +                       return -1;
> 
> add_dep for N steps ago, using a handle.
> 
>> +
>> +               add_dep(&w->data_deps, entry);
>> +
>> +               break;
>> +       case 's':
>> +               submit_fence = true;
>> +               /* Fall-through. */
>> +       case 'f':
>> +               /* Multiple fences not yet supported. */
>> +               igt_assert_eq(w->fence_deps.nr, 0);
>> +
>> +               entry.target = atoi(++str);
>> +               if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
>> +                       return -1;
>> +
>> +               add_dep(&w->fence_deps, entry);
>> +
>> +               w->fence_deps.submit_fence = submit_fence;
> 
> add_dep for N steps ago, using the out-fence from that step
> [A post processing steps adds emit_fence to the earlier steps.]
> 
>> +               break;
>> +       case 'w':
>> +               entry.write = true;
> 
> Got confused for a moment as I was expecting the submit_fence
> fallthrough pattern.
>> +               /* Fall-through. */
>> +       case 'r':
>> +               /*
>> +                * [rw]N-<str>
>> +                * r1-<str> or w2-<str>, where N is working set id.
>> +                */
>> +               s = index(++str, '-');
>> +               if (!s)
>> +                       return -1;
>> +
>> +               entry.working_set = atoi(str);
> 
> if (entry.working_set < 0)
> 	return -1;

Yep.

> 
>> +
>> +               if (parse_working_set_deps(w->wrk, &w->data_deps, entry, ++s))
>> +                       return -1;
> 
> The new one...
> 
>> +static int
>> +parse_working_set_deps(struct workload *wrk,
>> +                      struct deps *deps,
>> +                      struct dep_entry _entry,
>> +                      char *str)
>> +{
>> +       /*
>> +        * 1 - target handle index in the specified working set.
>> +        * 2-4 - range
>> +        */
>> +       struct dep_entry entry = _entry;
>> +       char *s;
>> +
>> +       s = index(str, '-');
>> +       if (s) {
>> +               int from, to;
>> +
>> +               from = atoi(str);
>> +               if (from < 0)
>> +                       return -1;
>> +
>> +               to = atoi(++s);
>> +               if (to <= 0)
>> +                       return -1;
> 
> if to < from, we add nothing. Is that worth the error?

Yep.

> 
>> +
>> +               for (entry.target = from; entry.target <= to; entry.target++)
>> +                       add_dep(deps, entry);
>> +       } else {
>> +               entry.target = atoi(str);
>> +               if (entry.target < 0)
>> +                       return -1;
>> +
>> +               add_dep(deps, entry);
> 
> 
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +               break;
>> +       default:
>> +               return -1;
>> +       };
>> +
>> +       return 0;
>> +}
>> +
>>   static int
>>   parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
>>   {
>>          char *desc = strdup(_desc);
>>          char *token, *tctx = NULL, *tstart = desc;
>> +       int ret = 0;
>> +
>> +       if (!strcmp(_desc, "0"))
>> +               goto out;
> 
> Hang on, what this special case?

For no dependencies.

If I move the check to parse_dependency then dependency of "0/0/0/0" 
would be silently accepted. It wouldn't be a big deal, who cares, but I 
thought it is better to be more strict.

> 
>>   
>>          igt_assert(desc);
>>          igt_assert(!w->data_deps.nr && w->data_deps.nr == w->fence_deps.nr);
>>   static void __attribute__((format(printf, 1, 2)))
>> @@ -624,6 +722,88 @@ static int parse_engine_map(struct w_step *step, const char *_str)
>>          return 0;
>>   }
>>   
>> +static unsigned long parse_size(char *str)
>> +{
> /* "1234567890[gGmMkK]" */
>> +       const unsigned int len = strlen(str);
>> +       unsigned int mult = 1;
>> +
>> +       if (len == 0)
>> +               return 0;
>> +
>> +       switch (str[len - 1]) {
> 
> T? P? E? Let's plan ahead! :)

Error on unrecognized non-digit? Ok.

> 
>> +       case 'g':
>> +       case 'G':
>> +               mult *= 1024;
>> +               /* Fall-throuogh. */
>> +       case 'm':
>> +       case 'M':
>> +               mult *= 1024;
>> +               /* Fall-throuogh. */
>> +       case 'k':
>> +       case 'K':
>> +               mult *= 1024;
>> +
>> +               str[len - 1] = 0;
>> +       }
>> +
>> +       return atol(str) * mult;
> 
> Negatives?

Ok.

> 
>> +}
>> +
>> +static int add_buffers(struct working_set *set, char *str)
>> +{
>> +       /*
>> +        * 4096
>> +        * 4k
>> +        * 4m
>> +        * 4g
>> +        * 10n4k - 10 4k batches
>> +        */
>> +       unsigned long *sizes, size;
>> +       unsigned int add, i;
>> +       char *n;
>> +
>> +       n = index(str, 'n');
>> +       if (n) {
>> +               *n = 0;
>> +               add = atoi(str);
>> +               if (!add)
>> +                       return -1;
> 
> if (add <= 0) [int add goes without saying then]

Yep.

> 
>> +               str = ++n;
>> +       } else {
>> +               add = 1;
>> +       }
>> +
>> +       size = parse_size(str);
>> +       if (!size)
>> +               return -1;
>> +
>> +       sizes = realloc(set->sizes, (set->nr + add) * sizeof(*sizes));
>> +       if (!sizes)
>> +               return -1;
>> +
>> +       for (i = 0; i < add; i++)
>> +               sizes[set->nr + i] = size;
>> +
>> +       set->nr += add;
>> +       set->sizes = sizes;
>> +
>> +       return 0;
>> +}
> 
>> @@ -1003,6 +1209,51 @@ add_step:
>>                  }
>>          }
>>   
>> +       /*
>> +        * Check no duplicate working set ids.
>> +        */
>> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +               struct w_step *w2;
>> +
>> +               if (w->type != WORKINGSET)
>> +                       continue;
>> +
>> +               for (j = 0, w2 = wrk->steps; j < wrk->nr_steps; w2++, j++) {
>> +                       if (j == i)
>> +                               continue;
>> +                       if (w2->type != WORKINGSET)
>> +                               continue;
>> +
>> +                       check_arg(w->working_set.id == w2->working_set.id,
>> +                                 "Duplicate working set id at %u!\n", j);
>> +               }
>> +       }
>> +
>> +       /*
>> +        * Allocate shared working sets.
>> +        */
>> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +               if (w->type == WORKINGSET && w->working_set.shared)
>> +                       allocate_working_set(&w->working_set);
>> +       }
>> +
>> +       wrk->max_working_set_id = -1;
>> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +               if (w->type == WORKINGSET &&
>> +                   w->working_set.shared &&
>> +                   w->working_set.id > wrk->max_working_set_id)
>> +                       wrk->max_working_set_id = w->working_set.id;
>> +       }
>> +
>> +       wrk->working_sets = calloc(wrk->max_working_set_id + 1,
>> +                                  sizeof(*wrk->working_sets));
>> +       igt_assert(wrk->working_sets);
>> +
>> +       for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
>> +               if (w->type == WORKINGSET && w->working_set.shared)
>> +                       wrk->working_sets[w->working_set.id] = &w->working_set;
>> +       }
> 
> Ok, sharing works by reusing the same set of handles within the process.
> 
> Is there room in the parser namespace for dmabuf sharing?

Plenty of unused characters. :)

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 02/10] gem_wsim: Buffer objects working sets and complex dependencies
  2020-06-18  9:05     ` Tvrtko Ursulin
@ 2020-06-18  9:22       ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2020-06-18  9:22 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-18 10:05:56)
> 
> On 17/06/2020 17:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-17 17:01:12)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Add support for defining buffer object working sets and targetting them as
> >> data dependencies. For more information please see the README file.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   benchmarks/gem_wsim.c                   | 453 +++++++++++++++++++++---
> >>   benchmarks/wsim/README                  |  59 +++
> >>   benchmarks/wsim/cloud-gaming-60fps.wsim |  11 +
> >>   benchmarks/wsim/composited-ui.wsim      |   7 +
> >>   4 files changed, 476 insertions(+), 54 deletions(-)
> >>   create mode 100644 benchmarks/wsim/cloud-gaming-60fps.wsim
> >>   create mode 100644 benchmarks/wsim/composited-ui.wsim
> >>
> >> diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
> >> index 02fe8f5a5e69..9e5bfe6a36d4 100644
> >> --- a/benchmarks/gem_wsim.c
> >> +++ b/benchmarks/gem_wsim.c
> >> @@ -88,14 +88,21 @@ enum w_type
> >>          LOAD_BALANCE,
> >>          BOND,
> >>          TERMINATE,
> >> -       SSEU
> >> +       SSEU,
> >> +       WORKINGSET,
> >> +};
> >> +
> >> +struct dep_entry {
> >> +       int target;
> >> +       bool write;
> >> +       int working_set; /* -1 = step dependecy, >= 0 working set id */
> >>   };
> >>   
> >>   struct deps
> >>   {
> >>          int nr;
> >>          bool submit_fence;
> >> -       int *list;
> >> +       struct dep_entry *list;
> >>   };
> >>   
> >>   struct w_arg {
> >> @@ -110,6 +117,14 @@ struct bond {
> >>          enum intel_engine_id master;
> >>   };
> >>   
> >> +struct working_set {
> >> +       int id;
> >> +       bool shared;
> >> +       unsigned int nr;
> >> +       uint32_t *handles;
> >> +       unsigned long *sizes;
> >> +};
> >> +
> >>   struct workload;
> >>   
> >>   struct w_step
> >> @@ -143,6 +158,7 @@ struct w_step
> >>                          enum intel_engine_id bond_master;
> >>                  };
> >>                  int sseu;
> >> +               struct working_set working_set;
> >>          };
> >>   
> >>          /* Implementation details */
> >> @@ -193,6 +209,9 @@ struct workload
> >>          unsigned int nr_ctxs;
> >>          struct ctx *ctx_list;
> >>   
> >> +       struct working_set **working_sets; /* array indexed by set id */
> >> +       int max_working_set_id;
> >> +
> >>          int sync_timeline;
> >>          uint32_t sync_seqno;
> >>   
> >> @@ -281,11 +300,120 @@ print_engine_calibrations(void)
> >>          printf("\n");
> >>   }
> >>   
> >> +static void add_dep(struct deps *deps, struct dep_entry entry)
> >> +{
> >> +       deps->list = realloc(deps->list, sizeof(*deps->list) * (deps->nr + 1));
> >> +       igt_assert(deps->list);
> >> +
> >> +       deps->list[deps->nr++] = entry;
> >> +}
> >> +
> >> +
> >> +static int
> >> +parse_dependency(unsigned int nr_steps, struct w_step *w, char *str)
> >> +{
> >> +       struct dep_entry entry = { .working_set = -1 };
> >> +       bool submit_fence = false;
> >> +       char *s;
> >> +
> >> +       switch (str[0]) {
> >> +       case '-':
> >> +               if (str[1] < '0' || str[1] > '9')
> >> +                       return -1;
> >> +
> >> +               entry.target = atoi(str);
> >> +               if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
> >> +                       return -1;
> > 
> > add_dep for N steps ago, using a handle.
> > 
> >> +
> >> +               add_dep(&w->data_deps, entry);
> >> +
> >> +               break;
> >> +       case 's':
> >> +               submit_fence = true;
> >> +               /* Fall-through. */
> >> +       case 'f':
> >> +               /* Multiple fences not yet supported. */
> >> +               igt_assert_eq(w->fence_deps.nr, 0);
> >> +
> >> +               entry.target = atoi(++str);
> >> +               if (entry.target > 0 || ((int)nr_steps + entry.target) < 0)
> >> +                       return -1;
> >> +
> >> +               add_dep(&w->fence_deps, entry);
> >> +
> >> +               w->fence_deps.submit_fence = submit_fence;
> > 
> > add_dep for N steps ago, using the out-fence from that step
> > [A post processing steps adds emit_fence to the earlier steps.]
> > 
> >> +               break;
> >> +       case 'w':
> >> +               entry.write = true;
> > 
> > Got confused for a moment as I was expecting the submit_fence
> > fallthrough pattern.
> >> +               /* Fall-through. */
> >> +       case 'r':
> >> +               /*
> >> +                * [rw]N-<str>
> >> +                * r1-<str> or w2-<str>, where N is working set id.
> >> +                */
> >> +               s = index(++str, '-');
> >> +               if (!s)
> >> +                       return -1;
> >> +
> >> +               entry.working_set = atoi(str);
> > 
> > if (entry.working_set < 0)
> >       return -1;
> 
> Yep.
> 
> > 
> >> +
> >> +               if (parse_working_set_deps(w->wrk, &w->data_deps, entry, ++s))
> >> +                       return -1;
> > 
> > The new one...
> > 
> >> +static int
> >> +parse_working_set_deps(struct workload *wrk,
> >> +                      struct deps *deps,
> >> +                      struct dep_entry _entry,
> >> +                      char *str)
> >> +{
> >> +       /*
> >> +        * 1 - target handle index in the specified working set.
> >> +        * 2-4 - range
> >> +        */
> >> +       struct dep_entry entry = _entry;
> >> +       char *s;
> >> +
> >> +       s = index(str, '-');
> >> +       if (s) {
> >> +               int from, to;
> >> +
> >> +               from = atoi(str);
> >> +               if (from < 0)
> >> +                       return -1;
> >> +
> >> +               to = atoi(++s);
> >> +               if (to <= 0)
> >> +                       return -1;
> > 
> > if to < from, we add nothing. Is that worth the error?
> 
> Yep.
> 
> > 
> >> +
> >> +               for (entry.target = from; entry.target <= to; entry.target++)
> >> +                       add_dep(deps, entry);
> >> +       } else {
> >> +               entry.target = atoi(str);
> >> +               if (entry.target < 0)
> >> +                       return -1;
> >> +
> >> +               add_dep(deps, entry);
> > 
> > 
> >> +       }
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +               break;
> >> +       default:
> >> +               return -1;
> >> +       };
> >> +
> >> +       return 0;
> >> +}
> >> +
> >>   static int
> >>   parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
> >>   {
> >>          char *desc = strdup(_desc);
> >>          char *token, *tctx = NULL, *tstart = desc;
> >> +       int ret = 0;
> >> +
> >> +       if (!strcmp(_desc, "0"))
> >> +               goto out;
> > 
> > Hang on, what this special case?
> 
> For no dependencies.
> 
> If I move the check to parse_dependency then dependency of "0/0/0/0" 
> would be silently accepted. It wouldn't be a big deal, who cares, but I 
> thought it is better to be more strict.

It was just not clear at this point what is being matched, what the
meaning of 0 is.

/* 0 refers to self, a degenerate dependency */
?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing
  2020-06-18  7:55       ` Chris Wilson
@ 2020-06-18 10:03         ` Tvrtko Ursulin
  2020-06-18 10:05           ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Tvrtko Ursulin @ 2020-06-18 10:03 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 18/06/2020 08:55, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-18 08:40:25)
>>
>> On 18/06/2020 08:14, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-06-17 17:01:11)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Evaluation of userspace load balancing options was how this tool started
>>>> but since we have settled on doing it in the kernel.
>>>>
>>>> Tomorrow we will want to update the tool for new engine interfaces and all
>>>> this legacy code will just be a distraction.
>>>>
>>>> Rip out everything not related to explicit load balancing implemented via
>>>> context engine maps and adjust the workloads to use it.
>>>
>>> Hmm, if this is on the table, should we also then restrict
>>> load-balancing wsim to gen11+ so that we can use the timed loops rather
>>> nop batches? That would be a huge selling point, and I'll just keep an
>>> old checkout around for nop load balancing with all the trimmings.
>>
>> That was my plan for the next step yes. Just taking your patch without
>> further changes would already make it work I think. But also at some
>> point I want to convert the engine selection (and engine naming in
>> descriptors) to class:instance.
>>
>> Why do you need the nop/old balancing stuff? I would hope going forward
>> we only need to compare current balancing against any changes. So I'd
>> really like to remoev the userspace balancing stuff.
> 
> There are still some cases where i915 is beaten by plain old contexts,
> usually that is a combination of semaphores and interrupt latency, but
> some I just don't understand. There is still an uncomfortably large
> variation between kernel releases, and comparing the regressions in
> different balancers is useful to narrow down the problem.

You could create separate workloads to simulate "-b context" to a 
degree? I really want to rip this out. Can you cut your losses and 
forget it existed? :)

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing
  2020-06-18 10:03         ` Tvrtko Ursulin
@ 2020-06-18 10:05           ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2020-06-18 10:05 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2020-06-18 11:03:11)
> 
> On 18/06/2020 08:55, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-18 08:40:25)
> >>
> >> On 18/06/2020 08:14, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2020-06-17 17:01:11)
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> Evaluation of userspace load balancing options was how this tool started
> >>>> but since we have settled on doing it in the kernel.
> >>>>
> >>>> Tomorrow we will want to update the tool for new engine interfaces and all
> >>>> this legacy code will just be a distraction.
> >>>>
> >>>> Rip out everything not related to explicit load balancing implemented via
> >>>> context engine maps and adjust the workloads to use it.
> >>>
> >>> Hmm, if this is on the table, should we also then restrict
> >>> load-balancing wsim to gen11+ so that we can use the timed loops rather
> >>> nop batches? That would be a huge selling point, and I'll just keep an
> >>> old checkout around for nop load balancing with all the trimmings.
> >>
> >> That was my plan for the next step yes. Just taking your patch without
> >> further changes would already make it work I think. But also at some
> >> point I want to convert the engine selection (and engine naming in
> >> descriptors) to class:instance.
> >>
> >> Why do you need the nop/old balancing stuff? I would hope going forward
> >> we only need to compare current balancing against any changes. So I'd
> >> really like to remoev the userspace balancing stuff.
> > 
> > There are still some cases where i915 is beaten by plain old contexts,
> > usually that is a combination of semaphores and interrupt latency, but
> > some I just don't understand. There is still an uncomfortably large
> > variation between kernel releases, and comparing the regressions in
> > different balancers is useful to narrow down the problem.
> 
> You could create separate workloads to simulate "-b context" to a 
> degree? I really want to rip this out. Can you cut your losses and 
> forget it existed? :)

It's fine; I can keep a stable benchmark around of a known checkout.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-06-18 10:05 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-17 16:01 [Intel-gfx] [PATCH i-g-t 00/10] gem_wsim improvements Tvrtko Ursulin
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 01/10] gem_wsim: Rip out userspace balancing Tvrtko Ursulin
2020-06-17 16:07   ` Chris Wilson
2020-06-18  7:14   ` Chris Wilson
2020-06-18  7:40     ` Tvrtko Ursulin
2020-06-18  7:55       ` Chris Wilson
2020-06-18 10:03         ` Tvrtko Ursulin
2020-06-18 10:05           ` Chris Wilson
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 02/10] gem_wsim: Buffer objects working sets and complex dependencies Tvrtko Ursulin
2020-06-17 16:57   ` Chris Wilson
2020-06-18  9:05     ` Tvrtko Ursulin
2020-06-18  9:22       ` Chris Wilson
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 03/10] gem_wsim: Show workload timing stats Tvrtko Ursulin
2020-06-17 16:58   ` Chris Wilson
2020-06-18  7:46     ` Tvrtko Ursulin
2020-06-18  7:57       ` Chris Wilson
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 04/10] gem_wsim: Move BO allocation to a helper Tvrtko Ursulin
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 05/10] gem_wsim: Support random buffer sizes Tvrtko Ursulin
2020-06-17 16:31   ` Chris Wilson
2020-06-18  8:06     ` Tvrtko Ursulin
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 06/10] gem_wsim: Support scaling workload batch durations Tvrtko Ursulin
2020-06-17 16:22   ` Chris Wilson
2020-06-18  8:01     ` Tvrtko Ursulin
2020-06-18  8:07       ` Chris Wilson
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 07/10] gem_wsim: Log max and active working set sizes in verbose mode Tvrtko Ursulin
2020-06-17 17:07   ` Chris Wilson
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 08/10] gem_wsim: Snippet of a workload extracted from carchase Tvrtko Ursulin
2020-06-17 17:45   ` Chris Wilson
2020-06-18  7:53     ` Tvrtko Ursulin
2020-06-18  8:02       ` Chris Wilson
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 09/10] gem_wsim: Implement device selection Tvrtko Ursulin
2020-06-17 17:09   ` Chris Wilson
2020-06-17 16:01 ` [Intel-gfx] [PATCH i-g-t 10/10] gem_wsim: Fix calibration handling Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.