All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH i-g-t 00/17] Media scalability tooling
@ 2018-10-18 15:27 ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:27 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A bunch of patches to trace.pl and gem_wsim which enable simulation and load
balancing analysis of the Virtual Engine work done separately by Chris Wilson.

Culmination is being able to simulate a so called frame split media workloads.

Example workload for this looks like this (annotated only for cover letter):

  X.1.0			    ; disable preemption on context 1
  M.1.VCS1		    ; configure engine map on context 1
  B.1			    ; turn on load balancing on context 1
  X.2.0			    ; \
  M.2.VCS2		    ; -> same as above but for context 2
  B.2			    ; /
  b.2.1.VCS1		    ; bond VCS2 in context 2 with VCS1
  f			    ; create an unsignaled fence
  1.DEFAULT.*.f-1.0	    ; submit an infinite batch to ctx 1, with input fence
  2.DEFAULT.4000-6000.s-1.0 ; submit a 4-6ms batch to ctx 2, with submit fence from the previous step
  a.-3			    ; advance the fence created 3 lines above
  s.-2			    ; wait for ctx 2 batch to complete
  T.-4			    ; terminate ctx 1 infinite batch
  3.RCS.2000-4000.-5/-4.0   ; submit 2-4ms ctx 3 batch to RCS, with data dependencies
  3.VECS.2000.-1.0	    ; submit 2ms ctx 3 batch to VECS, data dependency on previous batch
  4.BCS.1000.-1.0	    ; submit 1ms ctx 4 batch to BCS, data dependency on previous batch
  s.-2			    ; wait for ctx 3 batch to complete
  p.16667		    ; start next iteration to hit 60fps simulation

And can be run for instance like this:

  ./gem_wsim -n <calibration> -w wsim/frame_split_60fps.wsim -r 60 -c 3

This runs three clients of the same workload, 60 iterations each.

This can then also be captured with trace.pl and a HTML representation of the
execution timeline analysed:

  ../scripts/trace.pl --trace ./gem_wsim ...
  perf script | ../scripts/trace.pl -s -c --gpu-timeline --html >timeline.html

[Vis npm module is required to be present in the same directory as the HTML    ]
[file, see trace.pl --help for instructions.				       ]

Tvrtko Ursulin (17):
  lib: Update uapi headers
  trace.pl: Virtual engine support
  trace.pl: Virtual engine preemption support
  wsim/media-bench: i915 balancing
  gem_wsim: Use IGT uapi headers
  gem_wsim: Fix shadowed local
  gem_wsim: Factor out common error handling
  gem_wsim: More wsim_err
  gem_wsim: Submit fence support
  gem_wsim: Extract str to engine lookup
  gem_wsim: Engine map support
  gem_wsim: Save some lines by changing to implicit NULL checking
  gem_wsim: Compact int command parsing with a macro
  gem_wsim: Engine map load balance command
  gem_wsim: Engine bond command
  gem_wsim: Some more example workloads
  gem_wsim: Infinite batch support

 benchmarks/gem_wsim.c                       | 1060 +++++++++++++------
 benchmarks/wsim/README                      |  111 +-
 benchmarks/wsim/frame-split-60fps.wsim      |   18 +
 benchmarks/wsim/high-composited-game.wsim   |   11 +
 benchmarks/wsim/media-1080p-player.wsim     |    5 +
 benchmarks/wsim/medium-composited-game.wsim |    9 +
 include/drm-uapi/amdgpu_drm.h               |   52 +-
 include/drm-uapi/drm.h                      |   16 +
 include/drm-uapi/drm_fourcc.h               |  224 ++++
 include/drm-uapi/drm_mode.h                 |   26 +-
 include/drm-uapi/etnaviv_drm.h              |    6 +
 include/drm-uapi/exynos_drm.h               |  240 +++++
 include/drm-uapi/i915_drm.h                 |  239 ++++-
 include/drm-uapi/msm_drm.h                  |    2 +
 include/drm-uapi/sync_file.h                |   98 --
 include/drm-uapi/tegra_drm.h                |  492 ++++++++-
 include/drm-uapi/v3d_drm.h                  |  194 ++++
 include/drm-uapi/vc4_drm.h                  |   13 +-
 include/drm-uapi/virtgpu_drm.h              |    1 +
 include/drm-uapi/vmwgfx_drm.h               |  166 ++-
 scripts/media-bench.pl                      |    9 +-
 scripts/trace.pl                            |  244 ++++-
 22 files changed, 2705 insertions(+), 531 deletions(-)
 create mode 100644 benchmarks/wsim/frame-split-60fps.wsim
 create mode 100644 benchmarks/wsim/high-composited-game.wsim
 create mode 100644 benchmarks/wsim/media-1080p-player.wsim
 create mode 100644 benchmarks/wsim/medium-composited-game.wsim
 delete mode 100644 include/drm-uapi/sync_file.h
 create mode 100644 include/drm-uapi/v3d_drm.h

-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 00/17] Media scalability tooling
@ 2018-10-18 15:27 ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:27 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A bunch of patches to trace.pl and gem_wsim which enable simulation and load
balancing analysis of the Virtual Engine work done separately by Chris Wilson.

Culmination is being able to simulate a so called frame split media workloads.

Example workload for this looks like this (annotated only for cover letter):

  X.1.0			    ; disable preemption on context 1
  M.1.VCS1		    ; configure engine map on context 1
  B.1			    ; turn on load balancing on context 1
  X.2.0			    ; \
  M.2.VCS2		    ; -> same as above but for context 2
  B.2			    ; /
  b.2.1.VCS1		    ; bond VCS2 in context 2 with VCS1
  f			    ; create an unsignaled fence
  1.DEFAULT.*.f-1.0	    ; submit an infinite batch to ctx 1, with input fence
  2.DEFAULT.4000-6000.s-1.0 ; submit a 4-6ms batch to ctx 2, with submit fence from the previous step
  a.-3			    ; advance the fence created 3 lines above
  s.-2			    ; wait for ctx 2 batch to complete
  T.-4			    ; terminate ctx 1 infinite batch
  3.RCS.2000-4000.-5/-4.0   ; submit 2-4ms ctx 3 batch to RCS, with data dependencies
  3.VECS.2000.-1.0	    ; submit 2ms ctx 3 batch to VECS, data dependency on previous batch
  4.BCS.1000.-1.0	    ; submit 1ms ctx 4 batch to BCS, data dependency on previous batch
  s.-2			    ; wait for ctx 3 batch to complete
  p.16667		    ; start next iteration to hit 60fps simulation

And can be run for instance like this:

  ./gem_wsim -n <calibration> -w wsim/frame_split_60fps.wsim -r 60 -c 3

This runs three clients of the same workload, 60 iterations each.

This can then also be captured with trace.pl and a HTML representation of the
execution timeline analysed:

  ../scripts/trace.pl --trace ./gem_wsim ...
  perf script | ../scripts/trace.pl -s -c --gpu-timeline --html >timeline.html

[Vis npm module is required to be present in the same directory as the HTML    ]
[file, see trace.pl --help for instructions.				       ]

Tvrtko Ursulin (17):
  lib: Update uapi headers
  trace.pl: Virtual engine support
  trace.pl: Virtual engine preemption support
  wsim/media-bench: i915 balancing
  gem_wsim: Use IGT uapi headers
  gem_wsim: Fix shadowed local
  gem_wsim: Factor out common error handling
  gem_wsim: More wsim_err
  gem_wsim: Submit fence support
  gem_wsim: Extract str to engine lookup
  gem_wsim: Engine map support
  gem_wsim: Save some lines by changing to implicit NULL checking
  gem_wsim: Compact int command parsing with a macro
  gem_wsim: Engine map load balance command
  gem_wsim: Engine bond command
  gem_wsim: Some more example workloads
  gem_wsim: Infinite batch support

 benchmarks/gem_wsim.c                       | 1060 +++++++++++++------
 benchmarks/wsim/README                      |  111 +-
 benchmarks/wsim/frame-split-60fps.wsim      |   18 +
 benchmarks/wsim/high-composited-game.wsim   |   11 +
 benchmarks/wsim/media-1080p-player.wsim     |    5 +
 benchmarks/wsim/medium-composited-game.wsim |    9 +
 include/drm-uapi/amdgpu_drm.h               |   52 +-
 include/drm-uapi/drm.h                      |   16 +
 include/drm-uapi/drm_fourcc.h               |  224 ++++
 include/drm-uapi/drm_mode.h                 |   26 +-
 include/drm-uapi/etnaviv_drm.h              |    6 +
 include/drm-uapi/exynos_drm.h               |  240 +++++
 include/drm-uapi/i915_drm.h                 |  239 ++++-
 include/drm-uapi/msm_drm.h                  |    2 +
 include/drm-uapi/sync_file.h                |   98 --
 include/drm-uapi/tegra_drm.h                |  492 ++++++++-
 include/drm-uapi/v3d_drm.h                  |  194 ++++
 include/drm-uapi/vc4_drm.h                  |   13 +-
 include/drm-uapi/virtgpu_drm.h              |    1 +
 include/drm-uapi/vmwgfx_drm.h               |  166 ++-
 scripts/media-bench.pl                      |    9 +-
 scripts/trace.pl                            |  244 ++++-
 22 files changed, 2705 insertions(+), 531 deletions(-)
 create mode 100644 benchmarks/wsim/frame-split-60fps.wsim
 create mode 100644 benchmarks/wsim/high-composited-game.wsim
 create mode 100644 benchmarks/wsim/media-1080p-player.wsim
 create mode 100644 benchmarks/wsim/medium-composited-game.wsim
 delete mode 100644 include/drm-uapi/sync_file.h
 create mode 100644 include/drm-uapi/v3d_drm.h

-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 01/17] lib: Update uapi headers
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:27   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:27 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Sync with latest DRM uapi changes.
---
 include/drm-uapi/amdgpu_drm.h  |  52 +++-
 include/drm-uapi/drm.h         |  16 ++
 include/drm-uapi/drm_fourcc.h  | 224 +++++++++++++++
 include/drm-uapi/drm_mode.h    |  26 +-
 include/drm-uapi/etnaviv_drm.h |   6 +
 include/drm-uapi/exynos_drm.h  | 240 ++++++++++++++++
 include/drm-uapi/i915_drm.h    | 239 +++++++++++++++-
 include/drm-uapi/msm_drm.h     |   2 +
 include/drm-uapi/sync_file.h   |  98 -------
 include/drm-uapi/tegra_drm.h   | 492 ++++++++++++++++++++++++++++++++-
 include/drm-uapi/v3d_drm.h     | 194 +++++++++++++
 include/drm-uapi/vc4_drm.h     |  13 +-
 include/drm-uapi/virtgpu_drm.h |   1 +
 include/drm-uapi/vmwgfx_drm.h  | 166 ++++++++---
 14 files changed, 1613 insertions(+), 156 deletions(-)
 delete mode 100644 include/drm-uapi/sync_file.h
 create mode 100644 include/drm-uapi/v3d_drm.h

diff --git a/include/drm-uapi/amdgpu_drm.h b/include/drm-uapi/amdgpu_drm.h
index 1816bd8200d1..370e9a5536ef 100644
--- a/include/drm-uapi/amdgpu_drm.h
+++ b/include/drm-uapi/amdgpu_drm.h
@@ -72,12 +72,41 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
 
+/**
+ * DOC: memory domains
+ *
+ * %AMDGPU_GEM_DOMAIN_CPU	System memory that is not GPU accessible.
+ * Memory in this pool could be swapped out to disk if there is pressure.
+ *
+ * %AMDGPU_GEM_DOMAIN_GTT	GPU accessible system memory, mapped into the
+ * GPU's virtual address space via gart. Gart memory linearizes non-contiguous
+ * pages of system memory, allows GPU access system memory in a linezrized
+ * fashion.
+ *
+ * %AMDGPU_GEM_DOMAIN_VRAM	Local video memory. For APUs, it is memory
+ * carved out by the BIOS.
+ *
+ * %AMDGPU_GEM_DOMAIN_GDS	Global on-chip data storage used to share data
+ * across shader threads.
+ *
+ * %AMDGPU_GEM_DOMAIN_GWS	Global wave sync, used to synchronize the
+ * execution of all the waves on a device.
+ *
+ * %AMDGPU_GEM_DOMAIN_OA	Ordered append, used by 3D or Compute engines
+ * for appending data.
+ */
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
 #define AMDGPU_GEM_DOMAIN_VRAM		0x4
 #define AMDGPU_GEM_DOMAIN_GDS		0x8
 #define AMDGPU_GEM_DOMAIN_GWS		0x10
 #define AMDGPU_GEM_DOMAIN_OA		0x20
+#define AMDGPU_GEM_DOMAIN_MASK		(AMDGPU_GEM_DOMAIN_CPU | \
+					 AMDGPU_GEM_DOMAIN_GTT | \
+					 AMDGPU_GEM_DOMAIN_VRAM | \
+					 AMDGPU_GEM_DOMAIN_GDS | \
+					 AMDGPU_GEM_DOMAIN_GWS | \
+					 AMDGPU_GEM_DOMAIN_OA)
 
 /* Flag that CPU access will be required for the case of VRAM domain */
 #define AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED	(1 << 0)
@@ -95,6 +124,10 @@ extern "C" {
 #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID	(1 << 6)
 /* Flag that BO sharing will be explicitly synchronized */
 #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC		(1 << 7)
+/* Flag that indicates allocating MQD gart on GFX9, where the mtype
+ * for the second page onward should be set to NC.
+ */
+#define AMDGPU_GEM_CREATE_MQD_GFX9		(1 << 8)
 
 struct drm_amdgpu_gem_create_in  {
 	/** the requested memory size */
@@ -473,7 +506,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_HW_IP_UVD_ENC      5
 #define AMDGPU_HW_IP_VCN_DEC      6
 #define AMDGPU_HW_IP_VCN_ENC      7
-#define AMDGPU_HW_IP_NUM          8
+#define AMDGPU_HW_IP_VCN_JPEG     8
+#define AMDGPU_HW_IP_NUM          9
 
 #define AMDGPU_HW_IP_INSTANCE_MAX_COUNT 1
 
@@ -482,6 +516,7 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_DEPENDENCIES	0x03
 #define AMDGPU_CHUNK_ID_SYNCOBJ_IN      0x04
 #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT     0x05
+#define AMDGPU_CHUNK_ID_BO_HANDLES      0x06
 
 struct drm_amdgpu_cs_chunk {
 	__u32		chunk_id;
@@ -520,6 +555,10 @@ union drm_amdgpu_cs {
 /* Preempt flag, IB should set Pre_enb bit if PREEMPT flag detected */
 #define AMDGPU_IB_FLAG_PREEMPT (1<<2)
 
+/* The IB fence should do the L2 writeback but not invalidate any shader
+ * caches (L2/vL1/sL1/I$). */
+#define AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE (1 << 3)
+
 struct drm_amdgpu_cs_chunk_ib {
 	__u32 _pad;
 	/** AMDGPU_IB_FLAG_* */
@@ -618,6 +657,16 @@ struct drm_amdgpu_cs_chunk_data {
 	#define AMDGPU_INFO_FW_SOS		0x0c
 	/* Subquery id: Query PSP ASD firmware version */
 	#define AMDGPU_INFO_FW_ASD		0x0d
+	/* Subquery id: Query VCN firmware version */
+	#define AMDGPU_INFO_FW_VCN		0x0e
+	/* Subquery id: Query GFX RLC SRLC firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_CNTL 0x0f
+	/* Subquery id: Query GFX RLC SRLG firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_GPM_MEM 0x10
+	/* Subquery id: Query GFX RLC SRLS firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_SRM_MEM 0x11
+	/* Subquery id: Query DMCU firmware version */
+	#define AMDGPU_INFO_FW_DMCU		0x12
 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED		0x0f
 /* the used VRAM size */
@@ -806,6 +855,7 @@ struct drm_amdgpu_info_firmware {
 #define AMDGPU_VRAM_TYPE_GDDR5 5
 #define AMDGPU_VRAM_TYPE_HBM   6
 #define AMDGPU_VRAM_TYPE_DDR3  7
+#define AMDGPU_VRAM_TYPE_DDR4  8
 
 struct drm_amdgpu_info_device {
 	/** PCI Device ID */
diff --git a/include/drm-uapi/drm.h b/include/drm-uapi/drm.h
index f0bd91de0cf9..85c685a2075e 100644
--- a/include/drm-uapi/drm.h
+++ b/include/drm-uapi/drm.h
@@ -674,6 +674,22 @@ struct drm_get_cap {
  */
 #define DRM_CLIENT_CAP_ATOMIC	3
 
+/**
+ * DRM_CLIENT_CAP_ASPECT_RATIO
+ *
+ * If set to 1, the DRM core will provide aspect ratio information in modes.
+ */
+#define DRM_CLIENT_CAP_ASPECT_RATIO    4
+
+/**
+ * DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
+ *
+ * If set to 1, the DRM core will expose special connectors to be used for
+ * writing back to memory the scene setup in the commit. Depends on client
+ * also supporting DRM_CLIENT_CAP_ATOMIC
+ */
+#define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS	5
+
 /** DRM_IOCTL_SET_CLIENT_CAP ioctl argument type */
 struct drm_set_client_cap {
 	__u64 capability;
diff --git a/include/drm-uapi/drm_fourcc.h b/include/drm-uapi/drm_fourcc.h
index e04613d30a13..0cd40ebfa1b1 100644
--- a/include/drm-uapi/drm_fourcc.h
+++ b/include/drm-uapi/drm_fourcc.h
@@ -30,11 +30,50 @@
 extern "C" {
 #endif
 
+/**
+ * DOC: overview
+ *
+ * In the DRM subsystem, framebuffer pixel formats are described using the
+ * fourcc codes defined in `include/uapi/drm/drm_fourcc.h`. In addition to the
+ * fourcc code, a Format Modifier may optionally be provided, in order to
+ * further describe the buffer's format - for example tiling or compression.
+ *
+ * Format Modifiers
+ * ----------------
+ *
+ * Format modifiers are used in conjunction with a fourcc code, forming a
+ * unique fourcc:modifier pair. This format:modifier pair must fully define the
+ * format and data layout of the buffer, and should be the only way to describe
+ * that particular buffer.
+ *
+ * Having multiple fourcc:modifier pairs which describe the same layout should
+ * be avoided, as such aliases run the risk of different drivers exposing
+ * different names for the same data format, forcing userspace to understand
+ * that they are aliases.
+ *
+ * Format modifiers may change any property of the buffer, including the number
+ * of planes and/or the required allocation size. Format modifiers are
+ * vendor-namespaced, and as such the relationship between a fourcc code and a
+ * modifier is specific to the modifer being used. For example, some modifiers
+ * may preserve meaning - such as number of planes - from the fourcc code,
+ * whereas others may not.
+ *
+ * Vendors should document their modifier usage in as much detail as
+ * possible, to ensure maximum compatibility across devices, drivers and
+ * applications.
+ *
+ * The authoritative list of format modifier codes is found in
+ * `include/uapi/drm/drm_fourcc.h`
+ */
+
 #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \
 				 ((__u32)(c) << 16) | ((__u32)(d) << 24))
 
 #define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */
 
+/* Reserve 0 for the invalid format specifier */
+#define DRM_FORMAT_INVALID	0
+
 /* color index */
 #define DRM_FORMAT_C8		fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
@@ -183,6 +222,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_QCOM    0x05
 #define DRM_FORMAT_MOD_VENDOR_VIVANTE 0x06
 #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
+#define DRM_FORMAT_MOD_VENDOR_ARM     0x08
 /* add more to the end as needed */
 
 #define DRM_FORMAT_RESERVED	      ((1ULL << 56) - 1)
@@ -298,6 +338,28 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_SAMSUNG_64_32_TILE	fourcc_mod_code(SAMSUNG, 1)
 
+/*
+ * Tiled, 16 (pixels) x 16 (lines) - sized macroblocks
+ *
+ * This is a simple tiled layout using tiles of 16x16 pixels in a row-major
+ * layout. For YCbCr formats Cb/Cr components are taken in such a way that
+ * they correspond to their 16x16 luma block.
+ */
+#define DRM_FORMAT_MOD_SAMSUNG_16_16_TILE	fourcc_mod_code(SAMSUNG, 2)
+
+/*
+ * Qualcomm Compressed Format
+ *
+ * Refers to a compressed variant of the base format that is compressed.
+ * Implementation may be platform and base-format specific.
+ *
+ * Each macrotile consists of m x n (mostly 4 x 4) tiles.
+ * Pixel data pitch/stride is aligned with macrotile width.
+ * Pixel data height is aligned with macrotile height.
+ * Entire pixel data buffer is aligned with 4k(bytes).
+ */
+#define DRM_FORMAT_MOD_QCOM_COMPRESSED	fourcc_mod_code(QCOM, 1)
+
 /* Vivante framebuffer modifiers */
 
 /*
@@ -384,6 +446,23 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
 	fourcc_mod_code(NVIDIA, 0x15)
 
+/*
+ * Some Broadcom modifiers take parameters, for example the number of
+ * vertical lines in the image. Reserve the lower 32 bits for modifier
+ * type, and the next 24 bits for parameters. Top 8 bits are the
+ * vendor code.
+ */
+#define __fourcc_mod_broadcom_param_shift 8
+#define __fourcc_mod_broadcom_param_bits 48
+#define fourcc_mod_broadcom_code(val, params) \
+	fourcc_mod_code(BROADCOM, ((((__u64)params) << __fourcc_mod_broadcom_param_shift) | val))
+#define fourcc_mod_broadcom_param(m) \
+	((int)(((m) >> __fourcc_mod_broadcom_param_shift) &	\
+	       ((1ULL << __fourcc_mod_broadcom_param_bits) - 1)))
+#define fourcc_mod_broadcom_mod(m) \
+	((m) & ~(((1ULL << __fourcc_mod_broadcom_param_bits) - 1) <<	\
+		 __fourcc_mod_broadcom_param_shift))
+
 /*
  * Broadcom VC4 "T" format
  *
@@ -405,6 +484,151 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_BROADCOM_VC4_T_TILED fourcc_mod_code(BROADCOM, 1)
 
+/*
+ * Broadcom SAND format
+ *
+ * This is the native format that the H.264 codec block uses.  For VC4
+ * HVS, it is only valid for H.264 (NV12/21) and RGBA modes.
+ *
+ * The image can be considered to be split into columns, and the
+ * columns are placed consecutively into memory.  The width of those
+ * columns can be either 32, 64, 128, or 256 pixels, but in practice
+ * only 128 pixel columns are used.
+ *
+ * The pitch between the start of each column is set to optimally
+ * switch between SDRAM banks. This is passed as the number of lines
+ * of column width in the modifier (we can't use the stride value due
+ * to various core checks that look at it , so you should set the
+ * stride to width*cpp).
+ *
+ * Note that the column height for this format modifier is the same
+ * for all of the planes, assuming that each column contains both Y
+ * and UV.  Some SAND-using hardware stores UV in a separate tiled
+ * image from Y to reduce the column height, which is not supported
+ * with these modifiers.
+ */
+
+#define DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(2, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND64_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(3, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(4, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND256_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(5, v)
+
+#define DRM_FORMAT_MOD_BROADCOM_SAND32 \
+	DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND64 \
+	DRM_FORMAT_MOD_BROADCOM_SAND64_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND128 \
+	DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND256 \
+	DRM_FORMAT_MOD_BROADCOM_SAND256_COL_HEIGHT(0)
+
+/* Broadcom UIF format
+ *
+ * This is the common format for the current Broadcom multimedia
+ * blocks, including V3D 3.x and newer, newer video codecs, and
+ * displays.
+ *
+ * The image consists of utiles (64b blocks), UIF blocks (2x2 utiles),
+ * and macroblocks (4x4 UIF blocks).  Those 4x4 UIF block groups are
+ * stored in columns, with padding between the columns to ensure that
+ * moving from one column to the next doesn't hit the same SDRAM page
+ * bank.
+ *
+ * To calculate the padding, it is assumed that each hardware block
+ * and the software driving it knows the platform's SDRAM page size,
+ * number of banks, and XOR address, and that it's identical between
+ * all blocks using the format.  This tiling modifier will use XOR as
+ * necessary to reduce the padding.  If a hardware block can't do XOR,
+ * the assumption is that a no-XOR tiling modifier will be created.
+ */
+#define DRM_FORMAT_MOD_BROADCOM_UIF fourcc_mod_code(BROADCOM, 6)
+
+/*
+ * Arm Framebuffer Compression (AFBC) modifiers
+ *
+ * AFBC is a proprietary lossless image compression protocol and format.
+ * It provides fine-grained random access and minimizes the amount of data
+ * transferred between IP blocks.
+ *
+ * AFBC has several features which may be supported and/or used, which are
+ * represented using bits in the modifier. Not all combinations are valid,
+ * and different devices or use-cases may support different combinations.
+ */
+#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode)	fourcc_mod_code(ARM, __afbc_mode)
+
+/*
+ * AFBC superblock size
+ *
+ * Indicates the superblock size(s) used for the AFBC buffer. The buffer
+ * size (in pixels) must be aligned to a multiple of the superblock size.
+ * Four lowest significant bits(LSBs) are reserved for block size.
+ */
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_MASK      0xf
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_16x16     (1ULL)
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_32x8      (2ULL)
+
+/*
+ * AFBC lossless colorspace transform
+ *
+ * Indicates that the buffer makes use of the AFBC lossless colorspace
+ * transform.
+ */
+#define AFBC_FORMAT_MOD_YTR     (1ULL <<  4)
+
+/*
+ * AFBC block-split
+ *
+ * Indicates that the payload of each superblock is split. The second
+ * half of the payload is positioned at a predefined offset from the start
+ * of the superblock payload.
+ */
+#define AFBC_FORMAT_MOD_SPLIT   (1ULL <<  5)
+
+/*
+ * AFBC sparse layout
+ *
+ * This flag indicates that the payload of each superblock must be stored at a
+ * predefined position relative to the other superblocks in the same AFBC
+ * buffer. This order is the same order used by the header buffer. In this mode
+ * each superblock is given the same amount of space as an uncompressed
+ * superblock of the particular format would require, rounding up to the next
+ * multiple of 128 bytes in size.
+ */
+#define AFBC_FORMAT_MOD_SPARSE  (1ULL <<  6)
+
+/*
+ * AFBC copy-block restrict
+ *
+ * Buffers with this flag must obey the copy-block restriction. The restriction
+ * is such that there are no copy-blocks referring across the border of 8x8
+ * blocks. For the subsampled data the 8x8 limitation is also subsampled.
+ */
+#define AFBC_FORMAT_MOD_CBR     (1ULL <<  7)
+
+/*
+ * AFBC tiled layout
+ *
+ * The tiled layout groups superblocks in 8x8 or 4x4 tiles, where all
+ * superblocks inside a tile are stored together in memory. 8x8 tiles are used
+ * for pixel formats up to and including 32 bpp while 4x4 tiles are used for
+ * larger bpp formats. The order between the tiles is scan line.
+ * When the tiled layout is used, the buffer size (in pixels) must be aligned
+ * to the tile size.
+ */
+#define AFBC_FORMAT_MOD_TILED   (1ULL <<  8)
+
+/*
+ * AFBC solid color blocks
+ *
+ * Indicates that the buffer makes use of solid-color blocks, whereby bandwidth
+ * can be reduced if a whole superblock is a single color.
+ */
+#define AFBC_FORMAT_MOD_SC      (1ULL <<  9)
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/drm_mode.h b/include/drm-uapi/drm_mode.h
index 2c575794fb52..d3e0fe31efc5 100644
--- a/include/drm-uapi/drm_mode.h
+++ b/include/drm-uapi/drm_mode.h
@@ -93,6 +93,15 @@ extern "C" {
 #define DRM_MODE_PICTURE_ASPECT_NONE		0
 #define DRM_MODE_PICTURE_ASPECT_4_3		1
 #define DRM_MODE_PICTURE_ASPECT_16_9		2
+#define DRM_MODE_PICTURE_ASPECT_64_27		3
+#define DRM_MODE_PICTURE_ASPECT_256_135		4
+
+/* Content type options */
+#define DRM_MODE_CONTENT_TYPE_NO_DATA		0
+#define DRM_MODE_CONTENT_TYPE_GRAPHICS		1
+#define DRM_MODE_CONTENT_TYPE_PHOTO		2
+#define DRM_MODE_CONTENT_TYPE_CINEMA		3
+#define DRM_MODE_CONTENT_TYPE_GAME		4
 
 /* Aspect ratio flag bitmask (4 bits 22:19) */
 #define DRM_MODE_FLAG_PIC_AR_MASK		(0x0F<<19)
@@ -102,6 +111,10 @@ extern "C" {
 			(DRM_MODE_PICTURE_ASPECT_4_3<<19)
 #define  DRM_MODE_FLAG_PIC_AR_16_9 \
 			(DRM_MODE_PICTURE_ASPECT_16_9<<19)
+#define  DRM_MODE_FLAG_PIC_AR_64_27 \
+			(DRM_MODE_PICTURE_ASPECT_64_27<<19)
+#define  DRM_MODE_FLAG_PIC_AR_256_135 \
+			(DRM_MODE_PICTURE_ASPECT_256_135<<19)
 
 #define  DRM_MODE_FLAG_ALL	(DRM_MODE_FLAG_PHSYNC |		\
 				 DRM_MODE_FLAG_NHSYNC |		\
@@ -173,8 +186,9 @@ extern "C" {
 /*
  * DRM_MODE_REFLECT_<axis>
  *
- * Signals that the contents of a drm plane is reflected in the <axis> axis,
+ * Signals that the contents of a drm plane is reflected along the <axis> axis,
  * in the same way as mirroring.
+ * See kerneldoc chapter "Plane Composition Properties" for more details.
  *
  * This define is provided as a convenience, looking up the property id
  * using the name->prop id lookup is the preferred method.
@@ -338,6 +352,7 @@ enum drm_mode_subconnector {
 #define DRM_MODE_CONNECTOR_VIRTUAL      15
 #define DRM_MODE_CONNECTOR_DSI		16
 #define DRM_MODE_CONNECTOR_DPI		17
+#define DRM_MODE_CONNECTOR_WRITEBACK	18
 
 struct drm_mode_get_connector {
 
@@ -363,7 +378,7 @@ struct drm_mode_get_connector {
 	__u32 pad;
 };
 
-#define DRM_MODE_PROP_PENDING	(1<<0)
+#define DRM_MODE_PROP_PENDING	(1<<0) /* deprecated, do not use */
 #define DRM_MODE_PROP_RANGE	(1<<1)
 #define DRM_MODE_PROP_IMMUTABLE	(1<<2)
 #define DRM_MODE_PROP_ENUM	(1<<3) /* enumerated type with text strings */
@@ -598,8 +613,11 @@ struct drm_mode_crtc_lut {
 };
 
 struct drm_color_ctm {
-	/* Conversion matrix in S31.32 format. */
-	__s64 matrix[9];
+	/*
+	 * Conversion matrix in S31.32 sign-magnitude
+	 * (not two's complement!) format.
+	 */
+	__u64 matrix[9];
 };
 
 struct drm_color_lut {
diff --git a/include/drm-uapi/etnaviv_drm.h b/include/drm-uapi/etnaviv_drm.h
index e9b997a0ef27..0d5c49dc478c 100644
--- a/include/drm-uapi/etnaviv_drm.h
+++ b/include/drm-uapi/etnaviv_drm.h
@@ -55,6 +55,12 @@ struct drm_etnaviv_timespec {
 #define ETNAVIV_PARAM_GPU_FEATURES_4                0x07
 #define ETNAVIV_PARAM_GPU_FEATURES_5                0x08
 #define ETNAVIV_PARAM_GPU_FEATURES_6                0x09
+#define ETNAVIV_PARAM_GPU_FEATURES_7                0x0a
+#define ETNAVIV_PARAM_GPU_FEATURES_8                0x0b
+#define ETNAVIV_PARAM_GPU_FEATURES_9                0x0c
+#define ETNAVIV_PARAM_GPU_FEATURES_10               0x0d
+#define ETNAVIV_PARAM_GPU_FEATURES_11               0x0e
+#define ETNAVIV_PARAM_GPU_FEATURES_12               0x0f
 
 #define ETNAVIV_PARAM_GPU_STREAM_COUNT              0x10
 #define ETNAVIV_PARAM_GPU_REGISTER_MAX              0x11
diff --git a/include/drm-uapi/exynos_drm.h b/include/drm-uapi/exynos_drm.h
index a00116b5cc5c..7414cfd76419 100644
--- a/include/drm-uapi/exynos_drm.h
+++ b/include/drm-uapi/exynos_drm.h
@@ -135,6 +135,219 @@ struct drm_exynos_g2d_exec {
 	__u64					async;
 };
 
+/* Exynos DRM IPP v2 API */
+
+/**
+ * Enumerate available IPP hardware modules.
+ *
+ * @count_ipps: size of ipp_id array / number of ipp modules (set by driver)
+ * @reserved: padding
+ * @ipp_id_ptr: pointer to ipp_id array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_res {
+	__u32 count_ipps;
+	__u32 reserved;
+	__u64 ipp_id_ptr;
+};
+
+enum drm_exynos_ipp_format_type {
+	DRM_EXYNOS_IPP_FORMAT_SOURCE		= 0x01,
+	DRM_EXYNOS_IPP_FORMAT_DESTINATION	= 0x02,
+};
+
+struct drm_exynos_ipp_format {
+	__u32 fourcc;
+	__u32 type;
+	__u64 modifier;
+};
+
+enum drm_exynos_ipp_capability {
+	DRM_EXYNOS_IPP_CAP_CROP		= 0x01,
+	DRM_EXYNOS_IPP_CAP_ROTATE	= 0x02,
+	DRM_EXYNOS_IPP_CAP_SCALE	= 0x04,
+	DRM_EXYNOS_IPP_CAP_CONVERT	= 0x08,
+};
+
+/**
+ * Get IPP hardware capabilities and supported image formats.
+ *
+ * @ipp_id: id of IPP module to query
+ * @capabilities: bitmask of drm_exynos_ipp_capability (set by driver)
+ * @reserved: padding
+ * @formats_count: size of formats array (in entries) / number of filled
+ *		   formats (set by driver)
+ * @formats_ptr: pointer to formats array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_caps {
+	__u32 ipp_id;
+	__u32 capabilities;
+	__u32 reserved;
+	__u32 formats_count;
+	__u64 formats_ptr;
+};
+
+enum drm_exynos_ipp_limit_type {
+	/* size (horizontal/vertial) limits, in pixels (min, max, alignment) */
+	DRM_EXYNOS_IPP_LIMIT_TYPE_SIZE		= 0x0001,
+	/* scale ratio (horizonta/vertial), 16.16 fixed point (min, max) */
+	DRM_EXYNOS_IPP_LIMIT_TYPE_SCALE		= 0x0002,
+
+	/* image buffer area */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_BUFFER	= 0x0001 << 16,
+	/* src/dst rectangle area */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_AREA		= 0x0002 << 16,
+	/* src/dst rectangle area when rotation enabled */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_ROTATED	= 0x0003 << 16,
+
+	DRM_EXYNOS_IPP_LIMIT_TYPE_MASK		= 0x000f,
+	DRM_EXYNOS_IPP_LIMIT_SIZE_MASK		= 0x000f << 16,
+};
+
+struct drm_exynos_ipp_limit_val {
+	__u32 min;
+	__u32 max;
+	__u32 align;
+	__u32 reserved;
+};
+
+/**
+ * IPP module limitation.
+ *
+ * @type: limit type (see drm_exynos_ipp_limit_type enum)
+ * @reserved: padding
+ * @h: horizontal limits
+ * @v: vertical limits
+ */
+struct drm_exynos_ipp_limit {
+	__u32 type;
+	__u32 reserved;
+	struct drm_exynos_ipp_limit_val h;
+	struct drm_exynos_ipp_limit_val v;
+};
+
+/**
+ * Get IPP limits for given image format.
+ *
+ * @ipp_id: id of IPP module to query
+ * @fourcc: image format code (see DRM_FORMAT_* in drm_fourcc.h)
+ * @modifier: image format modifier (see DRM_FORMAT_MOD_* in drm_fourcc.h)
+ * @type: source/destination identifier (drm_exynos_ipp_format_flag enum)
+ * @limits_count: size of limits array (in entries) / number of filled entries
+ *		 (set by driver)
+ * @limits_ptr: pointer to limits array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_limits {
+	__u32 ipp_id;
+	__u32 fourcc;
+	__u64 modifier;
+	__u32 type;
+	__u32 limits_count;
+	__u64 limits_ptr;
+};
+
+enum drm_exynos_ipp_task_id {
+	/* buffer described by struct drm_exynos_ipp_task_buffer */
+	DRM_EXYNOS_IPP_TASK_BUFFER		= 0x0001,
+	/* rectangle described by struct drm_exynos_ipp_task_rect */
+	DRM_EXYNOS_IPP_TASK_RECTANGLE		= 0x0002,
+	/* transformation described by struct drm_exynos_ipp_task_transform */
+	DRM_EXYNOS_IPP_TASK_TRANSFORM		= 0x0003,
+	/* alpha configuration described by struct drm_exynos_ipp_task_alpha */
+	DRM_EXYNOS_IPP_TASK_ALPHA		= 0x0004,
+
+	/* source image data (for buffer and rectangle chunks) */
+	DRM_EXYNOS_IPP_TASK_TYPE_SOURCE		= 0x0001 << 16,
+	/* destination image data (for buffer and rectangle chunks) */
+	DRM_EXYNOS_IPP_TASK_TYPE_DESTINATION	= 0x0002 << 16,
+};
+
+/**
+ * Memory buffer with image data.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_BUFFER
+ * other parameters are same as for AddFB2 generic DRM ioctl
+ */
+struct drm_exynos_ipp_task_buffer {
+	__u32	id;
+	__u32	fourcc;
+	__u32	width, height;
+	__u32	gem_id[4];
+	__u32	offset[4];
+	__u32	pitch[4];
+	__u64	modifier;
+};
+
+/**
+ * Rectangle for processing.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_RECTANGLE
+ * @reserved: padding
+ * @x,@y: left corner in pixels
+ * @w,@h: width/height in pixels
+ */
+struct drm_exynos_ipp_task_rect {
+	__u32	id;
+	__u32	reserved;
+	__u32	x;
+	__u32	y;
+	__u32	w;
+	__u32	h;
+};
+
+/**
+ * Image tranformation description.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_TRANSFORM
+ * @rotation: DRM_MODE_ROTATE_* and DRM_MODE_REFLECT_* values
+ */
+struct drm_exynos_ipp_task_transform {
+	__u32	id;
+	__u32	rotation;
+};
+
+/**
+ * Image global alpha configuration for formats without alpha values.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_ALPHA
+ * @value: global alpha value (0-255)
+ */
+struct drm_exynos_ipp_task_alpha {
+	__u32	id;
+	__u32	value;
+};
+
+enum drm_exynos_ipp_flag {
+	/* generate DRM event after processing */
+	DRM_EXYNOS_IPP_FLAG_EVENT	= 0x01,
+	/* dry run, only check task parameters */
+	DRM_EXYNOS_IPP_FLAG_TEST_ONLY	= 0x02,
+	/* non-blocking processing */
+	DRM_EXYNOS_IPP_FLAG_NONBLOCK	= 0x04,
+};
+
+#define DRM_EXYNOS_IPP_FLAGS (DRM_EXYNOS_IPP_FLAG_EVENT |\
+		DRM_EXYNOS_IPP_FLAG_TEST_ONLY | DRM_EXYNOS_IPP_FLAG_NONBLOCK)
+
+/**
+ * Perform image processing described by array of drm_exynos_ipp_task_*
+ * structures (parameters array).
+ *
+ * @ipp_id: id of IPP module to run the task
+ * @flags: bitmask of drm_exynos_ipp_flag values
+ * @reserved: padding
+ * @params_size: size of parameters array (in bytes)
+ * @params_ptr: pointer to parameters array or NULL
+ * @user_data: (optional) data for drm event
+ */
+struct drm_exynos_ioctl_ipp_commit {
+	__u32 ipp_id;
+	__u32 flags;
+	__u32 reserved;
+	__u32 params_size;
+	__u64 params_ptr;
+	__u64 user_data;
+};
+
 #define DRM_EXYNOS_GEM_CREATE		0x00
 #define DRM_EXYNOS_GEM_MAP		0x01
 /* Reserved 0x03 ~ 0x05 for exynos specific gem ioctl */
@@ -147,6 +360,11 @@ struct drm_exynos_g2d_exec {
 #define DRM_EXYNOS_G2D_EXEC		0x22
 
 /* Reserved 0x30 ~ 0x33 for obsolete Exynos IPP ioctls */
+/* IPP - Image Post Processing */
+#define DRM_EXYNOS_IPP_GET_RESOURCES	0x40
+#define DRM_EXYNOS_IPP_GET_CAPS		0x41
+#define DRM_EXYNOS_IPP_GET_LIMITS	0x42
+#define DRM_EXYNOS_IPP_COMMIT		0x43
 
 #define DRM_IOCTL_EXYNOS_GEM_CREATE		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_GEM_CREATE, struct drm_exynos_gem_create)
@@ -165,8 +383,20 @@ struct drm_exynos_g2d_exec {
 #define DRM_IOCTL_EXYNOS_G2D_EXEC		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_G2D_EXEC, struct drm_exynos_g2d_exec)
 
+#define DRM_IOCTL_EXYNOS_IPP_GET_RESOURCES	DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_RESOURCES, \
+		struct drm_exynos_ioctl_ipp_get_res)
+#define DRM_IOCTL_EXYNOS_IPP_GET_CAPS		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_CAPS, struct drm_exynos_ioctl_ipp_get_caps)
+#define DRM_IOCTL_EXYNOS_IPP_GET_LIMITS		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_LIMITS, \
+		struct drm_exynos_ioctl_ipp_get_limits)
+#define DRM_IOCTL_EXYNOS_IPP_COMMIT		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_COMMIT, struct drm_exynos_ioctl_ipp_commit)
+
 /* EXYNOS specific events */
 #define DRM_EXYNOS_G2D_EVENT		0x80000000
+#define DRM_EXYNOS_IPP_EVENT		0x80000002
 
 struct drm_exynos_g2d_event {
 	struct drm_event	base;
@@ -177,6 +407,16 @@ struct drm_exynos_g2d_event {
 	__u32			reserved;
 };
 
+struct drm_exynos_ipp_event {
+	struct drm_event	base;
+	__u64			user_data;
+	__u32			tv_sec;
+	__u32			tv_usec;
+	__u32			ipp_id;
+	__u32			sequence;
+	__u64			reserved;
+};
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 16e452aa12d4..b14ca9695f1e 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -62,6 +62,26 @@ extern "C" {
 #define I915_ERROR_UEVENT		"ERROR"
 #define I915_RESET_UEVENT		"RESET"
 
+/*
+ * i915_user_extension: Base class for defining a chain of extensions
+ *
+ * Many interfaces need to grow over time. In most cases we can simply
+ * extend the struct and have userspace pass in more data. Another option,
+ * as demonstrated by Vulkan's approach to providing extensions for forward
+ * and backward compatibility, is to use a list of optional structs to
+ * provide those extra details.
+ *
+ * The key advantage to using an extension chain is that it allows us to
+ * redefine the interface more easily than an ever growing struct of
+ * increasing complexity, and for large parts of that interface to be
+ * entirely optional. The downside is more pointer chasing; chasing across
+ * the boundary with pointers encapsulated inside u64.
+ */
+struct i915_user_extension {
+	__u64 next_extension;
+	__u64 name;
+};
+
 /*
  * MOCS indexes used for GPU surfaces, defining the cacheability of the
  * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
@@ -367,6 +387,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
 #define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
 #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_v2	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_v2)
 #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
 #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
 #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
@@ -412,6 +433,14 @@ typedef struct drm_i915_irq_wait {
 	int irq_seq;
 } drm_i915_irq_wait_t;
 
+/*
+ * Different modes of per-process Graphics Translation Table,
+ * see I915_PARAM_HAS_ALIASING_PPGTT
+ */
+#define I915_GEM_PPGTT_NONE	0
+#define I915_GEM_PPGTT_ALIASING	1
+#define I915_GEM_PPGTT_FULL	2
+
 /* Ioctl to query kernel params:
  */
 #define I915_PARAM_IRQ_ACTIVE            1
@@ -529,6 +558,35 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51
 
+/*
+ * Once upon a time we supposed that writes through the GGTT would be
+ * immediately in physical memory (once flushed out of the CPU path). However,
+ * on a few different processors and chipsets, this is not necessarily the case
+ * as the writes appear to be buffered internally. Thus a read of the backing
+ * storage (physical memory) via a different path (with different physical tags
+ * to the indirect write via the GGTT) will see stale values from before
+ * the GGTT write. Inside the kernel, we can for the most part keep track of
+ * the different read/write domains in use (e.g. set-domain), but the assumption
+ * of coherency is baked into the ABI, hence reporting its true state in this
+ * parameter.
+ *
+ * Reports true when writes via mmap_gtt are immediately visible following an
+ * lfence to flush the WCB.
+ *
+ * Reports false when writes via mmap_gtt are indeterminately delayed in an in
+ * internal buffer and are _not_ immediately visible to third parties accessing
+ * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC
+ * communications channel when reporting false is strongly disadvised.
+ */
+#define I915_PARAM_MMAP_GTT_COHERENT	52
+
+/*
+ * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel
+ * execution through use of explicit fence support.
+ * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT.
+ */
+#define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
@@ -942,7 +1000,7 @@ struct drm_i915_gem_execbuffer2 {
 	 * struct drm_i915_gem_exec_fence *fences.
 	 */
 	__u64 cliprects_ptr;
-#define I915_EXEC_RING_MASK              (7<<0)
+#define I915_EXEC_RING_MASK              (0x3f)
 #define I915_EXEC_DEFAULT                (0<<0)
 #define I915_EXEC_RENDER                 (1<<0)
 #define I915_EXEC_BSD                    (2<<0)
@@ -1048,7 +1106,16 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_ARRAY   (1<<19)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+/*
+ * Setting I915_EXEC_FENCE_SUBMIT implies that lower_32_bits(rsvd2) represent
+ * a sync_file fd to wait upon (in a nonblocking manner) prior to executing
+ * the batch.
+ *
+ * Returns -EINVAL if the sync_file fd cannot be found.
+ */
+#define I915_EXEC_FENCE_SUBMIT		(1<<20)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
@@ -1387,6 +1454,16 @@ struct drm_i915_gem_context_create {
 	__u32 pad;
 };
 
+struct drm_i915_gem_context_create_v2 {
+	/*  output: id of new context*/
+	__u32 ctx_id;
+	__u32 flags;
+#define I915_GEM_CONTEXT_SHARE_GTT		0x1
+#define I915_GEM_CONTEXT_SINGLE_TIMELINE	0x2
+	__u32 share_ctx;
+	__u32 pad;
+};
+
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
@@ -1456,9 +1533,122 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+
+/*
+ * I915_CONTEXT_PARAM_ENGINES:
+ *
+ * Bind this context to operate on this subset of available engines. Henceforth,
+ * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
+ * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
+ * and upwards. The array created is offset by 1, such that by default
+ * I915_EXEC_DEFAULT is left empty, to be filled in as directed. Slots 1...N
+ * are then filled in using the specified (class, instance).
+ *
+ * Setting the number of engines bound to the context will revert back to
+ * default settings.
+ *
+ * See struct i915_context_param_engines.
+ *
+ * Extensions:
+ *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
+ *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
+ */
+#define I915_CONTEXT_PARAM_ENGINES	0x7
+
+/*
+ * When using the following param, value should be a pointer to
+ * drm_i915_gem_context_param_sseu.
+ */
+#define I915_CONTEXT_PARAM_SSEU		0x8
+
 	__u64 value;
 };
 
+/*
+ * i915_context_engines_load_balance:
+ *
+ * Enable load balancing across this set of engines.
+ *
+ * Into the I915_EXEC_DEFAULT slot, a virtual engine is created that when
+ * used will proxy the execbuffer request onto one of the set of engines
+ * in such a way as to distribute the load evenly across the set.
+ *
+ * The set of engines must be compatible (e.g. the same HW class) as they
+ * will share the same logical GPU context and ring.
+ *
+ * The context must be defined to use a single timeline for all engines.
+ */
+struct i915_context_engines_load_balance {
+	struct i915_user_extension base;
+
+	__u64 flags; /* all undefined flags must be zero */
+	__u64 engines_mask;
+
+	__u64 mbz[4]; /* reserved for future use; must be zero */
+};
+
+/*
+ * i915_context_engines_bond:
+ *
+ */
+struct i915_context_engines_bond {
+	struct i915_user_extension base;
+
+	__u16 master_class;
+	__u16 master_instance;
+	__u32 flags; /* all undefined flags must be zero */
+	__u64 sibling_mask;
+};
+
+struct i915_context_param_engines {
+	__u64 extensions;
+#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
+#define I915_CONTEXT_ENGINES_EXT_BOND 1
+
+	struct {
+		__u16 class; /* see enum drm_i915_gem_engine_class */
+		__u16 instance;
+	} class_instance[0];
+};
+
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u16 class;
+	__u16 instance;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd1;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u64 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u64 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u16 min_eus_per_subslice;
+	__u16 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd2;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
@@ -1620,6 +1810,7 @@ struct drm_i915_perf_oa_config {
 struct drm_i915_query_item {
 	__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
+#define DRM_I915_QUERY_ENGINE_INFO	2
 
 	/*
 	 * When set to zero by userspace, this is filled with the size of the
@@ -1717,6 +1908,50 @@ struct drm_i915_query_topology_info {
 	__u8 data[];
 };
 
+/**
+ * struct drm_i915_engine_info
+ *
+ * Describes one engine and it's capabilities as known to the driver.
+ */
+struct drm_i915_engine_info {
+	/** Engine class as in enum drm_i915_gem_engine_class. */
+	__u16 class;
+
+	/** Engine instance number. */
+	__u16 instance;
+
+	/** Reserved field. */
+	__u32 rsvd0;
+
+	/** Engine flags. */
+	__u64 flags;
+
+	/** Capabilities of this engine. */
+	__u64 capabilities;
+#define I915_VIDEO_CLASS_CAPABILITY_HEVC		(1 << 0)
+#define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC	(1 << 1)
+
+	/** Reserved fields. */
+	__u64 rsvd1[4];
+};
+
+/**
+ * struct drm_i915_query_engine_info
+ *
+ * Engine info query enumerates all engines known to the driver by filling in
+ * an array of struct drm_i915_engine_info structures.
+ */
+struct drm_i915_query_engine_info {
+	/** Number of struct drm_i915_engine_info structs following. */
+	__u32 num_engines;
+
+	/** MBZ */
+	__u32 rsvd[3];
+
+	/** Marker for drm_i915_engine_info structures. */
+	struct drm_i915_engine_info engines[];
+};
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/msm_drm.h b/include/drm-uapi/msm_drm.h
index bbbaffad772d..c06d0a5bdd80 100644
--- a/include/drm-uapi/msm_drm.h
+++ b/include/drm-uapi/msm_drm.h
@@ -201,10 +201,12 @@ struct drm_msm_gem_submit_bo {
 #define MSM_SUBMIT_NO_IMPLICIT   0x80000000 /* disable implicit sync */
 #define MSM_SUBMIT_FENCE_FD_IN   0x40000000 /* enable input fence_fd */
 #define MSM_SUBMIT_FENCE_FD_OUT  0x20000000 /* enable output fence_fd */
+#define MSM_SUBMIT_SUDO          0x10000000 /* run submitted cmds from RB */
 #define MSM_SUBMIT_FLAGS                ( \
 		MSM_SUBMIT_NO_IMPLICIT   | \
 		MSM_SUBMIT_FENCE_FD_IN   | \
 		MSM_SUBMIT_FENCE_FD_OUT  | \
+		MSM_SUBMIT_SUDO          | \
 		0)
 
 /* Each cmdstream submit consists of a table of buffers involved, and
diff --git a/include/drm-uapi/sync_file.h b/include/drm-uapi/sync_file.h
deleted file mode 100644
index b4f2db009347..000000000000
--- a/include/drm-uapi/sync_file.h
+++ /dev/null
@@ -1,98 +0,0 @@
-/* SPDX-License-Identifier: GPL-1.0+ WITH Linux-syscall-note */
-/*
- * Copyright (C) 2012 Google, Inc.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#ifndef _LINUX_SYNC_H
-#define _LINUX_SYNC_H
-
-#include <linux/ioctl.h>
-#include <linux/types.h>
-
-/**
- * struct sync_merge_data - data passed to merge ioctl
- * @name:	name of new fence
- * @fd2:	file descriptor of second fence
- * @fence:	returns the fd of the new fence to userspace
- * @flags:	merge_data flags
- * @pad:	padding for 64-bit alignment, should always be zero
- */
-struct sync_merge_data {
-	char	name[32];
-	__s32	fd2;
-	__s32	fence;
-	__u32	flags;
-	__u32	pad;
-};
-
-/**
- * struct sync_fence_info - detailed fence information
- * @obj_name:		name of parent sync_timeline
-* @driver_name:	name of driver implementing the parent
-* @status:		status of the fence 0:active 1:signaled <0:error
- * @flags:		fence_info flags
- * @timestamp_ns:	timestamp of status change in nanoseconds
- */
-struct sync_fence_info {
-	char	obj_name[32];
-	char	driver_name[32];
-	__s32	status;
-	__u32	flags;
-	__u64	timestamp_ns;
-};
-
-/**
- * struct sync_file_info - data returned from fence info ioctl
- * @name:	name of fence
- * @status:	status of fence. 1: signaled 0:active <0:error
- * @flags:	sync_file_info flags
- * @num_fences	number of fences in the sync_file
- * @pad:	padding for 64-bit alignment, should always be zero
- * @sync_fence_info: pointer to array of structs sync_fence_info with all
- *		 fences in the sync_file
- */
-struct sync_file_info {
-	char	name[32];
-	__s32	status;
-	__u32	flags;
-	__u32	num_fences;
-	__u32	pad;
-
-	__u64	sync_fence_info;
-};
-
-#define SYNC_IOC_MAGIC		'>'
-
-/**
- * Opcodes  0, 1 and 2 were burned during a API change to avoid users of the
- * old API to get weird errors when trying to handling sync_files. The API
- * change happened during the de-stage of the Sync Framework when there was
- * no upstream users available.
- */
-
-/**
- * DOC: SYNC_IOC_MERGE - merge two fences
- *
- * Takes a struct sync_merge_data.  Creates a new fence containing copies of
- * the sync_pts in both the calling fd and sync_merge_data.fd2.  Returns the
- * new fence's fd in sync_merge_data.fence
- */
-#define SYNC_IOC_MERGE		_IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data)
-
-/**
- * DOC: SYNC_IOC_FILE_INFO - get detailed information on a sync_file
- *
- * Takes a struct sync_file_info. If num_fences is 0, the field is updated
- * with the actual number of fences. If num_fences is > 0, the system will
- * use the pointer provided on sync_fence_info to return up to num_fences of
- * struct sync_fence_info, with detailed fence information.
- */
-#define SYNC_IOC_FILE_INFO	_IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)
-
-#endif /* _LINUX_SYNC_H */
diff --git a/include/drm-uapi/tegra_drm.h b/include/drm-uapi/tegra_drm.h
index 12f9bf848db1..6c07919c04e9 100644
--- a/include/drm-uapi/tegra_drm.h
+++ b/include/drm-uapi/tegra_drm.h
@@ -32,143 +32,615 @@ extern "C" {
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
+/**
+ * struct drm_tegra_gem_create - parameters for the GEM object creation IOCTL
+ */
 struct drm_tegra_gem_create {
+	/**
+	 * @size:
+	 *
+	 * The size, in bytes, of the buffer object to be created.
+	 */
 	__u64 size;
+
+	/**
+	 * @flags:
+	 *
+	 * A bitmask of flags that influence the creation of GEM objects:
+	 *
+	 * DRM_TEGRA_GEM_CREATE_TILED
+	 *   Use the 16x16 tiling format for this buffer.
+	 *
+	 * DRM_TEGRA_GEM_CREATE_BOTTOM_UP
+	 *   The buffer has a bottom-up layout.
+	 */
 	__u32 flags;
+
+	/**
+	 * @handle:
+	 *
+	 * The handle of the created GEM object. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u32 handle;
 };
 
+/**
+ * struct drm_tegra_gem_mmap - parameters for the GEM mmap IOCTL
+ */
 struct drm_tegra_gem_mmap {
+	/**
+	 * @handle:
+	 *
+	 * Handle of the GEM object to obtain an mmap offset for.
+	 */
 	__u32 handle;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
+
+	/**
+	 * @offset:
+	 *
+	 * The mmap offset for the given GEM object. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u64 offset;
 };
 
+/**
+ * struct drm_tegra_syncpt_read - parameters for the read syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_read {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to read the current value from.
+	 */
 	__u32 id;
+
+	/**
+	 * @value:
+	 *
+	 * The current syncpoint value. Set by the kernel upon successful
+	 * completion of the IOCTL.
+	 */
 	__u32 value;
 };
 
+/**
+ * struct drm_tegra_syncpt_incr - parameters for the increment syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_incr {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to increment.
+	 */
 	__u32 id;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_syncpt_wait - parameters for the wait syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_wait {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to wait on.
+	 */
 	__u32 id;
+
+	/**
+	 * @thresh:
+	 *
+	 * Threshold value for which to wait.
+	 */
 	__u32 thresh;
+
+	/**
+	 * @timeout:
+	 *
+	 * Timeout, in milliseconds, to wait.
+	 */
 	__u32 timeout;
+
+	/**
+	 * @value:
+	 *
+	 * The new syncpoint value after the wait. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u32 value;
 };
 
 #define DRM_TEGRA_NO_TIMEOUT	(0xffffffff)
 
+/**
+ * struct drm_tegra_open_channel - parameters for the open channel IOCTL
+ */
 struct drm_tegra_open_channel {
+	/**
+	 * @client:
+	 *
+	 * The client ID for this channel.
+	 */
 	__u32 client;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
+
+	/**
+	 * @context:
+	 *
+	 * The application context of this channel. Set by the kernel upon
+	 * successful completion of the IOCTL. This context needs to be passed
+	 * to the DRM_TEGRA_CHANNEL_CLOSE or the DRM_TEGRA_SUBMIT IOCTLs.
+	 */
 	__u64 context;
 };
 
+/**
+ * struct drm_tegra_close_channel - parameters for the close channel IOCTL
+ */
 struct drm_tegra_close_channel {
+	/**
+	 * @context:
+	 *
+	 * The application context of this channel. This is obtained from the
+	 * DRM_TEGRA_OPEN_CHANNEL IOCTL.
+	 */
 	__u64 context;
 };
 
+/**
+ * struct drm_tegra_get_syncpt - parameters for the get syncpoint IOCTL
+ */
 struct drm_tegra_get_syncpt {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying the channel for which to obtain
+	 * the syncpoint ID.
+	 */
 	__u64 context;
+
+	/**
+	 * @index:
+	 *
+	 * Index of the client syncpoint for which to obtain the ID.
+	 */
 	__u32 index;
+
+	/**
+	 * @id:
+	 *
+	 * The ID of the given syncpoint. Set by the kernel upon successful
+	 * completion of the IOCTL.
+	 */
 	__u32 id;
 };
 
+/**
+ * struct drm_tegra_get_syncpt_base - parameters for the get wait base IOCTL
+ */
 struct drm_tegra_get_syncpt_base {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying for which channel to obtain the
+	 * wait base.
+	 */
 	__u64 context;
+
+	/**
+	 * @syncpt:
+	 *
+	 * ID of the syncpoint for which to obtain the wait base.
+	 */
 	__u32 syncpt;
+
+	/**
+	 * @id:
+	 *
+	 * The ID of the wait base corresponding to the client syncpoint. Set
+	 * by the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 id;
 };
 
+/**
+ * struct drm_tegra_syncpt - syncpoint increment operation
+ */
 struct drm_tegra_syncpt {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to operate on.
+	 */
 	__u32 id;
+
+	/**
+	 * @incrs:
+	 *
+	 * Number of increments to perform for the syncpoint.
+	 */
 	__u32 incrs;
 };
 
+/**
+ * struct drm_tegra_cmdbuf - structure describing a command buffer
+ */
 struct drm_tegra_cmdbuf {
+	/**
+	 * @handle:
+	 *
+	 * Handle to a GEM object containing the command buffer.
+	 */
 	__u32 handle;
+
+	/**
+	 * @offset:
+	 *
+	 * Offset, in bytes, into the GEM object identified by @handle at
+	 * which the command buffer starts.
+	 */
 	__u32 offset;
+
+	/**
+	 * @words:
+	 *
+	 * Number of 32-bit words in this command buffer.
+	 */
 	__u32 words;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_reloc - GEM object relocation structure
+ */
 struct drm_tegra_reloc {
 	struct {
+		/**
+		 * @cmdbuf.handle:
+		 *
+		 * Handle to the GEM object containing the command buffer for
+		 * which to perform this GEM object relocation.
+		 */
 		__u32 handle;
+
+		/**
+		 * @cmdbuf.offset:
+		 *
+		 * Offset, in bytes, into the command buffer at which to
+		 * insert the relocated address.
+		 */
 		__u32 offset;
 	} cmdbuf;
 	struct {
+		/**
+		 * @target.handle:
+		 *
+		 * Handle to the GEM object to be relocated.
+		 */
 		__u32 handle;
+
+		/**
+		 * @target.offset:
+		 *
+		 * Offset, in bytes, into the target GEM object at which the
+		 * relocated data starts.
+		 */
 		__u32 offset;
 	} target;
+
+	/**
+	 * @shift:
+	 *
+	 * The number of bits by which to shift relocated addresses.
+	 */
 	__u32 shift;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_waitchk - wait check structure
+ */
 struct drm_tegra_waitchk {
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object containing a command stream on which to
+	 * perform the wait check.
+	 */
 	__u32 handle;
+
+	/**
+	 * @offset:
+	 *
+	 * Offset, in bytes, of the location in the command stream to perform
+	 * the wait check on.
+	 */
 	__u32 offset;
+
+	/**
+	 * @syncpt:
+	 *
+	 * ID of the syncpoint to wait check.
+	 */
 	__u32 syncpt;
+
+	/**
+	 * @thresh:
+	 *
+	 * Threshold value for which to check.
+	 */
 	__u32 thresh;
 };
 
+/**
+ * struct drm_tegra_submit - job submission structure
+ */
 struct drm_tegra_submit {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying the channel to use for the
+	 * execution of this job.
+	 */
 	__u64 context;
+
+	/**
+	 * @num_syncpts:
+	 *
+	 * The number of syncpoints operated on by this job. This defines the
+	 * length of the array pointed to by @syncpts.
+	 */
 	__u32 num_syncpts;
+
+	/**
+	 * @num_cmdbufs:
+	 *
+	 * The number of command buffers to execute as part of this job. This
+	 * defines the length of the array pointed to by @cmdbufs.
+	 */
 	__u32 num_cmdbufs;
+
+	/**
+	 * @num_relocs:
+	 *
+	 * The number of relocations to perform before executing this job.
+	 * This defines the length of the array pointed to by @relocs.
+	 */
 	__u32 num_relocs;
+
+	/**
+	 * @num_waitchks:
+	 *
+	 * The number of wait checks to perform as part of this job. This
+	 * defines the length of the array pointed to by @waitchks.
+	 */
 	__u32 num_waitchks;
+
+	/**
+	 * @waitchk_mask:
+	 *
+	 * Bitmask of valid wait checks.
+	 */
 	__u32 waitchk_mask;
+
+	/**
+	 * @timeout:
+	 *
+	 * Timeout, in milliseconds, before this job is cancelled.
+	 */
 	__u32 timeout;
+
+	/**
+	 * @syncpts:
+	 *
+	 * A pointer to an array of &struct drm_tegra_syncpt structures that
+	 * specify the syncpoint operations performed as part of this job.
+	 * The number of elements in the array must be equal to the value
+	 * given by @num_syncpts.
+	 */
 	__u64 syncpts;
+
+	/**
+	 * @cmdbufs:
+	 *
+	 * A pointer to an array of &struct drm_tegra_cmdbuf structures that
+	 * define the command buffers to execute as part of this job. The
+	 * number of elements in the array must be equal to the value given
+	 * by @num_syncpts.
+	 */
 	__u64 cmdbufs;
+
+	/**
+	 * @relocs:
+	 *
+	 * A pointer to an array of &struct drm_tegra_reloc structures that
+	 * specify the relocations that need to be performed before executing
+	 * this job. The number of elements in the array must be equal to the
+	 * value given by @num_relocs.
+	 */
 	__u64 relocs;
+
+	/**
+	 * @waitchks:
+	 *
+	 * A pointer to an array of &struct drm_tegra_waitchk structures that
+	 * specify the wait checks to be performed while executing this job.
+	 * The number of elements in the array must be equal to the value
+	 * given by @num_waitchks.
+	 */
 	__u64 waitchks;
-	__u32 fence;		/* Return value */
 
-	__u32 reserved[5];	/* future expansion */
+	/**
+	 * @fence:
+	 *
+	 * The threshold of the syncpoint associated with this job after it
+	 * has been completed. Set by the kernel upon successful completion of
+	 * the IOCTL. This can be used with the DRM_TEGRA_SYNCPT_WAIT IOCTL to
+	 * wait for this job to be finished.
+	 */
+	__u32 fence;
+
+	/**
+	 * @reserved:
+	 *
+	 * This field is reserved for future use. Must be 0.
+	 */
+	__u32 reserved[5];
 };
 
 #define DRM_TEGRA_GEM_TILING_MODE_PITCH 0
 #define DRM_TEGRA_GEM_TILING_MODE_TILED 1
 #define DRM_TEGRA_GEM_TILING_MODE_BLOCK 2
 
+/**
+ * struct drm_tegra_gem_set_tiling - parameters for the set tiling IOCTL
+ */
 struct drm_tegra_gem_set_tiling {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to set the tiling parameters.
+	 */
 	__u32 handle;
+
+	/**
+	 * @mode:
+	 *
+	 * The tiling mode to set. Must be one of:
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_PITCH
+	 *   pitch linear format
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_TILED
+	 *   16x16 tiling format
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_BLOCK
+	 *   16Bx2 tiling format
+	 */
 	__u32 mode;
+
+	/**
+	 * @value:
+	 *
+	 * The value to set for the tiling mode parameter.
+	 */
 	__u32 value;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_gem_get_tiling - parameters for the get tiling IOCTL
+ */
 struct drm_tegra_gem_get_tiling {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to query the tiling parameters.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @mode:
+	 *
+	 * The tiling mode currently associated with the GEM object. Set by
+	 * the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 mode;
+
+	/**
+	 * @value:
+	 *
+	 * The tiling mode parameter currently associated with the GEM object.
+	 * Set by the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 value;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
 #define DRM_TEGRA_GEM_BOTTOM_UP		(1 << 0)
 #define DRM_TEGRA_GEM_FLAGS		(DRM_TEGRA_GEM_BOTTOM_UP)
 
+/**
+ * struct drm_tegra_gem_set_flags - parameters for the set flags IOCTL
+ */
 struct drm_tegra_gem_set_flags {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to set the flags.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @flags:
+	 *
+	 * The flags to set for the GEM object.
+	 */
 	__u32 flags;
 };
 
+/**
+ * struct drm_tegra_gem_get_flags - parameters for the get flags IOCTL
+ */
 struct drm_tegra_gem_get_flags {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to query the flags.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @flags:
+	 *
+	 * The flags currently associated with the GEM object. Set by the
+	 * kernel upon successful completion of the IOCTL.
+	 */
 	__u32 flags;
 };
 
@@ -193,7 +665,7 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
 #define DRM_IOCTL_TEGRA_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_OPEN_CHANNEL, struct drm_tegra_open_channel)
-#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_open_channel)
+#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_close_channel)
 #define DRM_IOCTL_TEGRA_GET_SYNCPT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT, struct drm_tegra_get_syncpt)
 #define DRM_IOCTL_TEGRA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SUBMIT, struct drm_tegra_submit)
 #define DRM_IOCTL_TEGRA_GET_SYNCPT_BASE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT_BASE, struct drm_tegra_get_syncpt_base)
diff --git a/include/drm-uapi/v3d_drm.h b/include/drm-uapi/v3d_drm.h
new file mode 100644
index 000000000000..7b6627783608
--- /dev/null
+++ b/include/drm-uapi/v3d_drm.h
@@ -0,0 +1,194 @@
+/*
+ * Copyright © 2014-2018 Broadcom
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef _V3D_DRM_H_
+#define _V3D_DRM_H_
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define DRM_V3D_SUBMIT_CL                         0x00
+#define DRM_V3D_WAIT_BO                           0x01
+#define DRM_V3D_CREATE_BO                         0x02
+#define DRM_V3D_MMAP_BO                           0x03
+#define DRM_V3D_GET_PARAM                         0x04
+#define DRM_V3D_GET_BO_OFFSET                     0x05
+
+#define DRM_IOCTL_V3D_SUBMIT_CL           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
+#define DRM_IOCTL_V3D_WAIT_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
+#define DRM_IOCTL_V3D_CREATE_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_CREATE_BO, struct drm_v3d_create_bo)
+#define DRM_IOCTL_V3D_MMAP_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_MMAP_BO, struct drm_v3d_mmap_bo)
+#define DRM_IOCTL_V3D_GET_PARAM           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_PARAM, struct drm_v3d_get_param)
+#define DRM_IOCTL_V3D_GET_BO_OFFSET       DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_BO_OFFSET, struct drm_v3d_get_bo_offset)
+
+/**
+ * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
+ * engine.
+ *
+ * This asks the kernel to have the GPU execute an optional binner
+ * command list, and a render command list.
+ */
+struct drm_v3d_submit_cl {
+	/* Pointer to the binner command list.
+	 *
+	 * This is the first set of commands executed, which runs the
+	 * coordinate shader to determine where primitives land on the screen,
+	 * then writes out the state updates and draw calls necessary per tile
+	 * to the tile allocation BO.
+	 */
+	__u32 bcl_start;
+
+	 /** End address of the BCL (first byte after the BCL) */
+	__u32 bcl_end;
+
+	/* Offset of the render command list.
+	 *
+	 * This is the second set of commands executed, which will either
+	 * execute the tiles that have been set up by the BCL, or a fixed set
+	 * of tiles (in the case of RCL-only blits).
+	 */
+	__u32 rcl_start;
+
+	 /** End address of the RCL (first byte after the RCL) */
+	__u32 rcl_end;
+
+	/** An optional sync object to wait on before starting the BCL. */
+	__u32 in_sync_bcl;
+	/** An optional sync object to wait on before starting the RCL. */
+	__u32 in_sync_rcl;
+	/** An optional sync object to place the completion fence in. */
+	__u32 out_sync;
+
+	/* Offset of the tile alloc memory
+	 *
+	 * This is optional on V3D 3.3 (where the CL can set the value) but
+	 * required on V3D 4.1.
+	 */
+	__u32 qma;
+
+	/** Size of the tile alloc memory. */
+	__u32 qms;
+
+	/** Offset of the tile state data array. */
+	__u32 qts;
+
+	/* Pointer to a u32 array of the BOs that are referenced by the job.
+	 */
+	__u64 bo_handles;
+
+	/* Number of BO handles passed in (size is that times 4). */
+	__u32 bo_handle_count;
+
+	/* Pad, must be zero-filled. */
+	__u32 pad;
+};
+
+/**
+ * struct drm_v3d_wait_bo - ioctl argument for waiting for
+ * completion of the last DRM_V3D_SUBMIT_CL on a BO.
+ *
+ * This is useful for cases where multiple processes might be
+ * rendering to a BO and you want to wait for all rendering to be
+ * completed.
+ */
+struct drm_v3d_wait_bo {
+	__u32 handle;
+	__u32 pad;
+	__u64 timeout_ns;
+};
+
+/**
+ * struct drm_v3d_create_bo - ioctl argument for creating V3D BOs.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_v3d_create_bo {
+	__u32 size;
+	__u32 flags;
+	/** Returned GEM handle for the BO. */
+	__u32 handle;
+	/**
+	 * Returned offset for the BO in the V3D address space.  This offset
+	 * is private to the DRM fd and is valid for the lifetime of the GEM
+	 * handle.
+	 *
+	 * This offset value will always be nonzero, since various HW
+	 * units treat 0 specially.
+	 */
+	__u32 offset;
+};
+
+/**
+ * struct drm_v3d_mmap_bo - ioctl argument for mapping V3D BOs.
+ *
+ * This doesn't actually perform an mmap.  Instead, it returns the
+ * offset you need to use in an mmap on the DRM device node.  This
+ * means that tools like valgrind end up knowing about the mapped
+ * memory.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_v3d_mmap_bo {
+	/** Handle for the object being mapped. */
+	__u32 handle;
+	__u32 flags;
+	/** offset into the drm node to use for subsequent mmap call. */
+	__u64 offset;
+};
+
+enum drm_v3d_param {
+	DRM_V3D_PARAM_V3D_UIFCFG,
+	DRM_V3D_PARAM_V3D_HUB_IDENT1,
+	DRM_V3D_PARAM_V3D_HUB_IDENT2,
+	DRM_V3D_PARAM_V3D_HUB_IDENT3,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT0,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT1,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT2,
+};
+
+struct drm_v3d_get_param {
+	__u32 param;
+	__u32 pad;
+	__u64 value;
+};
+
+/**
+ * Returns the offset for the BO in the V3D address space for this DRM fd.
+ * This is the same value returned by drm_v3d_create_bo, if that was called
+ * from this DRM fd.
+ */
+struct drm_v3d_get_bo_offset {
+	__u32 handle;
+	__u32 offset;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* _V3D_DRM_H_ */
diff --git a/include/drm-uapi/vc4_drm.h b/include/drm-uapi/vc4_drm.h
index 4117117b4204..31f50de39acb 100644
--- a/include/drm-uapi/vc4_drm.h
+++ b/include/drm-uapi/vc4_drm.h
@@ -183,10 +183,17 @@ struct drm_vc4_submit_cl {
 	/* ID of the perfmon to attach to this job. 0 means no perfmon. */
 	__u32 perfmonid;
 
-	/* Unused field to align this struct on 64 bits. Must be set to 0.
-	 * If one ever needs to add an u32 field to this struct, this field
-	 * can be used.
+	/* Syncobj handle to wait on. If set, processing of this render job
+	 * will not start until the syncobj is signaled. 0 means ignore.
 	 */
+	__u32 in_sync;
+
+	/* Syncobj handle to export fence to. If set, the fence in the syncobj
+	 * will be replaced with a fence that signals upon completion of this
+	 * render job. 0 means ignore.
+	 */
+	__u32 out_sync;
+
 	__u32 pad2;
 };
 
diff --git a/include/drm-uapi/virtgpu_drm.h b/include/drm-uapi/virtgpu_drm.h
index 91a31ffed828..9a781f0611df 100644
--- a/include/drm-uapi/virtgpu_drm.h
+++ b/include/drm-uapi/virtgpu_drm.h
@@ -63,6 +63,7 @@ struct drm_virtgpu_execbuffer {
 };
 
 #define VIRTGPU_PARAM_3D_FEATURES 1 /* do we have 3D features in the hw */
+#define VIRTGPU_PARAM_CAPSET_QUERY_FIX 2 /* do we have the capset fix */
 
 struct drm_virtgpu_getparam {
 	__u64 param;
diff --git a/include/drm-uapi/vmwgfx_drm.h b/include/drm-uapi/vmwgfx_drm.h
index 0bc784f5e0db..399f58317cff 100644
--- a/include/drm-uapi/vmwgfx_drm.h
+++ b/include/drm-uapi/vmwgfx_drm.h
@@ -40,6 +40,7 @@ extern "C" {
 
 #define DRM_VMW_GET_PARAM            0
 #define DRM_VMW_ALLOC_DMABUF         1
+#define DRM_VMW_ALLOC_BO             1
 #define DRM_VMW_UNREF_DMABUF         2
 #define DRM_VMW_HANDLE_CLOSE         2
 #define DRM_VMW_CURSOR_BYPASS        3
@@ -68,6 +69,8 @@ extern "C" {
 #define DRM_VMW_GB_SURFACE_REF       24
 #define DRM_VMW_SYNCCPU              25
 #define DRM_VMW_CREATE_EXTENDED_CONTEXT 26
+#define DRM_VMW_GB_SURFACE_CREATE_EXT   27
+#define DRM_VMW_GB_SURFACE_REF_EXT      28
 
 /*************************************************************************/
 /**
@@ -79,6 +82,9 @@ extern "C" {
  *
  * DRM_VMW_PARAM_OVERLAY_IOCTL:
  * Does the driver support the overlay ioctl.
+ *
+ * DRM_VMW_PARAM_SM4_1
+ * SM4_1 support is enabled.
  */
 
 #define DRM_VMW_PARAM_NUM_STREAMS      0
@@ -94,6 +100,8 @@ extern "C" {
 #define DRM_VMW_PARAM_MAX_MOB_SIZE     10
 #define DRM_VMW_PARAM_SCREEN_TARGET    11
 #define DRM_VMW_PARAM_DX               12
+#define DRM_VMW_PARAM_HW_CAPS2         13
+#define DRM_VMW_PARAM_SM4_1            14
 
 /**
  * enum drm_vmw_handle_type - handle type for ref ioctls
@@ -356,9 +364,9 @@ struct drm_vmw_fence_rep {
 
 /*************************************************************************/
 /**
- * DRM_VMW_ALLOC_DMABUF
+ * DRM_VMW_ALLOC_BO
  *
- * Allocate a DMA buffer that is visible also to the host.
+ * Allocate a buffer object that is visible also to the host.
  * NOTE: The buffer is
  * identified by a handle and an offset, which are private to the guest, but
  * useable in the command stream. The guest kernel may translate these
@@ -366,27 +374,28 @@ struct drm_vmw_fence_rep {
  * be zero at all times, or it may disappear from the interface before it is
  * fixed.
  *
- * The DMA buffer may stay user-space mapped in the guest at all times,
+ * The buffer object may stay user-space mapped in the guest at all times,
  * and is thus suitable for sub-allocation.
  *
- * DMA buffers are mapped using the mmap() syscall on the drm device.
+ * Buffer objects are mapped using the mmap() syscall on the drm device.
  */
 
 /**
- * struct drm_vmw_alloc_dmabuf_req
+ * struct drm_vmw_alloc_bo_req
  *
  * @size: Required minimum size of the buffer.
  *
- * Input data to the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Input data to the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-struct drm_vmw_alloc_dmabuf_req {
+struct drm_vmw_alloc_bo_req {
 	__u32 size;
 	__u32 pad64;
 };
+#define drm_vmw_alloc_dmabuf_req drm_vmw_alloc_bo_req
 
 /**
- * struct drm_vmw_dmabuf_rep
+ * struct drm_vmw_bo_rep
  *
  * @map_handle: Offset to use in the mmap() call used to map the buffer.
  * @handle: Handle unique to this buffer. Used for unreferencing.
@@ -395,50 +404,32 @@ struct drm_vmw_alloc_dmabuf_req {
  * @cur_gmr_offset: Offset to use in the command stream when this buffer is
  * referenced. See note above.
  *
- * Output data from the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Output data from the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-struct drm_vmw_dmabuf_rep {
+struct drm_vmw_bo_rep {
 	__u64 map_handle;
 	__u32 handle;
 	__u32 cur_gmr_id;
 	__u32 cur_gmr_offset;
 	__u32 pad64;
 };
+#define drm_vmw_dmabuf_rep drm_vmw_bo_rep
 
 /**
- * union drm_vmw_dmabuf_arg
+ * union drm_vmw_alloc_bo_arg
  *
  * @req: Input data as described above.
  * @rep: Output data as described above.
  *
- * Argument to the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Argument to the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-union drm_vmw_alloc_dmabuf_arg {
-	struct drm_vmw_alloc_dmabuf_req req;
-	struct drm_vmw_dmabuf_rep rep;
-};
-
-/*************************************************************************/
-/**
- * DRM_VMW_UNREF_DMABUF - Free a DMA buffer.
- *
- */
-
-/**
- * struct drm_vmw_unref_dmabuf_arg
- *
- * @handle: Handle indicating what buffer to free. Obtained from the
- * DRM_VMW_ALLOC_DMABUF Ioctl.
- *
- * Argument to the DRM_VMW_UNREF_DMABUF Ioctl.
- */
-
-struct drm_vmw_unref_dmabuf_arg {
-	__u32 handle;
-	__u32 pad64;
+union drm_vmw_alloc_bo_arg {
+	struct drm_vmw_alloc_bo_req req;
+	struct drm_vmw_bo_rep rep;
 };
+#define drm_vmw_alloc_dmabuf_arg drm_vmw_alloc_bo_arg
 
 /*************************************************************************/
 /**
@@ -1103,9 +1094,8 @@ union drm_vmw_extended_context_arg {
  * DRM_VMW_HANDLE_CLOSE - Close a user-space handle and release its
  * underlying resource.
  *
- * Note that this ioctl is overlaid on the DRM_VMW_UNREF_DMABUF Ioctl.
- * The ioctl arguments therefore need to be identical in layout.
- *
+ * Note that this ioctl is overlaid on the deprecated DRM_VMW_UNREF_DMABUF
+ * Ioctl.
  */
 
 /**
@@ -1119,7 +1109,107 @@ struct drm_vmw_handle_close_arg {
 	__u32 handle;
 	__u32 pad64;
 };
+#define drm_vmw_unref_dmabuf_arg drm_vmw_handle_close_arg
+
+/*************************************************************************/
+/**
+ * DRM_VMW_GB_SURFACE_CREATE_EXT - Create a host guest-backed surface.
+ *
+ * Allocates a surface handle and queues a create surface command
+ * for the host on the first use of the surface. The surface ID can
+ * be used as the surface ID in commands referencing the surface.
+ *
+ * This new command extends DRM_VMW_GB_SURFACE_CREATE by adding version
+ * parameter and 64 bit svga flag.
+ */
+
+/**
+ * enum drm_vmw_surface_version
+ *
+ * @drm_vmw_surface_gb_v1: Corresponds to current gb surface format with
+ * svga3d surface flags split into 2, upper half and lower half.
+ */
+enum drm_vmw_surface_version {
+	drm_vmw_gb_surface_v1
+};
+
+/**
+ * struct drm_vmw_gb_surface_create_ext_req
+ *
+ * @base: Surface create parameters.
+ * @version: Version of surface create ioctl.
+ * @svga3d_flags_upper_32_bits: Upper 32 bits of svga3d flags.
+ * @multisample_pattern: Multisampling pattern when msaa is supported.
+ * @quality_level: Precision settings for each sample.
+ * @must_be_zero: Reserved for future usage.
+ *
+ * Input argument to the  DRM_VMW_GB_SURFACE_CREATE_EXT Ioctl.
+ * Part of output argument for the DRM_VMW_GB_SURFACE_REF_EXT Ioctl.
+ */
+struct drm_vmw_gb_surface_create_ext_req {
+	struct drm_vmw_gb_surface_create_req base;
+	enum drm_vmw_surface_version version;
+	uint32_t svga3d_flags_upper_32_bits;
+	SVGA3dMSPattern multisample_pattern;
+	SVGA3dMSQualityLevel quality_level;
+	uint64_t must_be_zero;
+};
+
+/**
+ * union drm_vmw_gb_surface_create_ext_arg
+ *
+ * @req: Input argument as described above.
+ * @rep: Output argument as described above.
+ *
+ * Argument to the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ */
+union drm_vmw_gb_surface_create_ext_arg {
+	struct drm_vmw_gb_surface_create_rep rep;
+	struct drm_vmw_gb_surface_create_ext_req req;
+};
+
+/*************************************************************************/
+/**
+ * DRM_VMW_GB_SURFACE_REF_EXT - Reference a host surface.
+ *
+ * Puts a reference on a host surface with a given handle, as previously
+ * returned by the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ * A reference will make sure the surface isn't destroyed while we hold
+ * it and will allow the calling client to use the surface handle in
+ * the command stream.
+ *
+ * On successful return, the Ioctl returns the surface information given
+ * to and returned from the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ */
 
+/**
+ * struct drm_vmw_gb_surface_ref_ext_rep
+ *
+ * @creq: The data used as input when the surface was created, as described
+ *        above at "struct drm_vmw_gb_surface_create_ext_req"
+ * @crep: Additional data output when the surface was created, as described
+ *        above at "struct drm_vmw_gb_surface_create_rep"
+ *
+ * Output Argument to the DRM_VMW_GB_SURFACE_REF_EXT ioctl.
+ */
+struct drm_vmw_gb_surface_ref_ext_rep {
+	struct drm_vmw_gb_surface_create_ext_req creq;
+	struct drm_vmw_gb_surface_create_rep crep;
+};
+
+/**
+ * union drm_vmw_gb_surface_reference_ext_arg
+ *
+ * @req: Input data as described above at "struct drm_vmw_surface_arg"
+ * @rep: Output data as described above at
+ *       "struct drm_vmw_gb_surface_ref_ext_rep"
+ *
+ * Argument to the DRM_VMW_GB_SURFACE_REF Ioctl.
+ */
+union drm_vmw_gb_surface_reference_ext_arg {
+	struct drm_vmw_gb_surface_ref_ext_rep rep;
+	struct drm_vmw_surface_arg req;
+};
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Intel-gfx] [PATCH i-g-t 01/17] lib: Update uapi headers
@ 2018-10-18 15:27   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:27 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Sync with latest DRM uapi changes.
---
 include/drm-uapi/amdgpu_drm.h  |  52 +++-
 include/drm-uapi/drm.h         |  16 ++
 include/drm-uapi/drm_fourcc.h  | 224 +++++++++++++++
 include/drm-uapi/drm_mode.h    |  26 +-
 include/drm-uapi/etnaviv_drm.h |   6 +
 include/drm-uapi/exynos_drm.h  | 240 ++++++++++++++++
 include/drm-uapi/i915_drm.h    | 239 +++++++++++++++-
 include/drm-uapi/msm_drm.h     |   2 +
 include/drm-uapi/sync_file.h   |  98 -------
 include/drm-uapi/tegra_drm.h   | 492 ++++++++++++++++++++++++++++++++-
 include/drm-uapi/v3d_drm.h     | 194 +++++++++++++
 include/drm-uapi/vc4_drm.h     |  13 +-
 include/drm-uapi/virtgpu_drm.h |   1 +
 include/drm-uapi/vmwgfx_drm.h  | 166 ++++++++---
 14 files changed, 1613 insertions(+), 156 deletions(-)
 delete mode 100644 include/drm-uapi/sync_file.h
 create mode 100644 include/drm-uapi/v3d_drm.h

diff --git a/include/drm-uapi/amdgpu_drm.h b/include/drm-uapi/amdgpu_drm.h
index 1816bd8200d1..370e9a5536ef 100644
--- a/include/drm-uapi/amdgpu_drm.h
+++ b/include/drm-uapi/amdgpu_drm.h
@@ -72,12 +72,41 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
 
+/**
+ * DOC: memory domains
+ *
+ * %AMDGPU_GEM_DOMAIN_CPU	System memory that is not GPU accessible.
+ * Memory in this pool could be swapped out to disk if there is pressure.
+ *
+ * %AMDGPU_GEM_DOMAIN_GTT	GPU accessible system memory, mapped into the
+ * GPU's virtual address space via gart. Gart memory linearizes non-contiguous
+ * pages of system memory, allows GPU access system memory in a linezrized
+ * fashion.
+ *
+ * %AMDGPU_GEM_DOMAIN_VRAM	Local video memory. For APUs, it is memory
+ * carved out by the BIOS.
+ *
+ * %AMDGPU_GEM_DOMAIN_GDS	Global on-chip data storage used to share data
+ * across shader threads.
+ *
+ * %AMDGPU_GEM_DOMAIN_GWS	Global wave sync, used to synchronize the
+ * execution of all the waves on a device.
+ *
+ * %AMDGPU_GEM_DOMAIN_OA	Ordered append, used by 3D or Compute engines
+ * for appending data.
+ */
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
 #define AMDGPU_GEM_DOMAIN_VRAM		0x4
 #define AMDGPU_GEM_DOMAIN_GDS		0x8
 #define AMDGPU_GEM_DOMAIN_GWS		0x10
 #define AMDGPU_GEM_DOMAIN_OA		0x20
+#define AMDGPU_GEM_DOMAIN_MASK		(AMDGPU_GEM_DOMAIN_CPU | \
+					 AMDGPU_GEM_DOMAIN_GTT | \
+					 AMDGPU_GEM_DOMAIN_VRAM | \
+					 AMDGPU_GEM_DOMAIN_GDS | \
+					 AMDGPU_GEM_DOMAIN_GWS | \
+					 AMDGPU_GEM_DOMAIN_OA)
 
 /* Flag that CPU access will be required for the case of VRAM domain */
 #define AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED	(1 << 0)
@@ -95,6 +124,10 @@ extern "C" {
 #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID	(1 << 6)
 /* Flag that BO sharing will be explicitly synchronized */
 #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC		(1 << 7)
+/* Flag that indicates allocating MQD gart on GFX9, where the mtype
+ * for the second page onward should be set to NC.
+ */
+#define AMDGPU_GEM_CREATE_MQD_GFX9		(1 << 8)
 
 struct drm_amdgpu_gem_create_in  {
 	/** the requested memory size */
@@ -473,7 +506,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_HW_IP_UVD_ENC      5
 #define AMDGPU_HW_IP_VCN_DEC      6
 #define AMDGPU_HW_IP_VCN_ENC      7
-#define AMDGPU_HW_IP_NUM          8
+#define AMDGPU_HW_IP_VCN_JPEG     8
+#define AMDGPU_HW_IP_NUM          9
 
 #define AMDGPU_HW_IP_INSTANCE_MAX_COUNT 1
 
@@ -482,6 +516,7 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_DEPENDENCIES	0x03
 #define AMDGPU_CHUNK_ID_SYNCOBJ_IN      0x04
 #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT     0x05
+#define AMDGPU_CHUNK_ID_BO_HANDLES      0x06
 
 struct drm_amdgpu_cs_chunk {
 	__u32		chunk_id;
@@ -520,6 +555,10 @@ union drm_amdgpu_cs {
 /* Preempt flag, IB should set Pre_enb bit if PREEMPT flag detected */
 #define AMDGPU_IB_FLAG_PREEMPT (1<<2)
 
+/* The IB fence should do the L2 writeback but not invalidate any shader
+ * caches (L2/vL1/sL1/I$). */
+#define AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE (1 << 3)
+
 struct drm_amdgpu_cs_chunk_ib {
 	__u32 _pad;
 	/** AMDGPU_IB_FLAG_* */
@@ -618,6 +657,16 @@ struct drm_amdgpu_cs_chunk_data {
 	#define AMDGPU_INFO_FW_SOS		0x0c
 	/* Subquery id: Query PSP ASD firmware version */
 	#define AMDGPU_INFO_FW_ASD		0x0d
+	/* Subquery id: Query VCN firmware version */
+	#define AMDGPU_INFO_FW_VCN		0x0e
+	/* Subquery id: Query GFX RLC SRLC firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_CNTL 0x0f
+	/* Subquery id: Query GFX RLC SRLG firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_GPM_MEM 0x10
+	/* Subquery id: Query GFX RLC SRLS firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_SRM_MEM 0x11
+	/* Subquery id: Query DMCU firmware version */
+	#define AMDGPU_INFO_FW_DMCU		0x12
 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED		0x0f
 /* the used VRAM size */
@@ -806,6 +855,7 @@ struct drm_amdgpu_info_firmware {
 #define AMDGPU_VRAM_TYPE_GDDR5 5
 #define AMDGPU_VRAM_TYPE_HBM   6
 #define AMDGPU_VRAM_TYPE_DDR3  7
+#define AMDGPU_VRAM_TYPE_DDR4  8
 
 struct drm_amdgpu_info_device {
 	/** PCI Device ID */
diff --git a/include/drm-uapi/drm.h b/include/drm-uapi/drm.h
index f0bd91de0cf9..85c685a2075e 100644
--- a/include/drm-uapi/drm.h
+++ b/include/drm-uapi/drm.h
@@ -674,6 +674,22 @@ struct drm_get_cap {
  */
 #define DRM_CLIENT_CAP_ATOMIC	3
 
+/**
+ * DRM_CLIENT_CAP_ASPECT_RATIO
+ *
+ * If set to 1, the DRM core will provide aspect ratio information in modes.
+ */
+#define DRM_CLIENT_CAP_ASPECT_RATIO    4
+
+/**
+ * DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
+ *
+ * If set to 1, the DRM core will expose special connectors to be used for
+ * writing back to memory the scene setup in the commit. Depends on client
+ * also supporting DRM_CLIENT_CAP_ATOMIC
+ */
+#define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS	5
+
 /** DRM_IOCTL_SET_CLIENT_CAP ioctl argument type */
 struct drm_set_client_cap {
 	__u64 capability;
diff --git a/include/drm-uapi/drm_fourcc.h b/include/drm-uapi/drm_fourcc.h
index e04613d30a13..0cd40ebfa1b1 100644
--- a/include/drm-uapi/drm_fourcc.h
+++ b/include/drm-uapi/drm_fourcc.h
@@ -30,11 +30,50 @@
 extern "C" {
 #endif
 
+/**
+ * DOC: overview
+ *
+ * In the DRM subsystem, framebuffer pixel formats are described using the
+ * fourcc codes defined in `include/uapi/drm/drm_fourcc.h`. In addition to the
+ * fourcc code, a Format Modifier may optionally be provided, in order to
+ * further describe the buffer's format - for example tiling or compression.
+ *
+ * Format Modifiers
+ * ----------------
+ *
+ * Format modifiers are used in conjunction with a fourcc code, forming a
+ * unique fourcc:modifier pair. This format:modifier pair must fully define the
+ * format and data layout of the buffer, and should be the only way to describe
+ * that particular buffer.
+ *
+ * Having multiple fourcc:modifier pairs which describe the same layout should
+ * be avoided, as such aliases run the risk of different drivers exposing
+ * different names for the same data format, forcing userspace to understand
+ * that they are aliases.
+ *
+ * Format modifiers may change any property of the buffer, including the number
+ * of planes and/or the required allocation size. Format modifiers are
+ * vendor-namespaced, and as such the relationship between a fourcc code and a
+ * modifier is specific to the modifer being used. For example, some modifiers
+ * may preserve meaning - such as number of planes - from the fourcc code,
+ * whereas others may not.
+ *
+ * Vendors should document their modifier usage in as much detail as
+ * possible, to ensure maximum compatibility across devices, drivers and
+ * applications.
+ *
+ * The authoritative list of format modifier codes is found in
+ * `include/uapi/drm/drm_fourcc.h`
+ */
+
 #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \
 				 ((__u32)(c) << 16) | ((__u32)(d) << 24))
 
 #define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */
 
+/* Reserve 0 for the invalid format specifier */
+#define DRM_FORMAT_INVALID	0
+
 /* color index */
 #define DRM_FORMAT_C8		fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
@@ -183,6 +222,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_QCOM    0x05
 #define DRM_FORMAT_MOD_VENDOR_VIVANTE 0x06
 #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
+#define DRM_FORMAT_MOD_VENDOR_ARM     0x08
 /* add more to the end as needed */
 
 #define DRM_FORMAT_RESERVED	      ((1ULL << 56) - 1)
@@ -298,6 +338,28 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_SAMSUNG_64_32_TILE	fourcc_mod_code(SAMSUNG, 1)
 
+/*
+ * Tiled, 16 (pixels) x 16 (lines) - sized macroblocks
+ *
+ * This is a simple tiled layout using tiles of 16x16 pixels in a row-major
+ * layout. For YCbCr formats Cb/Cr components are taken in such a way that
+ * they correspond to their 16x16 luma block.
+ */
+#define DRM_FORMAT_MOD_SAMSUNG_16_16_TILE	fourcc_mod_code(SAMSUNG, 2)
+
+/*
+ * Qualcomm Compressed Format
+ *
+ * Refers to a compressed variant of the base format that is compressed.
+ * Implementation may be platform and base-format specific.
+ *
+ * Each macrotile consists of m x n (mostly 4 x 4) tiles.
+ * Pixel data pitch/stride is aligned with macrotile width.
+ * Pixel data height is aligned with macrotile height.
+ * Entire pixel data buffer is aligned with 4k(bytes).
+ */
+#define DRM_FORMAT_MOD_QCOM_COMPRESSED	fourcc_mod_code(QCOM, 1)
+
 /* Vivante framebuffer modifiers */
 
 /*
@@ -384,6 +446,23 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
 	fourcc_mod_code(NVIDIA, 0x15)
 
+/*
+ * Some Broadcom modifiers take parameters, for example the number of
+ * vertical lines in the image. Reserve the lower 32 bits for modifier
+ * type, and the next 24 bits for parameters. Top 8 bits are the
+ * vendor code.
+ */
+#define __fourcc_mod_broadcom_param_shift 8
+#define __fourcc_mod_broadcom_param_bits 48
+#define fourcc_mod_broadcom_code(val, params) \
+	fourcc_mod_code(BROADCOM, ((((__u64)params) << __fourcc_mod_broadcom_param_shift) | val))
+#define fourcc_mod_broadcom_param(m) \
+	((int)(((m) >> __fourcc_mod_broadcom_param_shift) &	\
+	       ((1ULL << __fourcc_mod_broadcom_param_bits) - 1)))
+#define fourcc_mod_broadcom_mod(m) \
+	((m) & ~(((1ULL << __fourcc_mod_broadcom_param_bits) - 1) <<	\
+		 __fourcc_mod_broadcom_param_shift))
+
 /*
  * Broadcom VC4 "T" format
  *
@@ -405,6 +484,151 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_BROADCOM_VC4_T_TILED fourcc_mod_code(BROADCOM, 1)
 
+/*
+ * Broadcom SAND format
+ *
+ * This is the native format that the H.264 codec block uses.  For VC4
+ * HVS, it is only valid for H.264 (NV12/21) and RGBA modes.
+ *
+ * The image can be considered to be split into columns, and the
+ * columns are placed consecutively into memory.  The width of those
+ * columns can be either 32, 64, 128, or 256 pixels, but in practice
+ * only 128 pixel columns are used.
+ *
+ * The pitch between the start of each column is set to optimally
+ * switch between SDRAM banks. This is passed as the number of lines
+ * of column width in the modifier (we can't use the stride value due
+ * to various core checks that look at it , so you should set the
+ * stride to width*cpp).
+ *
+ * Note that the column height for this format modifier is the same
+ * for all of the planes, assuming that each column contains both Y
+ * and UV.  Some SAND-using hardware stores UV in a separate tiled
+ * image from Y to reduce the column height, which is not supported
+ * with these modifiers.
+ */
+
+#define DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(2, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND64_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(3, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(4, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND256_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(5, v)
+
+#define DRM_FORMAT_MOD_BROADCOM_SAND32 \
+	DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND64 \
+	DRM_FORMAT_MOD_BROADCOM_SAND64_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND128 \
+	DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND256 \
+	DRM_FORMAT_MOD_BROADCOM_SAND256_COL_HEIGHT(0)
+
+/* Broadcom UIF format
+ *
+ * This is the common format for the current Broadcom multimedia
+ * blocks, including V3D 3.x and newer, newer video codecs, and
+ * displays.
+ *
+ * The image consists of utiles (64b blocks), UIF blocks (2x2 utiles),
+ * and macroblocks (4x4 UIF blocks).  Those 4x4 UIF block groups are
+ * stored in columns, with padding between the columns to ensure that
+ * moving from one column to the next doesn't hit the same SDRAM page
+ * bank.
+ *
+ * To calculate the padding, it is assumed that each hardware block
+ * and the software driving it knows the platform's SDRAM page size,
+ * number of banks, and XOR address, and that it's identical between
+ * all blocks using the format.  This tiling modifier will use XOR as
+ * necessary to reduce the padding.  If a hardware block can't do XOR,
+ * the assumption is that a no-XOR tiling modifier will be created.
+ */
+#define DRM_FORMAT_MOD_BROADCOM_UIF fourcc_mod_code(BROADCOM, 6)
+
+/*
+ * Arm Framebuffer Compression (AFBC) modifiers
+ *
+ * AFBC is a proprietary lossless image compression protocol and format.
+ * It provides fine-grained random access and minimizes the amount of data
+ * transferred between IP blocks.
+ *
+ * AFBC has several features which may be supported and/or used, which are
+ * represented using bits in the modifier. Not all combinations are valid,
+ * and different devices or use-cases may support different combinations.
+ */
+#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode)	fourcc_mod_code(ARM, __afbc_mode)
+
+/*
+ * AFBC superblock size
+ *
+ * Indicates the superblock size(s) used for the AFBC buffer. The buffer
+ * size (in pixels) must be aligned to a multiple of the superblock size.
+ * Four lowest significant bits(LSBs) are reserved for block size.
+ */
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_MASK      0xf
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_16x16     (1ULL)
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_32x8      (2ULL)
+
+/*
+ * AFBC lossless colorspace transform
+ *
+ * Indicates that the buffer makes use of the AFBC lossless colorspace
+ * transform.
+ */
+#define AFBC_FORMAT_MOD_YTR     (1ULL <<  4)
+
+/*
+ * AFBC block-split
+ *
+ * Indicates that the payload of each superblock is split. The second
+ * half of the payload is positioned at a predefined offset from the start
+ * of the superblock payload.
+ */
+#define AFBC_FORMAT_MOD_SPLIT   (1ULL <<  5)
+
+/*
+ * AFBC sparse layout
+ *
+ * This flag indicates that the payload of each superblock must be stored at a
+ * predefined position relative to the other superblocks in the same AFBC
+ * buffer. This order is the same order used by the header buffer. In this mode
+ * each superblock is given the same amount of space as an uncompressed
+ * superblock of the particular format would require, rounding up to the next
+ * multiple of 128 bytes in size.
+ */
+#define AFBC_FORMAT_MOD_SPARSE  (1ULL <<  6)
+
+/*
+ * AFBC copy-block restrict
+ *
+ * Buffers with this flag must obey the copy-block restriction. The restriction
+ * is such that there are no copy-blocks referring across the border of 8x8
+ * blocks. For the subsampled data the 8x8 limitation is also subsampled.
+ */
+#define AFBC_FORMAT_MOD_CBR     (1ULL <<  7)
+
+/*
+ * AFBC tiled layout
+ *
+ * The tiled layout groups superblocks in 8x8 or 4x4 tiles, where all
+ * superblocks inside a tile are stored together in memory. 8x8 tiles are used
+ * for pixel formats up to and including 32 bpp while 4x4 tiles are used for
+ * larger bpp formats. The order between the tiles is scan line.
+ * When the tiled layout is used, the buffer size (in pixels) must be aligned
+ * to the tile size.
+ */
+#define AFBC_FORMAT_MOD_TILED   (1ULL <<  8)
+
+/*
+ * AFBC solid color blocks
+ *
+ * Indicates that the buffer makes use of solid-color blocks, whereby bandwidth
+ * can be reduced if a whole superblock is a single color.
+ */
+#define AFBC_FORMAT_MOD_SC      (1ULL <<  9)
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/drm_mode.h b/include/drm-uapi/drm_mode.h
index 2c575794fb52..d3e0fe31efc5 100644
--- a/include/drm-uapi/drm_mode.h
+++ b/include/drm-uapi/drm_mode.h
@@ -93,6 +93,15 @@ extern "C" {
 #define DRM_MODE_PICTURE_ASPECT_NONE		0
 #define DRM_MODE_PICTURE_ASPECT_4_3		1
 #define DRM_MODE_PICTURE_ASPECT_16_9		2
+#define DRM_MODE_PICTURE_ASPECT_64_27		3
+#define DRM_MODE_PICTURE_ASPECT_256_135		4
+
+/* Content type options */
+#define DRM_MODE_CONTENT_TYPE_NO_DATA		0
+#define DRM_MODE_CONTENT_TYPE_GRAPHICS		1
+#define DRM_MODE_CONTENT_TYPE_PHOTO		2
+#define DRM_MODE_CONTENT_TYPE_CINEMA		3
+#define DRM_MODE_CONTENT_TYPE_GAME		4
 
 /* Aspect ratio flag bitmask (4 bits 22:19) */
 #define DRM_MODE_FLAG_PIC_AR_MASK		(0x0F<<19)
@@ -102,6 +111,10 @@ extern "C" {
 			(DRM_MODE_PICTURE_ASPECT_4_3<<19)
 #define  DRM_MODE_FLAG_PIC_AR_16_9 \
 			(DRM_MODE_PICTURE_ASPECT_16_9<<19)
+#define  DRM_MODE_FLAG_PIC_AR_64_27 \
+			(DRM_MODE_PICTURE_ASPECT_64_27<<19)
+#define  DRM_MODE_FLAG_PIC_AR_256_135 \
+			(DRM_MODE_PICTURE_ASPECT_256_135<<19)
 
 #define  DRM_MODE_FLAG_ALL	(DRM_MODE_FLAG_PHSYNC |		\
 				 DRM_MODE_FLAG_NHSYNC |		\
@@ -173,8 +186,9 @@ extern "C" {
 /*
  * DRM_MODE_REFLECT_<axis>
  *
- * Signals that the contents of a drm plane is reflected in the <axis> axis,
+ * Signals that the contents of a drm plane is reflected along the <axis> axis,
  * in the same way as mirroring.
+ * See kerneldoc chapter "Plane Composition Properties" for more details.
  *
  * This define is provided as a convenience, looking up the property id
  * using the name->prop id lookup is the preferred method.
@@ -338,6 +352,7 @@ enum drm_mode_subconnector {
 #define DRM_MODE_CONNECTOR_VIRTUAL      15
 #define DRM_MODE_CONNECTOR_DSI		16
 #define DRM_MODE_CONNECTOR_DPI		17
+#define DRM_MODE_CONNECTOR_WRITEBACK	18
 
 struct drm_mode_get_connector {
 
@@ -363,7 +378,7 @@ struct drm_mode_get_connector {
 	__u32 pad;
 };
 
-#define DRM_MODE_PROP_PENDING	(1<<0)
+#define DRM_MODE_PROP_PENDING	(1<<0) /* deprecated, do not use */
 #define DRM_MODE_PROP_RANGE	(1<<1)
 #define DRM_MODE_PROP_IMMUTABLE	(1<<2)
 #define DRM_MODE_PROP_ENUM	(1<<3) /* enumerated type with text strings */
@@ -598,8 +613,11 @@ struct drm_mode_crtc_lut {
 };
 
 struct drm_color_ctm {
-	/* Conversion matrix in S31.32 format. */
-	__s64 matrix[9];
+	/*
+	 * Conversion matrix in S31.32 sign-magnitude
+	 * (not two's complement!) format.
+	 */
+	__u64 matrix[9];
 };
 
 struct drm_color_lut {
diff --git a/include/drm-uapi/etnaviv_drm.h b/include/drm-uapi/etnaviv_drm.h
index e9b997a0ef27..0d5c49dc478c 100644
--- a/include/drm-uapi/etnaviv_drm.h
+++ b/include/drm-uapi/etnaviv_drm.h
@@ -55,6 +55,12 @@ struct drm_etnaviv_timespec {
 #define ETNAVIV_PARAM_GPU_FEATURES_4                0x07
 #define ETNAVIV_PARAM_GPU_FEATURES_5                0x08
 #define ETNAVIV_PARAM_GPU_FEATURES_6                0x09
+#define ETNAVIV_PARAM_GPU_FEATURES_7                0x0a
+#define ETNAVIV_PARAM_GPU_FEATURES_8                0x0b
+#define ETNAVIV_PARAM_GPU_FEATURES_9                0x0c
+#define ETNAVIV_PARAM_GPU_FEATURES_10               0x0d
+#define ETNAVIV_PARAM_GPU_FEATURES_11               0x0e
+#define ETNAVIV_PARAM_GPU_FEATURES_12               0x0f
 
 #define ETNAVIV_PARAM_GPU_STREAM_COUNT              0x10
 #define ETNAVIV_PARAM_GPU_REGISTER_MAX              0x11
diff --git a/include/drm-uapi/exynos_drm.h b/include/drm-uapi/exynos_drm.h
index a00116b5cc5c..7414cfd76419 100644
--- a/include/drm-uapi/exynos_drm.h
+++ b/include/drm-uapi/exynos_drm.h
@@ -135,6 +135,219 @@ struct drm_exynos_g2d_exec {
 	__u64					async;
 };
 
+/* Exynos DRM IPP v2 API */
+
+/**
+ * Enumerate available IPP hardware modules.
+ *
+ * @count_ipps: size of ipp_id array / number of ipp modules (set by driver)
+ * @reserved: padding
+ * @ipp_id_ptr: pointer to ipp_id array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_res {
+	__u32 count_ipps;
+	__u32 reserved;
+	__u64 ipp_id_ptr;
+};
+
+enum drm_exynos_ipp_format_type {
+	DRM_EXYNOS_IPP_FORMAT_SOURCE		= 0x01,
+	DRM_EXYNOS_IPP_FORMAT_DESTINATION	= 0x02,
+};
+
+struct drm_exynos_ipp_format {
+	__u32 fourcc;
+	__u32 type;
+	__u64 modifier;
+};
+
+enum drm_exynos_ipp_capability {
+	DRM_EXYNOS_IPP_CAP_CROP		= 0x01,
+	DRM_EXYNOS_IPP_CAP_ROTATE	= 0x02,
+	DRM_EXYNOS_IPP_CAP_SCALE	= 0x04,
+	DRM_EXYNOS_IPP_CAP_CONVERT	= 0x08,
+};
+
+/**
+ * Get IPP hardware capabilities and supported image formats.
+ *
+ * @ipp_id: id of IPP module to query
+ * @capabilities: bitmask of drm_exynos_ipp_capability (set by driver)
+ * @reserved: padding
+ * @formats_count: size of formats array (in entries) / number of filled
+ *		   formats (set by driver)
+ * @formats_ptr: pointer to formats array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_caps {
+	__u32 ipp_id;
+	__u32 capabilities;
+	__u32 reserved;
+	__u32 formats_count;
+	__u64 formats_ptr;
+};
+
+enum drm_exynos_ipp_limit_type {
+	/* size (horizontal/vertial) limits, in pixels (min, max, alignment) */
+	DRM_EXYNOS_IPP_LIMIT_TYPE_SIZE		= 0x0001,
+	/* scale ratio (horizonta/vertial), 16.16 fixed point (min, max) */
+	DRM_EXYNOS_IPP_LIMIT_TYPE_SCALE		= 0x0002,
+
+	/* image buffer area */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_BUFFER	= 0x0001 << 16,
+	/* src/dst rectangle area */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_AREA		= 0x0002 << 16,
+	/* src/dst rectangle area when rotation enabled */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_ROTATED	= 0x0003 << 16,
+
+	DRM_EXYNOS_IPP_LIMIT_TYPE_MASK		= 0x000f,
+	DRM_EXYNOS_IPP_LIMIT_SIZE_MASK		= 0x000f << 16,
+};
+
+struct drm_exynos_ipp_limit_val {
+	__u32 min;
+	__u32 max;
+	__u32 align;
+	__u32 reserved;
+};
+
+/**
+ * IPP module limitation.
+ *
+ * @type: limit type (see drm_exynos_ipp_limit_type enum)
+ * @reserved: padding
+ * @h: horizontal limits
+ * @v: vertical limits
+ */
+struct drm_exynos_ipp_limit {
+	__u32 type;
+	__u32 reserved;
+	struct drm_exynos_ipp_limit_val h;
+	struct drm_exynos_ipp_limit_val v;
+};
+
+/**
+ * Get IPP limits for given image format.
+ *
+ * @ipp_id: id of IPP module to query
+ * @fourcc: image format code (see DRM_FORMAT_* in drm_fourcc.h)
+ * @modifier: image format modifier (see DRM_FORMAT_MOD_* in drm_fourcc.h)
+ * @type: source/destination identifier (drm_exynos_ipp_format_flag enum)
+ * @limits_count: size of limits array (in entries) / number of filled entries
+ *		 (set by driver)
+ * @limits_ptr: pointer to limits array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_limits {
+	__u32 ipp_id;
+	__u32 fourcc;
+	__u64 modifier;
+	__u32 type;
+	__u32 limits_count;
+	__u64 limits_ptr;
+};
+
+enum drm_exynos_ipp_task_id {
+	/* buffer described by struct drm_exynos_ipp_task_buffer */
+	DRM_EXYNOS_IPP_TASK_BUFFER		= 0x0001,
+	/* rectangle described by struct drm_exynos_ipp_task_rect */
+	DRM_EXYNOS_IPP_TASK_RECTANGLE		= 0x0002,
+	/* transformation described by struct drm_exynos_ipp_task_transform */
+	DRM_EXYNOS_IPP_TASK_TRANSFORM		= 0x0003,
+	/* alpha configuration described by struct drm_exynos_ipp_task_alpha */
+	DRM_EXYNOS_IPP_TASK_ALPHA		= 0x0004,
+
+	/* source image data (for buffer and rectangle chunks) */
+	DRM_EXYNOS_IPP_TASK_TYPE_SOURCE		= 0x0001 << 16,
+	/* destination image data (for buffer and rectangle chunks) */
+	DRM_EXYNOS_IPP_TASK_TYPE_DESTINATION	= 0x0002 << 16,
+};
+
+/**
+ * Memory buffer with image data.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_BUFFER
+ * other parameters are same as for AddFB2 generic DRM ioctl
+ */
+struct drm_exynos_ipp_task_buffer {
+	__u32	id;
+	__u32	fourcc;
+	__u32	width, height;
+	__u32	gem_id[4];
+	__u32	offset[4];
+	__u32	pitch[4];
+	__u64	modifier;
+};
+
+/**
+ * Rectangle for processing.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_RECTANGLE
+ * @reserved: padding
+ * @x,@y: left corner in pixels
+ * @w,@h: width/height in pixels
+ */
+struct drm_exynos_ipp_task_rect {
+	__u32	id;
+	__u32	reserved;
+	__u32	x;
+	__u32	y;
+	__u32	w;
+	__u32	h;
+};
+
+/**
+ * Image tranformation description.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_TRANSFORM
+ * @rotation: DRM_MODE_ROTATE_* and DRM_MODE_REFLECT_* values
+ */
+struct drm_exynos_ipp_task_transform {
+	__u32	id;
+	__u32	rotation;
+};
+
+/**
+ * Image global alpha configuration for formats without alpha values.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_ALPHA
+ * @value: global alpha value (0-255)
+ */
+struct drm_exynos_ipp_task_alpha {
+	__u32	id;
+	__u32	value;
+};
+
+enum drm_exynos_ipp_flag {
+	/* generate DRM event after processing */
+	DRM_EXYNOS_IPP_FLAG_EVENT	= 0x01,
+	/* dry run, only check task parameters */
+	DRM_EXYNOS_IPP_FLAG_TEST_ONLY	= 0x02,
+	/* non-blocking processing */
+	DRM_EXYNOS_IPP_FLAG_NONBLOCK	= 0x04,
+};
+
+#define DRM_EXYNOS_IPP_FLAGS (DRM_EXYNOS_IPP_FLAG_EVENT |\
+		DRM_EXYNOS_IPP_FLAG_TEST_ONLY | DRM_EXYNOS_IPP_FLAG_NONBLOCK)
+
+/**
+ * Perform image processing described by array of drm_exynos_ipp_task_*
+ * structures (parameters array).
+ *
+ * @ipp_id: id of IPP module to run the task
+ * @flags: bitmask of drm_exynos_ipp_flag values
+ * @reserved: padding
+ * @params_size: size of parameters array (in bytes)
+ * @params_ptr: pointer to parameters array or NULL
+ * @user_data: (optional) data for drm event
+ */
+struct drm_exynos_ioctl_ipp_commit {
+	__u32 ipp_id;
+	__u32 flags;
+	__u32 reserved;
+	__u32 params_size;
+	__u64 params_ptr;
+	__u64 user_data;
+};
+
 #define DRM_EXYNOS_GEM_CREATE		0x00
 #define DRM_EXYNOS_GEM_MAP		0x01
 /* Reserved 0x03 ~ 0x05 for exynos specific gem ioctl */
@@ -147,6 +360,11 @@ struct drm_exynos_g2d_exec {
 #define DRM_EXYNOS_G2D_EXEC		0x22
 
 /* Reserved 0x30 ~ 0x33 for obsolete Exynos IPP ioctls */
+/* IPP - Image Post Processing */
+#define DRM_EXYNOS_IPP_GET_RESOURCES	0x40
+#define DRM_EXYNOS_IPP_GET_CAPS		0x41
+#define DRM_EXYNOS_IPP_GET_LIMITS	0x42
+#define DRM_EXYNOS_IPP_COMMIT		0x43
 
 #define DRM_IOCTL_EXYNOS_GEM_CREATE		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_GEM_CREATE, struct drm_exynos_gem_create)
@@ -165,8 +383,20 @@ struct drm_exynos_g2d_exec {
 #define DRM_IOCTL_EXYNOS_G2D_EXEC		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_G2D_EXEC, struct drm_exynos_g2d_exec)
 
+#define DRM_IOCTL_EXYNOS_IPP_GET_RESOURCES	DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_RESOURCES, \
+		struct drm_exynos_ioctl_ipp_get_res)
+#define DRM_IOCTL_EXYNOS_IPP_GET_CAPS		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_CAPS, struct drm_exynos_ioctl_ipp_get_caps)
+#define DRM_IOCTL_EXYNOS_IPP_GET_LIMITS		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_LIMITS, \
+		struct drm_exynos_ioctl_ipp_get_limits)
+#define DRM_IOCTL_EXYNOS_IPP_COMMIT		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_COMMIT, struct drm_exynos_ioctl_ipp_commit)
+
 /* EXYNOS specific events */
 #define DRM_EXYNOS_G2D_EVENT		0x80000000
+#define DRM_EXYNOS_IPP_EVENT		0x80000002
 
 struct drm_exynos_g2d_event {
 	struct drm_event	base;
@@ -177,6 +407,16 @@ struct drm_exynos_g2d_event {
 	__u32			reserved;
 };
 
+struct drm_exynos_ipp_event {
+	struct drm_event	base;
+	__u64			user_data;
+	__u32			tv_sec;
+	__u32			tv_usec;
+	__u32			ipp_id;
+	__u32			sequence;
+	__u64			reserved;
+};
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 16e452aa12d4..b14ca9695f1e 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -62,6 +62,26 @@ extern "C" {
 #define I915_ERROR_UEVENT		"ERROR"
 #define I915_RESET_UEVENT		"RESET"
 
+/*
+ * i915_user_extension: Base class for defining a chain of extensions
+ *
+ * Many interfaces need to grow over time. In most cases we can simply
+ * extend the struct and have userspace pass in more data. Another option,
+ * as demonstrated by Vulkan's approach to providing extensions for forward
+ * and backward compatibility, is to use a list of optional structs to
+ * provide those extra details.
+ *
+ * The key advantage to using an extension chain is that it allows us to
+ * redefine the interface more easily than an ever growing struct of
+ * increasing complexity, and for large parts of that interface to be
+ * entirely optional. The downside is more pointer chasing; chasing across
+ * the boundary with pointers encapsulated inside u64.
+ */
+struct i915_user_extension {
+	__u64 next_extension;
+	__u64 name;
+};
+
 /*
  * MOCS indexes used for GPU surfaces, defining the cacheability of the
  * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
@@ -367,6 +387,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
 #define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
 #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_v2	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_v2)
 #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
 #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
 #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
@@ -412,6 +433,14 @@ typedef struct drm_i915_irq_wait {
 	int irq_seq;
 } drm_i915_irq_wait_t;
 
+/*
+ * Different modes of per-process Graphics Translation Table,
+ * see I915_PARAM_HAS_ALIASING_PPGTT
+ */
+#define I915_GEM_PPGTT_NONE	0
+#define I915_GEM_PPGTT_ALIASING	1
+#define I915_GEM_PPGTT_FULL	2
+
 /* Ioctl to query kernel params:
  */
 #define I915_PARAM_IRQ_ACTIVE            1
@@ -529,6 +558,35 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51
 
+/*
+ * Once upon a time we supposed that writes through the GGTT would be
+ * immediately in physical memory (once flushed out of the CPU path). However,
+ * on a few different processors and chipsets, this is not necessarily the case
+ * as the writes appear to be buffered internally. Thus a read of the backing
+ * storage (physical memory) via a different path (with different physical tags
+ * to the indirect write via the GGTT) will see stale values from before
+ * the GGTT write. Inside the kernel, we can for the most part keep track of
+ * the different read/write domains in use (e.g. set-domain), but the assumption
+ * of coherency is baked into the ABI, hence reporting its true state in this
+ * parameter.
+ *
+ * Reports true when writes via mmap_gtt are immediately visible following an
+ * lfence to flush the WCB.
+ *
+ * Reports false when writes via mmap_gtt are indeterminately delayed in an in
+ * internal buffer and are _not_ immediately visible to third parties accessing
+ * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC
+ * communications channel when reporting false is strongly disadvised.
+ */
+#define I915_PARAM_MMAP_GTT_COHERENT	52
+
+/*
+ * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel
+ * execution through use of explicit fence support.
+ * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT.
+ */
+#define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
@@ -942,7 +1000,7 @@ struct drm_i915_gem_execbuffer2 {
 	 * struct drm_i915_gem_exec_fence *fences.
 	 */
 	__u64 cliprects_ptr;
-#define I915_EXEC_RING_MASK              (7<<0)
+#define I915_EXEC_RING_MASK              (0x3f)
 #define I915_EXEC_DEFAULT                (0<<0)
 #define I915_EXEC_RENDER                 (1<<0)
 #define I915_EXEC_BSD                    (2<<0)
@@ -1048,7 +1106,16 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_ARRAY   (1<<19)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+/*
+ * Setting I915_EXEC_FENCE_SUBMIT implies that lower_32_bits(rsvd2) represent
+ * a sync_file fd to wait upon (in a nonblocking manner) prior to executing
+ * the batch.
+ *
+ * Returns -EINVAL if the sync_file fd cannot be found.
+ */
+#define I915_EXEC_FENCE_SUBMIT		(1<<20)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
@@ -1387,6 +1454,16 @@ struct drm_i915_gem_context_create {
 	__u32 pad;
 };
 
+struct drm_i915_gem_context_create_v2 {
+	/*  output: id of new context*/
+	__u32 ctx_id;
+	__u32 flags;
+#define I915_GEM_CONTEXT_SHARE_GTT		0x1
+#define I915_GEM_CONTEXT_SINGLE_TIMELINE	0x2
+	__u32 share_ctx;
+	__u32 pad;
+};
+
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
@@ -1456,9 +1533,122 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+
+/*
+ * I915_CONTEXT_PARAM_ENGINES:
+ *
+ * Bind this context to operate on this subset of available engines. Henceforth,
+ * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
+ * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
+ * and upwards. The array created is offset by 1, such that by default
+ * I915_EXEC_DEFAULT is left empty, to be filled in as directed. Slots 1...N
+ * are then filled in using the specified (class, instance).
+ *
+ * Setting the number of engines bound to the context will revert back to
+ * default settings.
+ *
+ * See struct i915_context_param_engines.
+ *
+ * Extensions:
+ *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
+ *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
+ */
+#define I915_CONTEXT_PARAM_ENGINES	0x7
+
+/*
+ * When using the following param, value should be a pointer to
+ * drm_i915_gem_context_param_sseu.
+ */
+#define I915_CONTEXT_PARAM_SSEU		0x8
+
 	__u64 value;
 };
 
+/*
+ * i915_context_engines_load_balance:
+ *
+ * Enable load balancing across this set of engines.
+ *
+ * Into the I915_EXEC_DEFAULT slot, a virtual engine is created that when
+ * used will proxy the execbuffer request onto one of the set of engines
+ * in such a way as to distribute the load evenly across the set.
+ *
+ * The set of engines must be compatible (e.g. the same HW class) as they
+ * will share the same logical GPU context and ring.
+ *
+ * The context must be defined to use a single timeline for all engines.
+ */
+struct i915_context_engines_load_balance {
+	struct i915_user_extension base;
+
+	__u64 flags; /* all undefined flags must be zero */
+	__u64 engines_mask;
+
+	__u64 mbz[4]; /* reserved for future use; must be zero */
+};
+
+/*
+ * i915_context_engines_bond:
+ *
+ */
+struct i915_context_engines_bond {
+	struct i915_user_extension base;
+
+	__u16 master_class;
+	__u16 master_instance;
+	__u32 flags; /* all undefined flags must be zero */
+	__u64 sibling_mask;
+};
+
+struct i915_context_param_engines {
+	__u64 extensions;
+#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
+#define I915_CONTEXT_ENGINES_EXT_BOND 1
+
+	struct {
+		__u16 class; /* see enum drm_i915_gem_engine_class */
+		__u16 instance;
+	} class_instance[0];
+};
+
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u16 class;
+	__u16 instance;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd1;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u64 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u64 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u16 min_eus_per_subslice;
+	__u16 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd2;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
@@ -1620,6 +1810,7 @@ struct drm_i915_perf_oa_config {
 struct drm_i915_query_item {
 	__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO    1
+#define DRM_I915_QUERY_ENGINE_INFO	2
 
 	/*
 	 * When set to zero by userspace, this is filled with the size of the
@@ -1717,6 +1908,50 @@ struct drm_i915_query_topology_info {
 	__u8 data[];
 };
 
+/**
+ * struct drm_i915_engine_info
+ *
+ * Describes one engine and it's capabilities as known to the driver.
+ */
+struct drm_i915_engine_info {
+	/** Engine class as in enum drm_i915_gem_engine_class. */
+	__u16 class;
+
+	/** Engine instance number. */
+	__u16 instance;
+
+	/** Reserved field. */
+	__u32 rsvd0;
+
+	/** Engine flags. */
+	__u64 flags;
+
+	/** Capabilities of this engine. */
+	__u64 capabilities;
+#define I915_VIDEO_CLASS_CAPABILITY_HEVC		(1 << 0)
+#define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC	(1 << 1)
+
+	/** Reserved fields. */
+	__u64 rsvd1[4];
+};
+
+/**
+ * struct drm_i915_query_engine_info
+ *
+ * Engine info query enumerates all engines known to the driver by filling in
+ * an array of struct drm_i915_engine_info structures.
+ */
+struct drm_i915_query_engine_info {
+	/** Number of struct drm_i915_engine_info structs following. */
+	__u32 num_engines;
+
+	/** MBZ */
+	__u32 rsvd[3];
+
+	/** Marker for drm_i915_engine_info structures. */
+	struct drm_i915_engine_info engines[];
+};
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/msm_drm.h b/include/drm-uapi/msm_drm.h
index bbbaffad772d..c06d0a5bdd80 100644
--- a/include/drm-uapi/msm_drm.h
+++ b/include/drm-uapi/msm_drm.h
@@ -201,10 +201,12 @@ struct drm_msm_gem_submit_bo {
 #define MSM_SUBMIT_NO_IMPLICIT   0x80000000 /* disable implicit sync */
 #define MSM_SUBMIT_FENCE_FD_IN   0x40000000 /* enable input fence_fd */
 #define MSM_SUBMIT_FENCE_FD_OUT  0x20000000 /* enable output fence_fd */
+#define MSM_SUBMIT_SUDO          0x10000000 /* run submitted cmds from RB */
 #define MSM_SUBMIT_FLAGS                ( \
 		MSM_SUBMIT_NO_IMPLICIT   | \
 		MSM_SUBMIT_FENCE_FD_IN   | \
 		MSM_SUBMIT_FENCE_FD_OUT  | \
+		MSM_SUBMIT_SUDO          | \
 		0)
 
 /* Each cmdstream submit consists of a table of buffers involved, and
diff --git a/include/drm-uapi/sync_file.h b/include/drm-uapi/sync_file.h
deleted file mode 100644
index b4f2db009347..000000000000
--- a/include/drm-uapi/sync_file.h
+++ /dev/null
@@ -1,98 +0,0 @@
-/* SPDX-License-Identifier: GPL-1.0+ WITH Linux-syscall-note */
-/*
- * Copyright (C) 2012 Google, Inc.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#ifndef _LINUX_SYNC_H
-#define _LINUX_SYNC_H
-
-#include <linux/ioctl.h>
-#include <linux/types.h>
-
-/**
- * struct sync_merge_data - data passed to merge ioctl
- * @name:	name of new fence
- * @fd2:	file descriptor of second fence
- * @fence:	returns the fd of the new fence to userspace
- * @flags:	merge_data flags
- * @pad:	padding for 64-bit alignment, should always be zero
- */
-struct sync_merge_data {
-	char	name[32];
-	__s32	fd2;
-	__s32	fence;
-	__u32	flags;
-	__u32	pad;
-};
-
-/**
- * struct sync_fence_info - detailed fence information
- * @obj_name:		name of parent sync_timeline
-* @driver_name:	name of driver implementing the parent
-* @status:		status of the fence 0:active 1:signaled <0:error
- * @flags:		fence_info flags
- * @timestamp_ns:	timestamp of status change in nanoseconds
- */
-struct sync_fence_info {
-	char	obj_name[32];
-	char	driver_name[32];
-	__s32	status;
-	__u32	flags;
-	__u64	timestamp_ns;
-};
-
-/**
- * struct sync_file_info - data returned from fence info ioctl
- * @name:	name of fence
- * @status:	status of fence. 1: signaled 0:active <0:error
- * @flags:	sync_file_info flags
- * @num_fences	number of fences in the sync_file
- * @pad:	padding for 64-bit alignment, should always be zero
- * @sync_fence_info: pointer to array of structs sync_fence_info with all
- *		 fences in the sync_file
- */
-struct sync_file_info {
-	char	name[32];
-	__s32	status;
-	__u32	flags;
-	__u32	num_fences;
-	__u32	pad;
-
-	__u64	sync_fence_info;
-};
-
-#define SYNC_IOC_MAGIC		'>'
-
-/**
- * Opcodes  0, 1 and 2 were burned during a API change to avoid users of the
- * old API to get weird errors when trying to handling sync_files. The API
- * change happened during the de-stage of the Sync Framework when there was
- * no upstream users available.
- */
-
-/**
- * DOC: SYNC_IOC_MERGE - merge two fences
- *
- * Takes a struct sync_merge_data.  Creates a new fence containing copies of
- * the sync_pts in both the calling fd and sync_merge_data.fd2.  Returns the
- * new fence's fd in sync_merge_data.fence
- */
-#define SYNC_IOC_MERGE		_IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data)
-
-/**
- * DOC: SYNC_IOC_FILE_INFO - get detailed information on a sync_file
- *
- * Takes a struct sync_file_info. If num_fences is 0, the field is updated
- * with the actual number of fences. If num_fences is > 0, the system will
- * use the pointer provided on sync_fence_info to return up to num_fences of
- * struct sync_fence_info, with detailed fence information.
- */
-#define SYNC_IOC_FILE_INFO	_IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)
-
-#endif /* _LINUX_SYNC_H */
diff --git a/include/drm-uapi/tegra_drm.h b/include/drm-uapi/tegra_drm.h
index 12f9bf848db1..6c07919c04e9 100644
--- a/include/drm-uapi/tegra_drm.h
+++ b/include/drm-uapi/tegra_drm.h
@@ -32,143 +32,615 @@ extern "C" {
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
+/**
+ * struct drm_tegra_gem_create - parameters for the GEM object creation IOCTL
+ */
 struct drm_tegra_gem_create {
+	/**
+	 * @size:
+	 *
+	 * The size, in bytes, of the buffer object to be created.
+	 */
 	__u64 size;
+
+	/**
+	 * @flags:
+	 *
+	 * A bitmask of flags that influence the creation of GEM objects:
+	 *
+	 * DRM_TEGRA_GEM_CREATE_TILED
+	 *   Use the 16x16 tiling format for this buffer.
+	 *
+	 * DRM_TEGRA_GEM_CREATE_BOTTOM_UP
+	 *   The buffer has a bottom-up layout.
+	 */
 	__u32 flags;
+
+	/**
+	 * @handle:
+	 *
+	 * The handle of the created GEM object. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u32 handle;
 };
 
+/**
+ * struct drm_tegra_gem_mmap - parameters for the GEM mmap IOCTL
+ */
 struct drm_tegra_gem_mmap {
+	/**
+	 * @handle:
+	 *
+	 * Handle of the GEM object to obtain an mmap offset for.
+	 */
 	__u32 handle;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
+
+	/**
+	 * @offset:
+	 *
+	 * The mmap offset for the given GEM object. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u64 offset;
 };
 
+/**
+ * struct drm_tegra_syncpt_read - parameters for the read syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_read {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to read the current value from.
+	 */
 	__u32 id;
+
+	/**
+	 * @value:
+	 *
+	 * The current syncpoint value. Set by the kernel upon successful
+	 * completion of the IOCTL.
+	 */
 	__u32 value;
 };
 
+/**
+ * struct drm_tegra_syncpt_incr - parameters for the increment syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_incr {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to increment.
+	 */
 	__u32 id;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_syncpt_wait - parameters for the wait syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_wait {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to wait on.
+	 */
 	__u32 id;
+
+	/**
+	 * @thresh:
+	 *
+	 * Threshold value for which to wait.
+	 */
 	__u32 thresh;
+
+	/**
+	 * @timeout:
+	 *
+	 * Timeout, in milliseconds, to wait.
+	 */
 	__u32 timeout;
+
+	/**
+	 * @value:
+	 *
+	 * The new syncpoint value after the wait. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u32 value;
 };
 
 #define DRM_TEGRA_NO_TIMEOUT	(0xffffffff)
 
+/**
+ * struct drm_tegra_open_channel - parameters for the open channel IOCTL
+ */
 struct drm_tegra_open_channel {
+	/**
+	 * @client:
+	 *
+	 * The client ID for this channel.
+	 */
 	__u32 client;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
+
+	/**
+	 * @context:
+	 *
+	 * The application context of this channel. Set by the kernel upon
+	 * successful completion of the IOCTL. This context needs to be passed
+	 * to the DRM_TEGRA_CHANNEL_CLOSE or the DRM_TEGRA_SUBMIT IOCTLs.
+	 */
 	__u64 context;
 };
 
+/**
+ * struct drm_tegra_close_channel - parameters for the close channel IOCTL
+ */
 struct drm_tegra_close_channel {
+	/**
+	 * @context:
+	 *
+	 * The application context of this channel. This is obtained from the
+	 * DRM_TEGRA_OPEN_CHANNEL IOCTL.
+	 */
 	__u64 context;
 };
 
+/**
+ * struct drm_tegra_get_syncpt - parameters for the get syncpoint IOCTL
+ */
 struct drm_tegra_get_syncpt {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying the channel for which to obtain
+	 * the syncpoint ID.
+	 */
 	__u64 context;
+
+	/**
+	 * @index:
+	 *
+	 * Index of the client syncpoint for which to obtain the ID.
+	 */
 	__u32 index;
+
+	/**
+	 * @id:
+	 *
+	 * The ID of the given syncpoint. Set by the kernel upon successful
+	 * completion of the IOCTL.
+	 */
 	__u32 id;
 };
 
+/**
+ * struct drm_tegra_get_syncpt_base - parameters for the get wait base IOCTL
+ */
 struct drm_tegra_get_syncpt_base {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying for which channel to obtain the
+	 * wait base.
+	 */
 	__u64 context;
+
+	/**
+	 * @syncpt:
+	 *
+	 * ID of the syncpoint for which to obtain the wait base.
+	 */
 	__u32 syncpt;
+
+	/**
+	 * @id:
+	 *
+	 * The ID of the wait base corresponding to the client syncpoint. Set
+	 * by the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 id;
 };
 
+/**
+ * struct drm_tegra_syncpt - syncpoint increment operation
+ */
 struct drm_tegra_syncpt {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to operate on.
+	 */
 	__u32 id;
+
+	/**
+	 * @incrs:
+	 *
+	 * Number of increments to perform for the syncpoint.
+	 */
 	__u32 incrs;
 };
 
+/**
+ * struct drm_tegra_cmdbuf - structure describing a command buffer
+ */
 struct drm_tegra_cmdbuf {
+	/**
+	 * @handle:
+	 *
+	 * Handle to a GEM object containing the command buffer.
+	 */
 	__u32 handle;
+
+	/**
+	 * @offset:
+	 *
+	 * Offset, in bytes, into the GEM object identified by @handle at
+	 * which the command buffer starts.
+	 */
 	__u32 offset;
+
+	/**
+	 * @words:
+	 *
+	 * Number of 32-bit words in this command buffer.
+	 */
 	__u32 words;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_reloc - GEM object relocation structure
+ */
 struct drm_tegra_reloc {
 	struct {
+		/**
+		 * @cmdbuf.handle:
+		 *
+		 * Handle to the GEM object containing the command buffer for
+		 * which to perform this GEM object relocation.
+		 */
 		__u32 handle;
+
+		/**
+		 * @cmdbuf.offset:
+		 *
+		 * Offset, in bytes, into the command buffer at which to
+		 * insert the relocated address.
+		 */
 		__u32 offset;
 	} cmdbuf;
 	struct {
+		/**
+		 * @target.handle:
+		 *
+		 * Handle to the GEM object to be relocated.
+		 */
 		__u32 handle;
+
+		/**
+		 * @target.offset:
+		 *
+		 * Offset, in bytes, into the target GEM object at which the
+		 * relocated data starts.
+		 */
 		__u32 offset;
 	} target;
+
+	/**
+	 * @shift:
+	 *
+	 * The number of bits by which to shift relocated addresses.
+	 */
 	__u32 shift;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_waitchk - wait check structure
+ */
 struct drm_tegra_waitchk {
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object containing a command stream on which to
+	 * perform the wait check.
+	 */
 	__u32 handle;
+
+	/**
+	 * @offset:
+	 *
+	 * Offset, in bytes, of the location in the command stream to perform
+	 * the wait check on.
+	 */
 	__u32 offset;
+
+	/**
+	 * @syncpt:
+	 *
+	 * ID of the syncpoint to wait check.
+	 */
 	__u32 syncpt;
+
+	/**
+	 * @thresh:
+	 *
+	 * Threshold value for which to check.
+	 */
 	__u32 thresh;
 };
 
+/**
+ * struct drm_tegra_submit - job submission structure
+ */
 struct drm_tegra_submit {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying the channel to use for the
+	 * execution of this job.
+	 */
 	__u64 context;
+
+	/**
+	 * @num_syncpts:
+	 *
+	 * The number of syncpoints operated on by this job. This defines the
+	 * length of the array pointed to by @syncpts.
+	 */
 	__u32 num_syncpts;
+
+	/**
+	 * @num_cmdbufs:
+	 *
+	 * The number of command buffers to execute as part of this job. This
+	 * defines the length of the array pointed to by @cmdbufs.
+	 */
 	__u32 num_cmdbufs;
+
+	/**
+	 * @num_relocs:
+	 *
+	 * The number of relocations to perform before executing this job.
+	 * This defines the length of the array pointed to by @relocs.
+	 */
 	__u32 num_relocs;
+
+	/**
+	 * @num_waitchks:
+	 *
+	 * The number of wait checks to perform as part of this job. This
+	 * defines the length of the array pointed to by @waitchks.
+	 */
 	__u32 num_waitchks;
+
+	/**
+	 * @waitchk_mask:
+	 *
+	 * Bitmask of valid wait checks.
+	 */
 	__u32 waitchk_mask;
+
+	/**
+	 * @timeout:
+	 *
+	 * Timeout, in milliseconds, before this job is cancelled.
+	 */
 	__u32 timeout;
+
+	/**
+	 * @syncpts:
+	 *
+	 * A pointer to an array of &struct drm_tegra_syncpt structures that
+	 * specify the syncpoint operations performed as part of this job.
+	 * The number of elements in the array must be equal to the value
+	 * given by @num_syncpts.
+	 */
 	__u64 syncpts;
+
+	/**
+	 * @cmdbufs:
+	 *
+	 * A pointer to an array of &struct drm_tegra_cmdbuf structures that
+	 * define the command buffers to execute as part of this job. The
+	 * number of elements in the array must be equal to the value given
+	 * by @num_syncpts.
+	 */
 	__u64 cmdbufs;
+
+	/**
+	 * @relocs:
+	 *
+	 * A pointer to an array of &struct drm_tegra_reloc structures that
+	 * specify the relocations that need to be performed before executing
+	 * this job. The number of elements in the array must be equal to the
+	 * value given by @num_relocs.
+	 */
 	__u64 relocs;
+
+	/**
+	 * @waitchks:
+	 *
+	 * A pointer to an array of &struct drm_tegra_waitchk structures that
+	 * specify the wait checks to be performed while executing this job.
+	 * The number of elements in the array must be equal to the value
+	 * given by @num_waitchks.
+	 */
 	__u64 waitchks;
-	__u32 fence;		/* Return value */
 
-	__u32 reserved[5];	/* future expansion */
+	/**
+	 * @fence:
+	 *
+	 * The threshold of the syncpoint associated with this job after it
+	 * has been completed. Set by the kernel upon successful completion of
+	 * the IOCTL. This can be used with the DRM_TEGRA_SYNCPT_WAIT IOCTL to
+	 * wait for this job to be finished.
+	 */
+	__u32 fence;
+
+	/**
+	 * @reserved:
+	 *
+	 * This field is reserved for future use. Must be 0.
+	 */
+	__u32 reserved[5];
 };
 
 #define DRM_TEGRA_GEM_TILING_MODE_PITCH 0
 #define DRM_TEGRA_GEM_TILING_MODE_TILED 1
 #define DRM_TEGRA_GEM_TILING_MODE_BLOCK 2
 
+/**
+ * struct drm_tegra_gem_set_tiling - parameters for the set tiling IOCTL
+ */
 struct drm_tegra_gem_set_tiling {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to set the tiling parameters.
+	 */
 	__u32 handle;
+
+	/**
+	 * @mode:
+	 *
+	 * The tiling mode to set. Must be one of:
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_PITCH
+	 *   pitch linear format
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_TILED
+	 *   16x16 tiling format
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_BLOCK
+	 *   16Bx2 tiling format
+	 */
 	__u32 mode;
+
+	/**
+	 * @value:
+	 *
+	 * The value to set for the tiling mode parameter.
+	 */
 	__u32 value;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_gem_get_tiling - parameters for the get tiling IOCTL
+ */
 struct drm_tegra_gem_get_tiling {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to query the tiling parameters.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @mode:
+	 *
+	 * The tiling mode currently associated with the GEM object. Set by
+	 * the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 mode;
+
+	/**
+	 * @value:
+	 *
+	 * The tiling mode parameter currently associated with the GEM object.
+	 * Set by the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 value;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
 #define DRM_TEGRA_GEM_BOTTOM_UP		(1 << 0)
 #define DRM_TEGRA_GEM_FLAGS		(DRM_TEGRA_GEM_BOTTOM_UP)
 
+/**
+ * struct drm_tegra_gem_set_flags - parameters for the set flags IOCTL
+ */
 struct drm_tegra_gem_set_flags {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to set the flags.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @flags:
+	 *
+	 * The flags to set for the GEM object.
+	 */
 	__u32 flags;
 };
 
+/**
+ * struct drm_tegra_gem_get_flags - parameters for the get flags IOCTL
+ */
 struct drm_tegra_gem_get_flags {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to query the flags.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @flags:
+	 *
+	 * The flags currently associated with the GEM object. Set by the
+	 * kernel upon successful completion of the IOCTL.
+	 */
 	__u32 flags;
 };
 
@@ -193,7 +665,7 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
 #define DRM_IOCTL_TEGRA_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_OPEN_CHANNEL, struct drm_tegra_open_channel)
-#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_open_channel)
+#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_close_channel)
 #define DRM_IOCTL_TEGRA_GET_SYNCPT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT, struct drm_tegra_get_syncpt)
 #define DRM_IOCTL_TEGRA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SUBMIT, struct drm_tegra_submit)
 #define DRM_IOCTL_TEGRA_GET_SYNCPT_BASE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT_BASE, struct drm_tegra_get_syncpt_base)
diff --git a/include/drm-uapi/v3d_drm.h b/include/drm-uapi/v3d_drm.h
new file mode 100644
index 000000000000..7b6627783608
--- /dev/null
+++ b/include/drm-uapi/v3d_drm.h
@@ -0,0 +1,194 @@
+/*
+ * Copyright © 2014-2018 Broadcom
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef _V3D_DRM_H_
+#define _V3D_DRM_H_
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define DRM_V3D_SUBMIT_CL                         0x00
+#define DRM_V3D_WAIT_BO                           0x01
+#define DRM_V3D_CREATE_BO                         0x02
+#define DRM_V3D_MMAP_BO                           0x03
+#define DRM_V3D_GET_PARAM                         0x04
+#define DRM_V3D_GET_BO_OFFSET                     0x05
+
+#define DRM_IOCTL_V3D_SUBMIT_CL           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
+#define DRM_IOCTL_V3D_WAIT_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
+#define DRM_IOCTL_V3D_CREATE_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_CREATE_BO, struct drm_v3d_create_bo)
+#define DRM_IOCTL_V3D_MMAP_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_MMAP_BO, struct drm_v3d_mmap_bo)
+#define DRM_IOCTL_V3D_GET_PARAM           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_PARAM, struct drm_v3d_get_param)
+#define DRM_IOCTL_V3D_GET_BO_OFFSET       DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_BO_OFFSET, struct drm_v3d_get_bo_offset)
+
+/**
+ * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
+ * engine.
+ *
+ * This asks the kernel to have the GPU execute an optional binner
+ * command list, and a render command list.
+ */
+struct drm_v3d_submit_cl {
+	/* Pointer to the binner command list.
+	 *
+	 * This is the first set of commands executed, which runs the
+	 * coordinate shader to determine where primitives land on the screen,
+	 * then writes out the state updates and draw calls necessary per tile
+	 * to the tile allocation BO.
+	 */
+	__u32 bcl_start;
+
+	 /** End address of the BCL (first byte after the BCL) */
+	__u32 bcl_end;
+
+	/* Offset of the render command list.
+	 *
+	 * This is the second set of commands executed, which will either
+	 * execute the tiles that have been set up by the BCL, or a fixed set
+	 * of tiles (in the case of RCL-only blits).
+	 */
+	__u32 rcl_start;
+
+	 /** End address of the RCL (first byte after the RCL) */
+	__u32 rcl_end;
+
+	/** An optional sync object to wait on before starting the BCL. */
+	__u32 in_sync_bcl;
+	/** An optional sync object to wait on before starting the RCL. */
+	__u32 in_sync_rcl;
+	/** An optional sync object to place the completion fence in. */
+	__u32 out_sync;
+
+	/* Offset of the tile alloc memory
+	 *
+	 * This is optional on V3D 3.3 (where the CL can set the value) but
+	 * required on V3D 4.1.
+	 */
+	__u32 qma;
+
+	/** Size of the tile alloc memory. */
+	__u32 qms;
+
+	/** Offset of the tile state data array. */
+	__u32 qts;
+
+	/* Pointer to a u32 array of the BOs that are referenced by the job.
+	 */
+	__u64 bo_handles;
+
+	/* Number of BO handles passed in (size is that times 4). */
+	__u32 bo_handle_count;
+
+	/* Pad, must be zero-filled. */
+	__u32 pad;
+};
+
+/**
+ * struct drm_v3d_wait_bo - ioctl argument for waiting for
+ * completion of the last DRM_V3D_SUBMIT_CL on a BO.
+ *
+ * This is useful for cases where multiple processes might be
+ * rendering to a BO and you want to wait for all rendering to be
+ * completed.
+ */
+struct drm_v3d_wait_bo {
+	__u32 handle;
+	__u32 pad;
+	__u64 timeout_ns;
+};
+
+/**
+ * struct drm_v3d_create_bo - ioctl argument for creating V3D BOs.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_v3d_create_bo {
+	__u32 size;
+	__u32 flags;
+	/** Returned GEM handle for the BO. */
+	__u32 handle;
+	/**
+	 * Returned offset for the BO in the V3D address space.  This offset
+	 * is private to the DRM fd and is valid for the lifetime of the GEM
+	 * handle.
+	 *
+	 * This offset value will always be nonzero, since various HW
+	 * units treat 0 specially.
+	 */
+	__u32 offset;
+};
+
+/**
+ * struct drm_v3d_mmap_bo - ioctl argument for mapping V3D BOs.
+ *
+ * This doesn't actually perform an mmap.  Instead, it returns the
+ * offset you need to use in an mmap on the DRM device node.  This
+ * means that tools like valgrind end up knowing about the mapped
+ * memory.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_v3d_mmap_bo {
+	/** Handle for the object being mapped. */
+	__u32 handle;
+	__u32 flags;
+	/** offset into the drm node to use for subsequent mmap call. */
+	__u64 offset;
+};
+
+enum drm_v3d_param {
+	DRM_V3D_PARAM_V3D_UIFCFG,
+	DRM_V3D_PARAM_V3D_HUB_IDENT1,
+	DRM_V3D_PARAM_V3D_HUB_IDENT2,
+	DRM_V3D_PARAM_V3D_HUB_IDENT3,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT0,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT1,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT2,
+};
+
+struct drm_v3d_get_param {
+	__u32 param;
+	__u32 pad;
+	__u64 value;
+};
+
+/**
+ * Returns the offset for the BO in the V3D address space for this DRM fd.
+ * This is the same value returned by drm_v3d_create_bo, if that was called
+ * from this DRM fd.
+ */
+struct drm_v3d_get_bo_offset {
+	__u32 handle;
+	__u32 offset;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* _V3D_DRM_H_ */
diff --git a/include/drm-uapi/vc4_drm.h b/include/drm-uapi/vc4_drm.h
index 4117117b4204..31f50de39acb 100644
--- a/include/drm-uapi/vc4_drm.h
+++ b/include/drm-uapi/vc4_drm.h
@@ -183,10 +183,17 @@ struct drm_vc4_submit_cl {
 	/* ID of the perfmon to attach to this job. 0 means no perfmon. */
 	__u32 perfmonid;
 
-	/* Unused field to align this struct on 64 bits. Must be set to 0.
-	 * If one ever needs to add an u32 field to this struct, this field
-	 * can be used.
+	/* Syncobj handle to wait on. If set, processing of this render job
+	 * will not start until the syncobj is signaled. 0 means ignore.
 	 */
+	__u32 in_sync;
+
+	/* Syncobj handle to export fence to. If set, the fence in the syncobj
+	 * will be replaced with a fence that signals upon completion of this
+	 * render job. 0 means ignore.
+	 */
+	__u32 out_sync;
+
 	__u32 pad2;
 };
 
diff --git a/include/drm-uapi/virtgpu_drm.h b/include/drm-uapi/virtgpu_drm.h
index 91a31ffed828..9a781f0611df 100644
--- a/include/drm-uapi/virtgpu_drm.h
+++ b/include/drm-uapi/virtgpu_drm.h
@@ -63,6 +63,7 @@ struct drm_virtgpu_execbuffer {
 };
 
 #define VIRTGPU_PARAM_3D_FEATURES 1 /* do we have 3D features in the hw */
+#define VIRTGPU_PARAM_CAPSET_QUERY_FIX 2 /* do we have the capset fix */
 
 struct drm_virtgpu_getparam {
 	__u64 param;
diff --git a/include/drm-uapi/vmwgfx_drm.h b/include/drm-uapi/vmwgfx_drm.h
index 0bc784f5e0db..399f58317cff 100644
--- a/include/drm-uapi/vmwgfx_drm.h
+++ b/include/drm-uapi/vmwgfx_drm.h
@@ -40,6 +40,7 @@ extern "C" {
 
 #define DRM_VMW_GET_PARAM            0
 #define DRM_VMW_ALLOC_DMABUF         1
+#define DRM_VMW_ALLOC_BO             1
 #define DRM_VMW_UNREF_DMABUF         2
 #define DRM_VMW_HANDLE_CLOSE         2
 #define DRM_VMW_CURSOR_BYPASS        3
@@ -68,6 +69,8 @@ extern "C" {
 #define DRM_VMW_GB_SURFACE_REF       24
 #define DRM_VMW_SYNCCPU              25
 #define DRM_VMW_CREATE_EXTENDED_CONTEXT 26
+#define DRM_VMW_GB_SURFACE_CREATE_EXT   27
+#define DRM_VMW_GB_SURFACE_REF_EXT      28
 
 /*************************************************************************/
 /**
@@ -79,6 +82,9 @@ extern "C" {
  *
  * DRM_VMW_PARAM_OVERLAY_IOCTL:
  * Does the driver support the overlay ioctl.
+ *
+ * DRM_VMW_PARAM_SM4_1
+ * SM4_1 support is enabled.
  */
 
 #define DRM_VMW_PARAM_NUM_STREAMS      0
@@ -94,6 +100,8 @@ extern "C" {
 #define DRM_VMW_PARAM_MAX_MOB_SIZE     10
 #define DRM_VMW_PARAM_SCREEN_TARGET    11
 #define DRM_VMW_PARAM_DX               12
+#define DRM_VMW_PARAM_HW_CAPS2         13
+#define DRM_VMW_PARAM_SM4_1            14
 
 /**
  * enum drm_vmw_handle_type - handle type for ref ioctls
@@ -356,9 +364,9 @@ struct drm_vmw_fence_rep {
 
 /*************************************************************************/
 /**
- * DRM_VMW_ALLOC_DMABUF
+ * DRM_VMW_ALLOC_BO
  *
- * Allocate a DMA buffer that is visible also to the host.
+ * Allocate a buffer object that is visible also to the host.
  * NOTE: The buffer is
  * identified by a handle and an offset, which are private to the guest, but
  * useable in the command stream. The guest kernel may translate these
@@ -366,27 +374,28 @@ struct drm_vmw_fence_rep {
  * be zero at all times, or it may disappear from the interface before it is
  * fixed.
  *
- * The DMA buffer may stay user-space mapped in the guest at all times,
+ * The buffer object may stay user-space mapped in the guest at all times,
  * and is thus suitable for sub-allocation.
  *
- * DMA buffers are mapped using the mmap() syscall on the drm device.
+ * Buffer objects are mapped using the mmap() syscall on the drm device.
  */
 
 /**
- * struct drm_vmw_alloc_dmabuf_req
+ * struct drm_vmw_alloc_bo_req
  *
  * @size: Required minimum size of the buffer.
  *
- * Input data to the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Input data to the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-struct drm_vmw_alloc_dmabuf_req {
+struct drm_vmw_alloc_bo_req {
 	__u32 size;
 	__u32 pad64;
 };
+#define drm_vmw_alloc_dmabuf_req drm_vmw_alloc_bo_req
 
 /**
- * struct drm_vmw_dmabuf_rep
+ * struct drm_vmw_bo_rep
  *
  * @map_handle: Offset to use in the mmap() call used to map the buffer.
  * @handle: Handle unique to this buffer. Used for unreferencing.
@@ -395,50 +404,32 @@ struct drm_vmw_alloc_dmabuf_req {
  * @cur_gmr_offset: Offset to use in the command stream when this buffer is
  * referenced. See note above.
  *
- * Output data from the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Output data from the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-struct drm_vmw_dmabuf_rep {
+struct drm_vmw_bo_rep {
 	__u64 map_handle;
 	__u32 handle;
 	__u32 cur_gmr_id;
 	__u32 cur_gmr_offset;
 	__u32 pad64;
 };
+#define drm_vmw_dmabuf_rep drm_vmw_bo_rep
 
 /**
- * union drm_vmw_dmabuf_arg
+ * union drm_vmw_alloc_bo_arg
  *
  * @req: Input data as described above.
  * @rep: Output data as described above.
  *
- * Argument to the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Argument to the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-union drm_vmw_alloc_dmabuf_arg {
-	struct drm_vmw_alloc_dmabuf_req req;
-	struct drm_vmw_dmabuf_rep rep;
-};
-
-/*************************************************************************/
-/**
- * DRM_VMW_UNREF_DMABUF - Free a DMA buffer.
- *
- */
-
-/**
- * struct drm_vmw_unref_dmabuf_arg
- *
- * @handle: Handle indicating what buffer to free. Obtained from the
- * DRM_VMW_ALLOC_DMABUF Ioctl.
- *
- * Argument to the DRM_VMW_UNREF_DMABUF Ioctl.
- */
-
-struct drm_vmw_unref_dmabuf_arg {
-	__u32 handle;
-	__u32 pad64;
+union drm_vmw_alloc_bo_arg {
+	struct drm_vmw_alloc_bo_req req;
+	struct drm_vmw_bo_rep rep;
 };
+#define drm_vmw_alloc_dmabuf_arg drm_vmw_alloc_bo_arg
 
 /*************************************************************************/
 /**
@@ -1103,9 +1094,8 @@ union drm_vmw_extended_context_arg {
  * DRM_VMW_HANDLE_CLOSE - Close a user-space handle and release its
  * underlying resource.
  *
- * Note that this ioctl is overlaid on the DRM_VMW_UNREF_DMABUF Ioctl.
- * The ioctl arguments therefore need to be identical in layout.
- *
+ * Note that this ioctl is overlaid on the deprecated DRM_VMW_UNREF_DMABUF
+ * Ioctl.
  */
 
 /**
@@ -1119,7 +1109,107 @@ struct drm_vmw_handle_close_arg {
 	__u32 handle;
 	__u32 pad64;
 };
+#define drm_vmw_unref_dmabuf_arg drm_vmw_handle_close_arg
+
+/*************************************************************************/
+/**
+ * DRM_VMW_GB_SURFACE_CREATE_EXT - Create a host guest-backed surface.
+ *
+ * Allocates a surface handle and queues a create surface command
+ * for the host on the first use of the surface. The surface ID can
+ * be used as the surface ID in commands referencing the surface.
+ *
+ * This new command extends DRM_VMW_GB_SURFACE_CREATE by adding version
+ * parameter and 64 bit svga flag.
+ */
+
+/**
+ * enum drm_vmw_surface_version
+ *
+ * @drm_vmw_surface_gb_v1: Corresponds to current gb surface format with
+ * svga3d surface flags split into 2, upper half and lower half.
+ */
+enum drm_vmw_surface_version {
+	drm_vmw_gb_surface_v1
+};
+
+/**
+ * struct drm_vmw_gb_surface_create_ext_req
+ *
+ * @base: Surface create parameters.
+ * @version: Version of surface create ioctl.
+ * @svga3d_flags_upper_32_bits: Upper 32 bits of svga3d flags.
+ * @multisample_pattern: Multisampling pattern when msaa is supported.
+ * @quality_level: Precision settings for each sample.
+ * @must_be_zero: Reserved for future usage.
+ *
+ * Input argument to the  DRM_VMW_GB_SURFACE_CREATE_EXT Ioctl.
+ * Part of output argument for the DRM_VMW_GB_SURFACE_REF_EXT Ioctl.
+ */
+struct drm_vmw_gb_surface_create_ext_req {
+	struct drm_vmw_gb_surface_create_req base;
+	enum drm_vmw_surface_version version;
+	uint32_t svga3d_flags_upper_32_bits;
+	SVGA3dMSPattern multisample_pattern;
+	SVGA3dMSQualityLevel quality_level;
+	uint64_t must_be_zero;
+};
+
+/**
+ * union drm_vmw_gb_surface_create_ext_arg
+ *
+ * @req: Input argument as described above.
+ * @rep: Output argument as described above.
+ *
+ * Argument to the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ */
+union drm_vmw_gb_surface_create_ext_arg {
+	struct drm_vmw_gb_surface_create_rep rep;
+	struct drm_vmw_gb_surface_create_ext_req req;
+};
+
+/*************************************************************************/
+/**
+ * DRM_VMW_GB_SURFACE_REF_EXT - Reference a host surface.
+ *
+ * Puts a reference on a host surface with a given handle, as previously
+ * returned by the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ * A reference will make sure the surface isn't destroyed while we hold
+ * it and will allow the calling client to use the surface handle in
+ * the command stream.
+ *
+ * On successful return, the Ioctl returns the surface information given
+ * to and returned from the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ */
 
+/**
+ * struct drm_vmw_gb_surface_ref_ext_rep
+ *
+ * @creq: The data used as input when the surface was created, as described
+ *        above at "struct drm_vmw_gb_surface_create_ext_req"
+ * @crep: Additional data output when the surface was created, as described
+ *        above at "struct drm_vmw_gb_surface_create_rep"
+ *
+ * Output Argument to the DRM_VMW_GB_SURFACE_REF_EXT ioctl.
+ */
+struct drm_vmw_gb_surface_ref_ext_rep {
+	struct drm_vmw_gb_surface_create_ext_req creq;
+	struct drm_vmw_gb_surface_create_rep crep;
+};
+
+/**
+ * union drm_vmw_gb_surface_reference_ext_arg
+ *
+ * @req: Input data as described above at "struct drm_vmw_surface_arg"
+ * @rep: Output data as described above at
+ *       "struct drm_vmw_gb_surface_ref_ext_rep"
+ *
+ * Argument to the DRM_VMW_GB_SURFACE_REF Ioctl.
+ */
+union drm_vmw_gb_surface_reference_ext_arg {
+	struct drm_vmw_gb_surface_ref_ext_rep rep;
+	struct drm_vmw_surface_arg req;
+};
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 02/17] trace.pl: Virtual engine support
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add virtual/queue timelines to both stdout and HTML output.

A new timeline is created for each queue/virtual engine to display
associated requests in queued and runnable states. Once requests are
submitted to a real engine for executing they show up on the physical
engine timeline.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 scripts/trace.pl | 230 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 200 insertions(+), 30 deletions(-)

diff --git a/scripts/trace.pl b/scripts/trace.pl
index 18f9f3b18396..72747b046202 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -27,10 +27,15 @@ use warnings;
 use 5.010;
 
 my $gid = 0;
-my (%db, %queue, %submit, %notify, %rings, %ctxdb, %ringmap, %reqwait, %ctxtimelines);
+my (%db, %vdb, %queue, %submit, %notify, %rings, %ctxdb, %ringmap, %reqwait, %ctxtimelines);
+my (%cids, %ctxmap);
+my $cid = 0;
+my %queues;
 my @freqs;
 
-my $max_items = 3000;
+use constant VENG => '255:0';
+
+my $max_requests = 1000;
 my $width_us = 32000;
 my $correct_durations = 0;
 my %ignore_ring;
@@ -180,21 +185,21 @@ sub arg_trace
 	return @_;
 }
 
-sub arg_max_items
+sub arg_max_requests
 {
 	my $val;
 
 	return unless scalar(@_);
 
-	if ($_[0] eq '--max-items' or $_[0] eq '-m') {
+	if ($_[0] eq '--max-requests' or $_[0] eq '-m') {
 		shift @_;
 		$val = shift @_;
-	} elsif ($_[0] =~ /--max-items=(\d+)/) {
+	} elsif ($_[0] =~ /--max-requests=(\d+)/) {
 		shift @_;
 		$val = $1;
 	}
 
-	$max_items = int($val) if defined $val;
+	$max_requests = int($val) if defined $val;
 
 	return @_;
 }
@@ -291,7 +296,7 @@ while (@args) {
 	@args = arg_avg_delay_stats(@args);
 	@args = arg_gpu_timeline(@args);
 	@args = arg_trace(@args);
-	@args = arg_max_items(@args);
+	@args = arg_max_requests(@args);
 	@args = arg_zoom_width(@args);
 	@args = arg_split_requests(@args);
 	@args = arg_ignore_ring(@args);
@@ -380,6 +385,7 @@ while (<>) {
 		my %rw;
 
 		next if exists $reqwait{$key};
+		die if $ring eq VENG and not exists $queues{$ctx};
 
 		$rw{'key'} = $key;
 		$rw{'ring'} = $ring;
@@ -388,9 +394,19 @@ while (<>) {
 		$rw{'start'} = $time;
 		$reqwait{$key} = \%rw;
 	} elsif ($tp_name eq 'i915:i915_request_wait_end:') {
-		next unless exists $reqwait{$key};
+		die if $ring eq VENG and not exists $queues{$ctx};
+
+		if (exists $reqwait{$key}) {
+			$reqwait{$key}->{'end'} = $time;
+		} else { # Virtual engine
+			my $vkey = db_key(VENG, $ctx, $seqno);
+
+			die unless exists $reqwait{$vkey};
 
-		$reqwait{$key}->{'end'} = $time;
+			# If the wait started on the virtual engine, attribute
+			# it to it completely.
+			$reqwait{$vkey}->{'end'} = $time;
+		}
 	} elsif ($tp_name eq 'i915:i915_request_add:') {
 		if (exists $queue{$key}) {
 			$ctxdb{$orig_ctx}++;
@@ -401,19 +417,52 @@ while (<>) {
 		}
 
 		$queue{$key} = $time;
+		if ($ring eq VENG and not exists $queues{$ctx}) {
+			$queues{$ctx} = 1 ;
+			$cids{$ctx} = $cid++;
+			$ctxmap{$cids{$ctx}} = $ctx;
+		}
 	} elsif ($tp_name eq 'i915:i915_request_submit:') {
 		die if exists $submit{$key};
 		die unless exists $queue{$key};
+		die if $ring eq VENG and not exists $queues{$ctx};
 
 		$submit{$key} = $time;
 	} elsif ($tp_name eq 'i915:i915_request_in:') {
+		my ($q, $s);
 		my %req;
 
 		# preemption
 		delete $db{$key} if exists $db{$key};
 
-		die unless exists $queue{$key};
-		die unless exists $submit{$key};
+		unless (exists $queue{$key}) {
+			# Virtual engine
+			my $vkey = db_key(VENG, $ctx, $seqno);
+			my %req;
+
+			die unless exists $queues{$ctx};
+			die unless exists $queue{$vkey};
+			die unless exists $submit{$vkey};
+
+			# Create separate request record on the queue timeline
+			$q = $queue{$vkey};
+			$s = $submit{$vkey};
+			$req{'queue'} = $q;
+			$req{'submit'} = $s;
+			$req{'start'} = $time;
+			$req{'end'} = $time;
+			$req{'ring'} = VENG;
+			$req{'seqno'} = $seqno;
+			$req{'ctx'} = $ctx;
+			$req{'name'} = $ctx . '/' . $seqno;
+			$req{'global'} = $tp{'global'};
+			$req{'port'} = $tp{'port'};
+
+			$vdb{$vkey} = \%req;
+		} else {
+			$q = $queue{$key};
+			$s = $submit{$key};
+		}
 
 		$req{'start'} = $time;
 		$req{'ring'} = $ring;
@@ -423,8 +472,9 @@ while (<>) {
 		$req{'name'} = $ctx . '/' . $seqno;
 		$req{'global'} = $tp{'global'};
 		$req{'port'} = $tp{'port'};
-		$req{'queue'} = $queue{$key};
-		$req{'submit'} = $submit{$key};
+		$req{'queue'} = $q;
+		$req{'submit'} = $s;
+		$req{'virtual'} = 1 if exists $queues{$ctx};
 		$rings{$ring} = $gid++ unless exists $rings{$ring};
 		$ringmap{$rings{$ring}} = $ring;
 		$db{$key} = \%req;
@@ -720,8 +770,10 @@ foreach my $key (@sorted_keys) {
 
 	$running{$ring} += $end - $start if $correct_durations or
 					    not exists $db{$key}->{'no-end'};
-	$runnable{$ring} += $db{$key}->{'execute-delay'};
-	$queued{$ring} += $start - $db{$key}->{'execute-delay'} - $db{$key}->{'queue'};
+	unless (exists $db{$key}->{'virtual'}) {
+		$runnable{$ring} += $db{$key}->{'execute-delay'};
+		$queued{$ring} += $start - $db{$key}->{'execute-delay'} - $db{$key}->{'queue'};
+	}
 
 	$batch_count{$ring}++;
 
@@ -840,6 +892,12 @@ foreach my $key (keys %reqwait) {
 	$reqw{$reqwait{$key}->{'ring'}} += $reqwait{$key}->{'end'} - $reqwait{$key}->{'start'};
 }
 
+# Add up all request waits per virtual engine
+my %vreqw;
+foreach my $key (keys %reqwait) {
+	$vreqw{$reqwait{$key}->{'ctx'}} += $reqwait{$key}->{'end'} - $reqwait{$key}->{'start'};
+}
+
 say sprintf('GPU: %.2f%% idle, %.2f%% busy',
 	     $flat_busy{'gpu-idle'}, $flat_busy{'gpu-busy'}) unless $html;
 
@@ -961,18 +1019,24 @@ ENDHTML
 sub html_stats
 {
 	my ($stats, $group, $id) = @_;
+	my $veng = exists $stats->{'virtual'} ? 1 : 0;
 	my $name;
 
-	$name = 'Ring' . $group;
+	$name = $veng ? 'Virtual' : 'Ring';
+	$name .= $group;
 	$name .= '<br><small><br>';
-	$name .= sprintf('%.2f', $stats->{'idle'}) . '% idle<br><br>';
-	$name .= sprintf('%.2f', $stats->{'busy'}) . '% busy<br>';
+	unless ($veng) {
+		$name .= sprintf('%.2f', $stats->{'idle'}) . '% idle<br><br>';
+		$name .= sprintf('%.2f', $stats->{'busy'}) . '% busy<br>';
+	}
 	$name .= sprintf('%.2f', $stats->{'runnable'}) . '% runnable<br>';
 	$name .= sprintf('%.2f', $stats->{'queued'}) . '% queued<br><br>';
 	$name .= sprintf('%.2f', $stats->{'wait'}) . '% wait<br><br>';
 	$name .= $stats->{'count'} . ' batches<br>';
-	$name .= sprintf('%.2f', $stats->{'avg'}) . 'us avg batch<br>';
-	$name .= sprintf('%.2f', $stats->{'total-avg'}) . 'us avg engine batch<br>';
+	unless ($veng) {
+		$name .= sprintf('%.2f', $stats->{'avg'}) . 'us avg batch<br>';
+		$name .= sprintf('%.2f', $stats->{'total-avg'}) . 'us avg engine batch<br>';
+	}
 	$name .= '</small>';
 
 	print "\t{id: $id, content: '$name'},\n";
@@ -981,17 +1045,24 @@ sub html_stats
 sub stdio_stats
 {
 	my ($stats, $group, $id) = @_;
+	my $veng = exists $stats->{'virtual'} ? 1 : 0;
 	my $str;
 
-	$str = 'Ring' . $group . ': ';
+	$str = $veng ? 'Virtual' : 'Ring';
+	$str .= $group . ': ';
 	$str .= $stats->{'count'} . ' batches, ';
-	$str .= sprintf('%.2f (%.2f) avg batch us, ', $stats->{'avg'}, $stats->{'total-avg'});
-	$str .= sprintf('%.2f', $stats->{'idle'}) . '% idle, ';
-	$str .= sprintf('%.2f', $stats->{'busy'}) . '% busy, ';
+	unless ($veng) {
+		$str .= sprintf('%.2f (%.2f) avg batch us, ',
+				$stats->{'avg'}, $stats->{'total-avg'});
+		$str .= sprintf('%.2f', $stats->{'idle'}) . '% idle, ';
+		$str .= sprintf('%.2f', $stats->{'busy'}) . '% busy, ';
+	}
+
 	$str .= sprintf('%.2f', $stats->{'runnable'}) . '% runnable, ';
 	$str .= sprintf('%.2f', $stats->{'queued'}) . '% queued, ';
 	$str .= sprintf('%.2f', $stats->{'wait'}) . '% wait';
-	if ($avg_delay_stats) {
+
+	if ($avg_delay_stats and not $veng) {
 		$str .= ', submit/execute/save-avg=(';
 		$str .= sprintf('%.2f/%.2f/%.2f)', $stats->{'submit'}, $stats->{'execute'}, $stats->{'save'});
 	}
@@ -1013,8 +1084,16 @@ foreach my $group (sort keys %rings) {
 
 	$stats{'idle'} = (1.0 - $flat_busy{$ring} / $elapsed) * 100.0;
 	$stats{'busy'} = $running{$ring} / $elapsed * 100.0;
-	$stats{'runnable'} = $runnable{$ring} / $elapsed * 100.0;
-	$stats{'queued'} = $queued{$ring} / $elapsed * 100.0;
+	if (exists $runnable{$ring}) {
+		$stats{'runnable'} = $runnable{$ring} / $elapsed * 100.0;
+	} else {
+		$stats{'runnable'} = 0;
+	}
+	if (exists $queued{$ring}) {
+		$stats{'queued'} = $queued{$ring} / $elapsed * 100.0;
+	} else {
+		$stats{'queued'} = 0;
+	}
 	$reqw{$ring} = 0 unless exists $reqw{$ring};
 	$stats{'wait'} = $reqw{$ring} / $elapsed * 100.0;
 	$stats{'count'} = $batch_count{$ring};
@@ -1031,6 +1110,59 @@ foreach my $group (sort keys %rings) {
 	}
 }
 
+sub sortVQueue {
+	my $as = $vdb{$a}->{'queue'};
+	my $bs = $vdb{$b}->{'queue'};
+	my $val;
+
+	$val = $as <=> $bs;
+	$val = $a cmp $b if $val == 0;
+
+	return $val;
+}
+
+my @sorted_vkeys = sort sortVQueue keys %vdb;
+my (%vqueued, %vrunnable);
+
+foreach my $key (@sorted_vkeys) {
+	my $ctx = $vdb{$key}->{'ctx'};
+
+	$vdb{$key}->{'submit-delay'} = $vdb{$key}->{'submit'} - $vdb{$key}->{'queue'};
+	$vdb{$key}->{'execute-delay'} = $vdb{$key}->{'start'} - $vdb{$key}->{'submit'};
+
+	$vqueued{$ctx} += $vdb{$key}->{'submit-delay'};
+	$vrunnable{$ctx} += $vdb{$key}->{'execute-delay'};
+}
+
+my $veng_id = $engine_start_id + scalar(keys %rings);
+
+foreach my $cid (sort keys %ctxmap) {
+	my $ctx = $ctxmap{$cid};
+	my $elapsed = $last_ts - $first_ts;
+	my %stats;
+
+	$stats{'virtual'} = 1;
+	if (exists $vrunnable{$ctx}) {
+		$stats{'runnable'} = $vrunnable{$ctx} / $elapsed * 100.0;
+	} else {
+		$stats{'runnable'} = 0;
+	}
+	if (exists $vqueued{$ctx}) {
+		$stats{'queued'} = $vqueued{$ctx} / $elapsed * 100.0;
+	} else {
+		$stats{'queued'} = 0;
+	}
+	$vreqw{$ctx} = 0 unless exists $vreqw{$ctx};
+	$stats{'wait'} = $vreqw{$ctx} / $elapsed * 100.0;
+	$stats{'count'} = scalar(grep {$ctx == $vdb{$_}->{'ctx'}} keys %vdb);
+
+	if ($html) {
+		html_stats(\%stats, $cid, $veng_id++);
+	} else {
+		stdio_stats(\%stats, $cid, $veng_id++);
+	}
+}
+
 exit 0 unless $html;
 
 print <<ENDHTML;
@@ -1134,6 +1266,7 @@ sub box_style
 }
 
 my $i = 0;
+my $req = 0;
 foreach my $key (sort sortQueue keys %db) {
 	my ($name, $ctx, $seqno) = ($db{$key}->{'name'}, $db{$key}->{'ctx'}, $db{$key}->{'seqno'});
 	my ($queue, $start, $notify, $end) = ($db{$key}->{'queue'}, $db{$key}->{'start'}, $db{$key}->{'notify'}, $db{$key}->{'end'});
@@ -1147,7 +1280,7 @@ foreach my $key (sort sortQueue keys %db) {
 	my $skey;
 
 	# submit to execute
-	unless (exists $skip_box{'queue'}) {
+	unless (exists $skip_box{'queue'} or exists $db{$key}->{'virtual'}) {
 		$skey = 2 * $max_seqno * $ctx + 2 * $seqno;
 		$style = box_style($ctx, 'queue');
 		$content = "$name<br>$db{$key}->{'submit-delay'}us <small>($db{$key}->{'execute-delay'}us)</small>";
@@ -1158,7 +1291,7 @@ foreach my $key (sort sortQueue keys %db) {
 
 	# execute to start
 	$engine_start = $db{$key}->{'start'} unless defined $engine_start;
-	unless (exists $skip_box{'ready'}) {
+	unless (exists $skip_box{'ready'} or exists $db{$key}->{'virtual'}) {
 		$skey = 2 * $max_seqno * $ctx + 2 * $seqno + 1;
 		$style = box_style($ctx, 'ready');
 		$content = "<small>$name<br>$db{$key}->{'execute-delay'}us</small>";
@@ -1199,7 +1332,7 @@ foreach my $key (sort sortQueue keys %db) {
 
 	$last_ts = $end;
 
-	last if $i > $max_items;
+	last if ++$req > $max_requests;
 }
 
 push @freqs, [$prev_freq_ts, $last_ts, $prev_freq] if $prev_freq;
@@ -1232,6 +1365,43 @@ if ($gpu_timeline) {
 	}
 }
 
+$req = 0;
+$veng_id = $engine_start_id + scalar(keys %rings);
+foreach my $key (@sorted_vkeys) {
+	my ($name, $ctx, $seqno) = ($vdb{$key}->{'name'}, $vdb{$key}->{'ctx'}, $vdb{$key}->{'seqno'});
+	my $queue = $vdb{$key}->{'queue'};
+	my $submit = $vdb{$key}->{'submit'};
+	my $engine_start = $db{$key}->{'engine-start'};
+	my ($content, $style, $startend, $skey);
+	my $group = $veng_id + $cids{$ctx};
+	my $subgroup = $ctx - $min_ctx;
+	my $type = ' type: \'range\',';
+	my $duration;
+
+	# submit to execute
+	unless (exists $skip_box{'queue'}) {
+		$skey = 2 * $max_seqno * $ctx + 2 * $seqno;
+		$style = box_style($ctx, 'queue');
+		$content = "$name<br>$vdb{$key}->{'submit-delay'}us <small>($vdb{$key}->{'execute-delay'}us)</small>";
+		$startend = 'start: ' . $queue . ', end: ' . $submit;
+		print "\t{id: $i, key: $skey, $type group: $group, subgroup: $subgroup, subgroupOrder: $subgroup, content: '$content', $startend, style: \'$style\'},\n";
+		$i++;
+	}
+
+	# execute to start
+	$engine_start = $vdb{$key}->{'start'} unless defined $engine_start;
+	unless (exists $skip_box{'ready'}) {
+		$skey = 2 * $max_seqno * $ctx + 2 * $seqno + 1;
+		$style = box_style($ctx, 'ready');
+		$content = "<small>$name<br>$vdb{$key}->{'execute-delay'}us</small>";
+		$startend = 'start: ' . $submit . ', end: ' . $engine_start;
+		print "\t{id: $i, key: $skey, $type group: $group, subgroup: $subgroup, subgroupOrder: $subgroup, content: '$content', $startend, style: \'$style\'},\n";
+		$i++;
+	}
+
+	last if ++$req > $max_requests;
+}
+
 my $end_ts = $first_ts + $width_us;
 $first_ts = $first_ts;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 02/17] trace.pl: Virtual engine support
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add virtual/queue timelines to both stdout and HTML output.

A new timeline is created for each queue/virtual engine to display
associated requests in queued and runnable states. Once requests are
submitted to a real engine for executing they show up on the physical
engine timeline.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 scripts/trace.pl | 230 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 200 insertions(+), 30 deletions(-)

diff --git a/scripts/trace.pl b/scripts/trace.pl
index 18f9f3b18396..72747b046202 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -27,10 +27,15 @@ use warnings;
 use 5.010;
 
 my $gid = 0;
-my (%db, %queue, %submit, %notify, %rings, %ctxdb, %ringmap, %reqwait, %ctxtimelines);
+my (%db, %vdb, %queue, %submit, %notify, %rings, %ctxdb, %ringmap, %reqwait, %ctxtimelines);
+my (%cids, %ctxmap);
+my $cid = 0;
+my %queues;
 my @freqs;
 
-my $max_items = 3000;
+use constant VENG => '255:0';
+
+my $max_requests = 1000;
 my $width_us = 32000;
 my $correct_durations = 0;
 my %ignore_ring;
@@ -180,21 +185,21 @@ sub arg_trace
 	return @_;
 }
 
-sub arg_max_items
+sub arg_max_requests
 {
 	my $val;
 
 	return unless scalar(@_);
 
-	if ($_[0] eq '--max-items' or $_[0] eq '-m') {
+	if ($_[0] eq '--max-requests' or $_[0] eq '-m') {
 		shift @_;
 		$val = shift @_;
-	} elsif ($_[0] =~ /--max-items=(\d+)/) {
+	} elsif ($_[0] =~ /--max-requests=(\d+)/) {
 		shift @_;
 		$val = $1;
 	}
 
-	$max_items = int($val) if defined $val;
+	$max_requests = int($val) if defined $val;
 
 	return @_;
 }
@@ -291,7 +296,7 @@ while (@args) {
 	@args = arg_avg_delay_stats(@args);
 	@args = arg_gpu_timeline(@args);
 	@args = arg_trace(@args);
-	@args = arg_max_items(@args);
+	@args = arg_max_requests(@args);
 	@args = arg_zoom_width(@args);
 	@args = arg_split_requests(@args);
 	@args = arg_ignore_ring(@args);
@@ -380,6 +385,7 @@ while (<>) {
 		my %rw;
 
 		next if exists $reqwait{$key};
+		die if $ring eq VENG and not exists $queues{$ctx};
 
 		$rw{'key'} = $key;
 		$rw{'ring'} = $ring;
@@ -388,9 +394,19 @@ while (<>) {
 		$rw{'start'} = $time;
 		$reqwait{$key} = \%rw;
 	} elsif ($tp_name eq 'i915:i915_request_wait_end:') {
-		next unless exists $reqwait{$key};
+		die if $ring eq VENG and not exists $queues{$ctx};
+
+		if (exists $reqwait{$key}) {
+			$reqwait{$key}->{'end'} = $time;
+		} else { # Virtual engine
+			my $vkey = db_key(VENG, $ctx, $seqno);
+
+			die unless exists $reqwait{$vkey};
 
-		$reqwait{$key}->{'end'} = $time;
+			# If the wait started on the virtual engine, attribute
+			# it to it completely.
+			$reqwait{$vkey}->{'end'} = $time;
+		}
 	} elsif ($tp_name eq 'i915:i915_request_add:') {
 		if (exists $queue{$key}) {
 			$ctxdb{$orig_ctx}++;
@@ -401,19 +417,52 @@ while (<>) {
 		}
 
 		$queue{$key} = $time;
+		if ($ring eq VENG and not exists $queues{$ctx}) {
+			$queues{$ctx} = 1 ;
+			$cids{$ctx} = $cid++;
+			$ctxmap{$cids{$ctx}} = $ctx;
+		}
 	} elsif ($tp_name eq 'i915:i915_request_submit:') {
 		die if exists $submit{$key};
 		die unless exists $queue{$key};
+		die if $ring eq VENG and not exists $queues{$ctx};
 
 		$submit{$key} = $time;
 	} elsif ($tp_name eq 'i915:i915_request_in:') {
+		my ($q, $s);
 		my %req;
 
 		# preemption
 		delete $db{$key} if exists $db{$key};
 
-		die unless exists $queue{$key};
-		die unless exists $submit{$key};
+		unless (exists $queue{$key}) {
+			# Virtual engine
+			my $vkey = db_key(VENG, $ctx, $seqno);
+			my %req;
+
+			die unless exists $queues{$ctx};
+			die unless exists $queue{$vkey};
+			die unless exists $submit{$vkey};
+
+			# Create separate request record on the queue timeline
+			$q = $queue{$vkey};
+			$s = $submit{$vkey};
+			$req{'queue'} = $q;
+			$req{'submit'} = $s;
+			$req{'start'} = $time;
+			$req{'end'} = $time;
+			$req{'ring'} = VENG;
+			$req{'seqno'} = $seqno;
+			$req{'ctx'} = $ctx;
+			$req{'name'} = $ctx . '/' . $seqno;
+			$req{'global'} = $tp{'global'};
+			$req{'port'} = $tp{'port'};
+
+			$vdb{$vkey} = \%req;
+		} else {
+			$q = $queue{$key};
+			$s = $submit{$key};
+		}
 
 		$req{'start'} = $time;
 		$req{'ring'} = $ring;
@@ -423,8 +472,9 @@ while (<>) {
 		$req{'name'} = $ctx . '/' . $seqno;
 		$req{'global'} = $tp{'global'};
 		$req{'port'} = $tp{'port'};
-		$req{'queue'} = $queue{$key};
-		$req{'submit'} = $submit{$key};
+		$req{'queue'} = $q;
+		$req{'submit'} = $s;
+		$req{'virtual'} = 1 if exists $queues{$ctx};
 		$rings{$ring} = $gid++ unless exists $rings{$ring};
 		$ringmap{$rings{$ring}} = $ring;
 		$db{$key} = \%req;
@@ -720,8 +770,10 @@ foreach my $key (@sorted_keys) {
 
 	$running{$ring} += $end - $start if $correct_durations or
 					    not exists $db{$key}->{'no-end'};
-	$runnable{$ring} += $db{$key}->{'execute-delay'};
-	$queued{$ring} += $start - $db{$key}->{'execute-delay'} - $db{$key}->{'queue'};
+	unless (exists $db{$key}->{'virtual'}) {
+		$runnable{$ring} += $db{$key}->{'execute-delay'};
+		$queued{$ring} += $start - $db{$key}->{'execute-delay'} - $db{$key}->{'queue'};
+	}
 
 	$batch_count{$ring}++;
 
@@ -840,6 +892,12 @@ foreach my $key (keys %reqwait) {
 	$reqw{$reqwait{$key}->{'ring'}} += $reqwait{$key}->{'end'} - $reqwait{$key}->{'start'};
 }
 
+# Add up all request waits per virtual engine
+my %vreqw;
+foreach my $key (keys %reqwait) {
+	$vreqw{$reqwait{$key}->{'ctx'}} += $reqwait{$key}->{'end'} - $reqwait{$key}->{'start'};
+}
+
 say sprintf('GPU: %.2f%% idle, %.2f%% busy',
 	     $flat_busy{'gpu-idle'}, $flat_busy{'gpu-busy'}) unless $html;
 
@@ -961,18 +1019,24 @@ ENDHTML
 sub html_stats
 {
 	my ($stats, $group, $id) = @_;
+	my $veng = exists $stats->{'virtual'} ? 1 : 0;
 	my $name;
 
-	$name = 'Ring' . $group;
+	$name = $veng ? 'Virtual' : 'Ring';
+	$name .= $group;
 	$name .= '<br><small><br>';
-	$name .= sprintf('%.2f', $stats->{'idle'}) . '% idle<br><br>';
-	$name .= sprintf('%.2f', $stats->{'busy'}) . '% busy<br>';
+	unless ($veng) {
+		$name .= sprintf('%.2f', $stats->{'idle'}) . '% idle<br><br>';
+		$name .= sprintf('%.2f', $stats->{'busy'}) . '% busy<br>';
+	}
 	$name .= sprintf('%.2f', $stats->{'runnable'}) . '% runnable<br>';
 	$name .= sprintf('%.2f', $stats->{'queued'}) . '% queued<br><br>';
 	$name .= sprintf('%.2f', $stats->{'wait'}) . '% wait<br><br>';
 	$name .= $stats->{'count'} . ' batches<br>';
-	$name .= sprintf('%.2f', $stats->{'avg'}) . 'us avg batch<br>';
-	$name .= sprintf('%.2f', $stats->{'total-avg'}) . 'us avg engine batch<br>';
+	unless ($veng) {
+		$name .= sprintf('%.2f', $stats->{'avg'}) . 'us avg batch<br>';
+		$name .= sprintf('%.2f', $stats->{'total-avg'}) . 'us avg engine batch<br>';
+	}
 	$name .= '</small>';
 
 	print "\t{id: $id, content: '$name'},\n";
@@ -981,17 +1045,24 @@ sub html_stats
 sub stdio_stats
 {
 	my ($stats, $group, $id) = @_;
+	my $veng = exists $stats->{'virtual'} ? 1 : 0;
 	my $str;
 
-	$str = 'Ring' . $group . ': ';
+	$str = $veng ? 'Virtual' : 'Ring';
+	$str .= $group . ': ';
 	$str .= $stats->{'count'} . ' batches, ';
-	$str .= sprintf('%.2f (%.2f) avg batch us, ', $stats->{'avg'}, $stats->{'total-avg'});
-	$str .= sprintf('%.2f', $stats->{'idle'}) . '% idle, ';
-	$str .= sprintf('%.2f', $stats->{'busy'}) . '% busy, ';
+	unless ($veng) {
+		$str .= sprintf('%.2f (%.2f) avg batch us, ',
+				$stats->{'avg'}, $stats->{'total-avg'});
+		$str .= sprintf('%.2f', $stats->{'idle'}) . '% idle, ';
+		$str .= sprintf('%.2f', $stats->{'busy'}) . '% busy, ';
+	}
+
 	$str .= sprintf('%.2f', $stats->{'runnable'}) . '% runnable, ';
 	$str .= sprintf('%.2f', $stats->{'queued'}) . '% queued, ';
 	$str .= sprintf('%.2f', $stats->{'wait'}) . '% wait';
-	if ($avg_delay_stats) {
+
+	if ($avg_delay_stats and not $veng) {
 		$str .= ', submit/execute/save-avg=(';
 		$str .= sprintf('%.2f/%.2f/%.2f)', $stats->{'submit'}, $stats->{'execute'}, $stats->{'save'});
 	}
@@ -1013,8 +1084,16 @@ foreach my $group (sort keys %rings) {
 
 	$stats{'idle'} = (1.0 - $flat_busy{$ring} / $elapsed) * 100.0;
 	$stats{'busy'} = $running{$ring} / $elapsed * 100.0;
-	$stats{'runnable'} = $runnable{$ring} / $elapsed * 100.0;
-	$stats{'queued'} = $queued{$ring} / $elapsed * 100.0;
+	if (exists $runnable{$ring}) {
+		$stats{'runnable'} = $runnable{$ring} / $elapsed * 100.0;
+	} else {
+		$stats{'runnable'} = 0;
+	}
+	if (exists $queued{$ring}) {
+		$stats{'queued'} = $queued{$ring} / $elapsed * 100.0;
+	} else {
+		$stats{'queued'} = 0;
+	}
 	$reqw{$ring} = 0 unless exists $reqw{$ring};
 	$stats{'wait'} = $reqw{$ring} / $elapsed * 100.0;
 	$stats{'count'} = $batch_count{$ring};
@@ -1031,6 +1110,59 @@ foreach my $group (sort keys %rings) {
 	}
 }
 
+sub sortVQueue {
+	my $as = $vdb{$a}->{'queue'};
+	my $bs = $vdb{$b}->{'queue'};
+	my $val;
+
+	$val = $as <=> $bs;
+	$val = $a cmp $b if $val == 0;
+
+	return $val;
+}
+
+my @sorted_vkeys = sort sortVQueue keys %vdb;
+my (%vqueued, %vrunnable);
+
+foreach my $key (@sorted_vkeys) {
+	my $ctx = $vdb{$key}->{'ctx'};
+
+	$vdb{$key}->{'submit-delay'} = $vdb{$key}->{'submit'} - $vdb{$key}->{'queue'};
+	$vdb{$key}->{'execute-delay'} = $vdb{$key}->{'start'} - $vdb{$key}->{'submit'};
+
+	$vqueued{$ctx} += $vdb{$key}->{'submit-delay'};
+	$vrunnable{$ctx} += $vdb{$key}->{'execute-delay'};
+}
+
+my $veng_id = $engine_start_id + scalar(keys %rings);
+
+foreach my $cid (sort keys %ctxmap) {
+	my $ctx = $ctxmap{$cid};
+	my $elapsed = $last_ts - $first_ts;
+	my %stats;
+
+	$stats{'virtual'} = 1;
+	if (exists $vrunnable{$ctx}) {
+		$stats{'runnable'} = $vrunnable{$ctx} / $elapsed * 100.0;
+	} else {
+		$stats{'runnable'} = 0;
+	}
+	if (exists $vqueued{$ctx}) {
+		$stats{'queued'} = $vqueued{$ctx} / $elapsed * 100.0;
+	} else {
+		$stats{'queued'} = 0;
+	}
+	$vreqw{$ctx} = 0 unless exists $vreqw{$ctx};
+	$stats{'wait'} = $vreqw{$ctx} / $elapsed * 100.0;
+	$stats{'count'} = scalar(grep {$ctx == $vdb{$_}->{'ctx'}} keys %vdb);
+
+	if ($html) {
+		html_stats(\%stats, $cid, $veng_id++);
+	} else {
+		stdio_stats(\%stats, $cid, $veng_id++);
+	}
+}
+
 exit 0 unless $html;
 
 print <<ENDHTML;
@@ -1134,6 +1266,7 @@ sub box_style
 }
 
 my $i = 0;
+my $req = 0;
 foreach my $key (sort sortQueue keys %db) {
 	my ($name, $ctx, $seqno) = ($db{$key}->{'name'}, $db{$key}->{'ctx'}, $db{$key}->{'seqno'});
 	my ($queue, $start, $notify, $end) = ($db{$key}->{'queue'}, $db{$key}->{'start'}, $db{$key}->{'notify'}, $db{$key}->{'end'});
@@ -1147,7 +1280,7 @@ foreach my $key (sort sortQueue keys %db) {
 	my $skey;
 
 	# submit to execute
-	unless (exists $skip_box{'queue'}) {
+	unless (exists $skip_box{'queue'} or exists $db{$key}->{'virtual'}) {
 		$skey = 2 * $max_seqno * $ctx + 2 * $seqno;
 		$style = box_style($ctx, 'queue');
 		$content = "$name<br>$db{$key}->{'submit-delay'}us <small>($db{$key}->{'execute-delay'}us)</small>";
@@ -1158,7 +1291,7 @@ foreach my $key (sort sortQueue keys %db) {
 
 	# execute to start
 	$engine_start = $db{$key}->{'start'} unless defined $engine_start;
-	unless (exists $skip_box{'ready'}) {
+	unless (exists $skip_box{'ready'} or exists $db{$key}->{'virtual'}) {
 		$skey = 2 * $max_seqno * $ctx + 2 * $seqno + 1;
 		$style = box_style($ctx, 'ready');
 		$content = "<small>$name<br>$db{$key}->{'execute-delay'}us</small>";
@@ -1199,7 +1332,7 @@ foreach my $key (sort sortQueue keys %db) {
 
 	$last_ts = $end;
 
-	last if $i > $max_items;
+	last if ++$req > $max_requests;
 }
 
 push @freqs, [$prev_freq_ts, $last_ts, $prev_freq] if $prev_freq;
@@ -1232,6 +1365,43 @@ if ($gpu_timeline) {
 	}
 }
 
+$req = 0;
+$veng_id = $engine_start_id + scalar(keys %rings);
+foreach my $key (@sorted_vkeys) {
+	my ($name, $ctx, $seqno) = ($vdb{$key}->{'name'}, $vdb{$key}->{'ctx'}, $vdb{$key}->{'seqno'});
+	my $queue = $vdb{$key}->{'queue'};
+	my $submit = $vdb{$key}->{'submit'};
+	my $engine_start = $db{$key}->{'engine-start'};
+	my ($content, $style, $startend, $skey);
+	my $group = $veng_id + $cids{$ctx};
+	my $subgroup = $ctx - $min_ctx;
+	my $type = ' type: \'range\',';
+	my $duration;
+
+	# submit to execute
+	unless (exists $skip_box{'queue'}) {
+		$skey = 2 * $max_seqno * $ctx + 2 * $seqno;
+		$style = box_style($ctx, 'queue');
+		$content = "$name<br>$vdb{$key}->{'submit-delay'}us <small>($vdb{$key}->{'execute-delay'}us)</small>";
+		$startend = 'start: ' . $queue . ', end: ' . $submit;
+		print "\t{id: $i, key: $skey, $type group: $group, subgroup: $subgroup, subgroupOrder: $subgroup, content: '$content', $startend, style: \'$style\'},\n";
+		$i++;
+	}
+
+	# execute to start
+	$engine_start = $vdb{$key}->{'start'} unless defined $engine_start;
+	unless (exists $skip_box{'ready'}) {
+		$skey = 2 * $max_seqno * $ctx + 2 * $seqno + 1;
+		$style = box_style($ctx, 'ready');
+		$content = "<small>$name<br>$vdb{$key}->{'execute-delay'}us</small>";
+		$startend = 'start: ' . $submit . ', end: ' . $engine_start;
+		print "\t{id: $i, key: $skey, $type group: $group, subgroup: $subgroup, subgroupOrder: $subgroup, content: '$content', $startend, style: \'$style\'},\n";
+		$i++;
+	}
+
+	last if ++$req > $max_requests;
+}
+
 my $end_ts = $first_ts + $width_us;
 $first_ts = $first_ts;
 
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 03/17] trace.pl: Virtual engine preemption support
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Use the 'completed?' tracepoint field to detect more robustly when a
request has been preempted and remove it from the engine database if so.

Otherwise the script can hit a scenario where the same global seqno will
be mentioned multiple times (on an engine seqno) which aborts processing.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 scripts/trace.pl | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/scripts/trace.pl b/scripts/trace.pl
index 72747b046202..a55e4f39539a 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -481,12 +481,16 @@ while (<>) {
 	} elsif ($tp_name eq 'i915:i915_request_out:') {
 		my $gkey = global_key($ring, $tp{'global'});
 
-		die unless exists $db{$key};
-		die unless exists $db{$key}->{'start'};
-		die if exists $db{$key}->{'end'};
+		if ($tp{'completed?'}) {
+			die unless exists $db{$key};
+			die unless exists $db{$key}->{'start'};
+			die if exists $db{$key}->{'end'};
 
-		$db{$key}->{'end'} = $time;
-		$db{$key}->{'notify'} = $notify{$gkey} if exists $notify{$gkey};
+			$db{$key}->{'end'} = $time;
+			$db{$key}->{'notify'} = $notify{$gkey} if exists $notify{$gkey};
+		} else {
+			delete $db{$key};
+		}
 	} elsif ($tp_name eq 'i915:intel_engine_notify:') {
 		my $gkey = global_key($ring, $seqno);
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 03/17] trace.pl: Virtual engine preemption support
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Use the 'completed?' tracepoint field to detect more robustly when a
request has been preempted and remove it from the engine database if so.

Otherwise the script can hit a scenario where the same global seqno will
be mentioned multiple times (on an engine seqno) which aborts processing.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 scripts/trace.pl | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/scripts/trace.pl b/scripts/trace.pl
index 72747b046202..a55e4f39539a 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -481,12 +481,16 @@ while (<>) {
 	} elsif ($tp_name eq 'i915:i915_request_out:') {
 		my $gkey = global_key($ring, $tp{'global'});
 
-		die unless exists $db{$key};
-		die unless exists $db{$key}->{'start'};
-		die if exists $db{$key}->{'end'};
+		if ($tp{'completed?'}) {
+			die unless exists $db{$key};
+			die unless exists $db{$key}->{'start'};
+			die if exists $db{$key}->{'end'};
 
-		$db{$key}->{'end'} = $time;
-		$db{$key}->{'notify'} = $notify{$gkey} if exists $notify{$gkey};
+			$db{$key}->{'end'} = $time;
+			$db{$key}->{'notify'} = $notify{$gkey} if exists $notify{$gkey};
+		} else {
+			delete $db{$key};
+		}
 	} elsif ($tp_name eq 'i915:intel_engine_notify:') {
 		my $gkey = global_key($ring, $seqno);
 
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 04/17] wsim/media-bench: i915 balancing
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Support i915 virtual engine from gem_wsim (-b i915) and media-bench.pl

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 289 ++++++++++++++++++++++++++++++++++-------
 scripts/media-bench.pl |   9 +-
 2 files changed, 251 insertions(+), 47 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index e0709487897b..e1c73855150b 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -59,6 +59,20 @@
 #define LOCAL_I915_EXEC_FENCE_IN              (1<<16)
 #define LOCAL_I915_EXEC_FENCE_OUT             (1<<17)
 
+struct local_drm_i915_gem_context_create_v2 {
+	/*  output: id of new context*/
+	__u32 ctx_id;
+	__u32 flags;
+#define LOCAL_I915_GEM_CONTEXT_SHARE_GTT	0x1
+#define LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE	0x2
+	__u32 share_ctx;
+	__u32 pad;
+};
+
+#define LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct local_drm_i915_gem_context_create_v2)
+
+#define LOCAL_I915_CONTEXT_PARAM_ENGINES	0x7
+
 enum intel_engine_id {
 	RCS,
 	BCS,
@@ -143,6 +157,14 @@ struct w_step
 
 DECLARE_EWMA(uint64_t, rt, 4, 2)
 
+struct ctx {
+	uint32_t id;
+	int priority;
+	bool targets_instance;
+	bool wants_balance;
+	unsigned int static_vcs;
+};
+
 struct workload
 {
 	unsigned int id;
@@ -164,11 +186,7 @@ struct workload
 	struct timespec repeat_start;
 
 	unsigned int nr_ctxs;
-	struct {
-		uint32_t id;
-		int priority;
-		unsigned int static_vcs;
-	} *ctx_list;
+	struct ctx *ctx_list;
 
 	int sync_timeline;
 	uint32_t sync_seqno;
@@ -225,6 +243,7 @@ static int fd;
 #define HEARTBEAT	(1<<7)
 #define GLOBAL_BALANCE	(1<<8)
 #define DEPSYNC		(1<<9)
+#define I915		(1<<10)
 
 #define SEQNO_IDX(engine) ((engine) * 16)
 #define SEQNO_OFFSET(engine) (SEQNO_IDX(engine) * sizeof(uint32_t))
@@ -836,7 +855,11 @@ eb_set_engine(struct drm_i915_gem_execbuffer2 *eb,
 	if (engine == VCS2 && (flags & VCS2REMAP))
 		engine = BCS;
 
-	eb->flags = eb_engine_map[engine];
+	if ((flags & I915) && engine == VCS) {
+		eb->flags = 0;
+	} else {
+		eb->flags = eb_engine_map[engine];
+	}
 }
 
 static void
@@ -862,6 +885,23 @@ get_status_objects(struct workload *wrk)
 		return wrk->status_object;
 }
 
+static struct ctx *
+__get_ctx(struct workload *wrk, struct w_step *w)
+{
+	return &wrk->ctx_list[w->context * 2];
+}
+
+static uint32_t
+get_ctxid(struct workload *wrk, struct w_step *w)
+{
+	struct ctx *ctx = __get_ctx(wrk, w);
+
+	if (ctx->targets_instance && ctx->wants_balance && w->engine == VCS)
+		return wrk->ctx_list[w->context * 2 + 1].id;
+	else
+		return wrk->ctx_list[w->context * 2].id;
+}
+
 static void
 alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 {
@@ -914,7 +954,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 
 	w->eb.buffers_ptr = to_user_pointer(w->obj);
 	w->eb.buffer_count = j + 1;
-	w->eb.rsvd1 = wrk->ctx_list[w->context].id;
+	w->eb.rsvd1 = get_ctxid(wrk, w);
 
 	if (flags & SWAPVCS && engine == VCS1)
 		engine = VCS2;
@@ -927,17 +967,29 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 		printf("%x|", w->obj[i].handle);
 	printf(" %10lu flags=%llx bb=%x[%u] ctx[%u]=%u\n",
 		w->bb_sz, w->eb.flags, w->bb_handle, j, w->context,
-		wrk->ctx_list[w->context].id);
+		get_ctxid(wrk, w));
 #endif
 }
 
+static void __ctx_set_prio(uint32_t ctx_id, unsigned int prio)
+{
+	struct drm_i915_gem_context_param param = {
+		.ctx_id = ctx_id,
+		.param = I915_CONTEXT_PARAM_PRIORITY,
+		.value = prio,
+	};
+
+	if (prio)
+		gem_context_set_param(fd, &param);
+}
+
 static void
 prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 {
 	unsigned int ctx_vcs = 0;
 	int max_ctx = -1;
 	struct w_step *w;
-	int i;
+	int i, j;
 
 	wrk->id = id;
 	wrk->prng = rand();
@@ -968,44 +1020,174 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		}
 	}
 
+	/*
+	 * Pre-scan workload steps to allocate context list storage.
+	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if ((int)w->context > max_ctx) {
-			int delta = w->context + 1 - wrk->nr_ctxs;
+		int ctx = w->context * 2 + 1; /* Odd slots are special. */
+		int delta;
+
+		if (ctx <= max_ctx)
+			continue;
+
+		delta = ctx + 1 - wrk->nr_ctxs;
+
+		wrk->nr_ctxs += delta;
+		wrk->ctx_list = realloc(wrk->ctx_list,
+					wrk->nr_ctxs * sizeof(*wrk->ctx_list));
+		memset(&wrk->ctx_list[wrk->nr_ctxs - delta], 0,
+			delta * sizeof(*wrk->ctx_list));
+
+		max_ctx = ctx;
+	}
+
+	/*
+	 * Identify if contexts target specific engine instances and if they
+	 * want to be balanced.
+	 */
+	for (j = 0; j < wrk->nr_ctxs; j += 2) {
+		bool targets = false;
+		bool balance = false;
+
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->type != BATCH)
+				continue;
+
+			if (w->context != (j / 2))
+				continue;
 
-			wrk->nr_ctxs += delta;
-			wrk->ctx_list = realloc(wrk->ctx_list,
-						wrk->nr_ctxs *
-						sizeof(*wrk->ctx_list));
-			memset(&wrk->ctx_list[wrk->nr_ctxs - delta], 0,
-			       delta * sizeof(*wrk->ctx_list));
+			if (w->engine == VCS)
+				balance = true;
+			else
+				targets = true;
+		}
 
-			max_ctx = w->context;
+		if (flags & I915) {
+			wrk->ctx_list[j].targets_instance = targets;
+			wrk->ctx_list[j].wants_balance = balance;
 		}
+	}
 
-		if (!wrk->ctx_list[w->context].id) {
-			struct drm_i915_gem_context_create arg = {};
+	/*
+	 * Create and configure contexts.
+	 */
+	for (i = 0; i < wrk->nr_ctxs; i += 2) {
+		struct ctx *ctx = &wrk->ctx_list[i];
+		uint32_t ctx_id, share_ctx = 0;
 
-			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &arg);
-			igt_assert(arg.ctx_id);
+		if (ctx->id)
+			continue;
 
-			wrk->ctx_list[w->context].id = arg.ctx_id;
+		if (flags & I915) {
+			struct local_drm_i915_gem_context_create_v2 args = { };
 
-			if (flags & GLOBAL_BALANCE) {
-				wrk->ctx_list[w->context].static_vcs = context_vcs_rr;
-				context_vcs_rr ^= 1;
-			} else {
-				wrk->ctx_list[w->context].static_vcs = ctx_vcs;
-				ctx_vcs ^= 1;
-			}
+			/* Find existing context to share ppgtt with. */
+			for (j = 0; j < wrk->nr_ctxs; j++) {
+				if (!wrk->ctx_list[j].id)
+					continue;
 
-			if (wrk->prio) {
-				struct drm_i915_gem_context_param param = {
-					.ctx_id = arg.ctx_id,
-					.param = I915_CONTEXT_PARAM_PRIORITY,
-					.value = wrk->prio,
-				};
-				gem_context_set_param(fd, &param);
+				args.flags |= LOCAL_I915_GEM_CONTEXT_SHARE_GTT;
+				args.share_ctx = share_ctx =
+					wrk->ctx_list[j].id;
+				break;
 			}
+
+			if (!ctx->targets_instance)
+				args.flags |= LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE;
+
+			drmIoctl(fd, LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE,
+				 &args);
+
+			ctx_id = args.ctx_id;
+		} else {
+			struct drm_i915_gem_context_create args = {};
+
+			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
+			ctx_id = args.ctx_id;
+		}
+
+		igt_assert(ctx_id);
+		ctx->id = ctx_id;
+
+		if (flags & GLOBAL_BALANCE) {
+			ctx->static_vcs = context_vcs_rr;
+			context_vcs_rr ^= 1;
+		} else {
+			ctx->static_vcs = ctx_vcs;
+			ctx_vcs ^= 1;
+		}
+
+		__ctx_set_prio(ctx_id, wrk->prio);
+
+		/*
+		 * Do we need a separate context to satisfy this workloads which
+		 * both want to target specific engines and be balanced by i915?
+		 */
+		if ((flags & I915) && ctx->wants_balance &&
+		    ctx->targets_instance) {
+			struct local_drm_i915_gem_context_create_v2 args = {};
+
+			igt_assert(share_ctx);
+
+			args.flags = LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE |
+				     LOCAL_I915_GEM_CONTEXT_SHARE_GTT;
+			args.share_ctx = share_ctx;
+
+			drmIoctl(fd, LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE,
+				 &args);
+
+			igt_assert(args.ctx_id);
+			ctx_id = args.ctx_id;
+			wrk->ctx_list[i + 1].id = args.ctx_id;
+
+			__ctx_set_prio(ctx_id, wrk->prio);
+		}
+
+		if (ctx->wants_balance) {
+			#define LOCAL_I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
+
+			struct local_i915_user_extension {
+				__u64 next_extension;
+				__u64 name;
+			};
+
+			struct local_i915_context_engines_load_balance {
+				struct local_i915_user_extension base;
+
+				__u64 flags; /* all undefined flags must be zero */
+				__u64 engines_mask;
+
+				__u64 mbz[4]; /* reserved for future use; must be zero */
+			} load_balance = {
+				.base.name = LOCAL_I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
+				.engines_mask = -1,
+			};
+
+			struct local_i915_context_param_engines {
+				__u64 extensions;
+
+				struct {
+					__u16 class; /* see enum drm_i915_gem_engine_class */
+					__u16 instance;
+				} engines[2];
+			} __attribute__((packed)) set_engines = {
+				.extensions = to_user_pointer(&load_balance),
+				.engines = {
+					{ .class = I915_ENGINE_CLASS_VIDEO,
+					  .instance = 0 },
+					{ .class = I915_ENGINE_CLASS_VIDEO,
+					  .instance = 1 },
+				},
+			};
+
+			struct drm_i915_gem_context_param param = {
+				.ctx_id = ctx_id,
+				.param = LOCAL_I915_CONTEXT_PARAM_ENGINES,
+				.size = sizeof(set_engines),
+				.value = to_user_pointer(&set_engines),
+			};
+
+			gem_context_set_param(fd, &param);
 		}
 	}
 
@@ -1380,7 +1562,7 @@ static enum intel_engine_id
 context_balance(const struct workload_balancer *balancer,
 		struct workload *wrk, struct w_step *w)
 {
-	return get_vcs_engine(wrk->ctx_list[w->context].static_vcs);
+	return get_vcs_engine(__get_ctx(wrk, w)->static_vcs);
 }
 
 static unsigned int
@@ -1574,6 +1756,12 @@ static const struct workload_balancer all_balancers[] = {
 		.get_qd = get_engine_busy,
 		.balance = busy_avg_balance,
 	},
+	{
+		.id = 11,
+		.name = "i915",
+		.desc = "i915 balancing.",
+		.flags = I915,
+	},
 };
 
 static unsigned int
@@ -1952,7 +2140,8 @@ static void *run_workload(void *data)
 			last_sync = false;
 
 			wrk->nr_bb[engine]++;
-			if (engine == VCS && wrk->balancer) {
+			if (engine == VCS && wrk->balancer &&
+			    wrk->balancer->balance) {
 				engine = wrk->balancer->balance(wrk->balancer,
 								wrk, w);
 				wrk->nr_bb[engine]++;
@@ -2379,6 +2568,12 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	if ((flags & VCS2REMAP) && (flags & I915)) {
+		if (verbose)
+			fprintf(stderr, "VCS remapping not supported with i915 balancing!\n");
+		return 1;
+	}
+
 	if (!nop_calibration) {
 		if (verbose > 1)
 			printf("Calibrating nop delay with %u%% tolerance...\n",
@@ -2464,11 +2659,17 @@ int main(int argc, char **argv)
 		printf("%u client%s.\n", clients, clients > 1 ? "s" : "");
 		if (flags & SWAPVCS)
 			printf("Swapping VCS rings between clients.\n");
-		if (flags & GLOBAL_BALANCE)
-			printf("Using %s balancer in global mode.\n",
-			       balancer->name);
-		else if (balancer)
+		if (flags & GLOBAL_BALANCE) {
+			if (flags & I915) {
+				printf("Ignoring global balancing with i915!\n");
+				flags &= ~GLOBAL_BALANCE;
+			} else {
+				printf("Using %s balancer in global mode.\n",
+				       balancer->name);
+			}
+		} else if (balancer) {
 			printf("Using %s balancer.\n", balancer->name);
+		}
 	}
 
 	if (master_workload >= 0 && clients == 1)
@@ -2485,7 +2686,7 @@ int main(int argc, char **argv)
 		if (flags & SWAPVCS && i & 1)
 			flags_ &= ~SWAPVCS;
 
-		if (flags & GLOBAL_BALANCE) {
+		if ((flags & GLOBAL_BALANCE) && !(flags & I915)) {
 			w[i]->balancer = &global_balancer;
 			w[i]->global_wrk = w[0];
 			w[i]->global_balancer = balancer;
diff --git a/scripts/media-bench.pl b/scripts/media-bench.pl
index 066b542f95df..ddf9c0ec05c8 100755
--- a/scripts/media-bench.pl
+++ b/scripts/media-bench.pl
@@ -49,10 +49,11 @@ my $nop;
 my %opts;
 
 my @balancers = ( 'rr', 'rand', 'qd', 'qdr', 'qdavg', 'rt', 'rtr', 'rtavg',
-		  'context', 'busy', 'busy-avg' );
+		  'context', 'busy', 'busy-avg', 'i915' );
 my %bal_skip_H = ( 'rr' => 1, 'rand' => 1, 'context' => 1, , 'busy' => 1,
-		   'busy-avg' => 1 );
-my %bal_skip_R = ( 'context' => 1 );
+		   'busy-avg' => 1, 'i915' => 1 );
+my %bal_skip_R = ( 'context' => 1, 'i915' => 1 );
+my %bal_skip_G = ( 'i915' => 1 );
 
 my @workloads = (
 	'media_load_balance_17i7.wsim',
@@ -498,6 +499,8 @@ foreach my $wrk (@saturation_workloads) {
 				my $bid;
 
 				if ($bal ne '') {
+					next GBAL if $G =~ '-G' and exists $bal_skip_G{$bal};
+
 					push @xargs, "-b $bal";
 					push @xargs, '-R' unless exists $bal_skip_R{$bal};
 					push @xargs, $G if $G ne '';
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 04/17] wsim/media-bench: i915 balancing
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Support i915 virtual engine from gem_wsim (-b i915) and media-bench.pl

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 289 ++++++++++++++++++++++++++++++++++-------
 scripts/media-bench.pl |   9 +-
 2 files changed, 251 insertions(+), 47 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index e0709487897b..e1c73855150b 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -59,6 +59,20 @@
 #define LOCAL_I915_EXEC_FENCE_IN              (1<<16)
 #define LOCAL_I915_EXEC_FENCE_OUT             (1<<17)
 
+struct local_drm_i915_gem_context_create_v2 {
+	/*  output: id of new context*/
+	__u32 ctx_id;
+	__u32 flags;
+#define LOCAL_I915_GEM_CONTEXT_SHARE_GTT	0x1
+#define LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE	0x2
+	__u32 share_ctx;
+	__u32 pad;
+};
+
+#define LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct local_drm_i915_gem_context_create_v2)
+
+#define LOCAL_I915_CONTEXT_PARAM_ENGINES	0x7
+
 enum intel_engine_id {
 	RCS,
 	BCS,
@@ -143,6 +157,14 @@ struct w_step
 
 DECLARE_EWMA(uint64_t, rt, 4, 2)
 
+struct ctx {
+	uint32_t id;
+	int priority;
+	bool targets_instance;
+	bool wants_balance;
+	unsigned int static_vcs;
+};
+
 struct workload
 {
 	unsigned int id;
@@ -164,11 +186,7 @@ struct workload
 	struct timespec repeat_start;
 
 	unsigned int nr_ctxs;
-	struct {
-		uint32_t id;
-		int priority;
-		unsigned int static_vcs;
-	} *ctx_list;
+	struct ctx *ctx_list;
 
 	int sync_timeline;
 	uint32_t sync_seqno;
@@ -225,6 +243,7 @@ static int fd;
 #define HEARTBEAT	(1<<7)
 #define GLOBAL_BALANCE	(1<<8)
 #define DEPSYNC		(1<<9)
+#define I915		(1<<10)
 
 #define SEQNO_IDX(engine) ((engine) * 16)
 #define SEQNO_OFFSET(engine) (SEQNO_IDX(engine) * sizeof(uint32_t))
@@ -836,7 +855,11 @@ eb_set_engine(struct drm_i915_gem_execbuffer2 *eb,
 	if (engine == VCS2 && (flags & VCS2REMAP))
 		engine = BCS;
 
-	eb->flags = eb_engine_map[engine];
+	if ((flags & I915) && engine == VCS) {
+		eb->flags = 0;
+	} else {
+		eb->flags = eb_engine_map[engine];
+	}
 }
 
 static void
@@ -862,6 +885,23 @@ get_status_objects(struct workload *wrk)
 		return wrk->status_object;
 }
 
+static struct ctx *
+__get_ctx(struct workload *wrk, struct w_step *w)
+{
+	return &wrk->ctx_list[w->context * 2];
+}
+
+static uint32_t
+get_ctxid(struct workload *wrk, struct w_step *w)
+{
+	struct ctx *ctx = __get_ctx(wrk, w);
+
+	if (ctx->targets_instance && ctx->wants_balance && w->engine == VCS)
+		return wrk->ctx_list[w->context * 2 + 1].id;
+	else
+		return wrk->ctx_list[w->context * 2].id;
+}
+
 static void
 alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 {
@@ -914,7 +954,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 
 	w->eb.buffers_ptr = to_user_pointer(w->obj);
 	w->eb.buffer_count = j + 1;
-	w->eb.rsvd1 = wrk->ctx_list[w->context].id;
+	w->eb.rsvd1 = get_ctxid(wrk, w);
 
 	if (flags & SWAPVCS && engine == VCS1)
 		engine = VCS2;
@@ -927,17 +967,29 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 		printf("%x|", w->obj[i].handle);
 	printf(" %10lu flags=%llx bb=%x[%u] ctx[%u]=%u\n",
 		w->bb_sz, w->eb.flags, w->bb_handle, j, w->context,
-		wrk->ctx_list[w->context].id);
+		get_ctxid(wrk, w));
 #endif
 }
 
+static void __ctx_set_prio(uint32_t ctx_id, unsigned int prio)
+{
+	struct drm_i915_gem_context_param param = {
+		.ctx_id = ctx_id,
+		.param = I915_CONTEXT_PARAM_PRIORITY,
+		.value = prio,
+	};
+
+	if (prio)
+		gem_context_set_param(fd, &param);
+}
+
 static void
 prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 {
 	unsigned int ctx_vcs = 0;
 	int max_ctx = -1;
 	struct w_step *w;
-	int i;
+	int i, j;
 
 	wrk->id = id;
 	wrk->prng = rand();
@@ -968,44 +1020,174 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		}
 	}
 
+	/*
+	 * Pre-scan workload steps to allocate context list storage.
+	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-		if ((int)w->context > max_ctx) {
-			int delta = w->context + 1 - wrk->nr_ctxs;
+		int ctx = w->context * 2 + 1; /* Odd slots are special. */
+		int delta;
+
+		if (ctx <= max_ctx)
+			continue;
+
+		delta = ctx + 1 - wrk->nr_ctxs;
+
+		wrk->nr_ctxs += delta;
+		wrk->ctx_list = realloc(wrk->ctx_list,
+					wrk->nr_ctxs * sizeof(*wrk->ctx_list));
+		memset(&wrk->ctx_list[wrk->nr_ctxs - delta], 0,
+			delta * sizeof(*wrk->ctx_list));
+
+		max_ctx = ctx;
+	}
+
+	/*
+	 * Identify if contexts target specific engine instances and if they
+	 * want to be balanced.
+	 */
+	for (j = 0; j < wrk->nr_ctxs; j += 2) {
+		bool targets = false;
+		bool balance = false;
+
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->type != BATCH)
+				continue;
+
+			if (w->context != (j / 2))
+				continue;
 
-			wrk->nr_ctxs += delta;
-			wrk->ctx_list = realloc(wrk->ctx_list,
-						wrk->nr_ctxs *
-						sizeof(*wrk->ctx_list));
-			memset(&wrk->ctx_list[wrk->nr_ctxs - delta], 0,
-			       delta * sizeof(*wrk->ctx_list));
+			if (w->engine == VCS)
+				balance = true;
+			else
+				targets = true;
+		}
 
-			max_ctx = w->context;
+		if (flags & I915) {
+			wrk->ctx_list[j].targets_instance = targets;
+			wrk->ctx_list[j].wants_balance = balance;
 		}
+	}
 
-		if (!wrk->ctx_list[w->context].id) {
-			struct drm_i915_gem_context_create arg = {};
+	/*
+	 * Create and configure contexts.
+	 */
+	for (i = 0; i < wrk->nr_ctxs; i += 2) {
+		struct ctx *ctx = &wrk->ctx_list[i];
+		uint32_t ctx_id, share_ctx = 0;
 
-			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &arg);
-			igt_assert(arg.ctx_id);
+		if (ctx->id)
+			continue;
 
-			wrk->ctx_list[w->context].id = arg.ctx_id;
+		if (flags & I915) {
+			struct local_drm_i915_gem_context_create_v2 args = { };
 
-			if (flags & GLOBAL_BALANCE) {
-				wrk->ctx_list[w->context].static_vcs = context_vcs_rr;
-				context_vcs_rr ^= 1;
-			} else {
-				wrk->ctx_list[w->context].static_vcs = ctx_vcs;
-				ctx_vcs ^= 1;
-			}
+			/* Find existing context to share ppgtt with. */
+			for (j = 0; j < wrk->nr_ctxs; j++) {
+				if (!wrk->ctx_list[j].id)
+					continue;
 
-			if (wrk->prio) {
-				struct drm_i915_gem_context_param param = {
-					.ctx_id = arg.ctx_id,
-					.param = I915_CONTEXT_PARAM_PRIORITY,
-					.value = wrk->prio,
-				};
-				gem_context_set_param(fd, &param);
+				args.flags |= LOCAL_I915_GEM_CONTEXT_SHARE_GTT;
+				args.share_ctx = share_ctx =
+					wrk->ctx_list[j].id;
+				break;
 			}
+
+			if (!ctx->targets_instance)
+				args.flags |= LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE;
+
+			drmIoctl(fd, LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE,
+				 &args);
+
+			ctx_id = args.ctx_id;
+		} else {
+			struct drm_i915_gem_context_create args = {};
+
+			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
+			ctx_id = args.ctx_id;
+		}
+
+		igt_assert(ctx_id);
+		ctx->id = ctx_id;
+
+		if (flags & GLOBAL_BALANCE) {
+			ctx->static_vcs = context_vcs_rr;
+			context_vcs_rr ^= 1;
+		} else {
+			ctx->static_vcs = ctx_vcs;
+			ctx_vcs ^= 1;
+		}
+
+		__ctx_set_prio(ctx_id, wrk->prio);
+
+		/*
+		 * Do we need a separate context to satisfy this workloads which
+		 * both want to target specific engines and be balanced by i915?
+		 */
+		if ((flags & I915) && ctx->wants_balance &&
+		    ctx->targets_instance) {
+			struct local_drm_i915_gem_context_create_v2 args = {};
+
+			igt_assert(share_ctx);
+
+			args.flags = LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE |
+				     LOCAL_I915_GEM_CONTEXT_SHARE_GTT;
+			args.share_ctx = share_ctx;
+
+			drmIoctl(fd, LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE,
+				 &args);
+
+			igt_assert(args.ctx_id);
+			ctx_id = args.ctx_id;
+			wrk->ctx_list[i + 1].id = args.ctx_id;
+
+			__ctx_set_prio(ctx_id, wrk->prio);
+		}
+
+		if (ctx->wants_balance) {
+			#define LOCAL_I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
+
+			struct local_i915_user_extension {
+				__u64 next_extension;
+				__u64 name;
+			};
+
+			struct local_i915_context_engines_load_balance {
+				struct local_i915_user_extension base;
+
+				__u64 flags; /* all undefined flags must be zero */
+				__u64 engines_mask;
+
+				__u64 mbz[4]; /* reserved for future use; must be zero */
+			} load_balance = {
+				.base.name = LOCAL_I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
+				.engines_mask = -1,
+			};
+
+			struct local_i915_context_param_engines {
+				__u64 extensions;
+
+				struct {
+					__u16 class; /* see enum drm_i915_gem_engine_class */
+					__u16 instance;
+				} engines[2];
+			} __attribute__((packed)) set_engines = {
+				.extensions = to_user_pointer(&load_balance),
+				.engines = {
+					{ .class = I915_ENGINE_CLASS_VIDEO,
+					  .instance = 0 },
+					{ .class = I915_ENGINE_CLASS_VIDEO,
+					  .instance = 1 },
+				},
+			};
+
+			struct drm_i915_gem_context_param param = {
+				.ctx_id = ctx_id,
+				.param = LOCAL_I915_CONTEXT_PARAM_ENGINES,
+				.size = sizeof(set_engines),
+				.value = to_user_pointer(&set_engines),
+			};
+
+			gem_context_set_param(fd, &param);
 		}
 	}
 
@@ -1380,7 +1562,7 @@ static enum intel_engine_id
 context_balance(const struct workload_balancer *balancer,
 		struct workload *wrk, struct w_step *w)
 {
-	return get_vcs_engine(wrk->ctx_list[w->context].static_vcs);
+	return get_vcs_engine(__get_ctx(wrk, w)->static_vcs);
 }
 
 static unsigned int
@@ -1574,6 +1756,12 @@ static const struct workload_balancer all_balancers[] = {
 		.get_qd = get_engine_busy,
 		.balance = busy_avg_balance,
 	},
+	{
+		.id = 11,
+		.name = "i915",
+		.desc = "i915 balancing.",
+		.flags = I915,
+	},
 };
 
 static unsigned int
@@ -1952,7 +2140,8 @@ static void *run_workload(void *data)
 			last_sync = false;
 
 			wrk->nr_bb[engine]++;
-			if (engine == VCS && wrk->balancer) {
+			if (engine == VCS && wrk->balancer &&
+			    wrk->balancer->balance) {
 				engine = wrk->balancer->balance(wrk->balancer,
 								wrk, w);
 				wrk->nr_bb[engine]++;
@@ -2379,6 +2568,12 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	if ((flags & VCS2REMAP) && (flags & I915)) {
+		if (verbose)
+			fprintf(stderr, "VCS remapping not supported with i915 balancing!\n");
+		return 1;
+	}
+
 	if (!nop_calibration) {
 		if (verbose > 1)
 			printf("Calibrating nop delay with %u%% tolerance...\n",
@@ -2464,11 +2659,17 @@ int main(int argc, char **argv)
 		printf("%u client%s.\n", clients, clients > 1 ? "s" : "");
 		if (flags & SWAPVCS)
 			printf("Swapping VCS rings between clients.\n");
-		if (flags & GLOBAL_BALANCE)
-			printf("Using %s balancer in global mode.\n",
-			       balancer->name);
-		else if (balancer)
+		if (flags & GLOBAL_BALANCE) {
+			if (flags & I915) {
+				printf("Ignoring global balancing with i915!\n");
+				flags &= ~GLOBAL_BALANCE;
+			} else {
+				printf("Using %s balancer in global mode.\n",
+				       balancer->name);
+			}
+		} else if (balancer) {
 			printf("Using %s balancer.\n", balancer->name);
+		}
 	}
 
 	if (master_workload >= 0 && clients == 1)
@@ -2485,7 +2686,7 @@ int main(int argc, char **argv)
 		if (flags & SWAPVCS && i & 1)
 			flags_ &= ~SWAPVCS;
 
-		if (flags & GLOBAL_BALANCE) {
+		if ((flags & GLOBAL_BALANCE) && !(flags & I915)) {
 			w[i]->balancer = &global_balancer;
 			w[i]->global_wrk = w[0];
 			w[i]->global_balancer = balancer;
diff --git a/scripts/media-bench.pl b/scripts/media-bench.pl
index 066b542f95df..ddf9c0ec05c8 100755
--- a/scripts/media-bench.pl
+++ b/scripts/media-bench.pl
@@ -49,10 +49,11 @@ my $nop;
 my %opts;
 
 my @balancers = ( 'rr', 'rand', 'qd', 'qdr', 'qdavg', 'rt', 'rtr', 'rtavg',
-		  'context', 'busy', 'busy-avg' );
+		  'context', 'busy', 'busy-avg', 'i915' );
 my %bal_skip_H = ( 'rr' => 1, 'rand' => 1, 'context' => 1, , 'busy' => 1,
-		   'busy-avg' => 1 );
-my %bal_skip_R = ( 'context' => 1 );
+		   'busy-avg' => 1, 'i915' => 1 );
+my %bal_skip_R = ( 'context' => 1, 'i915' => 1 );
+my %bal_skip_G = ( 'i915' => 1 );
 
 my @workloads = (
 	'media_load_balance_17i7.wsim',
@@ -498,6 +499,8 @@ foreach my $wrk (@saturation_workloads) {
 				my $bid;
 
 				if ($bal ne '') {
+					next GBAL if $G =~ '-G' and exists $bal_skip_G{$bal};
+
 					push @xargs, "-b $bal";
 					push @xargs, '-R' unless exists $bal_skip_R{$bal};
 					push @xargs, $G if $G ne '';
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 05/17] gem_wsim: Use IGT uapi headers
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We are moving towards bumping the uAPI headers more often instead of using
too much local struct/ioctl/param definitions since the latter are more
challenging for rebase and maintenance.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 68 +++++++++++--------------------------------
 1 file changed, 17 insertions(+), 51 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index e1c73855150b..adfc2b1bc819 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -41,7 +41,6 @@
 #include <limits.h>
 #include <pthread.h>
 
-
 #include "intel_chipset.h"
 #include "intel_reg.h"
 #include "drm.h"
@@ -56,23 +55,6 @@
 
 #include "ewma.h"
 
-#define LOCAL_I915_EXEC_FENCE_IN              (1<<16)
-#define LOCAL_I915_EXEC_FENCE_OUT             (1<<17)
-
-struct local_drm_i915_gem_context_create_v2 {
-	/*  output: id of new context*/
-	__u32 ctx_id;
-	__u32 flags;
-#define LOCAL_I915_GEM_CONTEXT_SHARE_GTT	0x1
-#define LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE	0x2
-	__u32 share_ctx;
-	__u32 pad;
-};
-
-#define LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct local_drm_i915_gem_context_create_v2)
-
-#define LOCAL_I915_CONTEXT_PARAM_ENGINES	0x7
-
 enum intel_engine_id {
 	RCS,
 	BCS,
@@ -873,7 +855,7 @@ eb_update_flags(struct w_step *w, enum intel_engine_id engine,
 
 	igt_assert(w->emit_fence <= 0);
 	if (w->emit_fence)
-		w->eb.flags |= LOCAL_I915_EXEC_FENCE_OUT;
+		w->eb.flags |= I915_EXEC_FENCE_OUT;
 }
 
 static struct drm_i915_gem_exec_object2 *
@@ -1079,24 +1061,23 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			continue;
 
 		if (flags & I915) {
-			struct local_drm_i915_gem_context_create_v2 args = { };
+			struct drm_i915_gem_context_create_v2 args = { };
 
 			/* Find existing context to share ppgtt with. */
 			for (j = 0; j < wrk->nr_ctxs; j++) {
 				if (!wrk->ctx_list[j].id)
 					continue;
 
-				args.flags |= LOCAL_I915_GEM_CONTEXT_SHARE_GTT;
+				args.flags |= I915_GEM_CONTEXT_SHARE_GTT;
 				args.share_ctx = share_ctx =
 					wrk->ctx_list[j].id;
 				break;
 			}
 
 			if (!ctx->targets_instance)
-				args.flags |= LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE;
+				args.flags |= I915_GEM_CONTEXT_SINGLE_TIMELINE;
 
-			drmIoctl(fd, LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE,
-				 &args);
+			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
 
 			ctx_id = args.ctx_id;
 		} else {
@@ -1125,16 +1106,15 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		 */
 		if ((flags & I915) && ctx->wants_balance &&
 		    ctx->targets_instance) {
-			struct local_drm_i915_gem_context_create_v2 args = {};
+			struct drm_i915_gem_context_create_v2 args = {};
 
 			igt_assert(share_ctx);
 
-			args.flags = LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE |
-				     LOCAL_I915_GEM_CONTEXT_SHARE_GTT;
+			args.flags = I915_GEM_CONTEXT_SINGLE_TIMELINE |
+				     I915_GEM_CONTEXT_SHARE_GTT;
 			args.share_ctx = share_ctx;
 
-			drmIoctl(fd, LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE,
-				 &args);
+			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
 
 			igt_assert(args.ctx_id);
 			ctx_id = args.ctx_id;
@@ -1144,24 +1124,10 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		}
 
 		if (ctx->wants_balance) {
-			#define LOCAL_I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
-
-			struct local_i915_user_extension {
-				__u64 next_extension;
-				__u64 name;
-			};
-
-			struct local_i915_context_engines_load_balance {
-				struct local_i915_user_extension base;
-
-				__u64 flags; /* all undefined flags must be zero */
-				__u64 engines_mask;
-
-				__u64 mbz[4]; /* reserved for future use; must be zero */
-			} load_balance = {
-				.base.name = LOCAL_I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
-				.engines_mask = -1,
-			};
+			struct i915_context_engines_load_balance load_balance =
+				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
+				  .engines_mask = -1,
+				};
 
 			struct local_i915_context_param_engines {
 				__u64 extensions;
@@ -1182,7 +1148,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 
 			struct drm_i915_gem_context_param param = {
 				.ctx_id = ctx_id,
-				.param = LOCAL_I915_CONTEXT_PARAM_ENGINES,
+				.param = I915_CONTEXT_PARAM_ENGINES,
 				.size = sizeof(set_engines),
 				.value = to_user_pointer(&set_engines),
 			};
@@ -1994,16 +1960,16 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 		igt_assert(tgt >= 0 && tgt < w->idx);
 		igt_assert(wrk->steps[tgt].emit_fence > 0);
 
-		w->eb.flags |= LOCAL_I915_EXEC_FENCE_IN;
+		w->eb.flags |= I915_EXEC_FENCE_IN;
 		w->eb.rsvd2 = wrk->steps[tgt].emit_fence;
 	}
 
-	if (w->eb.flags & LOCAL_I915_EXEC_FENCE_OUT)
+	if (w->eb.flags & I915_EXEC_FENCE_OUT)
 		gem_execbuf_wr(fd, &w->eb);
 	else
 		gem_execbuf(fd, &w->eb);
 
-	if (w->eb.flags & LOCAL_I915_EXEC_FENCE_OUT) {
+	if (w->eb.flags & I915_EXEC_FENCE_OUT) {
 		w->emit_fence = w->eb.rsvd2 >> 32;
 		igt_assert(w->emit_fence > 0);
 	}
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 05/17] gem_wsim: Use IGT uapi headers
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We are moving towards bumping the uAPI headers more often instead of using
too much local struct/ioctl/param definitions since the latter are more
challenging for rebase and maintenance.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 68 +++++++++++--------------------------------
 1 file changed, 17 insertions(+), 51 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index e1c73855150b..adfc2b1bc819 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -41,7 +41,6 @@
 #include <limits.h>
 #include <pthread.h>
 
-
 #include "intel_chipset.h"
 #include "intel_reg.h"
 #include "drm.h"
@@ -56,23 +55,6 @@
 
 #include "ewma.h"
 
-#define LOCAL_I915_EXEC_FENCE_IN              (1<<16)
-#define LOCAL_I915_EXEC_FENCE_OUT             (1<<17)
-
-struct local_drm_i915_gem_context_create_v2 {
-	/*  output: id of new context*/
-	__u32 ctx_id;
-	__u32 flags;
-#define LOCAL_I915_GEM_CONTEXT_SHARE_GTT	0x1
-#define LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE	0x2
-	__u32 share_ctx;
-	__u32 pad;
-};
-
-#define LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct local_drm_i915_gem_context_create_v2)
-
-#define LOCAL_I915_CONTEXT_PARAM_ENGINES	0x7
-
 enum intel_engine_id {
 	RCS,
 	BCS,
@@ -873,7 +855,7 @@ eb_update_flags(struct w_step *w, enum intel_engine_id engine,
 
 	igt_assert(w->emit_fence <= 0);
 	if (w->emit_fence)
-		w->eb.flags |= LOCAL_I915_EXEC_FENCE_OUT;
+		w->eb.flags |= I915_EXEC_FENCE_OUT;
 }
 
 static struct drm_i915_gem_exec_object2 *
@@ -1079,24 +1061,23 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			continue;
 
 		if (flags & I915) {
-			struct local_drm_i915_gem_context_create_v2 args = { };
+			struct drm_i915_gem_context_create_v2 args = { };
 
 			/* Find existing context to share ppgtt with. */
 			for (j = 0; j < wrk->nr_ctxs; j++) {
 				if (!wrk->ctx_list[j].id)
 					continue;
 
-				args.flags |= LOCAL_I915_GEM_CONTEXT_SHARE_GTT;
+				args.flags |= I915_GEM_CONTEXT_SHARE_GTT;
 				args.share_ctx = share_ctx =
 					wrk->ctx_list[j].id;
 				break;
 			}
 
 			if (!ctx->targets_instance)
-				args.flags |= LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE;
+				args.flags |= I915_GEM_CONTEXT_SINGLE_TIMELINE;
 
-			drmIoctl(fd, LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE,
-				 &args);
+			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
 
 			ctx_id = args.ctx_id;
 		} else {
@@ -1125,16 +1106,15 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		 */
 		if ((flags & I915) && ctx->wants_balance &&
 		    ctx->targets_instance) {
-			struct local_drm_i915_gem_context_create_v2 args = {};
+			struct drm_i915_gem_context_create_v2 args = {};
 
 			igt_assert(share_ctx);
 
-			args.flags = LOCAL_I915_GEM_CONTEXT_SINGLE_TIMELINE |
-				     LOCAL_I915_GEM_CONTEXT_SHARE_GTT;
+			args.flags = I915_GEM_CONTEXT_SINGLE_TIMELINE |
+				     I915_GEM_CONTEXT_SHARE_GTT;
 			args.share_ctx = share_ctx;
 
-			drmIoctl(fd, LOCAL_DRM_IOCTL_I915_GEM_CONTEXT_CREATE,
-				 &args);
+			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
 
 			igt_assert(args.ctx_id);
 			ctx_id = args.ctx_id;
@@ -1144,24 +1124,10 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		}
 
 		if (ctx->wants_balance) {
-			#define LOCAL_I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
-
-			struct local_i915_user_extension {
-				__u64 next_extension;
-				__u64 name;
-			};
-
-			struct local_i915_context_engines_load_balance {
-				struct local_i915_user_extension base;
-
-				__u64 flags; /* all undefined flags must be zero */
-				__u64 engines_mask;
-
-				__u64 mbz[4]; /* reserved for future use; must be zero */
-			} load_balance = {
-				.base.name = LOCAL_I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
-				.engines_mask = -1,
-			};
+			struct i915_context_engines_load_balance load_balance =
+				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
+				  .engines_mask = -1,
+				};
 
 			struct local_i915_context_param_engines {
 				__u64 extensions;
@@ -1182,7 +1148,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 
 			struct drm_i915_gem_context_param param = {
 				.ctx_id = ctx_id,
-				.param = LOCAL_I915_CONTEXT_PARAM_ENGINES,
+				.param = I915_CONTEXT_PARAM_ENGINES,
 				.size = sizeof(set_engines),
 				.value = to_user_pointer(&set_engines),
 			};
@@ -1994,16 +1960,16 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 		igt_assert(tgt >= 0 && tgt < w->idx);
 		igt_assert(wrk->steps[tgt].emit_fence > 0);
 
-		w->eb.flags |= LOCAL_I915_EXEC_FENCE_IN;
+		w->eb.flags |= I915_EXEC_FENCE_IN;
 		w->eb.rsvd2 = wrk->steps[tgt].emit_fence;
 	}
 
-	if (w->eb.flags & LOCAL_I915_EXEC_FENCE_OUT)
+	if (w->eb.flags & I915_EXEC_FENCE_OUT)
 		gem_execbuf_wr(fd, &w->eb);
 	else
 		gem_execbuf(fd, &w->eb);
 
-	if (w->eb.flags & LOCAL_I915_EXEC_FENCE_OUT) {
+	if (w->eb.flags & I915_EXEC_FENCE_OUT) {
 		w->emit_fence = w->eb.rsvd2 >> 32;
 		igt_assert(w->emit_fence > 0);
 	}
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 06/17] gem_wsim: Fix shadowed local
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index adfc2b1bc819..2561817622f6 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1170,7 +1170,6 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		struct w_step *w2;
-		int j;
 
 		if (w->type != PREEMPTION)
 			continue;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 06/17] gem_wsim: Fix shadowed local
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index adfc2b1bc819..2561817622f6 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -1170,7 +1170,6 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	 */
 	for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
 		struct w_step *w2;
-		int j;
 
 		if (w->type != PREEMPTION)
 			continue;
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 07/17] gem_wsim: Factor out common error handling
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

There is a repeated pattern with error handling which can be moved to a
macro to for better readability in the command parsing loop.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 244 +++++++++++++++---------------------------
 1 file changed, 88 insertions(+), 156 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 2561817622f6..a6ee6c493424 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -290,6 +290,27 @@ parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
 	return 0;
 }
 
+static void __attribute__((format(printf, 1, 2)))
+wsim_err(const char *fmt, ...)
+{
+	va_list ap;
+
+	if (!verbose)
+		return;
+
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	va_end(ap);
+}
+
+#define check_arg(cond, fmt, ...) \
+{ \
+	if (cond) { \
+		wsim_err(fmt, __VA_ARGS__); \
+		return NULL; \
+	} \
+}
+
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 {
@@ -320,14 +341,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp <= 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid delay at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp <= 0,
+						  "Invalid delay at step %u!\n",
+						  nr_steps);
 					step.type = DELAY;
 					step.delay = tmp;
 					goto add_step;
@@ -336,14 +352,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp <= 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid period at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp <= 0,
+						  "Invalid period at step %u!\n",
+						  nr_steps);
 					step.type = PERIOD;
 					step.period = tmp;
 					goto add_step;
@@ -353,25 +364,17 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				while ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp <= 0 && nr == 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid context at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
-					if (nr == 0) {
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 1,
+						  "Invalid priority format at step %u!\n",
+						  nr_steps);
+
+					if (nr == 0)
 						step.context = tmp;
-					} else if (nr == 1) {
+					else
 						step.priority = tmp;
-					} else {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid priority format at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
 
 					nr++;
 				}
@@ -382,15 +385,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp >= 0 ||
-					    ((int)nr_steps + tmp) < 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid sync target at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp >= 0 ||
+						  ((int)nr_steps + tmp) < 0,
+						  "Invalid sync target at step %u!\n",
+						  nr_steps);
 					step.type = SYNC;
 					step.target = tmp;
 					goto add_step;
@@ -399,14 +397,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp < 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid throttle at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp < 0,
+						  "Invalid throttle at step %u!\n",
+						  nr_steps);
 					step.type = THROTTLE;
 					step.throttle = tmp;
 					goto add_step;
@@ -415,14 +408,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp < 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid qd throttle at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp < 0,
+						  "Invalid qd throttle at step %u!\n",
+						  nr_steps);
 					step.type = QD_THROTTLE;
 					step.throttle = tmp;
 					goto add_step;
@@ -431,14 +419,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp >= 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid sw fence signal at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp >= 0,
+						  "Invalid sw fence signal at step %u!\n",
+						  nr_steps);
 					step.type = SW_FENCE_SIGNAL;
 					step.target = tmp;
 					goto add_step;
@@ -451,31 +434,20 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				while ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp <= 0 && nr == 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid context at step %u!\n",
-								nr_steps);
-						return NULL;
-					} else if (tmp < 0 && nr == 1) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid preemption period at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
-					if (nr == 0) {
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr == 1 && tmp < 0,
+						  "Invalid preemption period at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 1,
+						  "Invalid preemption format at step %u!\n",
+						  nr_steps);
+
+					if (nr == 0)
 						step.context = tmp;
-					} else if (nr == 1) {
+					else
 						step.period = tmp;
-					} else {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid preemption format at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
 
 					nr++;
 				}
@@ -485,13 +457,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			}
 
 			tmp = atoi(field);
-			if (tmp < 0) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid ctx id at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(tmp < 0, "Invalid ctx id at step %u!\n",
+				  nr_steps);
 			step.context = tmp;
 
 			valid++;
@@ -512,13 +479,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				}
 			}
 
-			if (old_valid == valid) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid engine id at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(old_valid == valid,
+				  "Invalid engine id at step %u!\n", nr_steps);
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
@@ -528,25 +490,19 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			fstart = NULL;
 
 			tmpl = strtol(field, &sep, 10);
-			if (tmpl <= 0 || tmpl == LONG_MIN || tmpl == LONG_MAX) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid duration at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
+				  tmpl == LONG_MAX,
+				  "Invalid duration at step %u!\n", nr_steps);
 			step.duration.min = tmpl;
 
 			if (sep && *sep == '-') {
 				tmpl = strtol(sep + 1, NULL, 10);
-				if (tmpl <= 0 || tmpl <= step.duration.min ||
-				    tmpl == LONG_MIN || tmpl == LONG_MAX) {
-					if (verbose)
-						fprintf(stderr,
-							"Invalid duration range at step %u!\n",
-							nr_steps);
-					return NULL;
-				}
+				check_arg(tmpl <= 0 ||
+					  tmpl <= step.duration.min ||
+					  tmpl == LONG_MIN ||
+					  tmpl == LONG_MAX,
+					  "Invalid duration range at step %u!\n",
+					  nr_steps);
 				step.duration.max = tmpl;
 			} else {
 				step.duration.max = step.duration.min;
@@ -559,13 +515,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			fstart = NULL;
 
 			tmp = parse_dependencies(nr_steps, &step, field);
-			if (tmp < 0) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid dependency at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(tmp < 0,
+				  "Invalid dependency at step %u!\n", nr_steps);
 
 			valid++;
 		}
@@ -573,25 +524,16 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
 			fstart = NULL;
 
-			if (strlen(field) != 1 ||
-			    (field[0] != '0' && field[0] != '1')) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid wait boolean at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(strlen(field) != 1 ||
+				  (field[0] != '0' && field[0] != '1'),
+				  "Invalid wait boolean at step %u!\n",
+				  nr_steps);
 			step.sync = field[0] - '0';
 
 			valid++;
 		}
 
-		if (valid != 5) {
-			if (verbose)
-				fprintf(stderr, "Invalid record at step %u!\n",
-					nr_steps);
-			return NULL;
-		}
+		check_arg(valid != 5, "Invalid record at step %u!\n", nr_steps);
 
 		step.type = BATCH;
 
@@ -636,15 +578,10 @@ add_step:
 	for (i = 0; i < nr_steps; i++) {
 		for (j = 0; j < steps[i].fence_deps.nr; j++) {
 			tmp = steps[i].idx + steps[i].fence_deps.list[j];
-			if (tmp < 0 || tmp >= i ||
-			    (steps[tmp].type != BATCH &&
-			     steps[tmp].type != SW_FENCE)) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid dependency target %u!\n",
-						i);
-				return NULL;
-			}
+			check_arg(tmp < 0 || tmp >= i ||
+				  (steps[tmp].type != BATCH &&
+				   steps[tmp].type != SW_FENCE),
+				  "Invalid dependency target %u!\n", i);
 			steps[tmp].emit_fence = -1;
 		}
 	}
@@ -653,14 +590,9 @@ add_step:
 	for (i = 0; i < nr_steps; i++) {
 		if (steps[i].type == SW_FENCE_SIGNAL) {
 			tmp = steps[i].idx + steps[i].target;
-			if (tmp < 0 || tmp >= i ||
-			    steps[tmp].type != SW_FENCE) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid sw fence target %u!\n",
-						i);
-				return NULL;
-			}
+			check_arg(tmp < 0 || tmp >= i ||
+				  steps[tmp].type != SW_FENCE,
+				  "Invalid sw fence target %u!\n", i);
 		}
 	}
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Intel-gfx] [PATCH i-g-t 07/17] gem_wsim: Factor out common error handling
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

There is a repeated pattern with error handling which can be moved to a
macro to for better readability in the command parsing loop.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 244 +++++++++++++++---------------------------
 1 file changed, 88 insertions(+), 156 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 2561817622f6..a6ee6c493424 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -290,6 +290,27 @@ parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
 	return 0;
 }
 
+static void __attribute__((format(printf, 1, 2)))
+wsim_err(const char *fmt, ...)
+{
+	va_list ap;
+
+	if (!verbose)
+		return;
+
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	va_end(ap);
+}
+
+#define check_arg(cond, fmt, ...) \
+{ \
+	if (cond) { \
+		wsim_err(fmt, __VA_ARGS__); \
+		return NULL; \
+	} \
+}
+
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 {
@@ -320,14 +341,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp <= 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid delay at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp <= 0,
+						  "Invalid delay at step %u!\n",
+						  nr_steps);
 					step.type = DELAY;
 					step.delay = tmp;
 					goto add_step;
@@ -336,14 +352,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp <= 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid period at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp <= 0,
+						  "Invalid period at step %u!\n",
+						  nr_steps);
 					step.type = PERIOD;
 					step.period = tmp;
 					goto add_step;
@@ -353,25 +364,17 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				while ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp <= 0 && nr == 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid context at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
-					if (nr == 0) {
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 1,
+						  "Invalid priority format at step %u!\n",
+						  nr_steps);
+
+					if (nr == 0)
 						step.context = tmp;
-					} else if (nr == 1) {
+					else
 						step.priority = tmp;
-					} else {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid priority format at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
 
 					nr++;
 				}
@@ -382,15 +385,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp >= 0 ||
-					    ((int)nr_steps + tmp) < 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid sync target at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp >= 0 ||
+						  ((int)nr_steps + tmp) < 0,
+						  "Invalid sync target at step %u!\n",
+						  nr_steps);
 					step.type = SYNC;
 					step.target = tmp;
 					goto add_step;
@@ -399,14 +397,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp < 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid throttle at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp < 0,
+						  "Invalid throttle at step %u!\n",
+						  nr_steps);
 					step.type = THROTTLE;
 					step.throttle = tmp;
 					goto add_step;
@@ -415,14 +408,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp < 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid qd throttle at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp < 0,
+						  "Invalid qd throttle at step %u!\n",
+						  nr_steps);
 					step.type = QD_THROTTLE;
 					step.throttle = tmp;
 					goto add_step;
@@ -431,14 +419,9 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				if ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp >= 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid sw fence signal at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
+					check_arg(tmp >= 0,
+						  "Invalid sw fence signal at step %u!\n",
+						  nr_steps);
 					step.type = SW_FENCE_SIGNAL;
 					step.target = tmp;
 					goto add_step;
@@ -451,31 +434,20 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				while ((field = strtok_r(fstart, ".", &fctx)) !=
 				    NULL) {
 					tmp = atoi(field);
-					if (tmp <= 0 && nr == 0) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid context at step %u!\n",
-								nr_steps);
-						return NULL;
-					} else if (tmp < 0 && nr == 1) {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid preemption period at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
-
-					if (nr == 0) {
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr == 1 && tmp < 0,
+						  "Invalid preemption period at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 1,
+						  "Invalid preemption format at step %u!\n",
+						  nr_steps);
+
+					if (nr == 0)
 						step.context = tmp;
-					} else if (nr == 1) {
+					else
 						step.period = tmp;
-					} else {
-						if (verbose)
-							fprintf(stderr,
-								"Invalid preemption format at step %u!\n",
-								nr_steps);
-						return NULL;
-					}
 
 					nr++;
 				}
@@ -485,13 +457,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			}
 
 			tmp = atoi(field);
-			if (tmp < 0) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid ctx id at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(tmp < 0, "Invalid ctx id at step %u!\n",
+				  nr_steps);
 			step.context = tmp;
 
 			valid++;
@@ -512,13 +479,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				}
 			}
 
-			if (old_valid == valid) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid engine id at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(old_valid == valid,
+				  "Invalid engine id at step %u!\n", nr_steps);
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
@@ -528,25 +490,19 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			fstart = NULL;
 
 			tmpl = strtol(field, &sep, 10);
-			if (tmpl <= 0 || tmpl == LONG_MIN || tmpl == LONG_MAX) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid duration at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
+				  tmpl == LONG_MAX,
+				  "Invalid duration at step %u!\n", nr_steps);
 			step.duration.min = tmpl;
 
 			if (sep && *sep == '-') {
 				tmpl = strtol(sep + 1, NULL, 10);
-				if (tmpl <= 0 || tmpl <= step.duration.min ||
-				    tmpl == LONG_MIN || tmpl == LONG_MAX) {
-					if (verbose)
-						fprintf(stderr,
-							"Invalid duration range at step %u!\n",
-							nr_steps);
-					return NULL;
-				}
+				check_arg(tmpl <= 0 ||
+					  tmpl <= step.duration.min ||
+					  tmpl == LONG_MIN ||
+					  tmpl == LONG_MAX,
+					  "Invalid duration range at step %u!\n",
+					  nr_steps);
 				step.duration.max = tmpl;
 			} else {
 				step.duration.max = step.duration.min;
@@ -559,13 +515,8 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			fstart = NULL;
 
 			tmp = parse_dependencies(nr_steps, &step, field);
-			if (tmp < 0) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid dependency at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(tmp < 0,
+				  "Invalid dependency at step %u!\n", nr_steps);
 
 			valid++;
 		}
@@ -573,25 +524,16 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
 			fstart = NULL;
 
-			if (strlen(field) != 1 ||
-			    (field[0] != '0' && field[0] != '1')) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid wait boolean at step %u!\n",
-						nr_steps);
-				return NULL;
-			}
+			check_arg(strlen(field) != 1 ||
+				  (field[0] != '0' && field[0] != '1'),
+				  "Invalid wait boolean at step %u!\n",
+				  nr_steps);
 			step.sync = field[0] - '0';
 
 			valid++;
 		}
 
-		if (valid != 5) {
-			if (verbose)
-				fprintf(stderr, "Invalid record at step %u!\n",
-					nr_steps);
-			return NULL;
-		}
+		check_arg(valid != 5, "Invalid record at step %u!\n", nr_steps);
 
 		step.type = BATCH;
 
@@ -636,15 +578,10 @@ add_step:
 	for (i = 0; i < nr_steps; i++) {
 		for (j = 0; j < steps[i].fence_deps.nr; j++) {
 			tmp = steps[i].idx + steps[i].fence_deps.list[j];
-			if (tmp < 0 || tmp >= i ||
-			    (steps[tmp].type != BATCH &&
-			     steps[tmp].type != SW_FENCE)) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid dependency target %u!\n",
-						i);
-				return NULL;
-			}
+			check_arg(tmp < 0 || tmp >= i ||
+				  (steps[tmp].type != BATCH &&
+				   steps[tmp].type != SW_FENCE),
+				  "Invalid dependency target %u!\n", i);
 			steps[tmp].emit_fence = -1;
 		}
 	}
@@ -653,14 +590,9 @@ add_step:
 	for (i = 0; i < nr_steps; i++) {
 		if (steps[i].type == SW_FENCE_SIGNAL) {
 			tmp = steps[i].idx + steps[i].target;
-			if (tmp < 0 || tmp >= i ||
-			    steps[tmp].type != SW_FENCE) {
-				if (verbose)
-					fprintf(stderr,
-						"Invalid sw fence target %u!\n",
-						i);
-				return NULL;
-			}
+			check_arg(tmp < 0 || tmp >= i ||
+				  steps[tmp].type != SW_FENCE,
+				  "Invalid sw fence target %u!\n", i);
 		}
 	}
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 08/17] gem_wsim: More wsim_err
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A few more opportunities to compact the code by using the error logging
helper.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 54 ++++++++++++-------------------------------
 1 file changed, 15 insertions(+), 39 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index a6ee6c493424..0010f46c357d 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -2366,9 +2366,7 @@ int main(int argc, char **argv)
 		switch (c) {
 		case 'W':
 			if (master_workload >= 0) {
-				if (verbose)
-					fprintf(stderr,
-						"Only one master workload can be given!\n");
+				wsim_err("Only one master workload can be given!\n");
 				return 1;
 			}
 			master_workload = nr_w_args;
@@ -2381,9 +2379,7 @@ int main(int argc, char **argv)
 			break;
 		case 'a':
 			if (append_workload_arg) {
-				if (verbose)
-					fprintf(stderr,
-						"Only one append workload can be given!\n");
+				wsim_err("Only one append workload can be given!\n");
 				return 1;
 			}
 			append_workload_arg = optarg;
@@ -2444,10 +2440,8 @@ int main(int argc, char **argv)
 			}
 
 			if (!balancer) {
-				if (verbose)
-					fprintf(stderr,
-						"Unknown balancing mode '%s'!\n",
-						optarg);
+				wsim_err("Unknown balancing mode '%s'!\n",
+					 optarg);
 				return 1;
 			}
 			break;
@@ -2460,14 +2454,12 @@ int main(int argc, char **argv)
 	}
 
 	if ((flags & HEARTBEAT) && !(flags & SEQNO)) {
-		if (verbose)
-			fprintf(stderr, "Heartbeat needs a seqno based balancer!\n");
+		wsim_err("Heartbeat needs a seqno based balancer!\n");
 		return 1;
 	}
 
 	if ((flags & VCS2REMAP) && (flags & I915)) {
-		if (verbose)
-			fprintf(stderr, "VCS remapping not supported with i915 balancing!\n");
+		wsim_err("VCS remapping not supported with i915 balancing!\n");
 		return 1;
 	}
 
@@ -2484,31 +2476,24 @@ int main(int argc, char **argv)
 	}
 
 	if (!nr_w_args) {
-		if (verbose)
-			fprintf(stderr, "No workload descriptor(s)!\n");
+		wsim_err("No workload descriptor(s)!\n");
 		return 1;
 	}
 
 	if (nr_w_args > 1 && clients > 1) {
-		if (verbose)
-			fprintf(stderr,
-				"Cloned clients cannot be combined with multiple workloads!\n");
+		wsim_err("Cloned clients cannot be combined with multiple workloads!\n");
 		return 1;
 	}
 
 	if ((flags & GLOBAL_BALANCE) && !balancer) {
-		if (verbose)
-			fprintf(stderr,
-				"Balancer not specified in global balancing mode!\n");
+		wsim_err("Balancer not specified in global balancing mode!\n");
 		return 1;
 	}
 
 	if (append_workload_arg) {
 		append_workload_arg = load_workload_descriptor(append_workload_arg);
 		if (!append_workload_arg) {
-			if (verbose)
-				fprintf(stderr,
-					"Failed to load append workload descriptor!\n");
+			wsim_err("Failed to load append workload descriptor!\n");
 			return 1;
 		}
 	}
@@ -2517,9 +2502,7 @@ int main(int argc, char **argv)
 		struct w_arg arg = { NULL, append_workload_arg, 0 };
 		app_w = parse_workload(&arg, flags, NULL);
 		if (!app_w) {
-			if (verbose)
-				fprintf(stderr,
-					"Failed to parse append workload!\n");
+			wsim_err("Failed to parse append workload!\n");
 			return 1;
 		}
 	}
@@ -2531,18 +2514,13 @@ int main(int argc, char **argv)
 		w_args[i].desc = load_workload_descriptor(w_args[i].filename);
 
 		if (!w_args[i].desc) {
-			if (verbose)
-				fprintf(stderr,
-					"Failed to load workload descriptor %u!\n",
-					i);
+			wsim_err("Failed to load workload descriptor %u!\n", i);
 			return 1;
 		}
 
 		wrk[i] = parse_workload(&w_args[i], flags, app_w);
 		if (!wrk[i]) {
-			if (verbose)
-				fprintf(stderr,
-					"Failed to parse workload %u!\n", i);
+			wsim_err("Failed to parse workload %u!\n", i);
 			return 1;
 		}
 	}
@@ -2602,10 +2580,8 @@ int main(int argc, char **argv)
 		if (balancer && balancer->init) {
 			int ret = balancer->init(balancer, w[i]);
 			if (ret) {
-				if (verbose)
-					fprintf(stderr,
-						"Failed to initialize balancing! (%u=%d)\n",
-						i, ret);
+				wsim_err("Failed to initialize balancing! (%u=%d)\n",
+					 i, ret);
 				return 1;
 			}
 		}
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 08/17] gem_wsim: More wsim_err
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A few more opportunities to compact the code by using the error logging
helper.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 54 ++++++++++++-------------------------------
 1 file changed, 15 insertions(+), 39 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index a6ee6c493424..0010f46c357d 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -2366,9 +2366,7 @@ int main(int argc, char **argv)
 		switch (c) {
 		case 'W':
 			if (master_workload >= 0) {
-				if (verbose)
-					fprintf(stderr,
-						"Only one master workload can be given!\n");
+				wsim_err("Only one master workload can be given!\n");
 				return 1;
 			}
 			master_workload = nr_w_args;
@@ -2381,9 +2379,7 @@ int main(int argc, char **argv)
 			break;
 		case 'a':
 			if (append_workload_arg) {
-				if (verbose)
-					fprintf(stderr,
-						"Only one append workload can be given!\n");
+				wsim_err("Only one append workload can be given!\n");
 				return 1;
 			}
 			append_workload_arg = optarg;
@@ -2444,10 +2440,8 @@ int main(int argc, char **argv)
 			}
 
 			if (!balancer) {
-				if (verbose)
-					fprintf(stderr,
-						"Unknown balancing mode '%s'!\n",
-						optarg);
+				wsim_err("Unknown balancing mode '%s'!\n",
+					 optarg);
 				return 1;
 			}
 			break;
@@ -2460,14 +2454,12 @@ int main(int argc, char **argv)
 	}
 
 	if ((flags & HEARTBEAT) && !(flags & SEQNO)) {
-		if (verbose)
-			fprintf(stderr, "Heartbeat needs a seqno based balancer!\n");
+		wsim_err("Heartbeat needs a seqno based balancer!\n");
 		return 1;
 	}
 
 	if ((flags & VCS2REMAP) && (flags & I915)) {
-		if (verbose)
-			fprintf(stderr, "VCS remapping not supported with i915 balancing!\n");
+		wsim_err("VCS remapping not supported with i915 balancing!\n");
 		return 1;
 	}
 
@@ -2484,31 +2476,24 @@ int main(int argc, char **argv)
 	}
 
 	if (!nr_w_args) {
-		if (verbose)
-			fprintf(stderr, "No workload descriptor(s)!\n");
+		wsim_err("No workload descriptor(s)!\n");
 		return 1;
 	}
 
 	if (nr_w_args > 1 && clients > 1) {
-		if (verbose)
-			fprintf(stderr,
-				"Cloned clients cannot be combined with multiple workloads!\n");
+		wsim_err("Cloned clients cannot be combined with multiple workloads!\n");
 		return 1;
 	}
 
 	if ((flags & GLOBAL_BALANCE) && !balancer) {
-		if (verbose)
-			fprintf(stderr,
-				"Balancer not specified in global balancing mode!\n");
+		wsim_err("Balancer not specified in global balancing mode!\n");
 		return 1;
 	}
 
 	if (append_workload_arg) {
 		append_workload_arg = load_workload_descriptor(append_workload_arg);
 		if (!append_workload_arg) {
-			if (verbose)
-				fprintf(stderr,
-					"Failed to load append workload descriptor!\n");
+			wsim_err("Failed to load append workload descriptor!\n");
 			return 1;
 		}
 	}
@@ -2517,9 +2502,7 @@ int main(int argc, char **argv)
 		struct w_arg arg = { NULL, append_workload_arg, 0 };
 		app_w = parse_workload(&arg, flags, NULL);
 		if (!app_w) {
-			if (verbose)
-				fprintf(stderr,
-					"Failed to parse append workload!\n");
+			wsim_err("Failed to parse append workload!\n");
 			return 1;
 		}
 	}
@@ -2531,18 +2514,13 @@ int main(int argc, char **argv)
 		w_args[i].desc = load_workload_descriptor(w_args[i].filename);
 
 		if (!w_args[i].desc) {
-			if (verbose)
-				fprintf(stderr,
-					"Failed to load workload descriptor %u!\n",
-					i);
+			wsim_err("Failed to load workload descriptor %u!\n", i);
 			return 1;
 		}
 
 		wrk[i] = parse_workload(&w_args[i], flags, app_w);
 		if (!wrk[i]) {
-			if (verbose)
-				fprintf(stderr,
-					"Failed to parse workload %u!\n", i);
+			wsim_err("Failed to parse workload %u!\n", i);
 			return 1;
 		}
 	}
@@ -2602,10 +2580,8 @@ int main(int argc, char **argv)
 		if (balancer && balancer->init) {
 			int ret = balancer->init(balancer, w[i]);
 			if (ret) {
-				if (verbose)
-					fprintf(stderr,
-						"Failed to initialize balancing! (%u=%d)\n",
-						i, ret);
+				wsim_err("Failed to initialize balancing! (%u=%d)\n",
+					 i, ret);
 				return 1;
 			}
 		}
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 09/17] gem_wsim: Submit fence support
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add support for submit fences in a way similar to how normal input fences
are handled. Eg:

  1.RCS.500-1000.0.0
  1.VCS1.3000.s-1.0
  1.VCS2.3000.s-2.0

Submit fences are signalled when the originating request enters the
submission backend.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 20 ++++++++++++++++----
 benchmarks/wsim/README | 17 +++++++++++++++++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 0010f46c357d..a77a322ee309 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -86,6 +86,7 @@ enum w_type
 struct deps
 {
 	int nr;
+	bool submit_fence;
 	int *list;
 };
 
@@ -254,17 +255,23 @@ parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
 		   w->data_deps.list == w->fence_deps.list);
 
 	while ((token = strtok_r(tstart, "/", &tctx)) != NULL) {
+		bool submit_fence = false;
 		char *str = token;
 		struct deps *deps;
 		int dep;
 
 		tstart = NULL;
 
-		if (strlen(token) > 1 && token[0] == 'f') {
+		if (str[0] == '-' || (str[0] >= '0' && str[0] <= '9')) {
+			deps = &w->data_deps;
+		} else {
+			if (str[0] == 's')
+				submit_fence = true;
+			else if (str[0] != 'f')
+				return -1;
+
 			deps = &w->fence_deps;
 			str++;
-		} else {
-			deps = &w->data_deps;
 		}
 
 		dep = atoi(str);
@@ -282,6 +289,7 @@ parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
 					     sizeof(*deps->list) * deps->nr);
 			igt_assert(deps->list);
 			deps->list[deps->nr - 1] = dep;
+			deps->submit_fence = submit_fence;
 		}
 	}
 
@@ -1891,7 +1899,11 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 		igt_assert(tgt >= 0 && tgt < w->idx);
 		igt_assert(wrk->steps[tgt].emit_fence > 0);
 
-		w->eb.flags |= I915_EXEC_FENCE_IN;
+		if (w->fence_deps.submit_fence)
+			w->eb.flags |= I915_EXEC_FENCE_SUBMIT;
+		else
+			w->eb.flags |= I915_EXEC_FENCE_IN;
+
 		w->eb.rsvd2 = wrk->steps[tgt].emit_fence;
 	}
 
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 205cd6c93afb..4786f116b4ac 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -114,6 +114,23 @@ runnable. When the second RCS batch completes the standalone fence is signaled
 which allows the two VCS batches to be executed. Finally we wait until the both
 VCS batches have completed before starting the (optional) next iteration.
 
+Submit fences
+-------------
+
+Submit fences are a type of input fence which are signalled when the originating
+batch buffer is submitted to the GPU. (In contrary to normal sync fences, which
+are signalled when completed.)
+
+Submit fences have the identical syntax as the sync fences with the lower-case
+'s' being used to select them. Eg:
+
+  1.RCS.500-1000.0.0
+  1.VCS1.3000.s-1.0
+  1.VCS2.3000.s-2.0
+
+Here VCS1 and VCS2 batches will only be submitted for executing once the RCS
+batch enters the GPU.
+
 Context priority
 ----------------
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 09/17] gem_wsim: Submit fence support
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add support for submit fences in a way similar to how normal input fences
are handled. Eg:

  1.RCS.500-1000.0.0
  1.VCS1.3000.s-1.0
  1.VCS2.3000.s-2.0

Submit fences are signalled when the originating request enters the
submission backend.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 20 ++++++++++++++++----
 benchmarks/wsim/README | 17 +++++++++++++++++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 0010f46c357d..a77a322ee309 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -86,6 +86,7 @@ enum w_type
 struct deps
 {
 	int nr;
+	bool submit_fence;
 	int *list;
 };
 
@@ -254,17 +255,23 @@ parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
 		   w->data_deps.list == w->fence_deps.list);
 
 	while ((token = strtok_r(tstart, "/", &tctx)) != NULL) {
+		bool submit_fence = false;
 		char *str = token;
 		struct deps *deps;
 		int dep;
 
 		tstart = NULL;
 
-		if (strlen(token) > 1 && token[0] == 'f') {
+		if (str[0] == '-' || (str[0] >= '0' && str[0] <= '9')) {
+			deps = &w->data_deps;
+		} else {
+			if (str[0] == 's')
+				submit_fence = true;
+			else if (str[0] != 'f')
+				return -1;
+
 			deps = &w->fence_deps;
 			str++;
-		} else {
-			deps = &w->data_deps;
 		}
 
 		dep = atoi(str);
@@ -282,6 +289,7 @@ parse_dependencies(unsigned int nr_steps, struct w_step *w, char *_desc)
 					     sizeof(*deps->list) * deps->nr);
 			igt_assert(deps->list);
 			deps->list[deps->nr - 1] = dep;
+			deps->submit_fence = submit_fence;
 		}
 	}
 
@@ -1891,7 +1899,11 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 		igt_assert(tgt >= 0 && tgt < w->idx);
 		igt_assert(wrk->steps[tgt].emit_fence > 0);
 
-		w->eb.flags |= I915_EXEC_FENCE_IN;
+		if (w->fence_deps.submit_fence)
+			w->eb.flags |= I915_EXEC_FENCE_SUBMIT;
+		else
+			w->eb.flags |= I915_EXEC_FENCE_IN;
+
 		w->eb.rsvd2 = wrk->steps[tgt].emit_fence;
 	}
 
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 205cd6c93afb..4786f116b4ac 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -114,6 +114,23 @@ runnable. When the second RCS batch completes the standalone fence is signaled
 which allows the two VCS batches to be executed. Finally we wait until the both
 VCS batches have completed before starting the (optional) next iteration.
 
+Submit fences
+-------------
+
+Submit fences are a type of input fence which are signalled when the originating
+batch buffer is submitted to the GPU. (In contrary to normal sync fences, which
+are signalled when completed.)
+
+Submit fences have the identical syntax as the sync fences with the lower-case
+'s' being used to select them. Eg:
+
+  1.RCS.500-1000.0.0
+  1.VCS1.3000.s-1.0
+  1.VCS2.3000.s-2.0
+
+Here VCS1 and VCS2 batches will only be submitted for executing once the RCS
+batch enters the GPU.
+
 Context priority
 ----------------
 
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 10/17] gem_wsim: Extract str to engine lookup
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index a77a322ee309..17325d2ceaf6 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -319,6 +319,18 @@ wsim_err(const char *fmt, ...)
 	} \
 }
 
+static int str_to_engine(const char *str)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(ring_str_map); i++) {
+		if (!strcasecmp(str, ring_str_map[i]))
+			return i;
+	}
+
+	return -1;
+}
+
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 {
@@ -473,22 +485,18 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
-			unsigned int old_valid = valid;
-
 			fstart = NULL;
 
-			for (i = 0; i < ARRAY_SIZE(ring_str_map); i++) {
-				if (!strcasecmp(field, ring_str_map[i])) {
-					step.engine = i;
-					if (step.engine == BCS)
-						bcs_used = true;
-					valid++;
-					break;
-				}
-			}
-
-			check_arg(old_valid == valid,
+			i = str_to_engine(field);
+			check_arg(i < 0,
 				  "Invalid engine id at step %u!\n", nr_steps);
+			if (i >= 0)
+				valid++;
+
+			step.engine = i;
+
+			if (step.engine == BCS)
+				bcs_used = true;
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Intel-gfx] [PATCH i-g-t 10/17] gem_wsim: Extract str to engine lookup
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index a77a322ee309..17325d2ceaf6 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -319,6 +319,18 @@ wsim_err(const char *fmt, ...)
 	} \
 }
 
+static int str_to_engine(const char *str)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(ring_str_map); i++) {
+		if (!strcasecmp(str, ring_str_map[i]))
+			return i;
+	}
+
+	return -1;
+}
+
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 {
@@ -473,22 +485,18 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
-			unsigned int old_valid = valid;
-
 			fstart = NULL;
 
-			for (i = 0; i < ARRAY_SIZE(ring_str_map); i++) {
-				if (!strcasecmp(field, ring_str_map[i])) {
-					step.engine = i;
-					if (step.engine == BCS)
-						bcs_used = true;
-					valid++;
-					break;
-				}
-			}
-
-			check_arg(old_valid == valid,
+			i = str_to_engine(field);
+			check_arg(i < 0,
 				  "Invalid engine id at step %u!\n", nr_steps);
+			if (i >= 0)
+				valid++;
+
+			step.engine = i;
+
+			if (step.engine == BCS)
+				bcs_used = true;
 		}
 
 		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 11/17] gem_wsim: Engine map support
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Support new i915 uAPI for configuring contexts with engine maps.

Please refer to the README file for more detailed explanation.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 212 ++++++++++++++++++++++++++++++++++-------
 benchmarks/wsim/README |  17 +++-
 2 files changed, 192 insertions(+), 37 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 17325d2ceaf6..fbec23ad1753 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -56,6 +56,7 @@
 #include "ewma.h"
 
 enum intel_engine_id {
+	DEFAULT,
 	RCS,
 	BCS,
 	VCS,
@@ -80,7 +81,8 @@ enum w_type
 	SW_FENCE,
 	SW_FENCE_SIGNAL,
 	CTX_PRIORITY,
-	PREEMPTION
+	PREEMPTION,
+	ENGINE_MAP
 };
 
 struct deps
@@ -114,6 +116,10 @@ struct w_step
 		int throttle;
 		int fence_signal;
 		int priority;
+		struct {
+			unsigned int engine_map_count;
+			enum intel_engine_id *engine_map;
+		};
 	};
 
 	/* Implementation details */
@@ -143,6 +149,8 @@ DECLARE_EWMA(uint64_t, rt, 4, 2)
 struct ctx {
 	uint32_t id;
 	int priority;
+	unsigned int engine_map_count;
+	enum intel_engine_id *engine_map;
 	bool targets_instance;
 	bool wants_balance;
 	unsigned int static_vcs;
@@ -201,10 +209,10 @@ struct workload
 		int fd;
 		bool first;
 		unsigned int num_engines;
-		unsigned int engine_map[5];
+		unsigned int engine_map[NUM_ENGINES];
 		uint64_t t_prev;
-		uint64_t prev[5];
-		double busy[5];
+		uint64_t prev[NUM_ENGINES];
+		double busy[NUM_ENGINES];
 	} busy_balancer;
 };
 
@@ -235,6 +243,7 @@ static int fd;
 #define REG(x) (volatile uint32_t *)((volatile char *)igt_global_mmio + x)
 
 static const char *ring_str_map[NUM_ENGINES] = {
+	[DEFAULT] = "DEFAULT",
 	[RCS] = "RCS",
 	[BCS] = "BCS",
 	[VCS] = "VCS",
@@ -331,6 +340,37 @@ static int str_to_engine(const char *str)
 	return -1;
 }
 
+static int parse_engine_map(struct w_step *step, const char *_str)
+{
+	char *token, *tctx = NULL, *tstart = (char *)_str;
+
+	while ((token = strtok_r(tstart, "|", &tctx))) {
+		enum intel_engine_id engine;
+
+		tstart = NULL;
+
+		if (!strcmp(token, "DEFAULT"))
+			return -1;
+		else if (!strcmp(token, "VCS"))
+			return -1;
+
+		engine = str_to_engine(token);
+		if ((int)engine < 0)
+			return -1;
+
+		if (engine != VCS1 && engine != VCS2)
+			return -1; /* TODO */
+
+		step->engine_map_count++;
+		step->engine_map = realloc(step->engine_map,
+					   step->engine_map_count *
+					   sizeof(step->engine_map[0]));
+		step->engine_map[step->engine_map_count - 1] = engine;
+	}
+
+	return 0;
+}
+
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 {
@@ -449,6 +489,33 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			} else if (!strcmp(field, "f")) {
 				step.type = SW_FENCE;
 				goto add_step;
+			} else if (!strcmp(field, "M")) {
+				unsigned int nr = 0;
+				while ((field = strtok_r(fstart, ".", &fctx)) !=
+				    NULL) {
+					tmp = atoi(field);
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 1,
+						  "Invalid engine map format at step %u!\n",
+						  nr_steps);
+
+					if (nr == 0) {
+						step.context = tmp;
+					} else {
+						tmp = parse_engine_map(&step,
+								       field);
+						check_arg(tmp < 0,
+							  "Invalid engine map list at step %u!\n",
+							  nr_steps);
+					}
+
+					nr++;
+				}
+
+				step.type = ENGINE_MAP;
+				goto add_step;
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
 				while ((field = strtok_r(fstart, ".", &fctx)) !=
@@ -490,9 +557,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			i = str_to_engine(field);
 			check_arg(i < 0,
 				  "Invalid engine id at step %u!\n", nr_steps);
-			if (i >= 0)
-				valid++;
-
+			valid++;
 			step.engine = i;
 
 			if (step.engine == BCS)
@@ -769,6 +834,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 }
 
 static const unsigned int eb_engine_map[NUM_ENGINES] = {
+	[DEFAULT] = I915_EXEC_DEFAULT,
 	[RCS] = I915_EXEC_RENDER,
 	[BCS] = I915_EXEC_BLT,
 	[VCS] = I915_EXEC_BSD,
@@ -785,18 +851,42 @@ eb_set_engine(struct drm_i915_gem_execbuffer2 *eb,
 	if (engine == VCS2 && (flags & VCS2REMAP))
 		engine = BCS;
 
-	if ((flags & I915) && engine == VCS) {
+	if ((flags & I915) && engine == VCS)
 		eb->flags = 0;
-	} else {
+	else
 		eb->flags = eb_engine_map[engine];
+}
+
+static unsigned int
+find_engine_in_map(struct ctx *ctx, enum intel_engine_id engine)
+{
+	unsigned int i;
+
+	for (i = 0; i < ctx->engine_map_count; i++) {
+		if (ctx->engine_map[i] == engine)
+			return i + 1;
 	}
+
+	igt_assert(0);
+	return 0;
+}
+
+static struct ctx *
+__get_ctx(struct workload *wrk, struct w_step *w)
+{
+	return &wrk->ctx_list[w->context * 2];
 }
 
 static void
-eb_update_flags(struct w_step *w, enum intel_engine_id engine,
-		unsigned int flags)
+eb_update_flags(struct workload *wrk, struct w_step *w,
+		enum intel_engine_id engine, unsigned int flags)
 {
-	eb_set_engine(&w->eb, engine, flags);
+	struct ctx *ctx = __get_ctx(wrk, w);
+
+	if (ctx->engine_map)
+		w->eb.flags = find_engine_in_map(ctx, engine);
+	else
+		eb_set_engine(&w->eb, engine, flags);
 
 	w->eb.flags |= I915_EXEC_HANDLE_LUT;
 	w->eb.flags |= I915_EXEC_NO_RELOC;
@@ -815,12 +905,6 @@ get_status_objects(struct workload *wrk)
 		return wrk->status_object;
 }
 
-static struct ctx *
-__get_ctx(struct workload *wrk, struct w_step *w)
-{
-	return &wrk->ctx_list[w->context * 2];
-}
-
 static uint32_t
 get_ctxid(struct workload *wrk, struct w_step *w)
 {
@@ -890,7 +974,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 		engine = VCS2;
 	else if (flags & SWAPVCS && engine == VCS2)
 		engine = VCS1;
-	eb_update_flags(w, engine, flags);
+	eb_update_flags(wrk, w, engine, flags);
 #ifdef DEBUG
 	printf("%u: %u:|", w->idx, w->eb.buffer_count);
 	for (i = 0; i <= j; i++)
@@ -913,7 +997,7 @@ static void __ctx_set_prio(uint32_t ctx_id, unsigned int prio)
 		gem_context_set_param(fd, &param);
 }
 
-static void
+static int
 prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 {
 	unsigned int ctx_vcs = 0;
@@ -974,30 +1058,53 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	/*
 	 * Identify if contexts target specific engine instances and if they
 	 * want to be balanced.
+	 *
+	 * Transfer over engine map configuration from the workload step.
 	 */
 	for (j = 0; j < wrk->nr_ctxs; j += 2) {
 		bool targets = false;
 		bool balance = false;
 
 		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-			if (w->type != BATCH)
-				continue;
-
 			if (w->context != (j / 2))
 				continue;
 
-			if (w->engine == VCS)
-				balance = true;
-			else
-				targets = true;
+			if (w->type == BATCH) {
+				if (w->engine == VCS)
+					balance = true;
+				else
+					targets = true;
+			} else if (w->type == ENGINE_MAP) {
+				wrk->ctx_list[j].engine_map = w->engine_map;
+				wrk->ctx_list[j].engine_map_count =
+					w->engine_map_count;
+			}
 		}
 
-		if (flags & I915) {
-			wrk->ctx_list[j].targets_instance = targets;
+		wrk->ctx_list[j].targets_instance = targets;
+		if (flags & I915)
 			wrk->ctx_list[j].wants_balance = balance;
+	}
+
+	/*
+	 * Ensure VCS is not allowed with engine map contexts.
+	 */
+	for (j = 0; j < wrk->nr_ctxs; j += 2) {
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->context != (j / 2))
+				continue;
+
+			if (w->type != BATCH)
+				continue;
+
+			if (wrk->ctx_list[j].engine_map && w->engine == VCS) {
+				wsim_err("Batches targetting engine maps must use explicit engines!\n");
+				return -1;
+			}
 		}
 	}
 
+
 	/*
 	 * Create and configure contexts.
 	 */
@@ -1008,7 +1115,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		if (ctx->id)
 			continue;
 
-		if (flags & I915) {
+		if ((flags & I915) || ctx->engine_map) {
 			struct drm_i915_gem_context_create_v2 args = { };
 
 			/* Find existing context to share ppgtt with. */
@@ -1022,7 +1129,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				break;
 			}
 
-			if (!ctx->targets_instance)
+			if ((!ctx->engine_map && !ctx->targets_instance))
 				args.flags |= I915_GEM_CONTEXT_SINGLE_TIMELINE;
 
 			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
@@ -1053,7 +1160,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		 * both want to target specific engines and be balanced by i915?
 		 */
 		if ((flags & I915) && ctx->wants_balance &&
-		    ctx->targets_instance) {
+		    ctx->targets_instance && !ctx->engine_map) {
 			struct drm_i915_gem_context_create_v2 args = {};
 
 			igt_assert(share_ctx);
@@ -1071,7 +1178,33 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			__ctx_set_prio(ctx_id, wrk->prio);
 		}
 
-		if (ctx->wants_balance) {
+		if (ctx->engine_map) {
+			struct local_i915_context_param_engines {
+				__u64 extensions;
+
+				struct {
+					__u16 class; /* see enum drm_i915_gem_engine_class */
+					__u16 instance;
+				} engines[ctx->engine_map_count];
+			} __attribute__((packed)) set_engines;
+			struct drm_i915_gem_context_param param = {
+				.ctx_id = ctx_id,
+				.param = I915_CONTEXT_PARAM_ENGINES,
+				.size = sizeof(set_engines),
+				.value = to_user_pointer(&set_engines),
+			};
+
+			set_engines.extensions = 0;
+
+			for (j = 0; j < ctx->engine_map_count; j++) {
+				set_engines.engines[j].class =
+					I915_ENGINE_CLASS_VIDEO; /* FIXME */
+				set_engines.engines[j].instance =
+					ctx->engine_map[j] - VCS1; /* FIXME */
+			}
+
+			gem_context_set_param(fd, &param);
+		} else if (ctx->wants_balance) {
 			struct i915_context_engines_load_balance load_balance =
 				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
 				  .engines_mask = -1,
@@ -1151,6 +1284,8 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 
 		alloc_step_batch(wrk, w, _flags);
 	}
+
+	return 0;
 }
 
 static double elapsed(const struct timespec *start, const struct timespec *end)
@@ -1888,7 +2023,7 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 	uint32_t seqno = new_seqno(wrk, engine);
 	unsigned int i;
 
-	eb_update_flags(w, engine, flags);
+	eb_update_flags(wrk, w, engine, flags);
 
 	if (flags & SEQNO)
 		update_bb_seqno(w, engine, seqno);
@@ -2037,7 +2172,8 @@ static void *run_workload(void *data)
 								    w->priority;
 				}
 				continue;
-			} else if (w->type == PREEMPTION) {
+			} else if (w->type == PREEMPTION ||
+				   w->type == ENGINE_MAP) {
 				continue;
 			}
 
@@ -2595,7 +2731,11 @@ int main(int argc, char **argv)
 		w[i]->print_stats = verbose > 1 ||
 				    (verbose > 0 && master_workload == i);
 
-		prepare_workload(i, w[i], flags_);
+		if (prepare_workload(i, w[i], flags_)) {
+			wsim_err("Failed to prepare workload %u!\n", i);
+			return 1;
+		}
+
 
 		if (balancer && balancer->init) {
 			int ret = balancer->init(balancer, w[i]);
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 4786f116b4ac..20e3e358cd2e 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -3,6 +3,7 @@ Workload descriptor format
 
 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>].<int <= 0>[/<int <= 0>][...].<0|1>,...
+M.<uint>.<str>[|<str>]...
 P|X.<uint>.<int>
 d|p|s|t|q|a.<int>,...
 f
@@ -23,10 +24,11 @@ Additional workload steps are also supported:
  'q' - Throttle to n max queue depth.
  'f' - Create a sync fence.
  'a' - Advance the previously created sync fence.
+ 'M' - Set up engine map.
  'P' - Context priority.
  'X' - Context preemption control.
 
-Engine ids: RCS, BCS, VCS, VCS1, VCS2, VECS
+Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
 
 Example (leading spaces must not be present in the actual file):
 ----------------------------------------------------------------
@@ -161,3 +163,16 @@ The same context is then marked to have batches which can be preempted every
 
 Same as with context priority, context preemption commands are valid until
 optionally overriden by another preemption control change on the same context.
+
+Engine maps
+-----------
+
+Engine maps are a per context feature which changes the way engine selection is
+done in the driver.
+
+Example:
+
+  1.M.VCS1|VCS2
+
+This sets up context 1 with an engine map containing VCS1 and VCS2 engine.
+Submission to this context can now only reference these two engines.
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 11/17] gem_wsim: Engine map support
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Support new i915 uAPI for configuring contexts with engine maps.

Please refer to the README file for more detailed explanation.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 212 ++++++++++++++++++++++++++++++++++-------
 benchmarks/wsim/README |  17 +++-
 2 files changed, 192 insertions(+), 37 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 17325d2ceaf6..fbec23ad1753 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -56,6 +56,7 @@
 #include "ewma.h"
 
 enum intel_engine_id {
+	DEFAULT,
 	RCS,
 	BCS,
 	VCS,
@@ -80,7 +81,8 @@ enum w_type
 	SW_FENCE,
 	SW_FENCE_SIGNAL,
 	CTX_PRIORITY,
-	PREEMPTION
+	PREEMPTION,
+	ENGINE_MAP
 };
 
 struct deps
@@ -114,6 +116,10 @@ struct w_step
 		int throttle;
 		int fence_signal;
 		int priority;
+		struct {
+			unsigned int engine_map_count;
+			enum intel_engine_id *engine_map;
+		};
 	};
 
 	/* Implementation details */
@@ -143,6 +149,8 @@ DECLARE_EWMA(uint64_t, rt, 4, 2)
 struct ctx {
 	uint32_t id;
 	int priority;
+	unsigned int engine_map_count;
+	enum intel_engine_id *engine_map;
 	bool targets_instance;
 	bool wants_balance;
 	unsigned int static_vcs;
@@ -201,10 +209,10 @@ struct workload
 		int fd;
 		bool first;
 		unsigned int num_engines;
-		unsigned int engine_map[5];
+		unsigned int engine_map[NUM_ENGINES];
 		uint64_t t_prev;
-		uint64_t prev[5];
-		double busy[5];
+		uint64_t prev[NUM_ENGINES];
+		double busy[NUM_ENGINES];
 	} busy_balancer;
 };
 
@@ -235,6 +243,7 @@ static int fd;
 #define REG(x) (volatile uint32_t *)((volatile char *)igt_global_mmio + x)
 
 static const char *ring_str_map[NUM_ENGINES] = {
+	[DEFAULT] = "DEFAULT",
 	[RCS] = "RCS",
 	[BCS] = "BCS",
 	[VCS] = "VCS",
@@ -331,6 +340,37 @@ static int str_to_engine(const char *str)
 	return -1;
 }
 
+static int parse_engine_map(struct w_step *step, const char *_str)
+{
+	char *token, *tctx = NULL, *tstart = (char *)_str;
+
+	while ((token = strtok_r(tstart, "|", &tctx))) {
+		enum intel_engine_id engine;
+
+		tstart = NULL;
+
+		if (!strcmp(token, "DEFAULT"))
+			return -1;
+		else if (!strcmp(token, "VCS"))
+			return -1;
+
+		engine = str_to_engine(token);
+		if ((int)engine < 0)
+			return -1;
+
+		if (engine != VCS1 && engine != VCS2)
+			return -1; /* TODO */
+
+		step->engine_map_count++;
+		step->engine_map = realloc(step->engine_map,
+					   step->engine_map_count *
+					   sizeof(step->engine_map[0]));
+		step->engine_map[step->engine_map_count - 1] = engine;
+	}
+
+	return 0;
+}
+
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 {
@@ -449,6 +489,33 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			} else if (!strcmp(field, "f")) {
 				step.type = SW_FENCE;
 				goto add_step;
+			} else if (!strcmp(field, "M")) {
+				unsigned int nr = 0;
+				while ((field = strtok_r(fstart, ".", &fctx)) !=
+				    NULL) {
+					tmp = atoi(field);
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 1,
+						  "Invalid engine map format at step %u!\n",
+						  nr_steps);
+
+					if (nr == 0) {
+						step.context = tmp;
+					} else {
+						tmp = parse_engine_map(&step,
+								       field);
+						check_arg(tmp < 0,
+							  "Invalid engine map list at step %u!\n",
+							  nr_steps);
+					}
+
+					nr++;
+				}
+
+				step.type = ENGINE_MAP;
+				goto add_step;
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
 				while ((field = strtok_r(fstart, ".", &fctx)) !=
@@ -490,9 +557,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			i = str_to_engine(field);
 			check_arg(i < 0,
 				  "Invalid engine id at step %u!\n", nr_steps);
-			if (i >= 0)
-				valid++;
-
+			valid++;
 			step.engine = i;
 
 			if (step.engine == BCS)
@@ -769,6 +834,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 }
 
 static const unsigned int eb_engine_map[NUM_ENGINES] = {
+	[DEFAULT] = I915_EXEC_DEFAULT,
 	[RCS] = I915_EXEC_RENDER,
 	[BCS] = I915_EXEC_BLT,
 	[VCS] = I915_EXEC_BSD,
@@ -785,18 +851,42 @@ eb_set_engine(struct drm_i915_gem_execbuffer2 *eb,
 	if (engine == VCS2 && (flags & VCS2REMAP))
 		engine = BCS;
 
-	if ((flags & I915) && engine == VCS) {
+	if ((flags & I915) && engine == VCS)
 		eb->flags = 0;
-	} else {
+	else
 		eb->flags = eb_engine_map[engine];
+}
+
+static unsigned int
+find_engine_in_map(struct ctx *ctx, enum intel_engine_id engine)
+{
+	unsigned int i;
+
+	for (i = 0; i < ctx->engine_map_count; i++) {
+		if (ctx->engine_map[i] == engine)
+			return i + 1;
 	}
+
+	igt_assert(0);
+	return 0;
+}
+
+static struct ctx *
+__get_ctx(struct workload *wrk, struct w_step *w)
+{
+	return &wrk->ctx_list[w->context * 2];
 }
 
 static void
-eb_update_flags(struct w_step *w, enum intel_engine_id engine,
-		unsigned int flags)
+eb_update_flags(struct workload *wrk, struct w_step *w,
+		enum intel_engine_id engine, unsigned int flags)
 {
-	eb_set_engine(&w->eb, engine, flags);
+	struct ctx *ctx = __get_ctx(wrk, w);
+
+	if (ctx->engine_map)
+		w->eb.flags = find_engine_in_map(ctx, engine);
+	else
+		eb_set_engine(&w->eb, engine, flags);
 
 	w->eb.flags |= I915_EXEC_HANDLE_LUT;
 	w->eb.flags |= I915_EXEC_NO_RELOC;
@@ -815,12 +905,6 @@ get_status_objects(struct workload *wrk)
 		return wrk->status_object;
 }
 
-static struct ctx *
-__get_ctx(struct workload *wrk, struct w_step *w)
-{
-	return &wrk->ctx_list[w->context * 2];
-}
-
 static uint32_t
 get_ctxid(struct workload *wrk, struct w_step *w)
 {
@@ -890,7 +974,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 		engine = VCS2;
 	else if (flags & SWAPVCS && engine == VCS2)
 		engine = VCS1;
-	eb_update_flags(w, engine, flags);
+	eb_update_flags(wrk, w, engine, flags);
 #ifdef DEBUG
 	printf("%u: %u:|", w->idx, w->eb.buffer_count);
 	for (i = 0; i <= j; i++)
@@ -913,7 +997,7 @@ static void __ctx_set_prio(uint32_t ctx_id, unsigned int prio)
 		gem_context_set_param(fd, &param);
 }
 
-static void
+static int
 prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 {
 	unsigned int ctx_vcs = 0;
@@ -974,30 +1058,53 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	/*
 	 * Identify if contexts target specific engine instances and if they
 	 * want to be balanced.
+	 *
+	 * Transfer over engine map configuration from the workload step.
 	 */
 	for (j = 0; j < wrk->nr_ctxs; j += 2) {
 		bool targets = false;
 		bool balance = false;
 
 		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
-			if (w->type != BATCH)
-				continue;
-
 			if (w->context != (j / 2))
 				continue;
 
-			if (w->engine == VCS)
-				balance = true;
-			else
-				targets = true;
+			if (w->type == BATCH) {
+				if (w->engine == VCS)
+					balance = true;
+				else
+					targets = true;
+			} else if (w->type == ENGINE_MAP) {
+				wrk->ctx_list[j].engine_map = w->engine_map;
+				wrk->ctx_list[j].engine_map_count =
+					w->engine_map_count;
+			}
 		}
 
-		if (flags & I915) {
-			wrk->ctx_list[j].targets_instance = targets;
+		wrk->ctx_list[j].targets_instance = targets;
+		if (flags & I915)
 			wrk->ctx_list[j].wants_balance = balance;
+	}
+
+	/*
+	 * Ensure VCS is not allowed with engine map contexts.
+	 */
+	for (j = 0; j < wrk->nr_ctxs; j += 2) {
+		for (i = 0, w = wrk->steps; i < wrk->nr_steps; i++, w++) {
+			if (w->context != (j / 2))
+				continue;
+
+			if (w->type != BATCH)
+				continue;
+
+			if (wrk->ctx_list[j].engine_map && w->engine == VCS) {
+				wsim_err("Batches targetting engine maps must use explicit engines!\n");
+				return -1;
+			}
 		}
 	}
 
+
 	/*
 	 * Create and configure contexts.
 	 */
@@ -1008,7 +1115,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		if (ctx->id)
 			continue;
 
-		if (flags & I915) {
+		if ((flags & I915) || ctx->engine_map) {
 			struct drm_i915_gem_context_create_v2 args = { };
 
 			/* Find existing context to share ppgtt with. */
@@ -1022,7 +1129,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				break;
 			}
 
-			if (!ctx->targets_instance)
+			if ((!ctx->engine_map && !ctx->targets_instance))
 				args.flags |= I915_GEM_CONTEXT_SINGLE_TIMELINE;
 
 			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
@@ -1053,7 +1160,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 		 * both want to target specific engines and be balanced by i915?
 		 */
 		if ((flags & I915) && ctx->wants_balance &&
-		    ctx->targets_instance) {
+		    ctx->targets_instance && !ctx->engine_map) {
 			struct drm_i915_gem_context_create_v2 args = {};
 
 			igt_assert(share_ctx);
@@ -1071,7 +1178,33 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			__ctx_set_prio(ctx_id, wrk->prio);
 		}
 
-		if (ctx->wants_balance) {
+		if (ctx->engine_map) {
+			struct local_i915_context_param_engines {
+				__u64 extensions;
+
+				struct {
+					__u16 class; /* see enum drm_i915_gem_engine_class */
+					__u16 instance;
+				} engines[ctx->engine_map_count];
+			} __attribute__((packed)) set_engines;
+			struct drm_i915_gem_context_param param = {
+				.ctx_id = ctx_id,
+				.param = I915_CONTEXT_PARAM_ENGINES,
+				.size = sizeof(set_engines),
+				.value = to_user_pointer(&set_engines),
+			};
+
+			set_engines.extensions = 0;
+
+			for (j = 0; j < ctx->engine_map_count; j++) {
+				set_engines.engines[j].class =
+					I915_ENGINE_CLASS_VIDEO; /* FIXME */
+				set_engines.engines[j].instance =
+					ctx->engine_map[j] - VCS1; /* FIXME */
+			}
+
+			gem_context_set_param(fd, &param);
+		} else if (ctx->wants_balance) {
 			struct i915_context_engines_load_balance load_balance =
 				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
 				  .engines_mask = -1,
@@ -1151,6 +1284,8 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 
 		alloc_step_batch(wrk, w, _flags);
 	}
+
+	return 0;
 }
 
 static double elapsed(const struct timespec *start, const struct timespec *end)
@@ -1888,7 +2023,7 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 	uint32_t seqno = new_seqno(wrk, engine);
 	unsigned int i;
 
-	eb_update_flags(w, engine, flags);
+	eb_update_flags(wrk, w, engine, flags);
 
 	if (flags & SEQNO)
 		update_bb_seqno(w, engine, seqno);
@@ -2037,7 +2172,8 @@ static void *run_workload(void *data)
 								    w->priority;
 				}
 				continue;
-			} else if (w->type == PREEMPTION) {
+			} else if (w->type == PREEMPTION ||
+				   w->type == ENGINE_MAP) {
 				continue;
 			}
 
@@ -2595,7 +2731,11 @@ int main(int argc, char **argv)
 		w[i]->print_stats = verbose > 1 ||
 				    (verbose > 0 && master_workload == i);
 
-		prepare_workload(i, w[i], flags_);
+		if (prepare_workload(i, w[i], flags_)) {
+			wsim_err("Failed to prepare workload %u!\n", i);
+			return 1;
+		}
+
 
 		if (balancer && balancer->init) {
 			int ret = balancer->init(balancer, w[i]);
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 4786f116b4ac..20e3e358cd2e 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -3,6 +3,7 @@ Workload descriptor format
 
 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>].<int <= 0>[/<int <= 0>][...].<0|1>,...
+M.<uint>.<str>[|<str>]...
 P|X.<uint>.<int>
 d|p|s|t|q|a.<int>,...
 f
@@ -23,10 +24,11 @@ Additional workload steps are also supported:
  'q' - Throttle to n max queue depth.
  'f' - Create a sync fence.
  'a' - Advance the previously created sync fence.
+ 'M' - Set up engine map.
  'P' - Context priority.
  'X' - Context preemption control.
 
-Engine ids: RCS, BCS, VCS, VCS1, VCS2, VECS
+Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
 
 Example (leading spaces must not be present in the actual file):
 ----------------------------------------------------------------
@@ -161,3 +163,16 @@ The same context is then marked to have batches which can be preempted every
 
 Same as with context priority, context preemption commands are valid until
 optionally overriden by another preemption control change on the same context.
+
+Engine maps
+-----------
+
+Engine maps are a per context feature which changes the way engine selection is
+done in the driver.
+
+Example:
+
+  1.M.VCS1|VCS2
+
+This sets up context 1 with an engine map containing VCS1 and VCS2 engine.
+Submission to this context can now only reference these two engines.
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 12/17] gem_wsim: Save some lines by changing to implicit NULL checking
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We can improve the parsing loop readability a bit more by avoiding some
line breaks caused by explicit NULL checks.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 39 +++++++++++++++------------------------
 1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index fbec23ad1753..59243af5cde8 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -386,7 +386,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 	igt_assert(desc);
 
-	while ((_token = strtok_r(tstart, ",", &tctx)) != NULL) {
+	while ((_token = strtok_r(tstart, ",", &tctx))) {
 		tstart = NULL;
 		token = strdup(_token);
 		igt_assert(token);
@@ -394,12 +394,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 		valid = 0;
 		memset(&step, 0, sizeof(step));
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			fstart = NULL;
 
 			if (!strcmp(field, "d")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0,
 						  "Invalid delay at step %u!\n",
@@ -409,8 +408,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					goto add_step;
 				}
 			} else if (!strcmp(field, "p")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0,
 						  "Invalid period at step %u!\n",
@@ -421,8 +419,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				}
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
-				while ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
 						  "Invalid context at step %u!\n",
@@ -442,8 +439,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				step.type = CTX_PRIORITY;
 				goto add_step;
 			} else if (!strcmp(field, "s")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp >= 0 ||
 						  ((int)nr_steps + tmp) < 0,
@@ -454,8 +450,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					goto add_step;
 				}
 			} else if (!strcmp(field, "t")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp < 0,
 						  "Invalid throttle at step %u!\n",
@@ -465,8 +460,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					goto add_step;
 				}
 			} else if (!strcmp(field, "q")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp < 0,
 						  "Invalid qd throttle at step %u!\n",
@@ -476,8 +470,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					goto add_step;
 				}
 			} else if (!strcmp(field, "a")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp >= 0,
 						  "Invalid sw fence signal at step %u!\n",
@@ -491,8 +484,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				goto add_step;
 			} else if (!strcmp(field, "M")) {
 				unsigned int nr = 0;
-				while ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
 						  "Invalid context at step %u!\n",
@@ -518,8 +510,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				goto add_step;
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
-				while ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
 						  "Invalid context at step %u!\n",
@@ -551,7 +542,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			fstart = NULL;
 
 			i = str_to_engine(field);
@@ -564,7 +555,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				bcs_used = true;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			char *sep = NULL;
 			long int tmpl;
 
@@ -592,7 +583,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			fstart = NULL;
 
 			tmp = parse_dependencies(nr_steps, &step, field);
@@ -602,7 +593,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			fstart = NULL;
 
 			check_arg(strlen(field) != 1 ||
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 12/17] gem_wsim: Save some lines by changing to implicit NULL checking
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We can improve the parsing loop readability a bit more by avoiding some
line breaks caused by explicit NULL checks.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 39 +++++++++++++++------------------------
 1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index fbec23ad1753..59243af5cde8 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -386,7 +386,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 	igt_assert(desc);
 
-	while ((_token = strtok_r(tstart, ",", &tctx)) != NULL) {
+	while ((_token = strtok_r(tstart, ",", &tctx))) {
 		tstart = NULL;
 		token = strdup(_token);
 		igt_assert(token);
@@ -394,12 +394,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 		valid = 0;
 		memset(&step, 0, sizeof(step));
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			fstart = NULL;
 
 			if (!strcmp(field, "d")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0,
 						  "Invalid delay at step %u!\n",
@@ -409,8 +408,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					goto add_step;
 				}
 			} else if (!strcmp(field, "p")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp <= 0,
 						  "Invalid period at step %u!\n",
@@ -421,8 +419,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				}
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
-				while ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
 						  "Invalid context at step %u!\n",
@@ -442,8 +439,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				step.type = CTX_PRIORITY;
 				goto add_step;
 			} else if (!strcmp(field, "s")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp >= 0 ||
 						  ((int)nr_steps + tmp) < 0,
@@ -454,8 +450,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					goto add_step;
 				}
 			} else if (!strcmp(field, "t")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp < 0,
 						  "Invalid throttle at step %u!\n",
@@ -465,8 +460,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					goto add_step;
 				}
 			} else if (!strcmp(field, "q")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp < 0,
 						  "Invalid qd throttle at step %u!\n",
@@ -476,8 +470,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 					goto add_step;
 				}
 			} else if (!strcmp(field, "a")) {
-				if ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				if ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(tmp >= 0,
 						  "Invalid sw fence signal at step %u!\n",
@@ -491,8 +484,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				goto add_step;
 			} else if (!strcmp(field, "M")) {
 				unsigned int nr = 0;
-				while ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
 						  "Invalid context at step %u!\n",
@@ -518,8 +510,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				goto add_step;
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
-				while ((field = strtok_r(fstart, ".", &fctx)) !=
-				    NULL) {
+				while ((field = strtok_r(fstart, ".", &fctx))) {
 					tmp = atoi(field);
 					check_arg(nr == 0 && tmp <= 0,
 						  "Invalid context at step %u!\n",
@@ -551,7 +542,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			fstart = NULL;
 
 			i = str_to_engine(field);
@@ -564,7 +555,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				bcs_used = true;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			char *sep = NULL;
 			long int tmpl;
 
@@ -592,7 +583,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			fstart = NULL;
 
 			tmp = parse_dependencies(nr_steps, &step, field);
@@ -602,7 +593,7 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			valid++;
 		}
 
-		if ((field = strtok_r(fstart, ".", &fctx)) != NULL) {
+		if ((field = strtok_r(fstart, ".", &fctx))) {
 			fstart = NULL;
 
 			check_arg(strlen(field) != 1 ||
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 13/17] gem_wsim: Compact int command parsing with a macro
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Parsing an integer workload descriptor field is a common pattern which we
can extract to a helper macro and by doing so further improve the
readability of the main parsing loop.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 80 ++++++++++++++-----------------------------
 1 file changed, 25 insertions(+), 55 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 59243af5cde8..b805ecd9a680 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -371,6 +371,15 @@ static int parse_engine_map(struct w_step *step, const char *_str)
 	return 0;
 }
 
+#define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
+	if ((field = strtok_r(fstart, ".", &fctx))) { \
+		tmp = atoi(field); \
+		check_arg(_COND_, _ERR_, nr_steps); \
+		step.type = _STEP_; \
+		step._FIELD_ = tmp; \
+		goto add_step; \
+	} \
+
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 {
@@ -398,25 +407,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			fstart = NULL;
 
 			if (!strcmp(field, "d")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp <= 0,
-						  "Invalid delay at step %u!\n",
-						  nr_steps);
-					step.type = DELAY;
-					step.delay = tmp;
-					goto add_step;
-				}
+				int_field(DELAY, delay, tmp <= 0,
+					  "Invalid delay at step %u!\n");
 			} else if (!strcmp(field, "p")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp <= 0,
-						  "Invalid period at step %u!\n",
-						  nr_steps);
-					step.type = PERIOD;
-					step.period = tmp;
-					goto add_step;
-				}
+				int_field(PERIOD, period, tmp <= 0,
+					  "Invalid period at step %u!\n");
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
 				while ((field = strtok_r(fstart, ".", &fctx))) {
@@ -439,46 +434,21 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				step.type = CTX_PRIORITY;
 				goto add_step;
 			} else if (!strcmp(field, "s")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp >= 0 ||
-						  ((int)nr_steps + tmp) < 0,
-						  "Invalid sync target at step %u!\n",
-						  nr_steps);
-					step.type = SYNC;
-					step.target = tmp;
-					goto add_step;
-				}
+				int_field(SYNC, target,
+					  tmp >= 0 || ((int)nr_steps + tmp) < 0,
+					  "Invalid sync target at step %u!\n");
 			} else if (!strcmp(field, "t")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp < 0,
-						  "Invalid throttle at step %u!\n",
-						  nr_steps);
-					step.type = THROTTLE;
-					step.throttle = tmp;
-					goto add_step;
-				}
+				int_field(THROTTLE, throttle,
+					  tmp < 0,
+					  "Invalid throttle at step %u!\n");
 			} else if (!strcmp(field, "q")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp < 0,
-						  "Invalid qd throttle at step %u!\n",
-						  nr_steps);
-					step.type = QD_THROTTLE;
-					step.throttle = tmp;
-					goto add_step;
-				}
+				int_field(QD_THROTTLE, throttle,
+					  tmp < 0,
+					  "Invalid qd throttle at step %u!\n");
 			} else if (!strcmp(field, "a")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp >= 0,
-						  "Invalid sw fence signal at step %u!\n",
-						  nr_steps);
-					step.type = SW_FENCE_SIGNAL;
-					step.target = tmp;
-					goto add_step;
-				}
+				int_field(SW_FENCE_SIGNAL, target,
+					  tmp >= 0,
+					  "Invalid sw fence signal at step %u!\n");
 			} else if (!strcmp(field, "f")) {
 				step.type = SW_FENCE;
 				goto add_step;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 13/17] gem_wsim: Compact int command parsing with a macro
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Parsing an integer workload descriptor field is a common pattern which we
can extract to a helper macro and by doing so further improve the
readability of the main parsing loop.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c | 80 ++++++++++++++-----------------------------
 1 file changed, 25 insertions(+), 55 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 59243af5cde8..b805ecd9a680 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -371,6 +371,15 @@ static int parse_engine_map(struct w_step *step, const char *_str)
 	return 0;
 }
 
+#define int_field(_STEP_, _FIELD_, _COND_, _ERR_) \
+	if ((field = strtok_r(fstart, ".", &fctx))) { \
+		tmp = atoi(field); \
+		check_arg(_COND_, _ERR_, nr_steps); \
+		step.type = _STEP_; \
+		step._FIELD_ = tmp; \
+		goto add_step; \
+	} \
+
 static struct workload *
 parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 {
@@ -398,25 +407,11 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 			fstart = NULL;
 
 			if (!strcmp(field, "d")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp <= 0,
-						  "Invalid delay at step %u!\n",
-						  nr_steps);
-					step.type = DELAY;
-					step.delay = tmp;
-					goto add_step;
-				}
+				int_field(DELAY, delay, tmp <= 0,
+					  "Invalid delay at step %u!\n");
 			} else if (!strcmp(field, "p")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp <= 0,
-						  "Invalid period at step %u!\n",
-						  nr_steps);
-					step.type = PERIOD;
-					step.period = tmp;
-					goto add_step;
-				}
+				int_field(PERIOD, period, tmp <= 0,
+					  "Invalid period at step %u!\n");
 			} else if (!strcmp(field, "P")) {
 				unsigned int nr = 0;
 				while ((field = strtok_r(fstart, ".", &fctx))) {
@@ -439,46 +434,21 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 				step.type = CTX_PRIORITY;
 				goto add_step;
 			} else if (!strcmp(field, "s")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp >= 0 ||
-						  ((int)nr_steps + tmp) < 0,
-						  "Invalid sync target at step %u!\n",
-						  nr_steps);
-					step.type = SYNC;
-					step.target = tmp;
-					goto add_step;
-				}
+				int_field(SYNC, target,
+					  tmp >= 0 || ((int)nr_steps + tmp) < 0,
+					  "Invalid sync target at step %u!\n");
 			} else if (!strcmp(field, "t")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp < 0,
-						  "Invalid throttle at step %u!\n",
-						  nr_steps);
-					step.type = THROTTLE;
-					step.throttle = tmp;
-					goto add_step;
-				}
+				int_field(THROTTLE, throttle,
+					  tmp < 0,
+					  "Invalid throttle at step %u!\n");
 			} else if (!strcmp(field, "q")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp < 0,
-						  "Invalid qd throttle at step %u!\n",
-						  nr_steps);
-					step.type = QD_THROTTLE;
-					step.throttle = tmp;
-					goto add_step;
-				}
+				int_field(QD_THROTTLE, throttle,
+					  tmp < 0,
+					  "Invalid qd throttle at step %u!\n");
 			} else if (!strcmp(field, "a")) {
-				if ((field = strtok_r(fstart, ".", &fctx))) {
-					tmp = atoi(field);
-					check_arg(tmp >= 0,
-						  "Invalid sw fence signal at step %u!\n",
-						  nr_steps);
-					step.type = SW_FENCE_SIGNAL;
-					step.target = tmp;
-					goto add_step;
-				}
+				int_field(SW_FENCE_SIGNAL, target,
+					  tmp >= 0,
+					  "Invalid sw fence signal at step %u!\n");
 			} else if (!strcmp(field, "f")) {
 				step.type = SW_FENCE;
 				goto add_step;
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 14/17] gem_wsim: Engine map load balance command
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A new workload command for enabling a load balanced context map (aka
Virtual Engine). Example usage:

  1.B

This turns on load balancing for context one, assuming it has already been
configured with an engine map. Only DEFAULT engine specifier can be used
with load balanced engine maps.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 62 +++++++++++++++++++++++++++++++++++++-----
 benchmarks/wsim/README | 18 ++++++++++++
 2 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index b805ecd9a680..a772e2c588b5 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -82,7 +82,8 @@ enum w_type
 	SW_FENCE_SIGNAL,
 	CTX_PRIORITY,
 	PREEMPTION,
-	ENGINE_MAP
+	ENGINE_MAP,
+	LOAD_BALANCE,
 };
 
 struct deps
@@ -120,6 +121,7 @@ struct w_step
 			unsigned int engine_map_count;
 			enum intel_engine_id *engine_map;
 		};
+		bool load_balance;
 	};
 
 	/* Implementation details */
@@ -502,6 +504,25 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 				step.type = PREEMPTION;
 				goto add_step;
+			} else if (!strcmp(field, "B")) {
+				unsigned int nr = 0;
+				while ((field = strtok_r(fstart, ".", &fctx))) {
+					tmp = atoi(field);
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 0,
+						  "Invalid load balance format at step %u!\n",
+						  nr_steps);
+
+					step.context = tmp;
+					step.load_balance = true;
+
+					nr++;
+				}
+
+				step.type = LOAD_BALANCE;
+				goto add_step;
 			}
 
 			tmp = atoi(field);
@@ -828,7 +849,7 @@ find_engine_in_map(struct ctx *ctx, enum intel_engine_id engine)
 			return i + 1;
 	}
 
-	igt_assert(0);
+	igt_assert(ctx->wants_balance);
 	return 0;
 }
 
@@ -1039,12 +1060,19 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				wrk->ctx_list[j].engine_map = w->engine_map;
 				wrk->ctx_list[j].engine_map_count =
 					w->engine_map_count;
+			} else if (w->type == LOAD_BALANCE) {
+				if (!wrk->ctx_list[j].engine_map) {
+					wsim_err("Load balancing needs an engine map!\n");
+					return 1;
+				}
+				wrk->ctx_list[j].wants_balance =
+					w->load_balance;
 			}
 		}
 
 		wrk->ctx_list[j].targets_instance = targets;
 		if (flags & I915)
-			wrk->ctx_list[j].wants_balance = balance;
+			wrk->ctx_list[j].wants_balance |= balance;
 	}
 
 	/*
@@ -1058,10 +1086,19 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			if (w->type != BATCH)
 				continue;
 
-			if (wrk->ctx_list[j].engine_map && w->engine == VCS) {
+			if (wrk->ctx_list[j].engine_map &&
+			    !wrk->ctx_list[j].wants_balance &&
+			    (w->engine == VCS || w->engine == DEFAULT)) {
 				wsim_err("Batches targetting engine maps must use explicit engines!\n");
 				return -1;
 			}
+
+			if (wrk->ctx_list[j].engine_map &&
+			    wrk->ctx_list[j].wants_balance &&
+			    w->engine != DEFAULT) {
+				wsim_err("Batches targetting load balanced maps must not use explicit engines!\n");
+				return -1;
+			}
 		}
 	}
 
@@ -1090,7 +1127,8 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				break;
 			}
 
-			if ((!ctx->engine_map && !ctx->targets_instance))
+			if ((!ctx->engine_map && !ctx->targets_instance) ||
+			    (ctx->engine_map && ctx->wants_balance))
 				args.flags |= I915_GEM_CONTEXT_SINGLE_TIMELINE;
 
 			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
@@ -1154,8 +1192,17 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				.size = sizeof(set_engines),
 				.value = to_user_pointer(&set_engines),
 			};
+			struct i915_context_engines_load_balance load_balance =
+				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
+				  .engines_mask = -1,
+				};
 
-			set_engines.extensions = 0;
+			if (ctx->wants_balance) {
+				set_engines.extensions =
+					to_user_pointer(&load_balance);
+			} else {
+				set_engines.extensions = 0;
+			}
 
 			for (j = 0; j < ctx->engine_map_count; j++) {
 				set_engines.engines[j].class =
@@ -2134,7 +2181,8 @@ static void *run_workload(void *data)
 				}
 				continue;
 			} else if (w->type == PREEMPTION ||
-				   w->type == ENGINE_MAP) {
+				   w->type == ENGINE_MAP ||
+				   w->type == LOAD_BALANCE) {
 				continue;
 			}
 
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 20e3e358cd2e..58dada675357 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -3,6 +3,7 @@ Workload descriptor format
 
 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>].<int <= 0>[/<int <= 0>][...].<0|1>,...
+B.<uint>
 M.<uint>.<str>[|<str>]...
 P|X.<uint>.<int>
 d|p|s|t|q|a.<int>,...
@@ -24,6 +25,7 @@ Additional workload steps are also supported:
  'q' - Throttle to n max queue depth.
  'f' - Create a sync fence.
  'a' - Advance the previously created sync fence.
+ 'B' - Turn on context load balancing.
  'M' - Set up engine map.
  'P' - Context priority.
  'X' - Context preemption control.
@@ -176,3 +178,19 @@ Example:
 
 This sets up context 1 with an engine map containing VCS1 and VCS2 engine.
 Submission to this context can now only reference these two engines.
+
+Context load balancing
+----------------------
+
+Context load balancing (aka Virtual Engine) is an i915 feature where the driver
+will pick the best engine (most idle) to submit to given previously configured
+engine map.
+
+Example:
+
+  1.B
+
+This enables load balancing for context number one.
+
+Submissions to load balanced contexts are only allowed to use the DEFAULT engine
+specifier.
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Intel-gfx] [PATCH i-g-t 14/17] gem_wsim: Engine map load balance command
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A new workload command for enabling a load balanced context map (aka
Virtual Engine). Example usage:

  1.B

This turns on load balancing for context one, assuming it has already been
configured with an engine map. Only DEFAULT engine specifier can be used
with load balanced engine maps.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 62 +++++++++++++++++++++++++++++++++++++-----
 benchmarks/wsim/README | 18 ++++++++++++
 2 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index b805ecd9a680..a772e2c588b5 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -82,7 +82,8 @@ enum w_type
 	SW_FENCE_SIGNAL,
 	CTX_PRIORITY,
 	PREEMPTION,
-	ENGINE_MAP
+	ENGINE_MAP,
+	LOAD_BALANCE,
 };
 
 struct deps
@@ -120,6 +121,7 @@ struct w_step
 			unsigned int engine_map_count;
 			enum intel_engine_id *engine_map;
 		};
+		bool load_balance;
 	};
 
 	/* Implementation details */
@@ -502,6 +504,25 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 				step.type = PREEMPTION;
 				goto add_step;
+			} else if (!strcmp(field, "B")) {
+				unsigned int nr = 0;
+				while ((field = strtok_r(fstart, ".", &fctx))) {
+					tmp = atoi(field);
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 0,
+						  "Invalid load balance format at step %u!\n",
+						  nr_steps);
+
+					step.context = tmp;
+					step.load_balance = true;
+
+					nr++;
+				}
+
+				step.type = LOAD_BALANCE;
+				goto add_step;
 			}
 
 			tmp = atoi(field);
@@ -828,7 +849,7 @@ find_engine_in_map(struct ctx *ctx, enum intel_engine_id engine)
 			return i + 1;
 	}
 
-	igt_assert(0);
+	igt_assert(ctx->wants_balance);
 	return 0;
 }
 
@@ -1039,12 +1060,19 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				wrk->ctx_list[j].engine_map = w->engine_map;
 				wrk->ctx_list[j].engine_map_count =
 					w->engine_map_count;
+			} else if (w->type == LOAD_BALANCE) {
+				if (!wrk->ctx_list[j].engine_map) {
+					wsim_err("Load balancing needs an engine map!\n");
+					return 1;
+				}
+				wrk->ctx_list[j].wants_balance =
+					w->load_balance;
 			}
 		}
 
 		wrk->ctx_list[j].targets_instance = targets;
 		if (flags & I915)
-			wrk->ctx_list[j].wants_balance = balance;
+			wrk->ctx_list[j].wants_balance |= balance;
 	}
 
 	/*
@@ -1058,10 +1086,19 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 			if (w->type != BATCH)
 				continue;
 
-			if (wrk->ctx_list[j].engine_map && w->engine == VCS) {
+			if (wrk->ctx_list[j].engine_map &&
+			    !wrk->ctx_list[j].wants_balance &&
+			    (w->engine == VCS || w->engine == DEFAULT)) {
 				wsim_err("Batches targetting engine maps must use explicit engines!\n");
 				return -1;
 			}
+
+			if (wrk->ctx_list[j].engine_map &&
+			    wrk->ctx_list[j].wants_balance &&
+			    w->engine != DEFAULT) {
+				wsim_err("Batches targetting load balanced maps must not use explicit engines!\n");
+				return -1;
+			}
 		}
 	}
 
@@ -1090,7 +1127,8 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				break;
 			}
 
-			if ((!ctx->engine_map && !ctx->targets_instance))
+			if ((!ctx->engine_map && !ctx->targets_instance) ||
+			    (ctx->engine_map && ctx->wants_balance))
 				args.flags |= I915_GEM_CONTEXT_SINGLE_TIMELINE;
 
 			drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &args);
@@ -1154,8 +1192,17 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				.size = sizeof(set_engines),
 				.value = to_user_pointer(&set_engines),
 			};
+			struct i915_context_engines_load_balance load_balance =
+				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
+				  .engines_mask = -1,
+				};
 
-			set_engines.extensions = 0;
+			if (ctx->wants_balance) {
+				set_engines.extensions =
+					to_user_pointer(&load_balance);
+			} else {
+				set_engines.extensions = 0;
+			}
 
 			for (j = 0; j < ctx->engine_map_count; j++) {
 				set_engines.engines[j].class =
@@ -2134,7 +2181,8 @@ static void *run_workload(void *data)
 				}
 				continue;
 			} else if (w->type == PREEMPTION ||
-				   w->type == ENGINE_MAP) {
+				   w->type == ENGINE_MAP ||
+				   w->type == LOAD_BALANCE) {
 				continue;
 			}
 
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 20e3e358cd2e..58dada675357 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -3,6 +3,7 @@ Workload descriptor format
 
 ctx.engine.duration_us.dependency.wait,...
 <uint>.<str>.<uint>[-<uint>].<int <= 0>[/<int <= 0>][...].<0|1>,...
+B.<uint>
 M.<uint>.<str>[|<str>]...
 P|X.<uint>.<int>
 d|p|s|t|q|a.<int>,...
@@ -24,6 +25,7 @@ Additional workload steps are also supported:
  'q' - Throttle to n max queue depth.
  'f' - Create a sync fence.
  'a' - Advance the previously created sync fence.
+ 'B' - Turn on context load balancing.
  'M' - Set up engine map.
  'P' - Context priority.
  'X' - Context preemption control.
@@ -176,3 +178,19 @@ Example:
 
 This sets up context 1 with an engine map containing VCS1 and VCS2 engine.
 Submission to this context can now only reference these two engines.
+
+Context load balancing
+----------------------
+
+Context load balancing (aka Virtual Engine) is an i915 feature where the driver
+will pick the best engine (most idle) to submit to given previously configured
+engine map.
+
+Example:
+
+  1.B
+
+This enables load balancing for context number one.
+
+Submissions to load balanced contexts are only allowed to use the DEFAULT engine
+specifier.
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 15/17] gem_wsim: Engine bond command
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Engine bonds are an i915 uAPI applicable to load balanced contexts with
engine map. They allow expression rules of engine selection between two
contexts when submissions are also tied with submit fences.

Please refer to the README for a more detailed description.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 100 ++++++++++++++++++++++++++++++++++++++---
 benchmarks/wsim/README |  50 +++++++++++++++++++++
 2 files changed, 143 insertions(+), 7 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index a772e2c588b5..b5ade7b33883 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -84,6 +84,7 @@ enum w_type
 	PREEMPTION,
 	ENGINE_MAP,
 	LOAD_BALANCE,
+	BOND,
 };
 
 struct deps
@@ -99,6 +100,11 @@ struct w_arg {
 	int prio;
 };
 
+struct bond {
+	uint64_t mask;
+	enum intel_engine_id master;
+};
+
 struct w_step
 {
 	/* Workload step metadata */
@@ -122,6 +128,10 @@ struct w_step
 			enum intel_engine_id *engine_map;
 		};
 		bool load_balance;
+		struct {
+			uint64_t bond_mask;
+			enum intel_engine_id bond_master;
+		};
 	};
 
 	/* Implementation details */
@@ -153,6 +163,8 @@ struct ctx {
 	int priority;
 	unsigned int engine_map_count;
 	enum intel_engine_id *engine_map;
+	unsigned int bond_count;
+	struct bond *bonds;
 	bool targets_instance;
 	bool wants_balance;
 	unsigned int static_vcs;
@@ -523,6 +535,40 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 				step.type = LOAD_BALANCE;
 				goto add_step;
+			} else if (!strcmp(field, "b")) {
+				unsigned int nr = 0;
+				while ((field = strtok_r(fstart, ".", &fctx))) {
+					tmp = atoi(field);
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr == 1 &&
+						  (tmp < -1 || tmp == 0),
+						  "Invalid siblings mask at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 2,
+						  "Invalid bond format at step %u!\n",
+						  nr_steps);
+
+					if (nr == 0) {
+						step.context = tmp;
+					} else if (nr == 1) {
+						step.bond_mask = tmp;
+					} else if (nr == 2) {
+						tmp = str_to_engine(field);
+						check_arg(tmp <= 0 ||
+							  tmp == VCS ||
+							  tmp == DEFAULT,
+							  "Invalid master engine at step %u!\n",
+							  nr_steps);
+						step.bond_master = tmp;
+					}
+
+					nr++;
+				}
+
+				step.type = BOND;
+				goto add_step;
 			}
 
 			tmp = atoi(field);
@@ -1044,6 +1090,8 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	 * Transfer over engine map configuration from the workload step.
 	 */
 	for (j = 0; j < wrk->nr_ctxs; j += 2) {
+		struct ctx *ctx = &wrk->ctx_list[j];
+
 		bool targets = false;
 		bool balance = false;
 
@@ -1057,16 +1105,28 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				else
 					targets = true;
 			} else if (w->type == ENGINE_MAP) {
-				wrk->ctx_list[j].engine_map = w->engine_map;
-				wrk->ctx_list[j].engine_map_count =
-					w->engine_map_count;
+				ctx->engine_map = w->engine_map;
+				ctx->engine_map_count = w->engine_map_count;
 			} else if (w->type == LOAD_BALANCE) {
-				if (!wrk->ctx_list[j].engine_map) {
+				if (!ctx->engine_map) {
 					wsim_err("Load balancing needs an engine map!\n");
 					return 1;
 				}
-				wrk->ctx_list[j].wants_balance =
-					w->load_balance;
+				ctx->wants_balance = w->load_balance;
+			} else if (w->type == BOND) {
+				if (!ctx->wants_balance) {
+					wsim_err("Engine bonds need load balancing engine map!\n");
+					return 1;
+				}
+				ctx->bond_count++;
+				ctx->bonds = realloc(ctx->bonds,
+						     ctx->bond_count *
+						     sizeof(struct bond));
+				igt_assert(ctx->bonds);
+				ctx->bonds[ctx->bond_count - 1].mask =
+					w->bond_mask;
+				ctx->bonds[ctx->bond_count - 1].master =
+					w->bond_master;
 			}
 		}
 
@@ -1196,6 +1256,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
 				  .engines_mask = -1,
 				};
+			struct i915_context_engines_bond *bonds = NULL;
 
 			if (ctx->wants_balance) {
 				set_engines.extensions =
@@ -1211,7 +1272,31 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 					ctx->engine_map[j] - VCS1; /* FIXME */
 			}
 
+			if (ctx->bond_count) {
+				bonds = calloc(ctx->bond_count, sizeof(*bonds));
+				load_balance.base.next_extension =
+					to_user_pointer(&bonds[0]);
+			}
+
+			for (j = 0; j < ctx->bond_count; j++) {
+				struct i915_context_engines_bond *bond =
+					&bonds[j];
+
+				if (j < (ctx->bond_count - 1))
+					bond->base.next_extension =
+						to_user_pointer(bond + 1);
+
+				bond->base.name = I915_CONTEXT_ENGINES_EXT_BOND;
+				bond->master_class = I915_ENGINE_CLASS_VIDEO;
+				bond->master_instance =
+					ctx->bonds[j].master - VCS1;
+				bond->sibling_mask = ctx->bonds[j].mask;
+			}
+
 			gem_context_set_param(fd, &param);
+
+			if (bonds)
+				free(bonds);
 		} else if (ctx->wants_balance) {
 			struct i915_context_engines_load_balance load_balance =
 				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
@@ -2182,7 +2267,8 @@ static void *run_workload(void *data)
 				continue;
 			} else if (w->type == PREEMPTION ||
 				   w->type == ENGINE_MAP ||
-				   w->type == LOAD_BALANCE) {
+				   w->type == LOAD_BALANCE ||
+				   w->type == BOND) {
 				continue;
 			}
 
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 58dada675357..f2974992ab68 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -7,6 +7,7 @@ B.<uint>
 M.<uint>.<str>[|<str>]...
 P|X.<uint>.<int>
 d|p|s|t|q|a.<int>,...
+b.<uint>.<uint>.<str>
 f
 
 For duration a range can be given from which a random value will be picked
@@ -26,6 +27,7 @@ Additional workload steps are also supported:
  'f' - Create a sync fence.
  'a' - Advance the previously created sync fence.
  'B' - Turn on context load balancing.
+ 'b' - Set up engine bonds.
  'M' - Set up engine map.
  'P' - Context priority.
  'X' - Context preemption control.
@@ -194,3 +196,51 @@ This enables load balancing for context number one.
 
 Submissions to load balanced contexts are only allowed to use the DEFAULT engine
 specifier.
+
+Engine bonds
+------------
+
+Engine bonds are extensions on load balanced contexts. They allow expressing
+rules of engine selection between two co-operating contexts tied with submit
+fences. In other words, the rule expression is telling the driver: "If you pick
+this engine for context one, then you have to pick that engine for context two".
+
+Syntax is:
+  b.<context>.<engine_mask>.<master_engine>
+
+Engine mask is a bitmask representing engines in the engine map configured for
+the same context.
+
+There can be multiple bonds tied to the same context.
+
+Example:
+
+  M.1.RCS|VECS
+  B.1
+  M.2.VCS1|VCS2
+  B.2
+  b.2.1.RCS
+  b.2.2.VECS
+
+This tells the driver that if it picked RCS for context one, it has to pick VCS1
+for context two. And if it picked VECS for context one, it has to pick VCS1 for
+context two.
+
+If we extend the above example with more workload directives:
+
+  1.DEFAULT.1000.0.0
+  2.DEFAULT.1000.s-1.0
+
+We get to a fully functional example where two batch buffers are submitted in a
+load balanced fashion, telling the driver they should run simultaneously and
+that valid engine pairs are either RCS + VCS1 (for two contexts respectively),
+or VECS + VCS2.
+
+This can also be extended using sync fences to improve chances of the first
+submission not getting on the hardware after the second one. Second block would
+then look like:
+
+  f
+  1.DEFAULT.1000.f-1.0
+  2.DEFAULT.1000.s-1.0
+  a.-3
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 15/17] gem_wsim: Engine bond command
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Engine bonds are an i915 uAPI applicable to load balanced contexts with
engine map. They allow expression rules of engine selection between two
contexts when submissions are also tied with submit fences.

Please refer to the README for a more detailed description.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c  | 100 ++++++++++++++++++++++++++++++++++++++---
 benchmarks/wsim/README |  50 +++++++++++++++++++++
 2 files changed, 143 insertions(+), 7 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index a772e2c588b5..b5ade7b33883 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -84,6 +84,7 @@ enum w_type
 	PREEMPTION,
 	ENGINE_MAP,
 	LOAD_BALANCE,
+	BOND,
 };
 
 struct deps
@@ -99,6 +100,11 @@ struct w_arg {
 	int prio;
 };
 
+struct bond {
+	uint64_t mask;
+	enum intel_engine_id master;
+};
+
 struct w_step
 {
 	/* Workload step metadata */
@@ -122,6 +128,10 @@ struct w_step
 			enum intel_engine_id *engine_map;
 		};
 		bool load_balance;
+		struct {
+			uint64_t bond_mask;
+			enum intel_engine_id bond_master;
+		};
 	};
 
 	/* Implementation details */
@@ -153,6 +163,8 @@ struct ctx {
 	int priority;
 	unsigned int engine_map_count;
 	enum intel_engine_id *engine_map;
+	unsigned int bond_count;
+	struct bond *bonds;
 	bool targets_instance;
 	bool wants_balance;
 	unsigned int static_vcs;
@@ -523,6 +535,40 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 				step.type = LOAD_BALANCE;
 				goto add_step;
+			} else if (!strcmp(field, "b")) {
+				unsigned int nr = 0;
+				while ((field = strtok_r(fstart, ".", &fctx))) {
+					tmp = atoi(field);
+					check_arg(nr == 0 && tmp <= 0,
+						  "Invalid context at step %u!\n",
+						  nr_steps);
+					check_arg(nr == 1 &&
+						  (tmp < -1 || tmp == 0),
+						  "Invalid siblings mask at step %u!\n",
+						  nr_steps);
+					check_arg(nr > 2,
+						  "Invalid bond format at step %u!\n",
+						  nr_steps);
+
+					if (nr == 0) {
+						step.context = tmp;
+					} else if (nr == 1) {
+						step.bond_mask = tmp;
+					} else if (nr == 2) {
+						tmp = str_to_engine(field);
+						check_arg(tmp <= 0 ||
+							  tmp == VCS ||
+							  tmp == DEFAULT,
+							  "Invalid master engine at step %u!\n",
+							  nr_steps);
+						step.bond_master = tmp;
+					}
+
+					nr++;
+				}
+
+				step.type = BOND;
+				goto add_step;
 			}
 
 			tmp = atoi(field);
@@ -1044,6 +1090,8 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 	 * Transfer over engine map configuration from the workload step.
 	 */
 	for (j = 0; j < wrk->nr_ctxs; j += 2) {
+		struct ctx *ctx = &wrk->ctx_list[j];
+
 		bool targets = false;
 		bool balance = false;
 
@@ -1057,16 +1105,28 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				else
 					targets = true;
 			} else if (w->type == ENGINE_MAP) {
-				wrk->ctx_list[j].engine_map = w->engine_map;
-				wrk->ctx_list[j].engine_map_count =
-					w->engine_map_count;
+				ctx->engine_map = w->engine_map;
+				ctx->engine_map_count = w->engine_map_count;
 			} else if (w->type == LOAD_BALANCE) {
-				if (!wrk->ctx_list[j].engine_map) {
+				if (!ctx->engine_map) {
 					wsim_err("Load balancing needs an engine map!\n");
 					return 1;
 				}
-				wrk->ctx_list[j].wants_balance =
-					w->load_balance;
+				ctx->wants_balance = w->load_balance;
+			} else if (w->type == BOND) {
+				if (!ctx->wants_balance) {
+					wsim_err("Engine bonds need load balancing engine map!\n");
+					return 1;
+				}
+				ctx->bond_count++;
+				ctx->bonds = realloc(ctx->bonds,
+						     ctx->bond_count *
+						     sizeof(struct bond));
+				igt_assert(ctx->bonds);
+				ctx->bonds[ctx->bond_count - 1].mask =
+					w->bond_mask;
+				ctx->bonds[ctx->bond_count - 1].master =
+					w->bond_master;
 			}
 		}
 
@@ -1196,6 +1256,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
 				  .engines_mask = -1,
 				};
+			struct i915_context_engines_bond *bonds = NULL;
 
 			if (ctx->wants_balance) {
 				set_engines.extensions =
@@ -1211,7 +1272,31 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
 					ctx->engine_map[j] - VCS1; /* FIXME */
 			}
 
+			if (ctx->bond_count) {
+				bonds = calloc(ctx->bond_count, sizeof(*bonds));
+				load_balance.base.next_extension =
+					to_user_pointer(&bonds[0]);
+			}
+
+			for (j = 0; j < ctx->bond_count; j++) {
+				struct i915_context_engines_bond *bond =
+					&bonds[j];
+
+				if (j < (ctx->bond_count - 1))
+					bond->base.next_extension =
+						to_user_pointer(bond + 1);
+
+				bond->base.name = I915_CONTEXT_ENGINES_EXT_BOND;
+				bond->master_class = I915_ENGINE_CLASS_VIDEO;
+				bond->master_instance =
+					ctx->bonds[j].master - VCS1;
+				bond->sibling_mask = ctx->bonds[j].mask;
+			}
+
 			gem_context_set_param(fd, &param);
+
+			if (bonds)
+				free(bonds);
 		} else if (ctx->wants_balance) {
 			struct i915_context_engines_load_balance load_balance =
 				{ .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
@@ -2182,7 +2267,8 @@ static void *run_workload(void *data)
 				continue;
 			} else if (w->type == PREEMPTION ||
 				   w->type == ENGINE_MAP ||
-				   w->type == LOAD_BALANCE) {
+				   w->type == LOAD_BALANCE ||
+				   w->type == BOND) {
 				continue;
 			}
 
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index 58dada675357..f2974992ab68 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -7,6 +7,7 @@ B.<uint>
 M.<uint>.<str>[|<str>]...
 P|X.<uint>.<int>
 d|p|s|t|q|a.<int>,...
+b.<uint>.<uint>.<str>
 f
 
 For duration a range can be given from which a random value will be picked
@@ -26,6 +27,7 @@ Additional workload steps are also supported:
  'f' - Create a sync fence.
  'a' - Advance the previously created sync fence.
  'B' - Turn on context load balancing.
+ 'b' - Set up engine bonds.
  'M' - Set up engine map.
  'P' - Context priority.
  'X' - Context preemption control.
@@ -194,3 +196,51 @@ This enables load balancing for context number one.
 
 Submissions to load balanced contexts are only allowed to use the DEFAULT engine
 specifier.
+
+Engine bonds
+------------
+
+Engine bonds are extensions on load balanced contexts. They allow expressing
+rules of engine selection between two co-operating contexts tied with submit
+fences. In other words, the rule expression is telling the driver: "If you pick
+this engine for context one, then you have to pick that engine for context two".
+
+Syntax is:
+  b.<context>.<engine_mask>.<master_engine>
+
+Engine mask is a bitmask representing engines in the engine map configured for
+the same context.
+
+There can be multiple bonds tied to the same context.
+
+Example:
+
+  M.1.RCS|VECS
+  B.1
+  M.2.VCS1|VCS2
+  B.2
+  b.2.1.RCS
+  b.2.2.VECS
+
+This tells the driver that if it picked RCS for context one, it has to pick VCS1
+for context two. And if it picked VECS for context one, it has to pick VCS1 for
+context two.
+
+If we extend the above example with more workload directives:
+
+  1.DEFAULT.1000.0.0
+  2.DEFAULT.1000.s-1.0
+
+We get to a fully functional example where two batch buffers are submitted in a
+load balanced fashion, telling the driver they should run simultaneously and
+that valid engine pairs are either RCS + VCS1 (for two contexts respectively),
+or VECS + VCS2.
+
+This can also be extended using sync fences to improve chances of the first
+submission not getting on the hardware after the second one. Second block would
+then look like:
+
+  f
+  1.DEFAULT.1000.f-1.0
+  2.DEFAULT.1000.s-1.0
+  a.-3
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 16/17] gem_wsim: Some more example workloads
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A few additional workloads useful for experimenting with scheduling.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/wsim/frame-split-60fps.wsim      | 16 ++++++++++++++++
 benchmarks/wsim/high-composited-game.wsim   | 11 +++++++++++
 benchmarks/wsim/media-1080p-player.wsim     |  5 +++++
 benchmarks/wsim/medium-composited-game.wsim |  9 +++++++++
 4 files changed, 41 insertions(+)
 create mode 100644 benchmarks/wsim/frame-split-60fps.wsim
 create mode 100644 benchmarks/wsim/high-composited-game.wsim
 create mode 100644 benchmarks/wsim/media-1080p-player.wsim
 create mode 100644 benchmarks/wsim/medium-composited-game.wsim

diff --git a/benchmarks/wsim/frame-split-60fps.wsim b/benchmarks/wsim/frame-split-60fps.wsim
new file mode 100644
index 000000000000..cfbfcd39be7d
--- /dev/null
+++ b/benchmarks/wsim/frame-split-60fps.wsim
@@ -0,0 +1,16 @@
+X.1.0
+M.1.VCS1
+B.1
+X.2.0
+M.2.VCS2
+B.2
+b.2.1.VCS1
+f
+1.DEFAULT.4000-6000.f-1.0
+2.DEFAULT.4000-6000.s-1.0
+a.-3
+3.RCS.2000-4000.-3/-2.0
+3.VECS.2000.-1.0
+4.BCS.1000.-1.0
+s.-2
+p.16667
diff --git a/benchmarks/wsim/high-composited-game.wsim b/benchmarks/wsim/high-composited-game.wsim
new file mode 100644
index 000000000000..a90a2b2be95b
--- /dev/null
+++ b/benchmarks/wsim/high-composited-game.wsim
@@ -0,0 +1,11 @@
+1.RCS.500.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+P.2.1
+2.BCS.1000.-2.0
+2.RCS.2000.-1.1
+p.16667
diff --git a/benchmarks/wsim/media-1080p-player.wsim b/benchmarks/wsim/media-1080p-player.wsim
new file mode 100644
index 000000000000..bcbb0cfd2ad3
--- /dev/null
+++ b/benchmarks/wsim/media-1080p-player.wsim
@@ -0,0 +1,5 @@
+1.VCS.5000-10000.0.0
+2.RCS.1000-2000.-1.0
+P.3.1
+3.BCS.1000.-2.0
+p.16667
diff --git a/benchmarks/wsim/medium-composited-game.wsim b/benchmarks/wsim/medium-composited-game.wsim
new file mode 100644
index 000000000000..580883516168
--- /dev/null
+++ b/benchmarks/wsim/medium-composited-game.wsim
@@ -0,0 +1,9 @@
+1.RCS.1000-2000.0.0
+1.RCS.1000-2000.0.0
+1.RCS.1000-2000.0.0
+1.RCS.1000-2000.0.0
+1.RCS.1000-2000.0.0
+P.2.1
+2.BCS.1000.-2.0
+2.RCS.2000.-1.1
+p.16667
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 16/17] gem_wsim: Some more example workloads
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A few additional workloads useful for experimenting with scheduling.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/wsim/frame-split-60fps.wsim      | 16 ++++++++++++++++
 benchmarks/wsim/high-composited-game.wsim   | 11 +++++++++++
 benchmarks/wsim/media-1080p-player.wsim     |  5 +++++
 benchmarks/wsim/medium-composited-game.wsim |  9 +++++++++
 4 files changed, 41 insertions(+)
 create mode 100644 benchmarks/wsim/frame-split-60fps.wsim
 create mode 100644 benchmarks/wsim/high-composited-game.wsim
 create mode 100644 benchmarks/wsim/media-1080p-player.wsim
 create mode 100644 benchmarks/wsim/medium-composited-game.wsim

diff --git a/benchmarks/wsim/frame-split-60fps.wsim b/benchmarks/wsim/frame-split-60fps.wsim
new file mode 100644
index 000000000000..cfbfcd39be7d
--- /dev/null
+++ b/benchmarks/wsim/frame-split-60fps.wsim
@@ -0,0 +1,16 @@
+X.1.0
+M.1.VCS1
+B.1
+X.2.0
+M.2.VCS2
+B.2
+b.2.1.VCS1
+f
+1.DEFAULT.4000-6000.f-1.0
+2.DEFAULT.4000-6000.s-1.0
+a.-3
+3.RCS.2000-4000.-3/-2.0
+3.VECS.2000.-1.0
+4.BCS.1000.-1.0
+s.-2
+p.16667
diff --git a/benchmarks/wsim/high-composited-game.wsim b/benchmarks/wsim/high-composited-game.wsim
new file mode 100644
index 000000000000..a90a2b2be95b
--- /dev/null
+++ b/benchmarks/wsim/high-composited-game.wsim
@@ -0,0 +1,11 @@
+1.RCS.500.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+1.RCS.2000.0.0
+P.2.1
+2.BCS.1000.-2.0
+2.RCS.2000.-1.1
+p.16667
diff --git a/benchmarks/wsim/media-1080p-player.wsim b/benchmarks/wsim/media-1080p-player.wsim
new file mode 100644
index 000000000000..bcbb0cfd2ad3
--- /dev/null
+++ b/benchmarks/wsim/media-1080p-player.wsim
@@ -0,0 +1,5 @@
+1.VCS.5000-10000.0.0
+2.RCS.1000-2000.-1.0
+P.3.1
+3.BCS.1000.-2.0
+p.16667
diff --git a/benchmarks/wsim/medium-composited-game.wsim b/benchmarks/wsim/medium-composited-game.wsim
new file mode 100644
index 000000000000..580883516168
--- /dev/null
+++ b/benchmarks/wsim/medium-composited-game.wsim
@@ -0,0 +1,9 @@
+1.RCS.1000-2000.0.0
+1.RCS.1000-2000.0.0
+1.RCS.1000-2000.0.0
+1.RCS.1000-2000.0.0
+1.RCS.1000-2000.0.0
+P.2.1
+2.BCS.1000.-2.0
+2.RCS.2000.-1.1
+p.16667
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH i-g-t 17/17] gem_wsim: Infinite batch support
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

For simulating frame split workloads it is useful to express a batch which
ends at the same time as the parallel submission on the respective bonded
engine. For this we add support for infinite batch durations and the batch
terminate command ('T'). Syntax looks like this:

  1.RCS.*.0.0
  T.-1

First step starts an infinite batch, and second command terminates the
infinite batch with the usual relative workload step addressing.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c                  | 119 +++++++++++++++++++------
 benchmarks/wsim/README                 |   9 +-
 benchmarks/wsim/frame-split-60fps.wsim |   6 +-
 3 files changed, 102 insertions(+), 32 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index b5ade7b33883..3669c1f7f1c9 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -85,6 +85,7 @@ enum w_type
 	ENGINE_MAP,
 	LOAD_BALANCE,
 	BOND,
+	TERMINATE,
 };
 
 struct deps
@@ -112,6 +113,7 @@ struct w_step
 	unsigned int context;
 	unsigned int engine;
 	struct duration duration;
+	bool unbound_duration;
 	struct deps data_deps;
 	struct deps fence_deps;
 	int emit_fence;
@@ -142,7 +144,7 @@ struct w_step
 
 	struct drm_i915_gem_execbuffer2 eb;
 	struct drm_i915_gem_exec_object2 *obj;
-	struct drm_i915_gem_relocation_entry reloc[4];
+	struct drm_i915_gem_relocation_entry reloc[5];
 	unsigned long bb_sz;
 	uint32_t bb_handle;
 	uint32_t *mapped_batch;
@@ -153,6 +155,7 @@ struct w_step
 	uint32_t *rt1_address;
 	uint32_t *latch_value;
 	uint32_t *latch_address;
+	uint32_t *recursive_bb_start;
 	unsigned int mapped_len;
 };
 
@@ -492,6 +495,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 				step.type = ENGINE_MAP;
 				goto add_step;
+			} else if (!strcmp(field, "T")) {
+				int_field(TERMINATE, target,
+					  tmp >= 0 || ((int)nr_steps + tmp) < 0,
+					  "Invalid terminate target at step %u!\n");
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
 				while ((field = strtok_r(fstart, ".", &fctx))) {
@@ -598,23 +605,28 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 			fstart = NULL;
 
-			tmpl = strtol(field, &sep, 10);
-			check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
-				  tmpl == LONG_MAX,
-				  "Invalid duration at step %u!\n", nr_steps);
-			step.duration.min = tmpl;
-
-			if (sep && *sep == '-') {
-				tmpl = strtol(sep + 1, NULL, 10);
-				check_arg(tmpl <= 0 ||
-					  tmpl <= step.duration.min ||
-					  tmpl == LONG_MIN ||
+			if (field[0] == '*') {
+				step.unbound_duration = true;
+			} else {
+				tmpl = strtol(field, &sep, 10);
+				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
 					  tmpl == LONG_MAX,
-					  "Invalid duration range at step %u!\n",
+					  "Invalid duration at step %u!\n",
 					  nr_steps);
-				step.duration.max = tmpl;
-			} else {
-				step.duration.max = step.duration.min;
+				step.duration.min = tmpl;
+
+				if (sep && *sep == '-') {
+					tmpl = strtol(sep + 1, NULL, 10);
+					check_arg(tmpl <= 0 ||
+						tmpl <= step.duration.min ||
+						tmpl == LONG_MIN ||
+						tmpl == LONG_MAX,
+						"Invalid duration range at step %u!\n",
+						nr_steps);
+					step.duration.max = tmpl;
+				} else {
+					step.duration.max = step.duration.min;
+				}
 			}
 
 			valid++;
@@ -773,7 +785,7 @@ init_bb(struct w_step *w, unsigned int flags)
 	unsigned int i;
 	uint32_t *ptr;
 
-	if (!arb_period)
+	if (w->unbound_duration || !arb_period)
 		return;
 
 	gem_set_domain(fd, w->bb_handle,
@@ -793,6 +805,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	const uint32_t bbe = 0xa << 23;
 	unsigned long mmap_start, mmap_len;
 	unsigned long batch_start = w->bb_sz;
+	unsigned int r = 0;
 	uint32_t *ptr, *cs;
 
 	igt_assert(((flags & RT) && (flags & SEQNO)) || !(flags & RT));
@@ -803,6 +816,9 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	if (flags & RT)
 		batch_start -= 12 * sizeof(uint32_t);
 
+	if (w->unbound_duration)
+		batch_start -= 4 * sizeof(uint32_t); /* MI_ARB_CHK + MI_BATCH_BUFFER_START */
+
 	mmap_start = rounddown(batch_start, PAGE_SIZE);
 	mmap_len = w->bb_sz - mmap_start;
 
@@ -812,8 +828,19 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	ptr = gem_mmap__wc(fd, w->bb_handle, mmap_start, mmap_len, PROT_WRITE);
 	cs = (uint32_t *)((char *)ptr + batch_start - mmap_start);
 
+	if (w->unbound_duration) {
+		w->reloc[r++].offset = batch_start + 2 * sizeof(uint32_t);
+		batch_start += 4 * sizeof(uint32_t);
+
+		*cs++ = w->preempt_us ? 0x5 << 23 /* MI_ARB_CHK; */ : MI_NOOP;
+		w->recursive_bb_start = cs;
+		*cs++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
+		*cs++ = 0;
+		*cs++ = 0;
+	}
+
 	if (flags & SEQNO) {
-		w->reloc[0].offset = batch_start + sizeof(uint32_t);
+		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
 		batch_start += 4 * sizeof(uint32_t);
 
 		*cs++ = MI_STORE_DWORD_IMM;
@@ -825,7 +852,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	}
 
 	if (flags & RT) {
-		w->reloc[1].offset = batch_start + sizeof(uint32_t);
+		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
 		batch_start += 4 * sizeof(uint32_t);
 
 		*cs++ = MI_STORE_DWORD_IMM;
@@ -835,7 +862,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 		w->rt0_value = cs;
 		*cs++ = 0;
 
-		w->reloc[2].offset = batch_start + 2 * sizeof(uint32_t);
+		w->reloc[r++].offset = batch_start + 2 * sizeof(uint32_t);
 		batch_start += 4 * sizeof(uint32_t);
 
 		*cs++ = 0x24 << 23 | 2; /* MI_STORE_REG_MEM */
@@ -844,7 +871,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 		*cs++ = 0;
 		*cs++ = 0;
 
-		w->reloc[3].offset = batch_start + sizeof(uint32_t);
+		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
 		batch_start += 4 * sizeof(uint32_t);
 
 		*cs++ = MI_STORE_DWORD_IMM;
@@ -979,19 +1006,28 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 		}
 	}
 
-	w->bb_sz = get_bb_sz(w->duration.max);
-	w->bb_handle = w->obj[j].handle = gem_create(fd, w->bb_sz);
+	if (w->unbound_duration)
+		/* nops + MI_ARB_CHK + MI_BATCH_BUFFER_START */
+		w->bb_sz = max(64, get_bb_sz(w->preempt_us)) +
+			   (1 + 3) * sizeof(uint32_t);
+	else
+		w->bb_sz = get_bb_sz(w->duration.max);
+	w->bb_handle = w->obj[j].handle = gem_create(fd, w->bb_sz + (w->unbound_duration ? 4096 : 0));
 	init_bb(w, flags);
 	terminate_bb(w, flags);
 
-	if (flags & SEQNO) {
+	if ((flags & SEQNO) || w->unbound_duration) {
 		w->obj[j].relocs_ptr = to_user_pointer(&w->reloc);
+		if (flags & SEQNO)
+			w->obj[j].relocation_count++;
 		if (flags & RT)
-			w->obj[j].relocation_count = 4;
-		else
-			w->obj[j].relocation_count = 1;
+			w->obj[j].relocation_count += 3;
+		if (w->unbound_duration)
+			w->obj[j].relocation_count++;
 		for (i = 0; i < w->obj[j].relocation_count; i++)
 			w->reloc[i].target_handle = 1;
+		if (w->unbound_duration)
+			w->reloc[0].target_handle = j;
 	}
 
 	w->eb.buffers_ptr = to_user_pointer(w->obj);
@@ -1988,6 +2024,18 @@ update_bb_rt(struct w_step *w, enum intel_engine_id engine, uint32_t seqno)
 	}
 }
 
+static void
+update_bb_start(struct w_step *w)
+{
+	if (!w->unbound_duration)
+		return;
+
+	gem_set_domain(fd, w->bb_handle,
+		       I915_GEM_DOMAIN_WC, I915_GEM_DOMAIN_WC);
+
+	*w->recursive_bb_start = MI_BATCH_BUFFER_START | (1 << 8) | 1;
+}
+
 static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 {
 	if (target < 0)
@@ -2123,9 +2171,13 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 	if (flags & RT)
 		update_bb_rt(w, engine, seqno);
 
+	update_bb_start(w);
+
 	w->eb.batch_start_offset =
+		w->unbound_duration ?
+		0 :
 		ALIGN(w->bb_sz - get_bb_sz(get_duration(w)),
-			2 * sizeof(uint32_t));
+		      2 * sizeof(uint32_t));
 
 	for (i = 0; i < w->fence_deps.nr; i++) {
 		int tgt = w->idx + w->fence_deps.list[i];
@@ -2265,6 +2317,17 @@ static void *run_workload(void *data)
 								    w->priority;
 				}
 				continue;
+			} else if (w->type == TERMINATE) {
+				unsigned int t_idx = i + w->target;
+
+				igt_assert(t_idx >= 0 && t_idx < i);
+				igt_assert(wrk->steps[t_idx].type == BATCH);
+				igt_assert(wrk->steps[t_idx].unbound_duration);
+
+				*wrk->steps[t_idx].recursive_bb_start =
+					MI_BATCH_BUFFER_END;
+				__sync_synchronize();
+				continue;
 			} else if (w->type == PREEMPTION ||
 				   w->type == ENGINE_MAP ||
 				   w->type == LOAD_BALANCE ||
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index f2974992ab68..439ea3650e3d 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -2,11 +2,11 @@ Workload descriptor format
 ==========================
 
 ctx.engine.duration_us.dependency.wait,...
-<uint>.<str>.<uint>[-<uint>].<int <= 0>[/<int <= 0>][...].<0|1>,...
+<uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
 B.<uint>
 M.<uint>.<str>[|<str>]...
 P|X.<uint>.<int>
-d|p|s|t|q|a.<int>,...
+d|p|s|t|q|a|T.<int>,...
 b.<uint>.<uint>.<str>
 f
 
@@ -30,6 +30,7 @@ Additional workload steps are also supported:
  'b' - Set up engine bonds.
  'M' - Set up engine map.
  'P' - Context priority.
+ 'T' - Terminate an infinite batch.
  'X' - Context preemption control.
 
 Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
@@ -77,6 +78,10 @@ Example:
 
 I this case the last step has a data dependency on both first and second steps.
 
+Batch durations can also be specified as infinite by using the '*' in the
+duration field. Such batches must be ended by the terminate command ('T')
+otherwise they will cause a GPU hang to be reported.
+
 Sync (fd) fences
 ----------------
 
diff --git a/benchmarks/wsim/frame-split-60fps.wsim b/benchmarks/wsim/frame-split-60fps.wsim
index cfbfcd39be7d..ea89da3add48 100644
--- a/benchmarks/wsim/frame-split-60fps.wsim
+++ b/benchmarks/wsim/frame-split-60fps.wsim
@@ -6,10 +6,12 @@ M.2.VCS2
 B.2
 b.2.1.VCS1
 f
-1.DEFAULT.4000-6000.f-1.0
+1.DEFAULT.*.f-1.0
 2.DEFAULT.4000-6000.s-1.0
 a.-3
-3.RCS.2000-4000.-3/-2.0
+s.-2
+T.-4
+3.RCS.2000-4000.-5/-4.0
 3.VECS.2000.-1.0
 4.BCS.1000.-1.0
 s.-2
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] [PATCH i-g-t 17/17] gem_wsim: Infinite batch support
@ 2018-10-18 15:28   ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-18 15:28 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

For simulating frame split workloads it is useful to express a batch which
ends at the same time as the parallel submission on the respective bonded
engine. For this we add support for infinite batch durations and the batch
terminate command ('T'). Syntax looks like this:

  1.RCS.*.0.0
  T.-1

First step starts an infinite batch, and second command terminates the
infinite batch with the usual relative workload step addressing.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_wsim.c                  | 119 +++++++++++++++++++------
 benchmarks/wsim/README                 |   9 +-
 benchmarks/wsim/frame-split-60fps.wsim |   6 +-
 3 files changed, 102 insertions(+), 32 deletions(-)

diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index b5ade7b33883..3669c1f7f1c9 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -85,6 +85,7 @@ enum w_type
 	ENGINE_MAP,
 	LOAD_BALANCE,
 	BOND,
+	TERMINATE,
 };
 
 struct deps
@@ -112,6 +113,7 @@ struct w_step
 	unsigned int context;
 	unsigned int engine;
 	struct duration duration;
+	bool unbound_duration;
 	struct deps data_deps;
 	struct deps fence_deps;
 	int emit_fence;
@@ -142,7 +144,7 @@ struct w_step
 
 	struct drm_i915_gem_execbuffer2 eb;
 	struct drm_i915_gem_exec_object2 *obj;
-	struct drm_i915_gem_relocation_entry reloc[4];
+	struct drm_i915_gem_relocation_entry reloc[5];
 	unsigned long bb_sz;
 	uint32_t bb_handle;
 	uint32_t *mapped_batch;
@@ -153,6 +155,7 @@ struct w_step
 	uint32_t *rt1_address;
 	uint32_t *latch_value;
 	uint32_t *latch_address;
+	uint32_t *recursive_bb_start;
 	unsigned int mapped_len;
 };
 
@@ -492,6 +495,10 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 				step.type = ENGINE_MAP;
 				goto add_step;
+			} else if (!strcmp(field, "T")) {
+				int_field(TERMINATE, target,
+					  tmp >= 0 || ((int)nr_steps + tmp) < 0,
+					  "Invalid terminate target at step %u!\n");
 			} else if (!strcmp(field, "X")) {
 				unsigned int nr = 0;
 				while ((field = strtok_r(fstart, ".", &fctx))) {
@@ -598,23 +605,28 @@ parse_workload(struct w_arg *arg, unsigned int flags, struct workload *app_w)
 
 			fstart = NULL;
 
-			tmpl = strtol(field, &sep, 10);
-			check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
-				  tmpl == LONG_MAX,
-				  "Invalid duration at step %u!\n", nr_steps);
-			step.duration.min = tmpl;
-
-			if (sep && *sep == '-') {
-				tmpl = strtol(sep + 1, NULL, 10);
-				check_arg(tmpl <= 0 ||
-					  tmpl <= step.duration.min ||
-					  tmpl == LONG_MIN ||
+			if (field[0] == '*') {
+				step.unbound_duration = true;
+			} else {
+				tmpl = strtol(field, &sep, 10);
+				check_arg(tmpl <= 0 || tmpl == LONG_MIN ||
 					  tmpl == LONG_MAX,
-					  "Invalid duration range at step %u!\n",
+					  "Invalid duration at step %u!\n",
 					  nr_steps);
-				step.duration.max = tmpl;
-			} else {
-				step.duration.max = step.duration.min;
+				step.duration.min = tmpl;
+
+				if (sep && *sep == '-') {
+					tmpl = strtol(sep + 1, NULL, 10);
+					check_arg(tmpl <= 0 ||
+						tmpl <= step.duration.min ||
+						tmpl == LONG_MIN ||
+						tmpl == LONG_MAX,
+						"Invalid duration range at step %u!\n",
+						nr_steps);
+					step.duration.max = tmpl;
+				} else {
+					step.duration.max = step.duration.min;
+				}
 			}
 
 			valid++;
@@ -773,7 +785,7 @@ init_bb(struct w_step *w, unsigned int flags)
 	unsigned int i;
 	uint32_t *ptr;
 
-	if (!arb_period)
+	if (w->unbound_duration || !arb_period)
 		return;
 
 	gem_set_domain(fd, w->bb_handle,
@@ -793,6 +805,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	const uint32_t bbe = 0xa << 23;
 	unsigned long mmap_start, mmap_len;
 	unsigned long batch_start = w->bb_sz;
+	unsigned int r = 0;
 	uint32_t *ptr, *cs;
 
 	igt_assert(((flags & RT) && (flags & SEQNO)) || !(flags & RT));
@@ -803,6 +816,9 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	if (flags & RT)
 		batch_start -= 12 * sizeof(uint32_t);
 
+	if (w->unbound_duration)
+		batch_start -= 4 * sizeof(uint32_t); /* MI_ARB_CHK + MI_BATCH_BUFFER_START */
+
 	mmap_start = rounddown(batch_start, PAGE_SIZE);
 	mmap_len = w->bb_sz - mmap_start;
 
@@ -812,8 +828,19 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	ptr = gem_mmap__wc(fd, w->bb_handle, mmap_start, mmap_len, PROT_WRITE);
 	cs = (uint32_t *)((char *)ptr + batch_start - mmap_start);
 
+	if (w->unbound_duration) {
+		w->reloc[r++].offset = batch_start + 2 * sizeof(uint32_t);
+		batch_start += 4 * sizeof(uint32_t);
+
+		*cs++ = w->preempt_us ? 0x5 << 23 /* MI_ARB_CHK; */ : MI_NOOP;
+		w->recursive_bb_start = cs;
+		*cs++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
+		*cs++ = 0;
+		*cs++ = 0;
+	}
+
 	if (flags & SEQNO) {
-		w->reloc[0].offset = batch_start + sizeof(uint32_t);
+		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
 		batch_start += 4 * sizeof(uint32_t);
 
 		*cs++ = MI_STORE_DWORD_IMM;
@@ -825,7 +852,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 	}
 
 	if (flags & RT) {
-		w->reloc[1].offset = batch_start + sizeof(uint32_t);
+		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
 		batch_start += 4 * sizeof(uint32_t);
 
 		*cs++ = MI_STORE_DWORD_IMM;
@@ -835,7 +862,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 		w->rt0_value = cs;
 		*cs++ = 0;
 
-		w->reloc[2].offset = batch_start + 2 * sizeof(uint32_t);
+		w->reloc[r++].offset = batch_start + 2 * sizeof(uint32_t);
 		batch_start += 4 * sizeof(uint32_t);
 
 		*cs++ = 0x24 << 23 | 2; /* MI_STORE_REG_MEM */
@@ -844,7 +871,7 @@ terminate_bb(struct w_step *w, unsigned int flags)
 		*cs++ = 0;
 		*cs++ = 0;
 
-		w->reloc[3].offset = batch_start + sizeof(uint32_t);
+		w->reloc[r++].offset = batch_start + sizeof(uint32_t);
 		batch_start += 4 * sizeof(uint32_t);
 
 		*cs++ = MI_STORE_DWORD_IMM;
@@ -979,19 +1006,28 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, unsigned int flags)
 		}
 	}
 
-	w->bb_sz = get_bb_sz(w->duration.max);
-	w->bb_handle = w->obj[j].handle = gem_create(fd, w->bb_sz);
+	if (w->unbound_duration)
+		/* nops + MI_ARB_CHK + MI_BATCH_BUFFER_START */
+		w->bb_sz = max(64, get_bb_sz(w->preempt_us)) +
+			   (1 + 3) * sizeof(uint32_t);
+	else
+		w->bb_sz = get_bb_sz(w->duration.max);
+	w->bb_handle = w->obj[j].handle = gem_create(fd, w->bb_sz + (w->unbound_duration ? 4096 : 0));
 	init_bb(w, flags);
 	terminate_bb(w, flags);
 
-	if (flags & SEQNO) {
+	if ((flags & SEQNO) || w->unbound_duration) {
 		w->obj[j].relocs_ptr = to_user_pointer(&w->reloc);
+		if (flags & SEQNO)
+			w->obj[j].relocation_count++;
 		if (flags & RT)
-			w->obj[j].relocation_count = 4;
-		else
-			w->obj[j].relocation_count = 1;
+			w->obj[j].relocation_count += 3;
+		if (w->unbound_duration)
+			w->obj[j].relocation_count++;
 		for (i = 0; i < w->obj[j].relocation_count; i++)
 			w->reloc[i].target_handle = 1;
+		if (w->unbound_duration)
+			w->reloc[0].target_handle = j;
 	}
 
 	w->eb.buffers_ptr = to_user_pointer(w->obj);
@@ -1988,6 +2024,18 @@ update_bb_rt(struct w_step *w, enum intel_engine_id engine, uint32_t seqno)
 	}
 }
 
+static void
+update_bb_start(struct w_step *w)
+{
+	if (!w->unbound_duration)
+		return;
+
+	gem_set_domain(fd, w->bb_handle,
+		       I915_GEM_DOMAIN_WC, I915_GEM_DOMAIN_WC);
+
+	*w->recursive_bb_start = MI_BATCH_BUFFER_START | (1 << 8) | 1;
+}
+
 static void w_sync_to(struct workload *wrk, struct w_step *w, int target)
 {
 	if (target < 0)
@@ -2123,9 +2171,13 @@ do_eb(struct workload *wrk, struct w_step *w, enum intel_engine_id engine,
 	if (flags & RT)
 		update_bb_rt(w, engine, seqno);
 
+	update_bb_start(w);
+
 	w->eb.batch_start_offset =
+		w->unbound_duration ?
+		0 :
 		ALIGN(w->bb_sz - get_bb_sz(get_duration(w)),
-			2 * sizeof(uint32_t));
+		      2 * sizeof(uint32_t));
 
 	for (i = 0; i < w->fence_deps.nr; i++) {
 		int tgt = w->idx + w->fence_deps.list[i];
@@ -2265,6 +2317,17 @@ static void *run_workload(void *data)
 								    w->priority;
 				}
 				continue;
+			} else if (w->type == TERMINATE) {
+				unsigned int t_idx = i + w->target;
+
+				igt_assert(t_idx >= 0 && t_idx < i);
+				igt_assert(wrk->steps[t_idx].type == BATCH);
+				igt_assert(wrk->steps[t_idx].unbound_duration);
+
+				*wrk->steps[t_idx].recursive_bb_start =
+					MI_BATCH_BUFFER_END;
+				__sync_synchronize();
+				continue;
 			} else if (w->type == PREEMPTION ||
 				   w->type == ENGINE_MAP ||
 				   w->type == LOAD_BALANCE ||
diff --git a/benchmarks/wsim/README b/benchmarks/wsim/README
index f2974992ab68..439ea3650e3d 100644
--- a/benchmarks/wsim/README
+++ b/benchmarks/wsim/README
@@ -2,11 +2,11 @@ Workload descriptor format
 ==========================
 
 ctx.engine.duration_us.dependency.wait,...
-<uint>.<str>.<uint>[-<uint>].<int <= 0>[/<int <= 0>][...].<0|1>,...
+<uint>.<str>.<uint>[-<uint>]|*.<int <= 0>[/<int <= 0>][...].<0|1>,...
 B.<uint>
 M.<uint>.<str>[|<str>]...
 P|X.<uint>.<int>
-d|p|s|t|q|a.<int>,...
+d|p|s|t|q|a|T.<int>,...
 b.<uint>.<uint>.<str>
 f
 
@@ -30,6 +30,7 @@ Additional workload steps are also supported:
  'b' - Set up engine bonds.
  'M' - Set up engine map.
  'P' - Context priority.
+ 'T' - Terminate an infinite batch.
  'X' - Context preemption control.
 
 Engine ids: DEFAULT, RCS, BCS, VCS, VCS1, VCS2, VECS
@@ -77,6 +78,10 @@ Example:
 
 I this case the last step has a data dependency on both first and second steps.
 
+Batch durations can also be specified as infinite by using the '*' in the
+duration field. Such batches must be ended by the terminate command ('T')
+otherwise they will cause a GPU hang to be reported.
+
 Sync (fd) fences
 ----------------
 
diff --git a/benchmarks/wsim/frame-split-60fps.wsim b/benchmarks/wsim/frame-split-60fps.wsim
index cfbfcd39be7d..ea89da3add48 100644
--- a/benchmarks/wsim/frame-split-60fps.wsim
+++ b/benchmarks/wsim/frame-split-60fps.wsim
@@ -6,10 +6,12 @@ M.2.VCS2
 B.2
 b.2.1.VCS1
 f
-1.DEFAULT.4000-6000.f-1.0
+1.DEFAULT.*.f-1.0
 2.DEFAULT.4000-6000.s-1.0
 a.-3
-3.RCS.2000-4000.-3/-2.0
+s.-2
+T.-4
+3.RCS.2000-4000.-5/-4.0
 3.VECS.2000.-1.0
 4.BCS.1000.-1.0
 s.-2
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [igt-dev] ✗ Fi.CI.BAT: failure for Media scalability tooling
  2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
                   ` (17 preceding siblings ...)
  (?)
@ 2018-10-18 15:37 ` Patchwork
  -1 siblings, 0 replies; 41+ messages in thread
From: Patchwork @ 2018-10-18 15:37 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: Media scalability tooling
URL   : https://patchwork.freedesktop.org/series/51193/
State : failure

== Summary ==

IGT patchset build failed on latest successful build
7766b1e2348b32cc8ed58a972c6fd53b20279549 tests/kms_selftest: Integrate kernel selftest test-drm_modeset

ninja: Entering directory `build'
[1/671] Generating version.h with a custom command.
[2/669] Linking static target lib/libigt-drmtest_c.a.
[3/669] Linking static target lib/libigt-i915_gem_context_c.a.
[4/669] Linking static target lib/libigt-i915_gem_scheduler_c.a.
[5/669] Linking static target lib/libigt-i915_gem_submission_c.a.
[6/669] Linking static target lib/libigt-i915_gem_ring_c.a.
[7/669] Linking static target lib/libigt-igt_debugfs_c.a.
[8/669] Linking static target lib/libigt-igt_device_c.a.
[9/669] Linking static target lib/libigt-igt_aux_c.a.
[10/669] Linking static target lib/libigt-igt_gt_c.a.
[11/669] Linking static target lib/libigt-igt_syncobj_c.a.
[12/669] Linking static target lib/libigt-igt_sysfs_c.a.
[13/669] Linking static target lib/libigt-igt_vgem_c.a.
[14/669] Linking static target lib/libigt-intel_batchbuffer_c.a.
[15/669] Linking static target lib/libigt-intel_chipset_c.a.
[16/669] Linking static target lib/libigt-intel_mmio_c.a.
[17/669] Linking static target lib/libigt-ioctl_wrappers_c.a.
[18/669] Linking static target lib/libigt-media_spin_c.a.
[19/669] Linking static target lib/libigt-media_fill_c.a.
[20/669] Linking static target lib/libigt-gpgpu_fill_c.a.
[21/669] Linking static target lib/libigt-gpu_cmds_c.a.
[22/669] Linking static target lib/libigt-rendercopy_i915_c.a.
[23/669] Linking static target lib/libigt-rendercopy_i830_c.a.
[24/669] Linking static target lib/libigt-rendercopy_gen4_c.a.
[25/669] Linking static target lib/libigt-rendercopy_gen6_c.a.
[26/669] Linking static target lib/libigt-rendercopy_gen7_c.a.
[27/669] Linking static target lib/libigt-rendercopy_gen8_c.a.
[28/669] Linking static target lib/libigt-rendercopy_gen9_c.a.
[29/669] Compiling C object 'lib/igt-sw_sync_c@sta/sw_sync.c.o'.
FAILED: lib/igt-sw_sync_c@sta/sw_sync.c.o 
ccache cc  -Ilib/igt-sw_sync_c@sta -Ilib -I../lib -I. -I../ -I../lib/stubs/syscalls -I../include/drm-uapi -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/libpng16 -I/usr/include/freetype2 -I/usr/include/libpng12 -I/opt/igt/include -I/opt/igt/include/libdrm -I/usr/include/x86_64-linux-gnu -I/usr/include -I/home/cidrm/kernel_headers/include -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=gnu11 -O0 -g -D_GNU_SOURCE -include config.h -Wno-unused-parameter -Wno-sign-compare -Wno-missing-field-initializers -Wno-clobbered -Wno-type-limits -Wimplicit-fallthrough=0 -fPIC -pthread '-DIGT_DATADIR="/opt/igt/share/igt-gpu-tools"' '-DIGT_SRCDIR="/home/cidrm/igt-gpu-tools/tests"' '-DIGT_LOG_DOMAIN="sw_sync"' -MD -MQ 'lib/igt-sw_sync_c@sta/sw_sync.c.o' -MF 'lib/igt-sw_sync_c@sta/sw_sync.c.o.d' -o 'lib/igt-sw_sync_c@sta/sw_sync.c.o' -c ../lib/sw_sync.c
../lib/sw_sync.c:36:10: fatal error: sync_file.h: No such file or directory
 #include "sync_file.h"
          ^~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] gem_wsim: Engine bond command
  2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-18 21:48     ` Chris Wilson
  -1 siblings, 0 replies; 41+ messages in thread
From: Chris Wilson @ 2018-10-18 21:48 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

Quoting Tvrtko Ursulin (2018-10-18 16:28:13)
> @@ -1196,6 +1256,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
>                                 { .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
>                                   .engines_mask = -1,
>                                 };
> +                       struct i915_context_engines_bond *bonds = NULL;
>  
>                         if (ctx->wants_balance) {
>                                 set_engines.extensions =
> @@ -1211,7 +1272,31 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
>                                         ctx->engine_map[j] - VCS1; /* FIXME */
>                         }
>  
> +                       if (ctx->bond_count) {
> +                               bonds = calloc(ctx->bond_count, sizeof(*bonds));
> +                               load_balance.base.next_extension =
> +                                       to_user_pointer(&bonds[0]);
> +                       }
> +
> +                       for (j = 0; j < ctx->bond_count; j++) {
> +                               struct i915_context_engines_bond *bond =
> +                                       &bonds[j];
> +
> +                               if (j < (ctx->bond_count - 1))
> +                                       bond->base.next_extension =
> +                                               to_user_pointer(bond + 1);
> +
> +                               bond->base.name = I915_CONTEXT_ENGINES_EXT_BOND;
> +                               bond->master_class = I915_ENGINE_CLASS_VIDEO;
> +                               bond->master_instance =
> +                                       ctx->bonds[j].master - VCS1;
> +                               bond->sibling_mask = ctx->bonds[j].mask;
> +                       }
> +
>                         gem_context_set_param(fd, &param);
> +
> +                       if (bonds)
> +                               free(bonds);

free(NULL) is legal, so just free(bonds) here.

Looking at how you have constructed the map for the extension, I'm
reasonably happy with how this works in practice (outside of the igt
tests). Is the flexibility (of next_extension and separate bond structs)
too much?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] gem_wsim: Engine bond command
@ 2018-10-18 21:48     ` Chris Wilson
  0 siblings, 0 replies; 41+ messages in thread
From: Chris Wilson @ 2018-10-18 21:48 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-10-18 16:28:13)
> @@ -1196,6 +1256,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
>                                 { .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
>                                   .engines_mask = -1,
>                                 };
> +                       struct i915_context_engines_bond *bonds = NULL;
>  
>                         if (ctx->wants_balance) {
>                                 set_engines.extensions =
> @@ -1211,7 +1272,31 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
>                                         ctx->engine_map[j] - VCS1; /* FIXME */
>                         }
>  
> +                       if (ctx->bond_count) {
> +                               bonds = calloc(ctx->bond_count, sizeof(*bonds));
> +                               load_balance.base.next_extension =
> +                                       to_user_pointer(&bonds[0]);
> +                       }
> +
> +                       for (j = 0; j < ctx->bond_count; j++) {
> +                               struct i915_context_engines_bond *bond =
> +                                       &bonds[j];
> +
> +                               if (j < (ctx->bond_count - 1))
> +                                       bond->base.next_extension =
> +                                               to_user_pointer(bond + 1);
> +
> +                               bond->base.name = I915_CONTEXT_ENGINES_EXT_BOND;
> +                               bond->master_class = I915_ENGINE_CLASS_VIDEO;
> +                               bond->master_instance =
> +                                       ctx->bonds[j].master - VCS1;
> +                               bond->sibling_mask = ctx->bonds[j].mask;
> +                       }
> +
>                         gem_context_set_param(fd, &param);
> +
> +                       if (bonds)
> +                               free(bonds);

free(NULL) is legal, so just free(bonds) here.

Looking at how you have constructed the map for the extension, I'm
reasonably happy with how this works in practice (outside of the igt
tests). Is the flexibility (of next_extension and separate bond structs)
too much?
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] gem_wsim: Engine bond command
  2018-10-18 21:48     ` Chris Wilson
@ 2018-10-26 11:11       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-26 11:11 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx


On 18/10/2018 22:48, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-10-18 16:28:13)
>> @@ -1196,6 +1256,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
>>                                  { .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
>>                                    .engines_mask = -1,
>>                                  };
>> +                       struct i915_context_engines_bond *bonds = NULL;
>>   
>>                          if (ctx->wants_balance) {
>>                                  set_engines.extensions =
>> @@ -1211,7 +1272,31 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
>>                                          ctx->engine_map[j] - VCS1; /* FIXME */
>>                          }
>>   
>> +                       if (ctx->bond_count) {
>> +                               bonds = calloc(ctx->bond_count, sizeof(*bonds));
>> +                               load_balance.base.next_extension =
>> +                                       to_user_pointer(&bonds[0]);
>> +                       }
>> +
>> +                       for (j = 0; j < ctx->bond_count; j++) {
>> +                               struct i915_context_engines_bond *bond =
>> +                                       &bonds[j];
>> +
>> +                               if (j < (ctx->bond_count - 1))
>> +                                       bond->base.next_extension =
>> +                                               to_user_pointer(bond + 1);
>> +
>> +                               bond->base.name = I915_CONTEXT_ENGINES_EXT_BOND;
>> +                               bond->master_class = I915_ENGINE_CLASS_VIDEO;
>> +                               bond->master_instance =
>> +                                       ctx->bonds[j].master - VCS1;
>> +                               bond->sibling_mask = ctx->bonds[j].mask;
>> +                       }
>> +
>>                          gem_context_set_param(fd, &param);
>> +
>> +                       if (bonds)
>> +                               free(bonds);
> 
> free(NULL) is legal, so just free(bonds) here.

The only one i am usually sure of is kfree. :)

> Looking at how you have constructed the map for the extension, I'm
> reasonably happy with how this works in practice (outside of the igt
> tests). Is the flexibility (of next_extension and separate bond structs)
> too much?

Shrug. My only gripe is that it is a nice generic mechanism used only in 
ctx set param, which feels a bit on it's head (*), but in principle I 
think it is fine.

Regards,

Tvrtko

(*) Sounds like it would be nicer if it was a core concept in more of 
our uapi's. Going from the bottom up like create_context + extension 
this + extension that. And the same for other our ioctls, where 
applicable. But perhaps it is a stretch it would fit to many more places.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 15/17] gem_wsim: Engine bond command
@ 2018-10-26 11:11       ` Tvrtko Ursulin
  0 siblings, 0 replies; 41+ messages in thread
From: Tvrtko Ursulin @ 2018-10-26 11:11 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin


On 18/10/2018 22:48, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-10-18 16:28:13)
>> @@ -1196,6 +1256,7 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
>>                                  { .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
>>                                    .engines_mask = -1,
>>                                  };
>> +                       struct i915_context_engines_bond *bonds = NULL;
>>   
>>                          if (ctx->wants_balance) {
>>                                  set_engines.extensions =
>> @@ -1211,7 +1272,31 @@ prepare_workload(unsigned int id, struct workload *wrk, unsigned int flags)
>>                                          ctx->engine_map[j] - VCS1; /* FIXME */
>>                          }
>>   
>> +                       if (ctx->bond_count) {
>> +                               bonds = calloc(ctx->bond_count, sizeof(*bonds));
>> +                               load_balance.base.next_extension =
>> +                                       to_user_pointer(&bonds[0]);
>> +                       }
>> +
>> +                       for (j = 0; j < ctx->bond_count; j++) {
>> +                               struct i915_context_engines_bond *bond =
>> +                                       &bonds[j];
>> +
>> +                               if (j < (ctx->bond_count - 1))
>> +                                       bond->base.next_extension =
>> +                                               to_user_pointer(bond + 1);
>> +
>> +                               bond->base.name = I915_CONTEXT_ENGINES_EXT_BOND;
>> +                               bond->master_class = I915_ENGINE_CLASS_VIDEO;
>> +                               bond->master_instance =
>> +                                       ctx->bonds[j].master - VCS1;
>> +                               bond->sibling_mask = ctx->bonds[j].mask;
>> +                       }
>> +
>>                          gem_context_set_param(fd, &param);
>> +
>> +                       if (bonds)
>> +                               free(bonds);
> 
> free(NULL) is legal, so just free(bonds) here.

The only one i am usually sure of is kfree. :)

> Looking at how you have constructed the map for the extension, I'm
> reasonably happy with how this works in practice (outside of the igt
> tests). Is the flexibility (of next_extension and separate bond structs)
> too much?

Shrug. My only gripe is that it is a nice generic mechanism used only in 
ctx set param, which feels a bit on it's head (*), but in principle I 
think it is fine.

Regards,

Tvrtko

(*) Sounds like it would be nicer if it was a core concept in more of 
our uapi's. Going from the bottom up like create_context + extension 
this + extension that. And the same for other our ioctls, where 
applicable. But perhaps it is a stretch it would fit to many more places.
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-10-26 11:11 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-18 15:27 [PATCH i-g-t 00/17] Media scalability tooling Tvrtko Ursulin
2018-10-18 15:27 ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:27 ` [PATCH i-g-t 01/17] lib: Update uapi headers Tvrtko Ursulin
2018-10-18 15:27   ` [Intel-gfx] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 02/17] trace.pl: Virtual engine support Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 03/17] trace.pl: Virtual engine preemption support Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 04/17] wsim/media-bench: i915 balancing Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 05/17] gem_wsim: Use IGT uapi headers Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 06/17] gem_wsim: Fix shadowed local Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 07/17] gem_wsim: Factor out common error handling Tvrtko Ursulin
2018-10-18 15:28   ` [Intel-gfx] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 08/17] gem_wsim: More wsim_err Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 09/17] gem_wsim: Submit fence support Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 10/17] gem_wsim: Extract str to engine lookup Tvrtko Ursulin
2018-10-18 15:28   ` [Intel-gfx] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 11/17] gem_wsim: Engine map support Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 12/17] gem_wsim: Save some lines by changing to implicit NULL checking Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 13/17] gem_wsim: Compact int command parsing with a macro Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 14/17] gem_wsim: Engine map load balance command Tvrtko Ursulin
2018-10-18 15:28   ` [Intel-gfx] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 15/17] gem_wsim: Engine bond command Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 21:48   ` Chris Wilson
2018-10-18 21:48     ` Chris Wilson
2018-10-26 11:11     ` Tvrtko Ursulin
2018-10-26 11:11       ` Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 16/17] gem_wsim: Some more example workloads Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:28 ` [PATCH i-g-t 17/17] gem_wsim: Infinite batch support Tvrtko Ursulin
2018-10-18 15:28   ` [igt-dev] " Tvrtko Ursulin
2018-10-18 15:37 ` [igt-dev] ✗ Fi.CI.BAT: failure for Media scalability tooling Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.