All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC i-g-t 0/6] 21st century intel_gpu_top & Co.
@ 2018-10-03 12:07 ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

IGT patches accompanying the similary named i915 series. Most notably to sketch
an improved intel_gpu_top which now, like the real top, can show the per client
engine utilisation:

intel-gpu-top - load avg  3.30,  1.51,  0.08;  949/ 949 MHz;    0% RC6;  14.66 Watts;     3605 irqs/s

      IMC reads:     4651 MiB/s
     IMC writes:       25 MiB/s

          ENGINE      BUSY                                                                                Q   r   R MI_SEMA MI_WAIT
     Render/3D/0    61.51% |█████████████████████████████████████████████▌                            |   3   0   1      0%      0%
       Blitter/0     0.00% |                                                                          |   0   0   0      0%      0%
         Video/0    60.86% |█████████████████████████████████████████████                             |   1   0   1      0%      0%
         Video/1    59.04% |███████████████████████████████████████████▋                              |   1   0   1      0%      0%
  VideoEnhance/0     0.00% |                                                                          |   0   0   0      0%      0%

  PID            NAME     Render/3D/0            Blitter/0              Video/0               Video/1            VideoEnhance/0
23373        gem_wsim |█████▎              ||                    ||████████▍           ||█████▎              ||                    |
23374        gem_wsim |███▉                ||                    ||██▏                 ||███                 ||                    |
23375        gem_wsim |███                 ||                    ||█▍                  ||███▌                ||                    |

Tvrtko Ursulin (6):
  include: DRM uAPI headers update
  intel-gpu-overlay: Add engine queue stats
  intel-gpu-overlay: Show 1s, 30s and 15m GPU load
  tests/perf_pmu: Add tests for engine queued/runnable/running stats
  intel-gpu-top: Add queue depths and load average
  intel-gpu-top: Support for client stats

 include/drm-uapi/amdgpu_drm.h  |  52 +++-
 include/drm-uapi/drm.h         |  16 ++
 include/drm-uapi/drm_fourcc.h  | 215 ++++++++++++++
 include/drm-uapi/drm_mode.h    |  26 +-
 include/drm-uapi/etnaviv_drm.h |   6 +
 include/drm-uapi/exynos_drm.h  | 240 ++++++++++++++++
 include/drm-uapi/i915_drm.h    |  49 +++-
 include/drm-uapi/msm_drm.h     |   2 +
 include/drm-uapi/tegra_drm.h   | 492 ++++++++++++++++++++++++++++++++-
 include/drm-uapi/v3d_drm.h     | 194 +++++++++++++
 include/drm-uapi/vc4_drm.h     |  13 +-
 include/drm-uapi/virtgpu_drm.h |   1 +
 include/drm-uapi/vmwgfx_drm.h  | 166 ++++++++---
 overlay/gpu-top.c              |  81 +++++-
 overlay/gpu-top.h              |  22 +-
 overlay/overlay.c              |  35 ++-
 tests/perf_pmu.c               | 259 +++++++++++++++++
 tools/Makefile.am              |   2 +-
 tools/intel_gpu_top.c          | 480 ++++++++++++++++++++++++++++++--
 tools/meson.build              |   2 +-
 20 files changed, 2264 insertions(+), 89 deletions(-)
 create mode 100644 include/drm-uapi/v3d_drm.h

-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [igt-dev] [RFC i-g-t 0/6] 21st century intel_gpu_top & Co.
@ 2018-10-03 12:07 ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

IGT patches accompanying the similary named i915 series. Most notably to sketch
an improved intel_gpu_top which now, like the real top, can show the per client
engine utilisation:

intel-gpu-top - load avg  3.30,  1.51,  0.08;  949/ 949 MHz;    0% RC6;  14.66 Watts;     3605 irqs/s

      IMC reads:     4651 MiB/s
     IMC writes:       25 MiB/s

          ENGINE      BUSY                                                                                Q   r   R MI_SEMA MI_WAIT
     Render/3D/0    61.51% |█████████████████████████████████████████████▌                            |   3   0   1      0%      0%
       Blitter/0     0.00% |                                                                          |   0   0   0      0%      0%
         Video/0    60.86% |█████████████████████████████████████████████                             |   1   0   1      0%      0%
         Video/1    59.04% |███████████████████████████████████████████▋                              |   1   0   1      0%      0%
  VideoEnhance/0     0.00% |                                                                          |   0   0   0      0%      0%

  PID            NAME     Render/3D/0            Blitter/0              Video/0               Video/1            VideoEnhance/0
23373        gem_wsim |█████▎              ||                    ||████████▍           ||█████▎              ||                    |
23374        gem_wsim |███▉                ||                    ||██▏                 ||███                 ||                    |
23375        gem_wsim |███                 ||                    ||█▍                  ||███▌                ||                    |

Tvrtko Ursulin (6):
  include: DRM uAPI headers update
  intel-gpu-overlay: Add engine queue stats
  intel-gpu-overlay: Show 1s, 30s and 15m GPU load
  tests/perf_pmu: Add tests for engine queued/runnable/running stats
  intel-gpu-top: Add queue depths and load average
  intel-gpu-top: Support for client stats

 include/drm-uapi/amdgpu_drm.h  |  52 +++-
 include/drm-uapi/drm.h         |  16 ++
 include/drm-uapi/drm_fourcc.h  | 215 ++++++++++++++
 include/drm-uapi/drm_mode.h    |  26 +-
 include/drm-uapi/etnaviv_drm.h |   6 +
 include/drm-uapi/exynos_drm.h  | 240 ++++++++++++++++
 include/drm-uapi/i915_drm.h    |  49 +++-
 include/drm-uapi/msm_drm.h     |   2 +
 include/drm-uapi/tegra_drm.h   | 492 ++++++++++++++++++++++++++++++++-
 include/drm-uapi/v3d_drm.h     | 194 +++++++++++++
 include/drm-uapi/vc4_drm.h     |  13 +-
 include/drm-uapi/virtgpu_drm.h |   1 +
 include/drm-uapi/vmwgfx_drm.h  | 166 ++++++++---
 overlay/gpu-top.c              |  81 +++++-
 overlay/gpu-top.h              |  22 +-
 overlay/overlay.c              |  35 ++-
 tests/perf_pmu.c               | 259 +++++++++++++++++
 tools/Makefile.am              |   2 +-
 tools/intel_gpu_top.c          | 480 ++++++++++++++++++++++++++++++--
 tools/meson.build              |   2 +-
 20 files changed, 2264 insertions(+), 89 deletions(-)
 create mode 100644 include/drm-uapi/v3d_drm.h

-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC i-g-t 1/6] include: DRM uAPI headers update
  2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 include/drm-uapi/amdgpu_drm.h  |  52 +++-
 include/drm-uapi/drm.h         |  16 ++
 include/drm-uapi/drm_fourcc.h  | 215 ++++++++++++++
 include/drm-uapi/drm_mode.h    |  26 +-
 include/drm-uapi/etnaviv_drm.h |   6 +
 include/drm-uapi/exynos_drm.h  | 240 ++++++++++++++++
 include/drm-uapi/i915_drm.h    |  49 +++-
 include/drm-uapi/msm_drm.h     |   2 +
 include/drm-uapi/tegra_drm.h   | 492 ++++++++++++++++++++++++++++++++-
 include/drm-uapi/v3d_drm.h     | 194 +++++++++++++
 include/drm-uapi/vc4_drm.h     |  13 +-
 include/drm-uapi/virtgpu_drm.h |   1 +
 include/drm-uapi/vmwgfx_drm.h  | 166 ++++++++---
 13 files changed, 1415 insertions(+), 57 deletions(-)
 create mode 100644 include/drm-uapi/v3d_drm.h

diff --git a/include/drm-uapi/amdgpu_drm.h b/include/drm-uapi/amdgpu_drm.h
index 1816bd8200d1..370e9a5536ef 100644
--- a/include/drm-uapi/amdgpu_drm.h
+++ b/include/drm-uapi/amdgpu_drm.h
@@ -72,12 +72,41 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
 
+/**
+ * DOC: memory domains
+ *
+ * %AMDGPU_GEM_DOMAIN_CPU	System memory that is not GPU accessible.
+ * Memory in this pool could be swapped out to disk if there is pressure.
+ *
+ * %AMDGPU_GEM_DOMAIN_GTT	GPU accessible system memory, mapped into the
+ * GPU's virtual address space via gart. Gart memory linearizes non-contiguous
+ * pages of system memory, allows GPU access system memory in a linezrized
+ * fashion.
+ *
+ * %AMDGPU_GEM_DOMAIN_VRAM	Local video memory. For APUs, it is memory
+ * carved out by the BIOS.
+ *
+ * %AMDGPU_GEM_DOMAIN_GDS	Global on-chip data storage used to share data
+ * across shader threads.
+ *
+ * %AMDGPU_GEM_DOMAIN_GWS	Global wave sync, used to synchronize the
+ * execution of all the waves on a device.
+ *
+ * %AMDGPU_GEM_DOMAIN_OA	Ordered append, used by 3D or Compute engines
+ * for appending data.
+ */
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
 #define AMDGPU_GEM_DOMAIN_VRAM		0x4
 #define AMDGPU_GEM_DOMAIN_GDS		0x8
 #define AMDGPU_GEM_DOMAIN_GWS		0x10
 #define AMDGPU_GEM_DOMAIN_OA		0x20
+#define AMDGPU_GEM_DOMAIN_MASK		(AMDGPU_GEM_DOMAIN_CPU | \
+					 AMDGPU_GEM_DOMAIN_GTT | \
+					 AMDGPU_GEM_DOMAIN_VRAM | \
+					 AMDGPU_GEM_DOMAIN_GDS | \
+					 AMDGPU_GEM_DOMAIN_GWS | \
+					 AMDGPU_GEM_DOMAIN_OA)
 
 /* Flag that CPU access will be required for the case of VRAM domain */
 #define AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED	(1 << 0)
@@ -95,6 +124,10 @@ extern "C" {
 #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID	(1 << 6)
 /* Flag that BO sharing will be explicitly synchronized */
 #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC		(1 << 7)
+/* Flag that indicates allocating MQD gart on GFX9, where the mtype
+ * for the second page onward should be set to NC.
+ */
+#define AMDGPU_GEM_CREATE_MQD_GFX9		(1 << 8)
 
 struct drm_amdgpu_gem_create_in  {
 	/** the requested memory size */
@@ -473,7 +506,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_HW_IP_UVD_ENC      5
 #define AMDGPU_HW_IP_VCN_DEC      6
 #define AMDGPU_HW_IP_VCN_ENC      7
-#define AMDGPU_HW_IP_NUM          8
+#define AMDGPU_HW_IP_VCN_JPEG     8
+#define AMDGPU_HW_IP_NUM          9
 
 #define AMDGPU_HW_IP_INSTANCE_MAX_COUNT 1
 
@@ -482,6 +516,7 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_DEPENDENCIES	0x03
 #define AMDGPU_CHUNK_ID_SYNCOBJ_IN      0x04
 #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT     0x05
+#define AMDGPU_CHUNK_ID_BO_HANDLES      0x06
 
 struct drm_amdgpu_cs_chunk {
 	__u32		chunk_id;
@@ -520,6 +555,10 @@ union drm_amdgpu_cs {
 /* Preempt flag, IB should set Pre_enb bit if PREEMPT flag detected */
 #define AMDGPU_IB_FLAG_PREEMPT (1<<2)
 
+/* The IB fence should do the L2 writeback but not invalidate any shader
+ * caches (L2/vL1/sL1/I$). */
+#define AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE (1 << 3)
+
 struct drm_amdgpu_cs_chunk_ib {
 	__u32 _pad;
 	/** AMDGPU_IB_FLAG_* */
@@ -618,6 +657,16 @@ struct drm_amdgpu_cs_chunk_data {
 	#define AMDGPU_INFO_FW_SOS		0x0c
 	/* Subquery id: Query PSP ASD firmware version */
 	#define AMDGPU_INFO_FW_ASD		0x0d
+	/* Subquery id: Query VCN firmware version */
+	#define AMDGPU_INFO_FW_VCN		0x0e
+	/* Subquery id: Query GFX RLC SRLC firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_CNTL 0x0f
+	/* Subquery id: Query GFX RLC SRLG firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_GPM_MEM 0x10
+	/* Subquery id: Query GFX RLC SRLS firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_SRM_MEM 0x11
+	/* Subquery id: Query DMCU firmware version */
+	#define AMDGPU_INFO_FW_DMCU		0x12
 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED		0x0f
 /* the used VRAM size */
@@ -806,6 +855,7 @@ struct drm_amdgpu_info_firmware {
 #define AMDGPU_VRAM_TYPE_GDDR5 5
 #define AMDGPU_VRAM_TYPE_HBM   6
 #define AMDGPU_VRAM_TYPE_DDR3  7
+#define AMDGPU_VRAM_TYPE_DDR4  8
 
 struct drm_amdgpu_info_device {
 	/** PCI Device ID */
diff --git a/include/drm-uapi/drm.h b/include/drm-uapi/drm.h
index f0bd91de0cf9..85c685a2075e 100644
--- a/include/drm-uapi/drm.h
+++ b/include/drm-uapi/drm.h
@@ -674,6 +674,22 @@ struct drm_get_cap {
  */
 #define DRM_CLIENT_CAP_ATOMIC	3
 
+/**
+ * DRM_CLIENT_CAP_ASPECT_RATIO
+ *
+ * If set to 1, the DRM core will provide aspect ratio information in modes.
+ */
+#define DRM_CLIENT_CAP_ASPECT_RATIO    4
+
+/**
+ * DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
+ *
+ * If set to 1, the DRM core will expose special connectors to be used for
+ * writing back to memory the scene setup in the commit. Depends on client
+ * also supporting DRM_CLIENT_CAP_ATOMIC
+ */
+#define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS	5
+
 /** DRM_IOCTL_SET_CLIENT_CAP ioctl argument type */
 struct drm_set_client_cap {
 	__u64 capability;
diff --git a/include/drm-uapi/drm_fourcc.h b/include/drm-uapi/drm_fourcc.h
index e04613d30a13..139632b87181 100644
--- a/include/drm-uapi/drm_fourcc.h
+++ b/include/drm-uapi/drm_fourcc.h
@@ -30,11 +30,50 @@
 extern "C" {
 #endif
 
+/**
+ * DOC: overview
+ *
+ * In the DRM subsystem, framebuffer pixel formats are described using the
+ * fourcc codes defined in `include/uapi/drm/drm_fourcc.h`. In addition to the
+ * fourcc code, a Format Modifier may optionally be provided, in order to
+ * further describe the buffer's format - for example tiling or compression.
+ *
+ * Format Modifiers
+ * ----------------
+ *
+ * Format modifiers are used in conjunction with a fourcc code, forming a
+ * unique fourcc:modifier pair. This format:modifier pair must fully define the
+ * format and data layout of the buffer, and should be the only way to describe
+ * that particular buffer.
+ *
+ * Having multiple fourcc:modifier pairs which describe the same layout should
+ * be avoided, as such aliases run the risk of different drivers exposing
+ * different names for the same data format, forcing userspace to understand
+ * that they are aliases.
+ *
+ * Format modifiers may change any property of the buffer, including the number
+ * of planes and/or the required allocation size. Format modifiers are
+ * vendor-namespaced, and as such the relationship between a fourcc code and a
+ * modifier is specific to the modifer being used. For example, some modifiers
+ * may preserve meaning - such as number of planes - from the fourcc code,
+ * whereas others may not.
+ *
+ * Vendors should document their modifier usage in as much detail as
+ * possible, to ensure maximum compatibility across devices, drivers and
+ * applications.
+ *
+ * The authoritative list of format modifier codes is found in
+ * `include/uapi/drm/drm_fourcc.h`
+ */
+
 #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \
 				 ((__u32)(c) << 16) | ((__u32)(d) << 24))
 
 #define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */
 
+/* Reserve 0 for the invalid format specifier */
+#define DRM_FORMAT_INVALID	0
+
 /* color index */
 #define DRM_FORMAT_C8		fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
@@ -183,6 +222,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_QCOM    0x05
 #define DRM_FORMAT_MOD_VENDOR_VIVANTE 0x06
 #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
+#define DRM_FORMAT_MOD_VENDOR_ARM     0x08
 /* add more to the end as needed */
 
 #define DRM_FORMAT_RESERVED	      ((1ULL << 56) - 1)
@@ -298,6 +338,19 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_SAMSUNG_64_32_TILE	fourcc_mod_code(SAMSUNG, 1)
 
+/*
+ * Qualcomm Compressed Format
+ *
+ * Refers to a compressed variant of the base format that is compressed.
+ * Implementation may be platform and base-format specific.
+ *
+ * Each macrotile consists of m x n (mostly 4 x 4) tiles.
+ * Pixel data pitch/stride is aligned with macrotile width.
+ * Pixel data height is aligned with macrotile height.
+ * Entire pixel data buffer is aligned with 4k(bytes).
+ */
+#define DRM_FORMAT_MOD_QCOM_COMPRESSED	fourcc_mod_code(QCOM, 1)
+
 /* Vivante framebuffer modifiers */
 
 /*
@@ -384,6 +437,23 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
 	fourcc_mod_code(NVIDIA, 0x15)
 
+/*
+ * Some Broadcom modifiers take parameters, for example the number of
+ * vertical lines in the image. Reserve the lower 32 bits for modifier
+ * type, and the next 24 bits for parameters. Top 8 bits are the
+ * vendor code.
+ */
+#define __fourcc_mod_broadcom_param_shift 8
+#define __fourcc_mod_broadcom_param_bits 48
+#define fourcc_mod_broadcom_code(val, params) \
+	fourcc_mod_code(BROADCOM, ((((__u64)params) << __fourcc_mod_broadcom_param_shift) | val))
+#define fourcc_mod_broadcom_param(m) \
+	((int)(((m) >> __fourcc_mod_broadcom_param_shift) &	\
+	       ((1ULL << __fourcc_mod_broadcom_param_bits) - 1)))
+#define fourcc_mod_broadcom_mod(m) \
+	((m) & ~(((1ULL << __fourcc_mod_broadcom_param_bits) - 1) <<	\
+		 __fourcc_mod_broadcom_param_shift))
+
 /*
  * Broadcom VC4 "T" format
  *
@@ -405,6 +475,151 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_BROADCOM_VC4_T_TILED fourcc_mod_code(BROADCOM, 1)
 
+/*
+ * Broadcom SAND format
+ *
+ * This is the native format that the H.264 codec block uses.  For VC4
+ * HVS, it is only valid for H.264 (NV12/21) and RGBA modes.
+ *
+ * The image can be considered to be split into columns, and the
+ * columns are placed consecutively into memory.  The width of those
+ * columns can be either 32, 64, 128, or 256 pixels, but in practice
+ * only 128 pixel columns are used.
+ *
+ * The pitch between the start of each column is set to optimally
+ * switch between SDRAM banks. This is passed as the number of lines
+ * of column width in the modifier (we can't use the stride value due
+ * to various core checks that look at it , so you should set the
+ * stride to width*cpp).
+ *
+ * Note that the column height for this format modifier is the same
+ * for all of the planes, assuming that each column contains both Y
+ * and UV.  Some SAND-using hardware stores UV in a separate tiled
+ * image from Y to reduce the column height, which is not supported
+ * with these modifiers.
+ */
+
+#define DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(2, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND64_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(3, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(4, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND256_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(5, v)
+
+#define DRM_FORMAT_MOD_BROADCOM_SAND32 \
+	DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND64 \
+	DRM_FORMAT_MOD_BROADCOM_SAND64_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND128 \
+	DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND256 \
+	DRM_FORMAT_MOD_BROADCOM_SAND256_COL_HEIGHT(0)
+
+/* Broadcom UIF format
+ *
+ * This is the common format for the current Broadcom multimedia
+ * blocks, including V3D 3.x and newer, newer video codecs, and
+ * displays.
+ *
+ * The image consists of utiles (64b blocks), UIF blocks (2x2 utiles),
+ * and macroblocks (4x4 UIF blocks).  Those 4x4 UIF block groups are
+ * stored in columns, with padding between the columns to ensure that
+ * moving from one column to the next doesn't hit the same SDRAM page
+ * bank.
+ *
+ * To calculate the padding, it is assumed that each hardware block
+ * and the software driving it knows the platform's SDRAM page size,
+ * number of banks, and XOR address, and that it's identical between
+ * all blocks using the format.  This tiling modifier will use XOR as
+ * necessary to reduce the padding.  If a hardware block can't do XOR,
+ * the assumption is that a no-XOR tiling modifier will be created.
+ */
+#define DRM_FORMAT_MOD_BROADCOM_UIF fourcc_mod_code(BROADCOM, 6)
+
+/*
+ * Arm Framebuffer Compression (AFBC) modifiers
+ *
+ * AFBC is a proprietary lossless image compression protocol and format.
+ * It provides fine-grained random access and minimizes the amount of data
+ * transferred between IP blocks.
+ *
+ * AFBC has several features which may be supported and/or used, which are
+ * represented using bits in the modifier. Not all combinations are valid,
+ * and different devices or use-cases may support different combinations.
+ */
+#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode)	fourcc_mod_code(ARM, __afbc_mode)
+
+/*
+ * AFBC superblock size
+ *
+ * Indicates the superblock size(s) used for the AFBC buffer. The buffer
+ * size (in pixels) must be aligned to a multiple of the superblock size.
+ * Four lowest significant bits(LSBs) are reserved for block size.
+ */
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_MASK      0xf
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_16x16     (1ULL)
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_32x8      (2ULL)
+
+/*
+ * AFBC lossless colorspace transform
+ *
+ * Indicates that the buffer makes use of the AFBC lossless colorspace
+ * transform.
+ */
+#define AFBC_FORMAT_MOD_YTR     (1ULL <<  4)
+
+/*
+ * AFBC block-split
+ *
+ * Indicates that the payload of each superblock is split. The second
+ * half of the payload is positioned at a predefined offset from the start
+ * of the superblock payload.
+ */
+#define AFBC_FORMAT_MOD_SPLIT   (1ULL <<  5)
+
+/*
+ * AFBC sparse layout
+ *
+ * This flag indicates that the payload of each superblock must be stored at a
+ * predefined position relative to the other superblocks in the same AFBC
+ * buffer. This order is the same order used by the header buffer. In this mode
+ * each superblock is given the same amount of space as an uncompressed
+ * superblock of the particular format would require, rounding up to the next
+ * multiple of 128 bytes in size.
+ */
+#define AFBC_FORMAT_MOD_SPARSE  (1ULL <<  6)
+
+/*
+ * AFBC copy-block restrict
+ *
+ * Buffers with this flag must obey the copy-block restriction. The restriction
+ * is such that there are no copy-blocks referring across the border of 8x8
+ * blocks. For the subsampled data the 8x8 limitation is also subsampled.
+ */
+#define AFBC_FORMAT_MOD_CBR     (1ULL <<  7)
+
+/*
+ * AFBC tiled layout
+ *
+ * The tiled layout groups superblocks in 8x8 or 4x4 tiles, where all
+ * superblocks inside a tile are stored together in memory. 8x8 tiles are used
+ * for pixel formats up to and including 32 bpp while 4x4 tiles are used for
+ * larger bpp formats. The order between the tiles is scan line.
+ * When the tiled layout is used, the buffer size (in pixels) must be aligned
+ * to the tile size.
+ */
+#define AFBC_FORMAT_MOD_TILED   (1ULL <<  8)
+
+/*
+ * AFBC solid color blocks
+ *
+ * Indicates that the buffer makes use of solid-color blocks, whereby bandwidth
+ * can be reduced if a whole superblock is a single color.
+ */
+#define AFBC_FORMAT_MOD_SC      (1ULL <<  9)
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/drm_mode.h b/include/drm-uapi/drm_mode.h
index 2c575794fb52..d3e0fe31efc5 100644
--- a/include/drm-uapi/drm_mode.h
+++ b/include/drm-uapi/drm_mode.h
@@ -93,6 +93,15 @@ extern "C" {
 #define DRM_MODE_PICTURE_ASPECT_NONE		0
 #define DRM_MODE_PICTURE_ASPECT_4_3		1
 #define DRM_MODE_PICTURE_ASPECT_16_9		2
+#define DRM_MODE_PICTURE_ASPECT_64_27		3
+#define DRM_MODE_PICTURE_ASPECT_256_135		4
+
+/* Content type options */
+#define DRM_MODE_CONTENT_TYPE_NO_DATA		0
+#define DRM_MODE_CONTENT_TYPE_GRAPHICS		1
+#define DRM_MODE_CONTENT_TYPE_PHOTO		2
+#define DRM_MODE_CONTENT_TYPE_CINEMA		3
+#define DRM_MODE_CONTENT_TYPE_GAME		4
 
 /* Aspect ratio flag bitmask (4 bits 22:19) */
 #define DRM_MODE_FLAG_PIC_AR_MASK		(0x0F<<19)
@@ -102,6 +111,10 @@ extern "C" {
 			(DRM_MODE_PICTURE_ASPECT_4_3<<19)
 #define  DRM_MODE_FLAG_PIC_AR_16_9 \
 			(DRM_MODE_PICTURE_ASPECT_16_9<<19)
+#define  DRM_MODE_FLAG_PIC_AR_64_27 \
+			(DRM_MODE_PICTURE_ASPECT_64_27<<19)
+#define  DRM_MODE_FLAG_PIC_AR_256_135 \
+			(DRM_MODE_PICTURE_ASPECT_256_135<<19)
 
 #define  DRM_MODE_FLAG_ALL	(DRM_MODE_FLAG_PHSYNC |		\
 				 DRM_MODE_FLAG_NHSYNC |		\
@@ -173,8 +186,9 @@ extern "C" {
 /*
  * DRM_MODE_REFLECT_<axis>
  *
- * Signals that the contents of a drm plane is reflected in the <axis> axis,
+ * Signals that the contents of a drm plane is reflected along the <axis> axis,
  * in the same way as mirroring.
+ * See kerneldoc chapter "Plane Composition Properties" for more details.
  *
  * This define is provided as a convenience, looking up the property id
  * using the name->prop id lookup is the preferred method.
@@ -338,6 +352,7 @@ enum drm_mode_subconnector {
 #define DRM_MODE_CONNECTOR_VIRTUAL      15
 #define DRM_MODE_CONNECTOR_DSI		16
 #define DRM_MODE_CONNECTOR_DPI		17
+#define DRM_MODE_CONNECTOR_WRITEBACK	18
 
 struct drm_mode_get_connector {
 
@@ -363,7 +378,7 @@ struct drm_mode_get_connector {
 	__u32 pad;
 };
 
-#define DRM_MODE_PROP_PENDING	(1<<0)
+#define DRM_MODE_PROP_PENDING	(1<<0) /* deprecated, do not use */
 #define DRM_MODE_PROP_RANGE	(1<<1)
 #define DRM_MODE_PROP_IMMUTABLE	(1<<2)
 #define DRM_MODE_PROP_ENUM	(1<<3) /* enumerated type with text strings */
@@ -598,8 +613,11 @@ struct drm_mode_crtc_lut {
 };
 
 struct drm_color_ctm {
-	/* Conversion matrix in S31.32 format. */
-	__s64 matrix[9];
+	/*
+	 * Conversion matrix in S31.32 sign-magnitude
+	 * (not two's complement!) format.
+	 */
+	__u64 matrix[9];
 };
 
 struct drm_color_lut {
diff --git a/include/drm-uapi/etnaviv_drm.h b/include/drm-uapi/etnaviv_drm.h
index e9b997a0ef27..0d5c49dc478c 100644
--- a/include/drm-uapi/etnaviv_drm.h
+++ b/include/drm-uapi/etnaviv_drm.h
@@ -55,6 +55,12 @@ struct drm_etnaviv_timespec {
 #define ETNAVIV_PARAM_GPU_FEATURES_4                0x07
 #define ETNAVIV_PARAM_GPU_FEATURES_5                0x08
 #define ETNAVIV_PARAM_GPU_FEATURES_6                0x09
+#define ETNAVIV_PARAM_GPU_FEATURES_7                0x0a
+#define ETNAVIV_PARAM_GPU_FEATURES_8                0x0b
+#define ETNAVIV_PARAM_GPU_FEATURES_9                0x0c
+#define ETNAVIV_PARAM_GPU_FEATURES_10               0x0d
+#define ETNAVIV_PARAM_GPU_FEATURES_11               0x0e
+#define ETNAVIV_PARAM_GPU_FEATURES_12               0x0f
 
 #define ETNAVIV_PARAM_GPU_STREAM_COUNT              0x10
 #define ETNAVIV_PARAM_GPU_REGISTER_MAX              0x11
diff --git a/include/drm-uapi/exynos_drm.h b/include/drm-uapi/exynos_drm.h
index a00116b5cc5c..7414cfd76419 100644
--- a/include/drm-uapi/exynos_drm.h
+++ b/include/drm-uapi/exynos_drm.h
@@ -135,6 +135,219 @@ struct drm_exynos_g2d_exec {
 	__u64					async;
 };
 
+/* Exynos DRM IPP v2 API */
+
+/**
+ * Enumerate available IPP hardware modules.
+ *
+ * @count_ipps: size of ipp_id array / number of ipp modules (set by driver)
+ * @reserved: padding
+ * @ipp_id_ptr: pointer to ipp_id array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_res {
+	__u32 count_ipps;
+	__u32 reserved;
+	__u64 ipp_id_ptr;
+};
+
+enum drm_exynos_ipp_format_type {
+	DRM_EXYNOS_IPP_FORMAT_SOURCE		= 0x01,
+	DRM_EXYNOS_IPP_FORMAT_DESTINATION	= 0x02,
+};
+
+struct drm_exynos_ipp_format {
+	__u32 fourcc;
+	__u32 type;
+	__u64 modifier;
+};
+
+enum drm_exynos_ipp_capability {
+	DRM_EXYNOS_IPP_CAP_CROP		= 0x01,
+	DRM_EXYNOS_IPP_CAP_ROTATE	= 0x02,
+	DRM_EXYNOS_IPP_CAP_SCALE	= 0x04,
+	DRM_EXYNOS_IPP_CAP_CONVERT	= 0x08,
+};
+
+/**
+ * Get IPP hardware capabilities and supported image formats.
+ *
+ * @ipp_id: id of IPP module to query
+ * @capabilities: bitmask of drm_exynos_ipp_capability (set by driver)
+ * @reserved: padding
+ * @formats_count: size of formats array (in entries) / number of filled
+ *		   formats (set by driver)
+ * @formats_ptr: pointer to formats array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_caps {
+	__u32 ipp_id;
+	__u32 capabilities;
+	__u32 reserved;
+	__u32 formats_count;
+	__u64 formats_ptr;
+};
+
+enum drm_exynos_ipp_limit_type {
+	/* size (horizontal/vertial) limits, in pixels (min, max, alignment) */
+	DRM_EXYNOS_IPP_LIMIT_TYPE_SIZE		= 0x0001,
+	/* scale ratio (horizonta/vertial), 16.16 fixed point (min, max) */
+	DRM_EXYNOS_IPP_LIMIT_TYPE_SCALE		= 0x0002,
+
+	/* image buffer area */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_BUFFER	= 0x0001 << 16,
+	/* src/dst rectangle area */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_AREA		= 0x0002 << 16,
+	/* src/dst rectangle area when rotation enabled */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_ROTATED	= 0x0003 << 16,
+
+	DRM_EXYNOS_IPP_LIMIT_TYPE_MASK		= 0x000f,
+	DRM_EXYNOS_IPP_LIMIT_SIZE_MASK		= 0x000f << 16,
+};
+
+struct drm_exynos_ipp_limit_val {
+	__u32 min;
+	__u32 max;
+	__u32 align;
+	__u32 reserved;
+};
+
+/**
+ * IPP module limitation.
+ *
+ * @type: limit type (see drm_exynos_ipp_limit_type enum)
+ * @reserved: padding
+ * @h: horizontal limits
+ * @v: vertical limits
+ */
+struct drm_exynos_ipp_limit {
+	__u32 type;
+	__u32 reserved;
+	struct drm_exynos_ipp_limit_val h;
+	struct drm_exynos_ipp_limit_val v;
+};
+
+/**
+ * Get IPP limits for given image format.
+ *
+ * @ipp_id: id of IPP module to query
+ * @fourcc: image format code (see DRM_FORMAT_* in drm_fourcc.h)
+ * @modifier: image format modifier (see DRM_FORMAT_MOD_* in drm_fourcc.h)
+ * @type: source/destination identifier (drm_exynos_ipp_format_flag enum)
+ * @limits_count: size of limits array (in entries) / number of filled entries
+ *		 (set by driver)
+ * @limits_ptr: pointer to limits array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_limits {
+	__u32 ipp_id;
+	__u32 fourcc;
+	__u64 modifier;
+	__u32 type;
+	__u32 limits_count;
+	__u64 limits_ptr;
+};
+
+enum drm_exynos_ipp_task_id {
+	/* buffer described by struct drm_exynos_ipp_task_buffer */
+	DRM_EXYNOS_IPP_TASK_BUFFER		= 0x0001,
+	/* rectangle described by struct drm_exynos_ipp_task_rect */
+	DRM_EXYNOS_IPP_TASK_RECTANGLE		= 0x0002,
+	/* transformation described by struct drm_exynos_ipp_task_transform */
+	DRM_EXYNOS_IPP_TASK_TRANSFORM		= 0x0003,
+	/* alpha configuration described by struct drm_exynos_ipp_task_alpha */
+	DRM_EXYNOS_IPP_TASK_ALPHA		= 0x0004,
+
+	/* source image data (for buffer and rectangle chunks) */
+	DRM_EXYNOS_IPP_TASK_TYPE_SOURCE		= 0x0001 << 16,
+	/* destination image data (for buffer and rectangle chunks) */
+	DRM_EXYNOS_IPP_TASK_TYPE_DESTINATION	= 0x0002 << 16,
+};
+
+/**
+ * Memory buffer with image data.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_BUFFER
+ * other parameters are same as for AddFB2 generic DRM ioctl
+ */
+struct drm_exynos_ipp_task_buffer {
+	__u32	id;
+	__u32	fourcc;
+	__u32	width, height;
+	__u32	gem_id[4];
+	__u32	offset[4];
+	__u32	pitch[4];
+	__u64	modifier;
+};
+
+/**
+ * Rectangle for processing.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_RECTANGLE
+ * @reserved: padding
+ * @x,@y: left corner in pixels
+ * @w,@h: width/height in pixels
+ */
+struct drm_exynos_ipp_task_rect {
+	__u32	id;
+	__u32	reserved;
+	__u32	x;
+	__u32	y;
+	__u32	w;
+	__u32	h;
+};
+
+/**
+ * Image tranformation description.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_TRANSFORM
+ * @rotation: DRM_MODE_ROTATE_* and DRM_MODE_REFLECT_* values
+ */
+struct drm_exynos_ipp_task_transform {
+	__u32	id;
+	__u32	rotation;
+};
+
+/**
+ * Image global alpha configuration for formats without alpha values.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_ALPHA
+ * @value: global alpha value (0-255)
+ */
+struct drm_exynos_ipp_task_alpha {
+	__u32	id;
+	__u32	value;
+};
+
+enum drm_exynos_ipp_flag {
+	/* generate DRM event after processing */
+	DRM_EXYNOS_IPP_FLAG_EVENT	= 0x01,
+	/* dry run, only check task parameters */
+	DRM_EXYNOS_IPP_FLAG_TEST_ONLY	= 0x02,
+	/* non-blocking processing */
+	DRM_EXYNOS_IPP_FLAG_NONBLOCK	= 0x04,
+};
+
+#define DRM_EXYNOS_IPP_FLAGS (DRM_EXYNOS_IPP_FLAG_EVENT |\
+		DRM_EXYNOS_IPP_FLAG_TEST_ONLY | DRM_EXYNOS_IPP_FLAG_NONBLOCK)
+
+/**
+ * Perform image processing described by array of drm_exynos_ipp_task_*
+ * structures (parameters array).
+ *
+ * @ipp_id: id of IPP module to run the task
+ * @flags: bitmask of drm_exynos_ipp_flag values
+ * @reserved: padding
+ * @params_size: size of parameters array (in bytes)
+ * @params_ptr: pointer to parameters array or NULL
+ * @user_data: (optional) data for drm event
+ */
+struct drm_exynos_ioctl_ipp_commit {
+	__u32 ipp_id;
+	__u32 flags;
+	__u32 reserved;
+	__u32 params_size;
+	__u64 params_ptr;
+	__u64 user_data;
+};
+
 #define DRM_EXYNOS_GEM_CREATE		0x00
 #define DRM_EXYNOS_GEM_MAP		0x01
 /* Reserved 0x03 ~ 0x05 for exynos specific gem ioctl */
@@ -147,6 +360,11 @@ struct drm_exynos_g2d_exec {
 #define DRM_EXYNOS_G2D_EXEC		0x22
 
 /* Reserved 0x30 ~ 0x33 for obsolete Exynos IPP ioctls */
+/* IPP - Image Post Processing */
+#define DRM_EXYNOS_IPP_GET_RESOURCES	0x40
+#define DRM_EXYNOS_IPP_GET_CAPS		0x41
+#define DRM_EXYNOS_IPP_GET_LIMITS	0x42
+#define DRM_EXYNOS_IPP_COMMIT		0x43
 
 #define DRM_IOCTL_EXYNOS_GEM_CREATE		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_GEM_CREATE, struct drm_exynos_gem_create)
@@ -165,8 +383,20 @@ struct drm_exynos_g2d_exec {
 #define DRM_IOCTL_EXYNOS_G2D_EXEC		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_G2D_EXEC, struct drm_exynos_g2d_exec)
 
+#define DRM_IOCTL_EXYNOS_IPP_GET_RESOURCES	DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_RESOURCES, \
+		struct drm_exynos_ioctl_ipp_get_res)
+#define DRM_IOCTL_EXYNOS_IPP_GET_CAPS		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_CAPS, struct drm_exynos_ioctl_ipp_get_caps)
+#define DRM_IOCTL_EXYNOS_IPP_GET_LIMITS		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_LIMITS, \
+		struct drm_exynos_ioctl_ipp_get_limits)
+#define DRM_IOCTL_EXYNOS_IPP_COMMIT		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_COMMIT, struct drm_exynos_ioctl_ipp_commit)
+
 /* EXYNOS specific events */
 #define DRM_EXYNOS_G2D_EVENT		0x80000000
+#define DRM_EXYNOS_IPP_EVENT		0x80000002
 
 struct drm_exynos_g2d_event {
 	struct drm_event	base;
@@ -177,6 +407,16 @@ struct drm_exynos_g2d_event {
 	__u32			reserved;
 };
 
+struct drm_exynos_ipp_event {
+	struct drm_event	base;
+	__u64			user_data;
+	__u32			tv_sec;
+	__u32			tv_usec;
+	__u32			ipp_id;
+	__u32			sequence;
+	__u64			reserved;
+};
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 16e452aa12d4..cf9dcf040321 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -110,9 +110,17 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1000)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1000)
+#define I915_SAMPLE_RUNNING_DIVISOR (1000)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +141,15 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
@@ -412,6 +429,14 @@ typedef struct drm_i915_irq_wait {
 	int irq_seq;
 } drm_i915_irq_wait_t;
 
+/*
+ * Different modes of per-process Graphics Translation Table,
+ * see I915_PARAM_HAS_ALIASING_PPGTT
+ */
+#define I915_GEM_PPGTT_NONE	0
+#define I915_GEM_PPGTT_ALIASING	1
+#define I915_GEM_PPGTT_FULL	2
+
 /* Ioctl to query kernel params:
  */
 #define I915_PARAM_IRQ_ACTIVE            1
@@ -529,6 +554,28 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51
 
+/*
+ * Once upon a time we supposed that writes through the GGTT would be
+ * immediately in physical memory (once flushed out of the CPU path). However,
+ * on a few different processors and chipsets, this is not necessarily the case
+ * as the writes appear to be buffered internally. Thus a read of the backing
+ * storage (physical memory) via a different path (with different physical tags
+ * to the indirect write via the GGTT) will see stale values from before
+ * the GGTT write. Inside the kernel, we can for the most part keep track of
+ * the different read/write domains in use (e.g. set-domain), but the assumption
+ * of coherency is baked into the ABI, hence reporting its true state in this
+ * parameter.
+ *
+ * Reports true when writes via mmap_gtt are immediately visible following an
+ * lfence to flush the WCB.
+ *
+ * Reports false when writes via mmap_gtt are indeterminately delayed in an in
+ * internal buffer and are _not_ immediately visible to third parties accessing
+ * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC
+ * communications channel when reporting false is strongly disadvised.
+ */
+#define I915_PARAM_MMAP_GTT_COHERENT	52
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
diff --git a/include/drm-uapi/msm_drm.h b/include/drm-uapi/msm_drm.h
index bbbaffad772d..c06d0a5bdd80 100644
--- a/include/drm-uapi/msm_drm.h
+++ b/include/drm-uapi/msm_drm.h
@@ -201,10 +201,12 @@ struct drm_msm_gem_submit_bo {
 #define MSM_SUBMIT_NO_IMPLICIT   0x80000000 /* disable implicit sync */
 #define MSM_SUBMIT_FENCE_FD_IN   0x40000000 /* enable input fence_fd */
 #define MSM_SUBMIT_FENCE_FD_OUT  0x20000000 /* enable output fence_fd */
+#define MSM_SUBMIT_SUDO          0x10000000 /* run submitted cmds from RB */
 #define MSM_SUBMIT_FLAGS                ( \
 		MSM_SUBMIT_NO_IMPLICIT   | \
 		MSM_SUBMIT_FENCE_FD_IN   | \
 		MSM_SUBMIT_FENCE_FD_OUT  | \
+		MSM_SUBMIT_SUDO          | \
 		0)
 
 /* Each cmdstream submit consists of a table of buffers involved, and
diff --git a/include/drm-uapi/tegra_drm.h b/include/drm-uapi/tegra_drm.h
index 12f9bf848db1..6c07919c04e9 100644
--- a/include/drm-uapi/tegra_drm.h
+++ b/include/drm-uapi/tegra_drm.h
@@ -32,143 +32,615 @@ extern "C" {
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
+/**
+ * struct drm_tegra_gem_create - parameters for the GEM object creation IOCTL
+ */
 struct drm_tegra_gem_create {
+	/**
+	 * @size:
+	 *
+	 * The size, in bytes, of the buffer object to be created.
+	 */
 	__u64 size;
+
+	/**
+	 * @flags:
+	 *
+	 * A bitmask of flags that influence the creation of GEM objects:
+	 *
+	 * DRM_TEGRA_GEM_CREATE_TILED
+	 *   Use the 16x16 tiling format for this buffer.
+	 *
+	 * DRM_TEGRA_GEM_CREATE_BOTTOM_UP
+	 *   The buffer has a bottom-up layout.
+	 */
 	__u32 flags;
+
+	/**
+	 * @handle:
+	 *
+	 * The handle of the created GEM object. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u32 handle;
 };
 
+/**
+ * struct drm_tegra_gem_mmap - parameters for the GEM mmap IOCTL
+ */
 struct drm_tegra_gem_mmap {
+	/**
+	 * @handle:
+	 *
+	 * Handle of the GEM object to obtain an mmap offset for.
+	 */
 	__u32 handle;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
+
+	/**
+	 * @offset:
+	 *
+	 * The mmap offset for the given GEM object. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u64 offset;
 };
 
+/**
+ * struct drm_tegra_syncpt_read - parameters for the read syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_read {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to read the current value from.
+	 */
 	__u32 id;
+
+	/**
+	 * @value:
+	 *
+	 * The current syncpoint value. Set by the kernel upon successful
+	 * completion of the IOCTL.
+	 */
 	__u32 value;
 };
 
+/**
+ * struct drm_tegra_syncpt_incr - parameters for the increment syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_incr {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to increment.
+	 */
 	__u32 id;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_syncpt_wait - parameters for the wait syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_wait {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to wait on.
+	 */
 	__u32 id;
+
+	/**
+	 * @thresh:
+	 *
+	 * Threshold value for which to wait.
+	 */
 	__u32 thresh;
+
+	/**
+	 * @timeout:
+	 *
+	 * Timeout, in milliseconds, to wait.
+	 */
 	__u32 timeout;
+
+	/**
+	 * @value:
+	 *
+	 * The new syncpoint value after the wait. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u32 value;
 };
 
 #define DRM_TEGRA_NO_TIMEOUT	(0xffffffff)
 
+/**
+ * struct drm_tegra_open_channel - parameters for the open channel IOCTL
+ */
 struct drm_tegra_open_channel {
+	/**
+	 * @client:
+	 *
+	 * The client ID for this channel.
+	 */
 	__u32 client;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
+
+	/**
+	 * @context:
+	 *
+	 * The application context of this channel. Set by the kernel upon
+	 * successful completion of the IOCTL. This context needs to be passed
+	 * to the DRM_TEGRA_CHANNEL_CLOSE or the DRM_TEGRA_SUBMIT IOCTLs.
+	 */
 	__u64 context;
 };
 
+/**
+ * struct drm_tegra_close_channel - parameters for the close channel IOCTL
+ */
 struct drm_tegra_close_channel {
+	/**
+	 * @context:
+	 *
+	 * The application context of this channel. This is obtained from the
+	 * DRM_TEGRA_OPEN_CHANNEL IOCTL.
+	 */
 	__u64 context;
 };
 
+/**
+ * struct drm_tegra_get_syncpt - parameters for the get syncpoint IOCTL
+ */
 struct drm_tegra_get_syncpt {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying the channel for which to obtain
+	 * the syncpoint ID.
+	 */
 	__u64 context;
+
+	/**
+	 * @index:
+	 *
+	 * Index of the client syncpoint for which to obtain the ID.
+	 */
 	__u32 index;
+
+	/**
+	 * @id:
+	 *
+	 * The ID of the given syncpoint. Set by the kernel upon successful
+	 * completion of the IOCTL.
+	 */
 	__u32 id;
 };
 
+/**
+ * struct drm_tegra_get_syncpt_base - parameters for the get wait base IOCTL
+ */
 struct drm_tegra_get_syncpt_base {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying for which channel to obtain the
+	 * wait base.
+	 */
 	__u64 context;
+
+	/**
+	 * @syncpt:
+	 *
+	 * ID of the syncpoint for which to obtain the wait base.
+	 */
 	__u32 syncpt;
+
+	/**
+	 * @id:
+	 *
+	 * The ID of the wait base corresponding to the client syncpoint. Set
+	 * by the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 id;
 };
 
+/**
+ * struct drm_tegra_syncpt - syncpoint increment operation
+ */
 struct drm_tegra_syncpt {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to operate on.
+	 */
 	__u32 id;
+
+	/**
+	 * @incrs:
+	 *
+	 * Number of increments to perform for the syncpoint.
+	 */
 	__u32 incrs;
 };
 
+/**
+ * struct drm_tegra_cmdbuf - structure describing a command buffer
+ */
 struct drm_tegra_cmdbuf {
+	/**
+	 * @handle:
+	 *
+	 * Handle to a GEM object containing the command buffer.
+	 */
 	__u32 handle;
+
+	/**
+	 * @offset:
+	 *
+	 * Offset, in bytes, into the GEM object identified by @handle at
+	 * which the command buffer starts.
+	 */
 	__u32 offset;
+
+	/**
+	 * @words:
+	 *
+	 * Number of 32-bit words in this command buffer.
+	 */
 	__u32 words;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_reloc - GEM object relocation structure
+ */
 struct drm_tegra_reloc {
 	struct {
+		/**
+		 * @cmdbuf.handle:
+		 *
+		 * Handle to the GEM object containing the command buffer for
+		 * which to perform this GEM object relocation.
+		 */
 		__u32 handle;
+
+		/**
+		 * @cmdbuf.offset:
+		 *
+		 * Offset, in bytes, into the command buffer at which to
+		 * insert the relocated address.
+		 */
 		__u32 offset;
 	} cmdbuf;
 	struct {
+		/**
+		 * @target.handle:
+		 *
+		 * Handle to the GEM object to be relocated.
+		 */
 		__u32 handle;
+
+		/**
+		 * @target.offset:
+		 *
+		 * Offset, in bytes, into the target GEM object at which the
+		 * relocated data starts.
+		 */
 		__u32 offset;
 	} target;
+
+	/**
+	 * @shift:
+	 *
+	 * The number of bits by which to shift relocated addresses.
+	 */
 	__u32 shift;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_waitchk - wait check structure
+ */
 struct drm_tegra_waitchk {
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object containing a command stream on which to
+	 * perform the wait check.
+	 */
 	__u32 handle;
+
+	/**
+	 * @offset:
+	 *
+	 * Offset, in bytes, of the location in the command stream to perform
+	 * the wait check on.
+	 */
 	__u32 offset;
+
+	/**
+	 * @syncpt:
+	 *
+	 * ID of the syncpoint to wait check.
+	 */
 	__u32 syncpt;
+
+	/**
+	 * @thresh:
+	 *
+	 * Threshold value for which to check.
+	 */
 	__u32 thresh;
 };
 
+/**
+ * struct drm_tegra_submit - job submission structure
+ */
 struct drm_tegra_submit {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying the channel to use for the
+	 * execution of this job.
+	 */
 	__u64 context;
+
+	/**
+	 * @num_syncpts:
+	 *
+	 * The number of syncpoints operated on by this job. This defines the
+	 * length of the array pointed to by @syncpts.
+	 */
 	__u32 num_syncpts;
+
+	/**
+	 * @num_cmdbufs:
+	 *
+	 * The number of command buffers to execute as part of this job. This
+	 * defines the length of the array pointed to by @cmdbufs.
+	 */
 	__u32 num_cmdbufs;
+
+	/**
+	 * @num_relocs:
+	 *
+	 * The number of relocations to perform before executing this job.
+	 * This defines the length of the array pointed to by @relocs.
+	 */
 	__u32 num_relocs;
+
+	/**
+	 * @num_waitchks:
+	 *
+	 * The number of wait checks to perform as part of this job. This
+	 * defines the length of the array pointed to by @waitchks.
+	 */
 	__u32 num_waitchks;
+
+	/**
+	 * @waitchk_mask:
+	 *
+	 * Bitmask of valid wait checks.
+	 */
 	__u32 waitchk_mask;
+
+	/**
+	 * @timeout:
+	 *
+	 * Timeout, in milliseconds, before this job is cancelled.
+	 */
 	__u32 timeout;
+
+	/**
+	 * @syncpts:
+	 *
+	 * A pointer to an array of &struct drm_tegra_syncpt structures that
+	 * specify the syncpoint operations performed as part of this job.
+	 * The number of elements in the array must be equal to the value
+	 * given by @num_syncpts.
+	 */
 	__u64 syncpts;
+
+	/**
+	 * @cmdbufs:
+	 *
+	 * A pointer to an array of &struct drm_tegra_cmdbuf structures that
+	 * define the command buffers to execute as part of this job. The
+	 * number of elements in the array must be equal to the value given
+	 * by @num_syncpts.
+	 */
 	__u64 cmdbufs;
+
+	/**
+	 * @relocs:
+	 *
+	 * A pointer to an array of &struct drm_tegra_reloc structures that
+	 * specify the relocations that need to be performed before executing
+	 * this job. The number of elements in the array must be equal to the
+	 * value given by @num_relocs.
+	 */
 	__u64 relocs;
+
+	/**
+	 * @waitchks:
+	 *
+	 * A pointer to an array of &struct drm_tegra_waitchk structures that
+	 * specify the wait checks to be performed while executing this job.
+	 * The number of elements in the array must be equal to the value
+	 * given by @num_waitchks.
+	 */
 	__u64 waitchks;
-	__u32 fence;		/* Return value */
 
-	__u32 reserved[5];	/* future expansion */
+	/**
+	 * @fence:
+	 *
+	 * The threshold of the syncpoint associated with this job after it
+	 * has been completed. Set by the kernel upon successful completion of
+	 * the IOCTL. This can be used with the DRM_TEGRA_SYNCPT_WAIT IOCTL to
+	 * wait for this job to be finished.
+	 */
+	__u32 fence;
+
+	/**
+	 * @reserved:
+	 *
+	 * This field is reserved for future use. Must be 0.
+	 */
+	__u32 reserved[5];
 };
 
 #define DRM_TEGRA_GEM_TILING_MODE_PITCH 0
 #define DRM_TEGRA_GEM_TILING_MODE_TILED 1
 #define DRM_TEGRA_GEM_TILING_MODE_BLOCK 2
 
+/**
+ * struct drm_tegra_gem_set_tiling - parameters for the set tiling IOCTL
+ */
 struct drm_tegra_gem_set_tiling {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to set the tiling parameters.
+	 */
 	__u32 handle;
+
+	/**
+	 * @mode:
+	 *
+	 * The tiling mode to set. Must be one of:
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_PITCH
+	 *   pitch linear format
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_TILED
+	 *   16x16 tiling format
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_BLOCK
+	 *   16Bx2 tiling format
+	 */
 	__u32 mode;
+
+	/**
+	 * @value:
+	 *
+	 * The value to set for the tiling mode parameter.
+	 */
 	__u32 value;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_gem_get_tiling - parameters for the get tiling IOCTL
+ */
 struct drm_tegra_gem_get_tiling {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to query the tiling parameters.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @mode:
+	 *
+	 * The tiling mode currently associated with the GEM object. Set by
+	 * the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 mode;
+
+	/**
+	 * @value:
+	 *
+	 * The tiling mode parameter currently associated with the GEM object.
+	 * Set by the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 value;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
 #define DRM_TEGRA_GEM_BOTTOM_UP		(1 << 0)
 #define DRM_TEGRA_GEM_FLAGS		(DRM_TEGRA_GEM_BOTTOM_UP)
 
+/**
+ * struct drm_tegra_gem_set_flags - parameters for the set flags IOCTL
+ */
 struct drm_tegra_gem_set_flags {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to set the flags.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @flags:
+	 *
+	 * The flags to set for the GEM object.
+	 */
 	__u32 flags;
 };
 
+/**
+ * struct drm_tegra_gem_get_flags - parameters for the get flags IOCTL
+ */
 struct drm_tegra_gem_get_flags {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to query the flags.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @flags:
+	 *
+	 * The flags currently associated with the GEM object. Set by the
+	 * kernel upon successful completion of the IOCTL.
+	 */
 	__u32 flags;
 };
 
@@ -193,7 +665,7 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
 #define DRM_IOCTL_TEGRA_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_OPEN_CHANNEL, struct drm_tegra_open_channel)
-#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_open_channel)
+#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_close_channel)
 #define DRM_IOCTL_TEGRA_GET_SYNCPT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT, struct drm_tegra_get_syncpt)
 #define DRM_IOCTL_TEGRA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SUBMIT, struct drm_tegra_submit)
 #define DRM_IOCTL_TEGRA_GET_SYNCPT_BASE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT_BASE, struct drm_tegra_get_syncpt_base)
diff --git a/include/drm-uapi/v3d_drm.h b/include/drm-uapi/v3d_drm.h
new file mode 100644
index 000000000000..7b6627783608
--- /dev/null
+++ b/include/drm-uapi/v3d_drm.h
@@ -0,0 +1,194 @@
+/*
+ * Copyright © 2014-2018 Broadcom
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef _V3D_DRM_H_
+#define _V3D_DRM_H_
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define DRM_V3D_SUBMIT_CL                         0x00
+#define DRM_V3D_WAIT_BO                           0x01
+#define DRM_V3D_CREATE_BO                         0x02
+#define DRM_V3D_MMAP_BO                           0x03
+#define DRM_V3D_GET_PARAM                         0x04
+#define DRM_V3D_GET_BO_OFFSET                     0x05
+
+#define DRM_IOCTL_V3D_SUBMIT_CL           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
+#define DRM_IOCTL_V3D_WAIT_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
+#define DRM_IOCTL_V3D_CREATE_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_CREATE_BO, struct drm_v3d_create_bo)
+#define DRM_IOCTL_V3D_MMAP_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_MMAP_BO, struct drm_v3d_mmap_bo)
+#define DRM_IOCTL_V3D_GET_PARAM           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_PARAM, struct drm_v3d_get_param)
+#define DRM_IOCTL_V3D_GET_BO_OFFSET       DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_BO_OFFSET, struct drm_v3d_get_bo_offset)
+
+/**
+ * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
+ * engine.
+ *
+ * This asks the kernel to have the GPU execute an optional binner
+ * command list, and a render command list.
+ */
+struct drm_v3d_submit_cl {
+	/* Pointer to the binner command list.
+	 *
+	 * This is the first set of commands executed, which runs the
+	 * coordinate shader to determine where primitives land on the screen,
+	 * then writes out the state updates and draw calls necessary per tile
+	 * to the tile allocation BO.
+	 */
+	__u32 bcl_start;
+
+	 /** End address of the BCL (first byte after the BCL) */
+	__u32 bcl_end;
+
+	/* Offset of the render command list.
+	 *
+	 * This is the second set of commands executed, which will either
+	 * execute the tiles that have been set up by the BCL, or a fixed set
+	 * of tiles (in the case of RCL-only blits).
+	 */
+	__u32 rcl_start;
+
+	 /** End address of the RCL (first byte after the RCL) */
+	__u32 rcl_end;
+
+	/** An optional sync object to wait on before starting the BCL. */
+	__u32 in_sync_bcl;
+	/** An optional sync object to wait on before starting the RCL. */
+	__u32 in_sync_rcl;
+	/** An optional sync object to place the completion fence in. */
+	__u32 out_sync;
+
+	/* Offset of the tile alloc memory
+	 *
+	 * This is optional on V3D 3.3 (where the CL can set the value) but
+	 * required on V3D 4.1.
+	 */
+	__u32 qma;
+
+	/** Size of the tile alloc memory. */
+	__u32 qms;
+
+	/** Offset of the tile state data array. */
+	__u32 qts;
+
+	/* Pointer to a u32 array of the BOs that are referenced by the job.
+	 */
+	__u64 bo_handles;
+
+	/* Number of BO handles passed in (size is that times 4). */
+	__u32 bo_handle_count;
+
+	/* Pad, must be zero-filled. */
+	__u32 pad;
+};
+
+/**
+ * struct drm_v3d_wait_bo - ioctl argument for waiting for
+ * completion of the last DRM_V3D_SUBMIT_CL on a BO.
+ *
+ * This is useful for cases where multiple processes might be
+ * rendering to a BO and you want to wait for all rendering to be
+ * completed.
+ */
+struct drm_v3d_wait_bo {
+	__u32 handle;
+	__u32 pad;
+	__u64 timeout_ns;
+};
+
+/**
+ * struct drm_v3d_create_bo - ioctl argument for creating V3D BOs.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_v3d_create_bo {
+	__u32 size;
+	__u32 flags;
+	/** Returned GEM handle for the BO. */
+	__u32 handle;
+	/**
+	 * Returned offset for the BO in the V3D address space.  This offset
+	 * is private to the DRM fd and is valid for the lifetime of the GEM
+	 * handle.
+	 *
+	 * This offset value will always be nonzero, since various HW
+	 * units treat 0 specially.
+	 */
+	__u32 offset;
+};
+
+/**
+ * struct drm_v3d_mmap_bo - ioctl argument for mapping V3D BOs.
+ *
+ * This doesn't actually perform an mmap.  Instead, it returns the
+ * offset you need to use in an mmap on the DRM device node.  This
+ * means that tools like valgrind end up knowing about the mapped
+ * memory.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_v3d_mmap_bo {
+	/** Handle for the object being mapped. */
+	__u32 handle;
+	__u32 flags;
+	/** offset into the drm node to use for subsequent mmap call. */
+	__u64 offset;
+};
+
+enum drm_v3d_param {
+	DRM_V3D_PARAM_V3D_UIFCFG,
+	DRM_V3D_PARAM_V3D_HUB_IDENT1,
+	DRM_V3D_PARAM_V3D_HUB_IDENT2,
+	DRM_V3D_PARAM_V3D_HUB_IDENT3,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT0,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT1,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT2,
+};
+
+struct drm_v3d_get_param {
+	__u32 param;
+	__u32 pad;
+	__u64 value;
+};
+
+/**
+ * Returns the offset for the BO in the V3D address space for this DRM fd.
+ * This is the same value returned by drm_v3d_create_bo, if that was called
+ * from this DRM fd.
+ */
+struct drm_v3d_get_bo_offset {
+	__u32 handle;
+	__u32 offset;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* _V3D_DRM_H_ */
diff --git a/include/drm-uapi/vc4_drm.h b/include/drm-uapi/vc4_drm.h
index 4117117b4204..31f50de39acb 100644
--- a/include/drm-uapi/vc4_drm.h
+++ b/include/drm-uapi/vc4_drm.h
@@ -183,10 +183,17 @@ struct drm_vc4_submit_cl {
 	/* ID of the perfmon to attach to this job. 0 means no perfmon. */
 	__u32 perfmonid;
 
-	/* Unused field to align this struct on 64 bits. Must be set to 0.
-	 * If one ever needs to add an u32 field to this struct, this field
-	 * can be used.
+	/* Syncobj handle to wait on. If set, processing of this render job
+	 * will not start until the syncobj is signaled. 0 means ignore.
 	 */
+	__u32 in_sync;
+
+	/* Syncobj handle to export fence to. If set, the fence in the syncobj
+	 * will be replaced with a fence that signals upon completion of this
+	 * render job. 0 means ignore.
+	 */
+	__u32 out_sync;
+
 	__u32 pad2;
 };
 
diff --git a/include/drm-uapi/virtgpu_drm.h b/include/drm-uapi/virtgpu_drm.h
index 91a31ffed828..9a781f0611df 100644
--- a/include/drm-uapi/virtgpu_drm.h
+++ b/include/drm-uapi/virtgpu_drm.h
@@ -63,6 +63,7 @@ struct drm_virtgpu_execbuffer {
 };
 
 #define VIRTGPU_PARAM_3D_FEATURES 1 /* do we have 3D features in the hw */
+#define VIRTGPU_PARAM_CAPSET_QUERY_FIX 2 /* do we have the capset fix */
 
 struct drm_virtgpu_getparam {
 	__u64 param;
diff --git a/include/drm-uapi/vmwgfx_drm.h b/include/drm-uapi/vmwgfx_drm.h
index 0bc784f5e0db..399f58317cff 100644
--- a/include/drm-uapi/vmwgfx_drm.h
+++ b/include/drm-uapi/vmwgfx_drm.h
@@ -40,6 +40,7 @@ extern "C" {
 
 #define DRM_VMW_GET_PARAM            0
 #define DRM_VMW_ALLOC_DMABUF         1
+#define DRM_VMW_ALLOC_BO             1
 #define DRM_VMW_UNREF_DMABUF         2
 #define DRM_VMW_HANDLE_CLOSE         2
 #define DRM_VMW_CURSOR_BYPASS        3
@@ -68,6 +69,8 @@ extern "C" {
 #define DRM_VMW_GB_SURFACE_REF       24
 #define DRM_VMW_SYNCCPU              25
 #define DRM_VMW_CREATE_EXTENDED_CONTEXT 26
+#define DRM_VMW_GB_SURFACE_CREATE_EXT   27
+#define DRM_VMW_GB_SURFACE_REF_EXT      28
 
 /*************************************************************************/
 /**
@@ -79,6 +82,9 @@ extern "C" {
  *
  * DRM_VMW_PARAM_OVERLAY_IOCTL:
  * Does the driver support the overlay ioctl.
+ *
+ * DRM_VMW_PARAM_SM4_1
+ * SM4_1 support is enabled.
  */
 
 #define DRM_VMW_PARAM_NUM_STREAMS      0
@@ -94,6 +100,8 @@ extern "C" {
 #define DRM_VMW_PARAM_MAX_MOB_SIZE     10
 #define DRM_VMW_PARAM_SCREEN_TARGET    11
 #define DRM_VMW_PARAM_DX               12
+#define DRM_VMW_PARAM_HW_CAPS2         13
+#define DRM_VMW_PARAM_SM4_1            14
 
 /**
  * enum drm_vmw_handle_type - handle type for ref ioctls
@@ -356,9 +364,9 @@ struct drm_vmw_fence_rep {
 
 /*************************************************************************/
 /**
- * DRM_VMW_ALLOC_DMABUF
+ * DRM_VMW_ALLOC_BO
  *
- * Allocate a DMA buffer that is visible also to the host.
+ * Allocate a buffer object that is visible also to the host.
  * NOTE: The buffer is
  * identified by a handle and an offset, which are private to the guest, but
  * useable in the command stream. The guest kernel may translate these
@@ -366,27 +374,28 @@ struct drm_vmw_fence_rep {
  * be zero at all times, or it may disappear from the interface before it is
  * fixed.
  *
- * The DMA buffer may stay user-space mapped in the guest at all times,
+ * The buffer object may stay user-space mapped in the guest at all times,
  * and is thus suitable for sub-allocation.
  *
- * DMA buffers are mapped using the mmap() syscall on the drm device.
+ * Buffer objects are mapped using the mmap() syscall on the drm device.
  */
 
 /**
- * struct drm_vmw_alloc_dmabuf_req
+ * struct drm_vmw_alloc_bo_req
  *
  * @size: Required minimum size of the buffer.
  *
- * Input data to the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Input data to the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-struct drm_vmw_alloc_dmabuf_req {
+struct drm_vmw_alloc_bo_req {
 	__u32 size;
 	__u32 pad64;
 };
+#define drm_vmw_alloc_dmabuf_req drm_vmw_alloc_bo_req
 
 /**
- * struct drm_vmw_dmabuf_rep
+ * struct drm_vmw_bo_rep
  *
  * @map_handle: Offset to use in the mmap() call used to map the buffer.
  * @handle: Handle unique to this buffer. Used for unreferencing.
@@ -395,50 +404,32 @@ struct drm_vmw_alloc_dmabuf_req {
  * @cur_gmr_offset: Offset to use in the command stream when this buffer is
  * referenced. See note above.
  *
- * Output data from the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Output data from the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-struct drm_vmw_dmabuf_rep {
+struct drm_vmw_bo_rep {
 	__u64 map_handle;
 	__u32 handle;
 	__u32 cur_gmr_id;
 	__u32 cur_gmr_offset;
 	__u32 pad64;
 };
+#define drm_vmw_dmabuf_rep drm_vmw_bo_rep
 
 /**
- * union drm_vmw_dmabuf_arg
+ * union drm_vmw_alloc_bo_arg
  *
  * @req: Input data as described above.
  * @rep: Output data as described above.
  *
- * Argument to the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Argument to the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-union drm_vmw_alloc_dmabuf_arg {
-	struct drm_vmw_alloc_dmabuf_req req;
-	struct drm_vmw_dmabuf_rep rep;
-};
-
-/*************************************************************************/
-/**
- * DRM_VMW_UNREF_DMABUF - Free a DMA buffer.
- *
- */
-
-/**
- * struct drm_vmw_unref_dmabuf_arg
- *
- * @handle: Handle indicating what buffer to free. Obtained from the
- * DRM_VMW_ALLOC_DMABUF Ioctl.
- *
- * Argument to the DRM_VMW_UNREF_DMABUF Ioctl.
- */
-
-struct drm_vmw_unref_dmabuf_arg {
-	__u32 handle;
-	__u32 pad64;
+union drm_vmw_alloc_bo_arg {
+	struct drm_vmw_alloc_bo_req req;
+	struct drm_vmw_bo_rep rep;
 };
+#define drm_vmw_alloc_dmabuf_arg drm_vmw_alloc_bo_arg
 
 /*************************************************************************/
 /**
@@ -1103,9 +1094,8 @@ union drm_vmw_extended_context_arg {
  * DRM_VMW_HANDLE_CLOSE - Close a user-space handle and release its
  * underlying resource.
  *
- * Note that this ioctl is overlaid on the DRM_VMW_UNREF_DMABUF Ioctl.
- * The ioctl arguments therefore need to be identical in layout.
- *
+ * Note that this ioctl is overlaid on the deprecated DRM_VMW_UNREF_DMABUF
+ * Ioctl.
  */
 
 /**
@@ -1119,7 +1109,107 @@ struct drm_vmw_handle_close_arg {
 	__u32 handle;
 	__u32 pad64;
 };
+#define drm_vmw_unref_dmabuf_arg drm_vmw_handle_close_arg
+
+/*************************************************************************/
+/**
+ * DRM_VMW_GB_SURFACE_CREATE_EXT - Create a host guest-backed surface.
+ *
+ * Allocates a surface handle and queues a create surface command
+ * for the host on the first use of the surface. The surface ID can
+ * be used as the surface ID in commands referencing the surface.
+ *
+ * This new command extends DRM_VMW_GB_SURFACE_CREATE by adding version
+ * parameter and 64 bit svga flag.
+ */
+
+/**
+ * enum drm_vmw_surface_version
+ *
+ * @drm_vmw_surface_gb_v1: Corresponds to current gb surface format with
+ * svga3d surface flags split into 2, upper half and lower half.
+ */
+enum drm_vmw_surface_version {
+	drm_vmw_gb_surface_v1
+};
+
+/**
+ * struct drm_vmw_gb_surface_create_ext_req
+ *
+ * @base: Surface create parameters.
+ * @version: Version of surface create ioctl.
+ * @svga3d_flags_upper_32_bits: Upper 32 bits of svga3d flags.
+ * @multisample_pattern: Multisampling pattern when msaa is supported.
+ * @quality_level: Precision settings for each sample.
+ * @must_be_zero: Reserved for future usage.
+ *
+ * Input argument to the  DRM_VMW_GB_SURFACE_CREATE_EXT Ioctl.
+ * Part of output argument for the DRM_VMW_GB_SURFACE_REF_EXT Ioctl.
+ */
+struct drm_vmw_gb_surface_create_ext_req {
+	struct drm_vmw_gb_surface_create_req base;
+	enum drm_vmw_surface_version version;
+	uint32_t svga3d_flags_upper_32_bits;
+	SVGA3dMSPattern multisample_pattern;
+	SVGA3dMSQualityLevel quality_level;
+	uint64_t must_be_zero;
+};
+
+/**
+ * union drm_vmw_gb_surface_create_ext_arg
+ *
+ * @req: Input argument as described above.
+ * @rep: Output argument as described above.
+ *
+ * Argument to the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ */
+union drm_vmw_gb_surface_create_ext_arg {
+	struct drm_vmw_gb_surface_create_rep rep;
+	struct drm_vmw_gb_surface_create_ext_req req;
+};
+
+/*************************************************************************/
+/**
+ * DRM_VMW_GB_SURFACE_REF_EXT - Reference a host surface.
+ *
+ * Puts a reference on a host surface with a given handle, as previously
+ * returned by the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ * A reference will make sure the surface isn't destroyed while we hold
+ * it and will allow the calling client to use the surface handle in
+ * the command stream.
+ *
+ * On successful return, the Ioctl returns the surface information given
+ * to and returned from the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ */
 
+/**
+ * struct drm_vmw_gb_surface_ref_ext_rep
+ *
+ * @creq: The data used as input when the surface was created, as described
+ *        above at "struct drm_vmw_gb_surface_create_ext_req"
+ * @crep: Additional data output when the surface was created, as described
+ *        above at "struct drm_vmw_gb_surface_create_rep"
+ *
+ * Output Argument to the DRM_VMW_GB_SURFACE_REF_EXT ioctl.
+ */
+struct drm_vmw_gb_surface_ref_ext_rep {
+	struct drm_vmw_gb_surface_create_ext_req creq;
+	struct drm_vmw_gb_surface_create_rep crep;
+};
+
+/**
+ * union drm_vmw_gb_surface_reference_ext_arg
+ *
+ * @req: Input data as described above at "struct drm_vmw_surface_arg"
+ * @rep: Output data as described above at
+ *       "struct drm_vmw_gb_surface_ref_ext_rep"
+ *
+ * Argument to the DRM_VMW_GB_SURFACE_REF Ioctl.
+ */
+union drm_vmw_gb_surface_reference_ext_arg {
+	struct drm_vmw_gb_surface_ref_ext_rep rep;
+	struct drm_vmw_surface_arg req;
+};
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] [RFC i-g-t 1/6] include: DRM uAPI headers update
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 include/drm-uapi/amdgpu_drm.h  |  52 +++-
 include/drm-uapi/drm.h         |  16 ++
 include/drm-uapi/drm_fourcc.h  | 215 ++++++++++++++
 include/drm-uapi/drm_mode.h    |  26 +-
 include/drm-uapi/etnaviv_drm.h |   6 +
 include/drm-uapi/exynos_drm.h  | 240 ++++++++++++++++
 include/drm-uapi/i915_drm.h    |  49 +++-
 include/drm-uapi/msm_drm.h     |   2 +
 include/drm-uapi/tegra_drm.h   | 492 ++++++++++++++++++++++++++++++++-
 include/drm-uapi/v3d_drm.h     | 194 +++++++++++++
 include/drm-uapi/vc4_drm.h     |  13 +-
 include/drm-uapi/virtgpu_drm.h |   1 +
 include/drm-uapi/vmwgfx_drm.h  | 166 ++++++++---
 13 files changed, 1415 insertions(+), 57 deletions(-)
 create mode 100644 include/drm-uapi/v3d_drm.h

diff --git a/include/drm-uapi/amdgpu_drm.h b/include/drm-uapi/amdgpu_drm.h
index 1816bd8200d1..370e9a5536ef 100644
--- a/include/drm-uapi/amdgpu_drm.h
+++ b/include/drm-uapi/amdgpu_drm.h
@@ -72,12 +72,41 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
 
+/**
+ * DOC: memory domains
+ *
+ * %AMDGPU_GEM_DOMAIN_CPU	System memory that is not GPU accessible.
+ * Memory in this pool could be swapped out to disk if there is pressure.
+ *
+ * %AMDGPU_GEM_DOMAIN_GTT	GPU accessible system memory, mapped into the
+ * GPU's virtual address space via gart. Gart memory linearizes non-contiguous
+ * pages of system memory, allows GPU access system memory in a linezrized
+ * fashion.
+ *
+ * %AMDGPU_GEM_DOMAIN_VRAM	Local video memory. For APUs, it is memory
+ * carved out by the BIOS.
+ *
+ * %AMDGPU_GEM_DOMAIN_GDS	Global on-chip data storage used to share data
+ * across shader threads.
+ *
+ * %AMDGPU_GEM_DOMAIN_GWS	Global wave sync, used to synchronize the
+ * execution of all the waves on a device.
+ *
+ * %AMDGPU_GEM_DOMAIN_OA	Ordered append, used by 3D or Compute engines
+ * for appending data.
+ */
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
 #define AMDGPU_GEM_DOMAIN_VRAM		0x4
 #define AMDGPU_GEM_DOMAIN_GDS		0x8
 #define AMDGPU_GEM_DOMAIN_GWS		0x10
 #define AMDGPU_GEM_DOMAIN_OA		0x20
+#define AMDGPU_GEM_DOMAIN_MASK		(AMDGPU_GEM_DOMAIN_CPU | \
+					 AMDGPU_GEM_DOMAIN_GTT | \
+					 AMDGPU_GEM_DOMAIN_VRAM | \
+					 AMDGPU_GEM_DOMAIN_GDS | \
+					 AMDGPU_GEM_DOMAIN_GWS | \
+					 AMDGPU_GEM_DOMAIN_OA)
 
 /* Flag that CPU access will be required for the case of VRAM domain */
 #define AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED	(1 << 0)
@@ -95,6 +124,10 @@ extern "C" {
 #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID	(1 << 6)
 /* Flag that BO sharing will be explicitly synchronized */
 #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC		(1 << 7)
+/* Flag that indicates allocating MQD gart on GFX9, where the mtype
+ * for the second page onward should be set to NC.
+ */
+#define AMDGPU_GEM_CREATE_MQD_GFX9		(1 << 8)
 
 struct drm_amdgpu_gem_create_in  {
 	/** the requested memory size */
@@ -473,7 +506,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_HW_IP_UVD_ENC      5
 #define AMDGPU_HW_IP_VCN_DEC      6
 #define AMDGPU_HW_IP_VCN_ENC      7
-#define AMDGPU_HW_IP_NUM          8
+#define AMDGPU_HW_IP_VCN_JPEG     8
+#define AMDGPU_HW_IP_NUM          9
 
 #define AMDGPU_HW_IP_INSTANCE_MAX_COUNT 1
 
@@ -482,6 +516,7 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_DEPENDENCIES	0x03
 #define AMDGPU_CHUNK_ID_SYNCOBJ_IN      0x04
 #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT     0x05
+#define AMDGPU_CHUNK_ID_BO_HANDLES      0x06
 
 struct drm_amdgpu_cs_chunk {
 	__u32		chunk_id;
@@ -520,6 +555,10 @@ union drm_amdgpu_cs {
 /* Preempt flag, IB should set Pre_enb bit if PREEMPT flag detected */
 #define AMDGPU_IB_FLAG_PREEMPT (1<<2)
 
+/* The IB fence should do the L2 writeback but not invalidate any shader
+ * caches (L2/vL1/sL1/I$). */
+#define AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE (1 << 3)
+
 struct drm_amdgpu_cs_chunk_ib {
 	__u32 _pad;
 	/** AMDGPU_IB_FLAG_* */
@@ -618,6 +657,16 @@ struct drm_amdgpu_cs_chunk_data {
 	#define AMDGPU_INFO_FW_SOS		0x0c
 	/* Subquery id: Query PSP ASD firmware version */
 	#define AMDGPU_INFO_FW_ASD		0x0d
+	/* Subquery id: Query VCN firmware version */
+	#define AMDGPU_INFO_FW_VCN		0x0e
+	/* Subquery id: Query GFX RLC SRLC firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_CNTL 0x0f
+	/* Subquery id: Query GFX RLC SRLG firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_GPM_MEM 0x10
+	/* Subquery id: Query GFX RLC SRLS firmware version */
+	#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_SRM_MEM 0x11
+	/* Subquery id: Query DMCU firmware version */
+	#define AMDGPU_INFO_FW_DMCU		0x12
 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED		0x0f
 /* the used VRAM size */
@@ -806,6 +855,7 @@ struct drm_amdgpu_info_firmware {
 #define AMDGPU_VRAM_TYPE_GDDR5 5
 #define AMDGPU_VRAM_TYPE_HBM   6
 #define AMDGPU_VRAM_TYPE_DDR3  7
+#define AMDGPU_VRAM_TYPE_DDR4  8
 
 struct drm_amdgpu_info_device {
 	/** PCI Device ID */
diff --git a/include/drm-uapi/drm.h b/include/drm-uapi/drm.h
index f0bd91de0cf9..85c685a2075e 100644
--- a/include/drm-uapi/drm.h
+++ b/include/drm-uapi/drm.h
@@ -674,6 +674,22 @@ struct drm_get_cap {
  */
 #define DRM_CLIENT_CAP_ATOMIC	3
 
+/**
+ * DRM_CLIENT_CAP_ASPECT_RATIO
+ *
+ * If set to 1, the DRM core will provide aspect ratio information in modes.
+ */
+#define DRM_CLIENT_CAP_ASPECT_RATIO    4
+
+/**
+ * DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
+ *
+ * If set to 1, the DRM core will expose special connectors to be used for
+ * writing back to memory the scene setup in the commit. Depends on client
+ * also supporting DRM_CLIENT_CAP_ATOMIC
+ */
+#define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS	5
+
 /** DRM_IOCTL_SET_CLIENT_CAP ioctl argument type */
 struct drm_set_client_cap {
 	__u64 capability;
diff --git a/include/drm-uapi/drm_fourcc.h b/include/drm-uapi/drm_fourcc.h
index e04613d30a13..139632b87181 100644
--- a/include/drm-uapi/drm_fourcc.h
+++ b/include/drm-uapi/drm_fourcc.h
@@ -30,11 +30,50 @@
 extern "C" {
 #endif
 
+/**
+ * DOC: overview
+ *
+ * In the DRM subsystem, framebuffer pixel formats are described using the
+ * fourcc codes defined in `include/uapi/drm/drm_fourcc.h`. In addition to the
+ * fourcc code, a Format Modifier may optionally be provided, in order to
+ * further describe the buffer's format - for example tiling or compression.
+ *
+ * Format Modifiers
+ * ----------------
+ *
+ * Format modifiers are used in conjunction with a fourcc code, forming a
+ * unique fourcc:modifier pair. This format:modifier pair must fully define the
+ * format and data layout of the buffer, and should be the only way to describe
+ * that particular buffer.
+ *
+ * Having multiple fourcc:modifier pairs which describe the same layout should
+ * be avoided, as such aliases run the risk of different drivers exposing
+ * different names for the same data format, forcing userspace to understand
+ * that they are aliases.
+ *
+ * Format modifiers may change any property of the buffer, including the number
+ * of planes and/or the required allocation size. Format modifiers are
+ * vendor-namespaced, and as such the relationship between a fourcc code and a
+ * modifier is specific to the modifer being used. For example, some modifiers
+ * may preserve meaning - such as number of planes - from the fourcc code,
+ * whereas others may not.
+ *
+ * Vendors should document their modifier usage in as much detail as
+ * possible, to ensure maximum compatibility across devices, drivers and
+ * applications.
+ *
+ * The authoritative list of format modifier codes is found in
+ * `include/uapi/drm/drm_fourcc.h`
+ */
+
 #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \
 				 ((__u32)(c) << 16) | ((__u32)(d) << 24))
 
 #define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */
 
+/* Reserve 0 for the invalid format specifier */
+#define DRM_FORMAT_INVALID	0
+
 /* color index */
 #define DRM_FORMAT_C8		fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
@@ -183,6 +222,7 @@ extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_QCOM    0x05
 #define DRM_FORMAT_MOD_VENDOR_VIVANTE 0x06
 #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
+#define DRM_FORMAT_MOD_VENDOR_ARM     0x08
 /* add more to the end as needed */
 
 #define DRM_FORMAT_RESERVED	      ((1ULL << 56) - 1)
@@ -298,6 +338,19 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_SAMSUNG_64_32_TILE	fourcc_mod_code(SAMSUNG, 1)
 
+/*
+ * Qualcomm Compressed Format
+ *
+ * Refers to a compressed variant of the base format that is compressed.
+ * Implementation may be platform and base-format specific.
+ *
+ * Each macrotile consists of m x n (mostly 4 x 4) tiles.
+ * Pixel data pitch/stride is aligned with macrotile width.
+ * Pixel data height is aligned with macrotile height.
+ * Entire pixel data buffer is aligned with 4k(bytes).
+ */
+#define DRM_FORMAT_MOD_QCOM_COMPRESSED	fourcc_mod_code(QCOM, 1)
+
 /* Vivante framebuffer modifiers */
 
 /*
@@ -384,6 +437,23 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
 	fourcc_mod_code(NVIDIA, 0x15)
 
+/*
+ * Some Broadcom modifiers take parameters, for example the number of
+ * vertical lines in the image. Reserve the lower 32 bits for modifier
+ * type, and the next 24 bits for parameters. Top 8 bits are the
+ * vendor code.
+ */
+#define __fourcc_mod_broadcom_param_shift 8
+#define __fourcc_mod_broadcom_param_bits 48
+#define fourcc_mod_broadcom_code(val, params) \
+	fourcc_mod_code(BROADCOM, ((((__u64)params) << __fourcc_mod_broadcom_param_shift) | val))
+#define fourcc_mod_broadcom_param(m) \
+	((int)(((m) >> __fourcc_mod_broadcom_param_shift) &	\
+	       ((1ULL << __fourcc_mod_broadcom_param_bits) - 1)))
+#define fourcc_mod_broadcom_mod(m) \
+	((m) & ~(((1ULL << __fourcc_mod_broadcom_param_bits) - 1) <<	\
+		 __fourcc_mod_broadcom_param_shift))
+
 /*
  * Broadcom VC4 "T" format
  *
@@ -405,6 +475,151 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_BROADCOM_VC4_T_TILED fourcc_mod_code(BROADCOM, 1)
 
+/*
+ * Broadcom SAND format
+ *
+ * This is the native format that the H.264 codec block uses.  For VC4
+ * HVS, it is only valid for H.264 (NV12/21) and RGBA modes.
+ *
+ * The image can be considered to be split into columns, and the
+ * columns are placed consecutively into memory.  The width of those
+ * columns can be either 32, 64, 128, or 256 pixels, but in practice
+ * only 128 pixel columns are used.
+ *
+ * The pitch between the start of each column is set to optimally
+ * switch between SDRAM banks. This is passed as the number of lines
+ * of column width in the modifier (we can't use the stride value due
+ * to various core checks that look at it , so you should set the
+ * stride to width*cpp).
+ *
+ * Note that the column height for this format modifier is the same
+ * for all of the planes, assuming that each column contains both Y
+ * and UV.  Some SAND-using hardware stores UV in a separate tiled
+ * image from Y to reduce the column height, which is not supported
+ * with these modifiers.
+ */
+
+#define DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(2, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND64_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(3, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(4, v)
+#define DRM_FORMAT_MOD_BROADCOM_SAND256_COL_HEIGHT(v) \
+	fourcc_mod_broadcom_code(5, v)
+
+#define DRM_FORMAT_MOD_BROADCOM_SAND32 \
+	DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND64 \
+	DRM_FORMAT_MOD_BROADCOM_SAND64_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND128 \
+	DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT(0)
+#define DRM_FORMAT_MOD_BROADCOM_SAND256 \
+	DRM_FORMAT_MOD_BROADCOM_SAND256_COL_HEIGHT(0)
+
+/* Broadcom UIF format
+ *
+ * This is the common format for the current Broadcom multimedia
+ * blocks, including V3D 3.x and newer, newer video codecs, and
+ * displays.
+ *
+ * The image consists of utiles (64b blocks), UIF blocks (2x2 utiles),
+ * and macroblocks (4x4 UIF blocks).  Those 4x4 UIF block groups are
+ * stored in columns, with padding between the columns to ensure that
+ * moving from one column to the next doesn't hit the same SDRAM page
+ * bank.
+ *
+ * To calculate the padding, it is assumed that each hardware block
+ * and the software driving it knows the platform's SDRAM page size,
+ * number of banks, and XOR address, and that it's identical between
+ * all blocks using the format.  This tiling modifier will use XOR as
+ * necessary to reduce the padding.  If a hardware block can't do XOR,
+ * the assumption is that a no-XOR tiling modifier will be created.
+ */
+#define DRM_FORMAT_MOD_BROADCOM_UIF fourcc_mod_code(BROADCOM, 6)
+
+/*
+ * Arm Framebuffer Compression (AFBC) modifiers
+ *
+ * AFBC is a proprietary lossless image compression protocol and format.
+ * It provides fine-grained random access and minimizes the amount of data
+ * transferred between IP blocks.
+ *
+ * AFBC has several features which may be supported and/or used, which are
+ * represented using bits in the modifier. Not all combinations are valid,
+ * and different devices or use-cases may support different combinations.
+ */
+#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode)	fourcc_mod_code(ARM, __afbc_mode)
+
+/*
+ * AFBC superblock size
+ *
+ * Indicates the superblock size(s) used for the AFBC buffer. The buffer
+ * size (in pixels) must be aligned to a multiple of the superblock size.
+ * Four lowest significant bits(LSBs) are reserved for block size.
+ */
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_MASK      0xf
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_16x16     (1ULL)
+#define AFBC_FORMAT_MOD_BLOCK_SIZE_32x8      (2ULL)
+
+/*
+ * AFBC lossless colorspace transform
+ *
+ * Indicates that the buffer makes use of the AFBC lossless colorspace
+ * transform.
+ */
+#define AFBC_FORMAT_MOD_YTR     (1ULL <<  4)
+
+/*
+ * AFBC block-split
+ *
+ * Indicates that the payload of each superblock is split. The second
+ * half of the payload is positioned at a predefined offset from the start
+ * of the superblock payload.
+ */
+#define AFBC_FORMAT_MOD_SPLIT   (1ULL <<  5)
+
+/*
+ * AFBC sparse layout
+ *
+ * This flag indicates that the payload of each superblock must be stored at a
+ * predefined position relative to the other superblocks in the same AFBC
+ * buffer. This order is the same order used by the header buffer. In this mode
+ * each superblock is given the same amount of space as an uncompressed
+ * superblock of the particular format would require, rounding up to the next
+ * multiple of 128 bytes in size.
+ */
+#define AFBC_FORMAT_MOD_SPARSE  (1ULL <<  6)
+
+/*
+ * AFBC copy-block restrict
+ *
+ * Buffers with this flag must obey the copy-block restriction. The restriction
+ * is such that there are no copy-blocks referring across the border of 8x8
+ * blocks. For the subsampled data the 8x8 limitation is also subsampled.
+ */
+#define AFBC_FORMAT_MOD_CBR     (1ULL <<  7)
+
+/*
+ * AFBC tiled layout
+ *
+ * The tiled layout groups superblocks in 8x8 or 4x4 tiles, where all
+ * superblocks inside a tile are stored together in memory. 8x8 tiles are used
+ * for pixel formats up to and including 32 bpp while 4x4 tiles are used for
+ * larger bpp formats. The order between the tiles is scan line.
+ * When the tiled layout is used, the buffer size (in pixels) must be aligned
+ * to the tile size.
+ */
+#define AFBC_FORMAT_MOD_TILED   (1ULL <<  8)
+
+/*
+ * AFBC solid color blocks
+ *
+ * Indicates that the buffer makes use of solid-color blocks, whereby bandwidth
+ * can be reduced if a whole superblock is a single color.
+ */
+#define AFBC_FORMAT_MOD_SC      (1ULL <<  9)
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/drm_mode.h b/include/drm-uapi/drm_mode.h
index 2c575794fb52..d3e0fe31efc5 100644
--- a/include/drm-uapi/drm_mode.h
+++ b/include/drm-uapi/drm_mode.h
@@ -93,6 +93,15 @@ extern "C" {
 #define DRM_MODE_PICTURE_ASPECT_NONE		0
 #define DRM_MODE_PICTURE_ASPECT_4_3		1
 #define DRM_MODE_PICTURE_ASPECT_16_9		2
+#define DRM_MODE_PICTURE_ASPECT_64_27		3
+#define DRM_MODE_PICTURE_ASPECT_256_135		4
+
+/* Content type options */
+#define DRM_MODE_CONTENT_TYPE_NO_DATA		0
+#define DRM_MODE_CONTENT_TYPE_GRAPHICS		1
+#define DRM_MODE_CONTENT_TYPE_PHOTO		2
+#define DRM_MODE_CONTENT_TYPE_CINEMA		3
+#define DRM_MODE_CONTENT_TYPE_GAME		4
 
 /* Aspect ratio flag bitmask (4 bits 22:19) */
 #define DRM_MODE_FLAG_PIC_AR_MASK		(0x0F<<19)
@@ -102,6 +111,10 @@ extern "C" {
 			(DRM_MODE_PICTURE_ASPECT_4_3<<19)
 #define  DRM_MODE_FLAG_PIC_AR_16_9 \
 			(DRM_MODE_PICTURE_ASPECT_16_9<<19)
+#define  DRM_MODE_FLAG_PIC_AR_64_27 \
+			(DRM_MODE_PICTURE_ASPECT_64_27<<19)
+#define  DRM_MODE_FLAG_PIC_AR_256_135 \
+			(DRM_MODE_PICTURE_ASPECT_256_135<<19)
 
 #define  DRM_MODE_FLAG_ALL	(DRM_MODE_FLAG_PHSYNC |		\
 				 DRM_MODE_FLAG_NHSYNC |		\
@@ -173,8 +186,9 @@ extern "C" {
 /*
  * DRM_MODE_REFLECT_<axis>
  *
- * Signals that the contents of a drm plane is reflected in the <axis> axis,
+ * Signals that the contents of a drm plane is reflected along the <axis> axis,
  * in the same way as mirroring.
+ * See kerneldoc chapter "Plane Composition Properties" for more details.
  *
  * This define is provided as a convenience, looking up the property id
  * using the name->prop id lookup is the preferred method.
@@ -338,6 +352,7 @@ enum drm_mode_subconnector {
 #define DRM_MODE_CONNECTOR_VIRTUAL      15
 #define DRM_MODE_CONNECTOR_DSI		16
 #define DRM_MODE_CONNECTOR_DPI		17
+#define DRM_MODE_CONNECTOR_WRITEBACK	18
 
 struct drm_mode_get_connector {
 
@@ -363,7 +378,7 @@ struct drm_mode_get_connector {
 	__u32 pad;
 };
 
-#define DRM_MODE_PROP_PENDING	(1<<0)
+#define DRM_MODE_PROP_PENDING	(1<<0) /* deprecated, do not use */
 #define DRM_MODE_PROP_RANGE	(1<<1)
 #define DRM_MODE_PROP_IMMUTABLE	(1<<2)
 #define DRM_MODE_PROP_ENUM	(1<<3) /* enumerated type with text strings */
@@ -598,8 +613,11 @@ struct drm_mode_crtc_lut {
 };
 
 struct drm_color_ctm {
-	/* Conversion matrix in S31.32 format. */
-	__s64 matrix[9];
+	/*
+	 * Conversion matrix in S31.32 sign-magnitude
+	 * (not two's complement!) format.
+	 */
+	__u64 matrix[9];
 };
 
 struct drm_color_lut {
diff --git a/include/drm-uapi/etnaviv_drm.h b/include/drm-uapi/etnaviv_drm.h
index e9b997a0ef27..0d5c49dc478c 100644
--- a/include/drm-uapi/etnaviv_drm.h
+++ b/include/drm-uapi/etnaviv_drm.h
@@ -55,6 +55,12 @@ struct drm_etnaviv_timespec {
 #define ETNAVIV_PARAM_GPU_FEATURES_4                0x07
 #define ETNAVIV_PARAM_GPU_FEATURES_5                0x08
 #define ETNAVIV_PARAM_GPU_FEATURES_6                0x09
+#define ETNAVIV_PARAM_GPU_FEATURES_7                0x0a
+#define ETNAVIV_PARAM_GPU_FEATURES_8                0x0b
+#define ETNAVIV_PARAM_GPU_FEATURES_9                0x0c
+#define ETNAVIV_PARAM_GPU_FEATURES_10               0x0d
+#define ETNAVIV_PARAM_GPU_FEATURES_11               0x0e
+#define ETNAVIV_PARAM_GPU_FEATURES_12               0x0f
 
 #define ETNAVIV_PARAM_GPU_STREAM_COUNT              0x10
 #define ETNAVIV_PARAM_GPU_REGISTER_MAX              0x11
diff --git a/include/drm-uapi/exynos_drm.h b/include/drm-uapi/exynos_drm.h
index a00116b5cc5c..7414cfd76419 100644
--- a/include/drm-uapi/exynos_drm.h
+++ b/include/drm-uapi/exynos_drm.h
@@ -135,6 +135,219 @@ struct drm_exynos_g2d_exec {
 	__u64					async;
 };
 
+/* Exynos DRM IPP v2 API */
+
+/**
+ * Enumerate available IPP hardware modules.
+ *
+ * @count_ipps: size of ipp_id array / number of ipp modules (set by driver)
+ * @reserved: padding
+ * @ipp_id_ptr: pointer to ipp_id array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_res {
+	__u32 count_ipps;
+	__u32 reserved;
+	__u64 ipp_id_ptr;
+};
+
+enum drm_exynos_ipp_format_type {
+	DRM_EXYNOS_IPP_FORMAT_SOURCE		= 0x01,
+	DRM_EXYNOS_IPP_FORMAT_DESTINATION	= 0x02,
+};
+
+struct drm_exynos_ipp_format {
+	__u32 fourcc;
+	__u32 type;
+	__u64 modifier;
+};
+
+enum drm_exynos_ipp_capability {
+	DRM_EXYNOS_IPP_CAP_CROP		= 0x01,
+	DRM_EXYNOS_IPP_CAP_ROTATE	= 0x02,
+	DRM_EXYNOS_IPP_CAP_SCALE	= 0x04,
+	DRM_EXYNOS_IPP_CAP_CONVERT	= 0x08,
+};
+
+/**
+ * Get IPP hardware capabilities and supported image formats.
+ *
+ * @ipp_id: id of IPP module to query
+ * @capabilities: bitmask of drm_exynos_ipp_capability (set by driver)
+ * @reserved: padding
+ * @formats_count: size of formats array (in entries) / number of filled
+ *		   formats (set by driver)
+ * @formats_ptr: pointer to formats array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_caps {
+	__u32 ipp_id;
+	__u32 capabilities;
+	__u32 reserved;
+	__u32 formats_count;
+	__u64 formats_ptr;
+};
+
+enum drm_exynos_ipp_limit_type {
+	/* size (horizontal/vertial) limits, in pixels (min, max, alignment) */
+	DRM_EXYNOS_IPP_LIMIT_TYPE_SIZE		= 0x0001,
+	/* scale ratio (horizonta/vertial), 16.16 fixed point (min, max) */
+	DRM_EXYNOS_IPP_LIMIT_TYPE_SCALE		= 0x0002,
+
+	/* image buffer area */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_BUFFER	= 0x0001 << 16,
+	/* src/dst rectangle area */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_AREA		= 0x0002 << 16,
+	/* src/dst rectangle area when rotation enabled */
+	DRM_EXYNOS_IPP_LIMIT_SIZE_ROTATED	= 0x0003 << 16,
+
+	DRM_EXYNOS_IPP_LIMIT_TYPE_MASK		= 0x000f,
+	DRM_EXYNOS_IPP_LIMIT_SIZE_MASK		= 0x000f << 16,
+};
+
+struct drm_exynos_ipp_limit_val {
+	__u32 min;
+	__u32 max;
+	__u32 align;
+	__u32 reserved;
+};
+
+/**
+ * IPP module limitation.
+ *
+ * @type: limit type (see drm_exynos_ipp_limit_type enum)
+ * @reserved: padding
+ * @h: horizontal limits
+ * @v: vertical limits
+ */
+struct drm_exynos_ipp_limit {
+	__u32 type;
+	__u32 reserved;
+	struct drm_exynos_ipp_limit_val h;
+	struct drm_exynos_ipp_limit_val v;
+};
+
+/**
+ * Get IPP limits for given image format.
+ *
+ * @ipp_id: id of IPP module to query
+ * @fourcc: image format code (see DRM_FORMAT_* in drm_fourcc.h)
+ * @modifier: image format modifier (see DRM_FORMAT_MOD_* in drm_fourcc.h)
+ * @type: source/destination identifier (drm_exynos_ipp_format_flag enum)
+ * @limits_count: size of limits array (in entries) / number of filled entries
+ *		 (set by driver)
+ * @limits_ptr: pointer to limits array or NULL
+ */
+struct drm_exynos_ioctl_ipp_get_limits {
+	__u32 ipp_id;
+	__u32 fourcc;
+	__u64 modifier;
+	__u32 type;
+	__u32 limits_count;
+	__u64 limits_ptr;
+};
+
+enum drm_exynos_ipp_task_id {
+	/* buffer described by struct drm_exynos_ipp_task_buffer */
+	DRM_EXYNOS_IPP_TASK_BUFFER		= 0x0001,
+	/* rectangle described by struct drm_exynos_ipp_task_rect */
+	DRM_EXYNOS_IPP_TASK_RECTANGLE		= 0x0002,
+	/* transformation described by struct drm_exynos_ipp_task_transform */
+	DRM_EXYNOS_IPP_TASK_TRANSFORM		= 0x0003,
+	/* alpha configuration described by struct drm_exynos_ipp_task_alpha */
+	DRM_EXYNOS_IPP_TASK_ALPHA		= 0x0004,
+
+	/* source image data (for buffer and rectangle chunks) */
+	DRM_EXYNOS_IPP_TASK_TYPE_SOURCE		= 0x0001 << 16,
+	/* destination image data (for buffer and rectangle chunks) */
+	DRM_EXYNOS_IPP_TASK_TYPE_DESTINATION	= 0x0002 << 16,
+};
+
+/**
+ * Memory buffer with image data.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_BUFFER
+ * other parameters are same as for AddFB2 generic DRM ioctl
+ */
+struct drm_exynos_ipp_task_buffer {
+	__u32	id;
+	__u32	fourcc;
+	__u32	width, height;
+	__u32	gem_id[4];
+	__u32	offset[4];
+	__u32	pitch[4];
+	__u64	modifier;
+};
+
+/**
+ * Rectangle for processing.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_RECTANGLE
+ * @reserved: padding
+ * @x,@y: left corner in pixels
+ * @w,@h: width/height in pixels
+ */
+struct drm_exynos_ipp_task_rect {
+	__u32	id;
+	__u32	reserved;
+	__u32	x;
+	__u32	y;
+	__u32	w;
+	__u32	h;
+};
+
+/**
+ * Image tranformation description.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_TRANSFORM
+ * @rotation: DRM_MODE_ROTATE_* and DRM_MODE_REFLECT_* values
+ */
+struct drm_exynos_ipp_task_transform {
+	__u32	id;
+	__u32	rotation;
+};
+
+/**
+ * Image global alpha configuration for formats without alpha values.
+ *
+ * @id: must be DRM_EXYNOS_IPP_TASK_ALPHA
+ * @value: global alpha value (0-255)
+ */
+struct drm_exynos_ipp_task_alpha {
+	__u32	id;
+	__u32	value;
+};
+
+enum drm_exynos_ipp_flag {
+	/* generate DRM event after processing */
+	DRM_EXYNOS_IPP_FLAG_EVENT	= 0x01,
+	/* dry run, only check task parameters */
+	DRM_EXYNOS_IPP_FLAG_TEST_ONLY	= 0x02,
+	/* non-blocking processing */
+	DRM_EXYNOS_IPP_FLAG_NONBLOCK	= 0x04,
+};
+
+#define DRM_EXYNOS_IPP_FLAGS (DRM_EXYNOS_IPP_FLAG_EVENT |\
+		DRM_EXYNOS_IPP_FLAG_TEST_ONLY | DRM_EXYNOS_IPP_FLAG_NONBLOCK)
+
+/**
+ * Perform image processing described by array of drm_exynos_ipp_task_*
+ * structures (parameters array).
+ *
+ * @ipp_id: id of IPP module to run the task
+ * @flags: bitmask of drm_exynos_ipp_flag values
+ * @reserved: padding
+ * @params_size: size of parameters array (in bytes)
+ * @params_ptr: pointer to parameters array or NULL
+ * @user_data: (optional) data for drm event
+ */
+struct drm_exynos_ioctl_ipp_commit {
+	__u32 ipp_id;
+	__u32 flags;
+	__u32 reserved;
+	__u32 params_size;
+	__u64 params_ptr;
+	__u64 user_data;
+};
+
 #define DRM_EXYNOS_GEM_CREATE		0x00
 #define DRM_EXYNOS_GEM_MAP		0x01
 /* Reserved 0x03 ~ 0x05 for exynos specific gem ioctl */
@@ -147,6 +360,11 @@ struct drm_exynos_g2d_exec {
 #define DRM_EXYNOS_G2D_EXEC		0x22
 
 /* Reserved 0x30 ~ 0x33 for obsolete Exynos IPP ioctls */
+/* IPP - Image Post Processing */
+#define DRM_EXYNOS_IPP_GET_RESOURCES	0x40
+#define DRM_EXYNOS_IPP_GET_CAPS		0x41
+#define DRM_EXYNOS_IPP_GET_LIMITS	0x42
+#define DRM_EXYNOS_IPP_COMMIT		0x43
 
 #define DRM_IOCTL_EXYNOS_GEM_CREATE		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_GEM_CREATE, struct drm_exynos_gem_create)
@@ -165,8 +383,20 @@ struct drm_exynos_g2d_exec {
 #define DRM_IOCTL_EXYNOS_G2D_EXEC		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_G2D_EXEC, struct drm_exynos_g2d_exec)
 
+#define DRM_IOCTL_EXYNOS_IPP_GET_RESOURCES	DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_RESOURCES, \
+		struct drm_exynos_ioctl_ipp_get_res)
+#define DRM_IOCTL_EXYNOS_IPP_GET_CAPS		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_CAPS, struct drm_exynos_ioctl_ipp_get_caps)
+#define DRM_IOCTL_EXYNOS_IPP_GET_LIMITS		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_GET_LIMITS, \
+		struct drm_exynos_ioctl_ipp_get_limits)
+#define DRM_IOCTL_EXYNOS_IPP_COMMIT		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_IPP_COMMIT, struct drm_exynos_ioctl_ipp_commit)
+
 /* EXYNOS specific events */
 #define DRM_EXYNOS_G2D_EVENT		0x80000000
+#define DRM_EXYNOS_IPP_EVENT		0x80000002
 
 struct drm_exynos_g2d_event {
 	struct drm_event	base;
@@ -177,6 +407,16 @@ struct drm_exynos_g2d_event {
 	__u32			reserved;
 };
 
+struct drm_exynos_ipp_event {
+	struct drm_event	base;
+	__u64			user_data;
+	__u32			tv_sec;
+	__u32			tv_usec;
+	__u32			ipp_id;
+	__u32			sequence;
+	__u64			reserved;
+};
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 16e452aa12d4..cf9dcf040321 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -110,9 +110,17 @@ enum drm_i915_gem_engine_class {
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
 	I915_SAMPLE_WAIT = 1,
-	I915_SAMPLE_SEMA = 2
+	I915_SAMPLE_SEMA = 2,
+	I915_SAMPLE_QUEUED = 3,
+	I915_SAMPLE_RUNNABLE = 4,
+	I915_SAMPLE_RUNNING = 5,
 };
 
+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (1000)
+#define I915_SAMPLE_RUNNABLE_DIVISOR (1000)
+#define I915_SAMPLE_RUNNING_DIVISOR (1000)
+
 #define I915_PMU_SAMPLE_BITS (4)
 #define I915_PMU_SAMPLE_MASK (0xf)
 #define I915_PMU_SAMPLE_INSTANCE_BITS (8)
@@ -133,6 +141,15 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_ENGINE_SEMA(class, instance) \
 	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
+#define I915_PMU_ENGINE_QUEUED(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED)
+
+#define I915_PMU_ENGINE_RUNNABLE(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNABLE)
+
+#define I915_PMU_ENGINE_RUNNING(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_RUNNING)
+
 #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
 #define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
@@ -412,6 +429,14 @@ typedef struct drm_i915_irq_wait {
 	int irq_seq;
 } drm_i915_irq_wait_t;
 
+/*
+ * Different modes of per-process Graphics Translation Table,
+ * see I915_PARAM_HAS_ALIASING_PPGTT
+ */
+#define I915_GEM_PPGTT_NONE	0
+#define I915_GEM_PPGTT_ALIASING	1
+#define I915_GEM_PPGTT_FULL	2
+
 /* Ioctl to query kernel params:
  */
 #define I915_PARAM_IRQ_ACTIVE            1
@@ -529,6 +554,28 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51
 
+/*
+ * Once upon a time we supposed that writes through the GGTT would be
+ * immediately in physical memory (once flushed out of the CPU path). However,
+ * on a few different processors and chipsets, this is not necessarily the case
+ * as the writes appear to be buffered internally. Thus a read of the backing
+ * storage (physical memory) via a different path (with different physical tags
+ * to the indirect write via the GGTT) will see stale values from before
+ * the GGTT write. Inside the kernel, we can for the most part keep track of
+ * the different read/write domains in use (e.g. set-domain), but the assumption
+ * of coherency is baked into the ABI, hence reporting its true state in this
+ * parameter.
+ *
+ * Reports true when writes via mmap_gtt are immediately visible following an
+ * lfence to flush the WCB.
+ *
+ * Reports false when writes via mmap_gtt are indeterminately delayed in an in
+ * internal buffer and are _not_ immediately visible to third parties accessing
+ * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC
+ * communications channel when reporting false is strongly disadvised.
+ */
+#define I915_PARAM_MMAP_GTT_COHERENT	52
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
diff --git a/include/drm-uapi/msm_drm.h b/include/drm-uapi/msm_drm.h
index bbbaffad772d..c06d0a5bdd80 100644
--- a/include/drm-uapi/msm_drm.h
+++ b/include/drm-uapi/msm_drm.h
@@ -201,10 +201,12 @@ struct drm_msm_gem_submit_bo {
 #define MSM_SUBMIT_NO_IMPLICIT   0x80000000 /* disable implicit sync */
 #define MSM_SUBMIT_FENCE_FD_IN   0x40000000 /* enable input fence_fd */
 #define MSM_SUBMIT_FENCE_FD_OUT  0x20000000 /* enable output fence_fd */
+#define MSM_SUBMIT_SUDO          0x10000000 /* run submitted cmds from RB */
 #define MSM_SUBMIT_FLAGS                ( \
 		MSM_SUBMIT_NO_IMPLICIT   | \
 		MSM_SUBMIT_FENCE_FD_IN   | \
 		MSM_SUBMIT_FENCE_FD_OUT  | \
+		MSM_SUBMIT_SUDO          | \
 		0)
 
 /* Each cmdstream submit consists of a table of buffers involved, and
diff --git a/include/drm-uapi/tegra_drm.h b/include/drm-uapi/tegra_drm.h
index 12f9bf848db1..6c07919c04e9 100644
--- a/include/drm-uapi/tegra_drm.h
+++ b/include/drm-uapi/tegra_drm.h
@@ -32,143 +32,615 @@ extern "C" {
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
+/**
+ * struct drm_tegra_gem_create - parameters for the GEM object creation IOCTL
+ */
 struct drm_tegra_gem_create {
+	/**
+	 * @size:
+	 *
+	 * The size, in bytes, of the buffer object to be created.
+	 */
 	__u64 size;
+
+	/**
+	 * @flags:
+	 *
+	 * A bitmask of flags that influence the creation of GEM objects:
+	 *
+	 * DRM_TEGRA_GEM_CREATE_TILED
+	 *   Use the 16x16 tiling format for this buffer.
+	 *
+	 * DRM_TEGRA_GEM_CREATE_BOTTOM_UP
+	 *   The buffer has a bottom-up layout.
+	 */
 	__u32 flags;
+
+	/**
+	 * @handle:
+	 *
+	 * The handle of the created GEM object. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u32 handle;
 };
 
+/**
+ * struct drm_tegra_gem_mmap - parameters for the GEM mmap IOCTL
+ */
 struct drm_tegra_gem_mmap {
+	/**
+	 * @handle:
+	 *
+	 * Handle of the GEM object to obtain an mmap offset for.
+	 */
 	__u32 handle;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
+
+	/**
+	 * @offset:
+	 *
+	 * The mmap offset for the given GEM object. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u64 offset;
 };
 
+/**
+ * struct drm_tegra_syncpt_read - parameters for the read syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_read {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to read the current value from.
+	 */
 	__u32 id;
+
+	/**
+	 * @value:
+	 *
+	 * The current syncpoint value. Set by the kernel upon successful
+	 * completion of the IOCTL.
+	 */
 	__u32 value;
 };
 
+/**
+ * struct drm_tegra_syncpt_incr - parameters for the increment syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_incr {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to increment.
+	 */
 	__u32 id;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_syncpt_wait - parameters for the wait syncpoint IOCTL
+ */
 struct drm_tegra_syncpt_wait {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to wait on.
+	 */
 	__u32 id;
+
+	/**
+	 * @thresh:
+	 *
+	 * Threshold value for which to wait.
+	 */
 	__u32 thresh;
+
+	/**
+	 * @timeout:
+	 *
+	 * Timeout, in milliseconds, to wait.
+	 */
 	__u32 timeout;
+
+	/**
+	 * @value:
+	 *
+	 * The new syncpoint value after the wait. Set by the kernel upon
+	 * successful completion of the IOCTL.
+	 */
 	__u32 value;
 };
 
 #define DRM_TEGRA_NO_TIMEOUT	(0xffffffff)
 
+/**
+ * struct drm_tegra_open_channel - parameters for the open channel IOCTL
+ */
 struct drm_tegra_open_channel {
+	/**
+	 * @client:
+	 *
+	 * The client ID for this channel.
+	 */
 	__u32 client;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
+
+	/**
+	 * @context:
+	 *
+	 * The application context of this channel. Set by the kernel upon
+	 * successful completion of the IOCTL. This context needs to be passed
+	 * to the DRM_TEGRA_CHANNEL_CLOSE or the DRM_TEGRA_SUBMIT IOCTLs.
+	 */
 	__u64 context;
 };
 
+/**
+ * struct drm_tegra_close_channel - parameters for the close channel IOCTL
+ */
 struct drm_tegra_close_channel {
+	/**
+	 * @context:
+	 *
+	 * The application context of this channel. This is obtained from the
+	 * DRM_TEGRA_OPEN_CHANNEL IOCTL.
+	 */
 	__u64 context;
 };
 
+/**
+ * struct drm_tegra_get_syncpt - parameters for the get syncpoint IOCTL
+ */
 struct drm_tegra_get_syncpt {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying the channel for which to obtain
+	 * the syncpoint ID.
+	 */
 	__u64 context;
+
+	/**
+	 * @index:
+	 *
+	 * Index of the client syncpoint for which to obtain the ID.
+	 */
 	__u32 index;
+
+	/**
+	 * @id:
+	 *
+	 * The ID of the given syncpoint. Set by the kernel upon successful
+	 * completion of the IOCTL.
+	 */
 	__u32 id;
 };
 
+/**
+ * struct drm_tegra_get_syncpt_base - parameters for the get wait base IOCTL
+ */
 struct drm_tegra_get_syncpt_base {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying for which channel to obtain the
+	 * wait base.
+	 */
 	__u64 context;
+
+	/**
+	 * @syncpt:
+	 *
+	 * ID of the syncpoint for which to obtain the wait base.
+	 */
 	__u32 syncpt;
+
+	/**
+	 * @id:
+	 *
+	 * The ID of the wait base corresponding to the client syncpoint. Set
+	 * by the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 id;
 };
 
+/**
+ * struct drm_tegra_syncpt - syncpoint increment operation
+ */
 struct drm_tegra_syncpt {
+	/**
+	 * @id:
+	 *
+	 * ID of the syncpoint to operate on.
+	 */
 	__u32 id;
+
+	/**
+	 * @incrs:
+	 *
+	 * Number of increments to perform for the syncpoint.
+	 */
 	__u32 incrs;
 };
 
+/**
+ * struct drm_tegra_cmdbuf - structure describing a command buffer
+ */
 struct drm_tegra_cmdbuf {
+	/**
+	 * @handle:
+	 *
+	 * Handle to a GEM object containing the command buffer.
+	 */
 	__u32 handle;
+
+	/**
+	 * @offset:
+	 *
+	 * Offset, in bytes, into the GEM object identified by @handle at
+	 * which the command buffer starts.
+	 */
 	__u32 offset;
+
+	/**
+	 * @words:
+	 *
+	 * Number of 32-bit words in this command buffer.
+	 */
 	__u32 words;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_reloc - GEM object relocation structure
+ */
 struct drm_tegra_reloc {
 	struct {
+		/**
+		 * @cmdbuf.handle:
+		 *
+		 * Handle to the GEM object containing the command buffer for
+		 * which to perform this GEM object relocation.
+		 */
 		__u32 handle;
+
+		/**
+		 * @cmdbuf.offset:
+		 *
+		 * Offset, in bytes, into the command buffer at which to
+		 * insert the relocated address.
+		 */
 		__u32 offset;
 	} cmdbuf;
 	struct {
+		/**
+		 * @target.handle:
+		 *
+		 * Handle to the GEM object to be relocated.
+		 */
 		__u32 handle;
+
+		/**
+		 * @target.offset:
+		 *
+		 * Offset, in bytes, into the target GEM object at which the
+		 * relocated data starts.
+		 */
 		__u32 offset;
 	} target;
+
+	/**
+	 * @shift:
+	 *
+	 * The number of bits by which to shift relocated addresses.
+	 */
 	__u32 shift;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_waitchk - wait check structure
+ */
 struct drm_tegra_waitchk {
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object containing a command stream on which to
+	 * perform the wait check.
+	 */
 	__u32 handle;
+
+	/**
+	 * @offset:
+	 *
+	 * Offset, in bytes, of the location in the command stream to perform
+	 * the wait check on.
+	 */
 	__u32 offset;
+
+	/**
+	 * @syncpt:
+	 *
+	 * ID of the syncpoint to wait check.
+	 */
 	__u32 syncpt;
+
+	/**
+	 * @thresh:
+	 *
+	 * Threshold value for which to check.
+	 */
 	__u32 thresh;
 };
 
+/**
+ * struct drm_tegra_submit - job submission structure
+ */
 struct drm_tegra_submit {
+	/**
+	 * @context:
+	 *
+	 * The application context identifying the channel to use for the
+	 * execution of this job.
+	 */
 	__u64 context;
+
+	/**
+	 * @num_syncpts:
+	 *
+	 * The number of syncpoints operated on by this job. This defines the
+	 * length of the array pointed to by @syncpts.
+	 */
 	__u32 num_syncpts;
+
+	/**
+	 * @num_cmdbufs:
+	 *
+	 * The number of command buffers to execute as part of this job. This
+	 * defines the length of the array pointed to by @cmdbufs.
+	 */
 	__u32 num_cmdbufs;
+
+	/**
+	 * @num_relocs:
+	 *
+	 * The number of relocations to perform before executing this job.
+	 * This defines the length of the array pointed to by @relocs.
+	 */
 	__u32 num_relocs;
+
+	/**
+	 * @num_waitchks:
+	 *
+	 * The number of wait checks to perform as part of this job. This
+	 * defines the length of the array pointed to by @waitchks.
+	 */
 	__u32 num_waitchks;
+
+	/**
+	 * @waitchk_mask:
+	 *
+	 * Bitmask of valid wait checks.
+	 */
 	__u32 waitchk_mask;
+
+	/**
+	 * @timeout:
+	 *
+	 * Timeout, in milliseconds, before this job is cancelled.
+	 */
 	__u32 timeout;
+
+	/**
+	 * @syncpts:
+	 *
+	 * A pointer to an array of &struct drm_tegra_syncpt structures that
+	 * specify the syncpoint operations performed as part of this job.
+	 * The number of elements in the array must be equal to the value
+	 * given by @num_syncpts.
+	 */
 	__u64 syncpts;
+
+	/**
+	 * @cmdbufs:
+	 *
+	 * A pointer to an array of &struct drm_tegra_cmdbuf structures that
+	 * define the command buffers to execute as part of this job. The
+	 * number of elements in the array must be equal to the value given
+	 * by @num_syncpts.
+	 */
 	__u64 cmdbufs;
+
+	/**
+	 * @relocs:
+	 *
+	 * A pointer to an array of &struct drm_tegra_reloc structures that
+	 * specify the relocations that need to be performed before executing
+	 * this job. The number of elements in the array must be equal to the
+	 * value given by @num_relocs.
+	 */
 	__u64 relocs;
+
+	/**
+	 * @waitchks:
+	 *
+	 * A pointer to an array of &struct drm_tegra_waitchk structures that
+	 * specify the wait checks to be performed while executing this job.
+	 * The number of elements in the array must be equal to the value
+	 * given by @num_waitchks.
+	 */
 	__u64 waitchks;
-	__u32 fence;		/* Return value */
 
-	__u32 reserved[5];	/* future expansion */
+	/**
+	 * @fence:
+	 *
+	 * The threshold of the syncpoint associated with this job after it
+	 * has been completed. Set by the kernel upon successful completion of
+	 * the IOCTL. This can be used with the DRM_TEGRA_SYNCPT_WAIT IOCTL to
+	 * wait for this job to be finished.
+	 */
+	__u32 fence;
+
+	/**
+	 * @reserved:
+	 *
+	 * This field is reserved for future use. Must be 0.
+	 */
+	__u32 reserved[5];
 };
 
 #define DRM_TEGRA_GEM_TILING_MODE_PITCH 0
 #define DRM_TEGRA_GEM_TILING_MODE_TILED 1
 #define DRM_TEGRA_GEM_TILING_MODE_BLOCK 2
 
+/**
+ * struct drm_tegra_gem_set_tiling - parameters for the set tiling IOCTL
+ */
 struct drm_tegra_gem_set_tiling {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to set the tiling parameters.
+	 */
 	__u32 handle;
+
+	/**
+	 * @mode:
+	 *
+	 * The tiling mode to set. Must be one of:
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_PITCH
+	 *   pitch linear format
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_TILED
+	 *   16x16 tiling format
+	 *
+	 * DRM_TEGRA_GEM_TILING_MODE_BLOCK
+	 *   16Bx2 tiling format
+	 */
 	__u32 mode;
+
+	/**
+	 * @value:
+	 *
+	 * The value to set for the tiling mode parameter.
+	 */
 	__u32 value;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
+/**
+ * struct drm_tegra_gem_get_tiling - parameters for the get tiling IOCTL
+ */
 struct drm_tegra_gem_get_tiling {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to query the tiling parameters.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @mode:
+	 *
+	 * The tiling mode currently associated with the GEM object. Set by
+	 * the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 mode;
+
+	/**
+	 * @value:
+	 *
+	 * The tiling mode parameter currently associated with the GEM object.
+	 * Set by the kernel upon successful completion of the IOCTL.
+	 */
 	__u32 value;
+
+	/**
+	 * @pad:
+	 *
+	 * Structure padding that may be used in the future. Must be 0.
+	 */
 	__u32 pad;
 };
 
 #define DRM_TEGRA_GEM_BOTTOM_UP		(1 << 0)
 #define DRM_TEGRA_GEM_FLAGS		(DRM_TEGRA_GEM_BOTTOM_UP)
 
+/**
+ * struct drm_tegra_gem_set_flags - parameters for the set flags IOCTL
+ */
 struct drm_tegra_gem_set_flags {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to set the flags.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @flags:
+	 *
+	 * The flags to set for the GEM object.
+	 */
 	__u32 flags;
 };
 
+/**
+ * struct drm_tegra_gem_get_flags - parameters for the get flags IOCTL
+ */
 struct drm_tegra_gem_get_flags {
-	/* input */
+	/**
+	 * @handle:
+	 *
+	 * Handle to the GEM object for which to query the flags.
+	 */
 	__u32 handle;
-	/* output */
+
+	/**
+	 * @flags:
+	 *
+	 * The flags currently associated with the GEM object. Set by the
+	 * kernel upon successful completion of the IOCTL.
+	 */
 	__u32 flags;
 };
 
@@ -193,7 +665,7 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
 #define DRM_IOCTL_TEGRA_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_OPEN_CHANNEL, struct drm_tegra_open_channel)
-#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_open_channel)
+#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_close_channel)
 #define DRM_IOCTL_TEGRA_GET_SYNCPT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT, struct drm_tegra_get_syncpt)
 #define DRM_IOCTL_TEGRA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SUBMIT, struct drm_tegra_submit)
 #define DRM_IOCTL_TEGRA_GET_SYNCPT_BASE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT_BASE, struct drm_tegra_get_syncpt_base)
diff --git a/include/drm-uapi/v3d_drm.h b/include/drm-uapi/v3d_drm.h
new file mode 100644
index 000000000000..7b6627783608
--- /dev/null
+++ b/include/drm-uapi/v3d_drm.h
@@ -0,0 +1,194 @@
+/*
+ * Copyright © 2014-2018 Broadcom
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef _V3D_DRM_H_
+#define _V3D_DRM_H_
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define DRM_V3D_SUBMIT_CL                         0x00
+#define DRM_V3D_WAIT_BO                           0x01
+#define DRM_V3D_CREATE_BO                         0x02
+#define DRM_V3D_MMAP_BO                           0x03
+#define DRM_V3D_GET_PARAM                         0x04
+#define DRM_V3D_GET_BO_OFFSET                     0x05
+
+#define DRM_IOCTL_V3D_SUBMIT_CL           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
+#define DRM_IOCTL_V3D_WAIT_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
+#define DRM_IOCTL_V3D_CREATE_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_CREATE_BO, struct drm_v3d_create_bo)
+#define DRM_IOCTL_V3D_MMAP_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_MMAP_BO, struct drm_v3d_mmap_bo)
+#define DRM_IOCTL_V3D_GET_PARAM           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_PARAM, struct drm_v3d_get_param)
+#define DRM_IOCTL_V3D_GET_BO_OFFSET       DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_BO_OFFSET, struct drm_v3d_get_bo_offset)
+
+/**
+ * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
+ * engine.
+ *
+ * This asks the kernel to have the GPU execute an optional binner
+ * command list, and a render command list.
+ */
+struct drm_v3d_submit_cl {
+	/* Pointer to the binner command list.
+	 *
+	 * This is the first set of commands executed, which runs the
+	 * coordinate shader to determine where primitives land on the screen,
+	 * then writes out the state updates and draw calls necessary per tile
+	 * to the tile allocation BO.
+	 */
+	__u32 bcl_start;
+
+	 /** End address of the BCL (first byte after the BCL) */
+	__u32 bcl_end;
+
+	/* Offset of the render command list.
+	 *
+	 * This is the second set of commands executed, which will either
+	 * execute the tiles that have been set up by the BCL, or a fixed set
+	 * of tiles (in the case of RCL-only blits).
+	 */
+	__u32 rcl_start;
+
+	 /** End address of the RCL (first byte after the RCL) */
+	__u32 rcl_end;
+
+	/** An optional sync object to wait on before starting the BCL. */
+	__u32 in_sync_bcl;
+	/** An optional sync object to wait on before starting the RCL. */
+	__u32 in_sync_rcl;
+	/** An optional sync object to place the completion fence in. */
+	__u32 out_sync;
+
+	/* Offset of the tile alloc memory
+	 *
+	 * This is optional on V3D 3.3 (where the CL can set the value) but
+	 * required on V3D 4.1.
+	 */
+	__u32 qma;
+
+	/** Size of the tile alloc memory. */
+	__u32 qms;
+
+	/** Offset of the tile state data array. */
+	__u32 qts;
+
+	/* Pointer to a u32 array of the BOs that are referenced by the job.
+	 */
+	__u64 bo_handles;
+
+	/* Number of BO handles passed in (size is that times 4). */
+	__u32 bo_handle_count;
+
+	/* Pad, must be zero-filled. */
+	__u32 pad;
+};
+
+/**
+ * struct drm_v3d_wait_bo - ioctl argument for waiting for
+ * completion of the last DRM_V3D_SUBMIT_CL on a BO.
+ *
+ * This is useful for cases where multiple processes might be
+ * rendering to a BO and you want to wait for all rendering to be
+ * completed.
+ */
+struct drm_v3d_wait_bo {
+	__u32 handle;
+	__u32 pad;
+	__u64 timeout_ns;
+};
+
+/**
+ * struct drm_v3d_create_bo - ioctl argument for creating V3D BOs.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_v3d_create_bo {
+	__u32 size;
+	__u32 flags;
+	/** Returned GEM handle for the BO. */
+	__u32 handle;
+	/**
+	 * Returned offset for the BO in the V3D address space.  This offset
+	 * is private to the DRM fd and is valid for the lifetime of the GEM
+	 * handle.
+	 *
+	 * This offset value will always be nonzero, since various HW
+	 * units treat 0 specially.
+	 */
+	__u32 offset;
+};
+
+/**
+ * struct drm_v3d_mmap_bo - ioctl argument for mapping V3D BOs.
+ *
+ * This doesn't actually perform an mmap.  Instead, it returns the
+ * offset you need to use in an mmap on the DRM device node.  This
+ * means that tools like valgrind end up knowing about the mapped
+ * memory.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_v3d_mmap_bo {
+	/** Handle for the object being mapped. */
+	__u32 handle;
+	__u32 flags;
+	/** offset into the drm node to use for subsequent mmap call. */
+	__u64 offset;
+};
+
+enum drm_v3d_param {
+	DRM_V3D_PARAM_V3D_UIFCFG,
+	DRM_V3D_PARAM_V3D_HUB_IDENT1,
+	DRM_V3D_PARAM_V3D_HUB_IDENT2,
+	DRM_V3D_PARAM_V3D_HUB_IDENT3,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT0,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT1,
+	DRM_V3D_PARAM_V3D_CORE0_IDENT2,
+};
+
+struct drm_v3d_get_param {
+	__u32 param;
+	__u32 pad;
+	__u64 value;
+};
+
+/**
+ * Returns the offset for the BO in the V3D address space for this DRM fd.
+ * This is the same value returned by drm_v3d_create_bo, if that was called
+ * from this DRM fd.
+ */
+struct drm_v3d_get_bo_offset {
+	__u32 handle;
+	__u32 offset;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* _V3D_DRM_H_ */
diff --git a/include/drm-uapi/vc4_drm.h b/include/drm-uapi/vc4_drm.h
index 4117117b4204..31f50de39acb 100644
--- a/include/drm-uapi/vc4_drm.h
+++ b/include/drm-uapi/vc4_drm.h
@@ -183,10 +183,17 @@ struct drm_vc4_submit_cl {
 	/* ID of the perfmon to attach to this job. 0 means no perfmon. */
 	__u32 perfmonid;
 
-	/* Unused field to align this struct on 64 bits. Must be set to 0.
-	 * If one ever needs to add an u32 field to this struct, this field
-	 * can be used.
+	/* Syncobj handle to wait on. If set, processing of this render job
+	 * will not start until the syncobj is signaled. 0 means ignore.
 	 */
+	__u32 in_sync;
+
+	/* Syncobj handle to export fence to. If set, the fence in the syncobj
+	 * will be replaced with a fence that signals upon completion of this
+	 * render job. 0 means ignore.
+	 */
+	__u32 out_sync;
+
 	__u32 pad2;
 };
 
diff --git a/include/drm-uapi/virtgpu_drm.h b/include/drm-uapi/virtgpu_drm.h
index 91a31ffed828..9a781f0611df 100644
--- a/include/drm-uapi/virtgpu_drm.h
+++ b/include/drm-uapi/virtgpu_drm.h
@@ -63,6 +63,7 @@ struct drm_virtgpu_execbuffer {
 };
 
 #define VIRTGPU_PARAM_3D_FEATURES 1 /* do we have 3D features in the hw */
+#define VIRTGPU_PARAM_CAPSET_QUERY_FIX 2 /* do we have the capset fix */
 
 struct drm_virtgpu_getparam {
 	__u64 param;
diff --git a/include/drm-uapi/vmwgfx_drm.h b/include/drm-uapi/vmwgfx_drm.h
index 0bc784f5e0db..399f58317cff 100644
--- a/include/drm-uapi/vmwgfx_drm.h
+++ b/include/drm-uapi/vmwgfx_drm.h
@@ -40,6 +40,7 @@ extern "C" {
 
 #define DRM_VMW_GET_PARAM            0
 #define DRM_VMW_ALLOC_DMABUF         1
+#define DRM_VMW_ALLOC_BO             1
 #define DRM_VMW_UNREF_DMABUF         2
 #define DRM_VMW_HANDLE_CLOSE         2
 #define DRM_VMW_CURSOR_BYPASS        3
@@ -68,6 +69,8 @@ extern "C" {
 #define DRM_VMW_GB_SURFACE_REF       24
 #define DRM_VMW_SYNCCPU              25
 #define DRM_VMW_CREATE_EXTENDED_CONTEXT 26
+#define DRM_VMW_GB_SURFACE_CREATE_EXT   27
+#define DRM_VMW_GB_SURFACE_REF_EXT      28
 
 /*************************************************************************/
 /**
@@ -79,6 +82,9 @@ extern "C" {
  *
  * DRM_VMW_PARAM_OVERLAY_IOCTL:
  * Does the driver support the overlay ioctl.
+ *
+ * DRM_VMW_PARAM_SM4_1
+ * SM4_1 support is enabled.
  */
 
 #define DRM_VMW_PARAM_NUM_STREAMS      0
@@ -94,6 +100,8 @@ extern "C" {
 #define DRM_VMW_PARAM_MAX_MOB_SIZE     10
 #define DRM_VMW_PARAM_SCREEN_TARGET    11
 #define DRM_VMW_PARAM_DX               12
+#define DRM_VMW_PARAM_HW_CAPS2         13
+#define DRM_VMW_PARAM_SM4_1            14
 
 /**
  * enum drm_vmw_handle_type - handle type for ref ioctls
@@ -356,9 +364,9 @@ struct drm_vmw_fence_rep {
 
 /*************************************************************************/
 /**
- * DRM_VMW_ALLOC_DMABUF
+ * DRM_VMW_ALLOC_BO
  *
- * Allocate a DMA buffer that is visible also to the host.
+ * Allocate a buffer object that is visible also to the host.
  * NOTE: The buffer is
  * identified by a handle and an offset, which are private to the guest, but
  * useable in the command stream. The guest kernel may translate these
@@ -366,27 +374,28 @@ struct drm_vmw_fence_rep {
  * be zero at all times, or it may disappear from the interface before it is
  * fixed.
  *
- * The DMA buffer may stay user-space mapped in the guest at all times,
+ * The buffer object may stay user-space mapped in the guest at all times,
  * and is thus suitable for sub-allocation.
  *
- * DMA buffers are mapped using the mmap() syscall on the drm device.
+ * Buffer objects are mapped using the mmap() syscall on the drm device.
  */
 
 /**
- * struct drm_vmw_alloc_dmabuf_req
+ * struct drm_vmw_alloc_bo_req
  *
  * @size: Required minimum size of the buffer.
  *
- * Input data to the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Input data to the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-struct drm_vmw_alloc_dmabuf_req {
+struct drm_vmw_alloc_bo_req {
 	__u32 size;
 	__u32 pad64;
 };
+#define drm_vmw_alloc_dmabuf_req drm_vmw_alloc_bo_req
 
 /**
- * struct drm_vmw_dmabuf_rep
+ * struct drm_vmw_bo_rep
  *
  * @map_handle: Offset to use in the mmap() call used to map the buffer.
  * @handle: Handle unique to this buffer. Used for unreferencing.
@@ -395,50 +404,32 @@ struct drm_vmw_alloc_dmabuf_req {
  * @cur_gmr_offset: Offset to use in the command stream when this buffer is
  * referenced. See note above.
  *
- * Output data from the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Output data from the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-struct drm_vmw_dmabuf_rep {
+struct drm_vmw_bo_rep {
 	__u64 map_handle;
 	__u32 handle;
 	__u32 cur_gmr_id;
 	__u32 cur_gmr_offset;
 	__u32 pad64;
 };
+#define drm_vmw_dmabuf_rep drm_vmw_bo_rep
 
 /**
- * union drm_vmw_dmabuf_arg
+ * union drm_vmw_alloc_bo_arg
  *
  * @req: Input data as described above.
  * @rep: Output data as described above.
  *
- * Argument to the DRM_VMW_ALLOC_DMABUF Ioctl.
+ * Argument to the DRM_VMW_ALLOC_BO Ioctl.
  */
 
-union drm_vmw_alloc_dmabuf_arg {
-	struct drm_vmw_alloc_dmabuf_req req;
-	struct drm_vmw_dmabuf_rep rep;
-};
-
-/*************************************************************************/
-/**
- * DRM_VMW_UNREF_DMABUF - Free a DMA buffer.
- *
- */
-
-/**
- * struct drm_vmw_unref_dmabuf_arg
- *
- * @handle: Handle indicating what buffer to free. Obtained from the
- * DRM_VMW_ALLOC_DMABUF Ioctl.
- *
- * Argument to the DRM_VMW_UNREF_DMABUF Ioctl.
- */
-
-struct drm_vmw_unref_dmabuf_arg {
-	__u32 handle;
-	__u32 pad64;
+union drm_vmw_alloc_bo_arg {
+	struct drm_vmw_alloc_bo_req req;
+	struct drm_vmw_bo_rep rep;
 };
+#define drm_vmw_alloc_dmabuf_arg drm_vmw_alloc_bo_arg
 
 /*************************************************************************/
 /**
@@ -1103,9 +1094,8 @@ union drm_vmw_extended_context_arg {
  * DRM_VMW_HANDLE_CLOSE - Close a user-space handle and release its
  * underlying resource.
  *
- * Note that this ioctl is overlaid on the DRM_VMW_UNREF_DMABUF Ioctl.
- * The ioctl arguments therefore need to be identical in layout.
- *
+ * Note that this ioctl is overlaid on the deprecated DRM_VMW_UNREF_DMABUF
+ * Ioctl.
  */
 
 /**
@@ -1119,7 +1109,107 @@ struct drm_vmw_handle_close_arg {
 	__u32 handle;
 	__u32 pad64;
 };
+#define drm_vmw_unref_dmabuf_arg drm_vmw_handle_close_arg
+
+/*************************************************************************/
+/**
+ * DRM_VMW_GB_SURFACE_CREATE_EXT - Create a host guest-backed surface.
+ *
+ * Allocates a surface handle and queues a create surface command
+ * for the host on the first use of the surface. The surface ID can
+ * be used as the surface ID in commands referencing the surface.
+ *
+ * This new command extends DRM_VMW_GB_SURFACE_CREATE by adding version
+ * parameter and 64 bit svga flag.
+ */
+
+/**
+ * enum drm_vmw_surface_version
+ *
+ * @drm_vmw_surface_gb_v1: Corresponds to current gb surface format with
+ * svga3d surface flags split into 2, upper half and lower half.
+ */
+enum drm_vmw_surface_version {
+	drm_vmw_gb_surface_v1
+};
+
+/**
+ * struct drm_vmw_gb_surface_create_ext_req
+ *
+ * @base: Surface create parameters.
+ * @version: Version of surface create ioctl.
+ * @svga3d_flags_upper_32_bits: Upper 32 bits of svga3d flags.
+ * @multisample_pattern: Multisampling pattern when msaa is supported.
+ * @quality_level: Precision settings for each sample.
+ * @must_be_zero: Reserved for future usage.
+ *
+ * Input argument to the  DRM_VMW_GB_SURFACE_CREATE_EXT Ioctl.
+ * Part of output argument for the DRM_VMW_GB_SURFACE_REF_EXT Ioctl.
+ */
+struct drm_vmw_gb_surface_create_ext_req {
+	struct drm_vmw_gb_surface_create_req base;
+	enum drm_vmw_surface_version version;
+	uint32_t svga3d_flags_upper_32_bits;
+	SVGA3dMSPattern multisample_pattern;
+	SVGA3dMSQualityLevel quality_level;
+	uint64_t must_be_zero;
+};
+
+/**
+ * union drm_vmw_gb_surface_create_ext_arg
+ *
+ * @req: Input argument as described above.
+ * @rep: Output argument as described above.
+ *
+ * Argument to the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ */
+union drm_vmw_gb_surface_create_ext_arg {
+	struct drm_vmw_gb_surface_create_rep rep;
+	struct drm_vmw_gb_surface_create_ext_req req;
+};
+
+/*************************************************************************/
+/**
+ * DRM_VMW_GB_SURFACE_REF_EXT - Reference a host surface.
+ *
+ * Puts a reference on a host surface with a given handle, as previously
+ * returned by the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ * A reference will make sure the surface isn't destroyed while we hold
+ * it and will allow the calling client to use the surface handle in
+ * the command stream.
+ *
+ * On successful return, the Ioctl returns the surface information given
+ * to and returned from the DRM_VMW_GB_SURFACE_CREATE_EXT ioctl.
+ */
 
+/**
+ * struct drm_vmw_gb_surface_ref_ext_rep
+ *
+ * @creq: The data used as input when the surface was created, as described
+ *        above at "struct drm_vmw_gb_surface_create_ext_req"
+ * @crep: Additional data output when the surface was created, as described
+ *        above at "struct drm_vmw_gb_surface_create_rep"
+ *
+ * Output Argument to the DRM_VMW_GB_SURFACE_REF_EXT ioctl.
+ */
+struct drm_vmw_gb_surface_ref_ext_rep {
+	struct drm_vmw_gb_surface_create_ext_req creq;
+	struct drm_vmw_gb_surface_create_rep crep;
+};
+
+/**
+ * union drm_vmw_gb_surface_reference_ext_arg
+ *
+ * @req: Input data as described above at "struct drm_vmw_surface_arg"
+ * @rep: Output data as described above at
+ *       "struct drm_vmw_gb_surface_ref_ext_rep"
+ *
+ * Argument to the DRM_VMW_GB_SURFACE_REF Ioctl.
+ */
+union drm_vmw_gb_surface_reference_ext_arg {
+	struct drm_vmw_gb_surface_ref_ext_rep rep;
+	struct drm_vmw_surface_arg req;
+};
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC i-g-t 2/6] intel-gpu-overlay: Add engine queue stats
  2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Use new PMU engine queue stats (queued, runnable and running) and display
them per engine.

v2:
 * Compact per engine stats. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gpu-top.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 overlay/gpu-top.h | 11 +++++++++++
 overlay/overlay.c |  7 +++++++
 3 files changed, 60 insertions(+)

diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 61b8f62fd78c..22e9badb22c1 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -72,6 +72,18 @@ static int perf_init(struct gpu_top *gt)
 				 gt->fd) >= 0)
 		gt->have_sema = 1;
 
+	if (perf_i915_open_group(I915_PMU_ENGINE_QUEUED(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_queued = 1;
+
+	if (perf_i915_open_group(I915_PMU_ENGINE_RUNNABLE(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_runnable = 1;
+
+	if (perf_i915_open_group(I915_PMU_ENGINE_RUNNING(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_running = 1;
+
 	gt->ring[0].name = d->name;
 	gt->num_rings = 1;
 
@@ -93,6 +105,24 @@ static int perf_init(struct gpu_top *gt)
 				   gt->fd) < 0)
 			return -1;
 
+		if (gt->have_queued &&
+		    perf_i915_open_group(I915_PMU_ENGINE_QUEUED(d->class,
+								d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		if (gt->have_runnable &&
+		    perf_i915_open_group(I915_PMU_ENGINE_RUNNABLE(d->class,
+								  d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		if (gt->have_running &&
+		    perf_i915_open_group(I915_PMU_ENGINE_RUNNING(d->class,
+								 d->inst),
+				   gt->fd) < 0)
+			return -1;
+
 		gt->ring[gt->num_rings++].name = d->name;
 	}
 
@@ -298,6 +328,12 @@ int gpu_top_update(struct gpu_top *gt)
 				s->wait[n] = sample[m++];
 			if (gt->have_sema)
 				s->sema[n] = sample[m++];
+			if (gt->have_queued)
+				s->queued[n] = sample[m++];
+			if (gt->have_runnable)
+				s->runnable[n] = sample[m++];
+			if (gt->have_running)
+				s->running[n] = sample[m++];
 		}
 
 		if (gt->count == 1)
@@ -310,6 +346,12 @@ int gpu_top_update(struct gpu_top *gt)
 				gt->ring[n].u.u.wait = (100 * (s->wait[n] - d->wait[n]) + d_time/2) / d_time;
 			if (gt->have_sema)
 				gt->ring[n].u.u.sema = (100 * (s->sema[n] - d->sema[n]) + d_time/2) / d_time;
+			if (gt->have_queued)
+				gt->ring[n].queued = (double)((s->queued[n] - d->queued[n])) / I915_SAMPLE_QUEUED_DIVISOR * 1e9 / d_time;
+			if (gt->have_runnable)
+				gt->ring[n].runnable = (double)((s->runnable[n] - d->runnable[n])) / I915_SAMPLE_RUNNABLE_DIVISOR  * 1e9 / d_time;
+			if (gt->have_running)
+				gt->ring[n].running = (double)((s->running[n] - d->running[n])) / I915_SAMPLE_RUNNING_DIVISOR * 1e9 / d_time;
 
 			/* in case of rounding + sampling errors, fudge */
 			if (gt->ring[n].u.u.busy > 100)
diff --git a/overlay/gpu-top.h b/overlay/gpu-top.h
index d3cdd779760f..cb4310c82a94 100644
--- a/overlay/gpu-top.h
+++ b/overlay/gpu-top.h
@@ -36,6 +36,9 @@ struct gpu_top {
 	int num_rings;
 	int have_wait;
 	int have_sema;
+	int have_queued;
+	int have_runnable;
+	int have_running;
 
 	struct gpu_top_ring {
 		const char *name;
@@ -47,6 +50,10 @@ struct gpu_top {
 			} u;
 			uint32_t payload;
 		} u;
+
+		double queued;
+		double runnable;
+		double running;
 	} ring[MAX_RINGS];
 
 	struct gpu_top_stat {
@@ -54,7 +61,11 @@ struct gpu_top {
 		uint64_t busy[MAX_RINGS];
 		uint64_t wait[MAX_RINGS];
 		uint64_t sema[MAX_RINGS];
+		uint64_t queued[MAX_RINGS];
+		uint64_t runnable[MAX_RINGS];
+		uint64_t running[MAX_RINGS];
 	} stat[2];
+
 	int count;
 };
 
diff --git a/overlay/overlay.c b/overlay/overlay.c
index eae5ddfa8823..7087d5e5b24e 100644
--- a/overlay/overlay.c
+++ b/overlay/overlay.c
@@ -256,6 +256,13 @@ static void show_gpu_top(struct overlay_context *ctx, struct overlay_gpu_top *gt
 		len = sprintf(txt, "%s: %3d%% busy",
 			      gt->gpu_top.ring[n].name,
 			      gt->gpu_top.ring[n].u.u.busy);
+		if (gt->gpu_top.have_queued &&
+		    gt->gpu_top.have_runnable &&
+		    gt->gpu_top.have_running)
+			len += sprintf(txt + len, " (%.2f / %.2f / %.2f)",
+				       gt->gpu_top.ring[n].queued,
+				       gt->gpu_top.ring[n].runnable,
+				       gt->gpu_top.ring[n].running);
 		if (gt->gpu_top.ring[n].u.u.wait)
 			len += sprintf(txt + len, ", %d%% wait",
 				       gt->gpu_top.ring[n].u.u.wait);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] [RFC i-g-t 2/6] intel-gpu-overlay: Add engine queue stats
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Use new PMU engine queue stats (queued, runnable and running) and display
them per engine.

v2:
 * Compact per engine stats. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gpu-top.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 overlay/gpu-top.h | 11 +++++++++++
 overlay/overlay.c |  7 +++++++
 3 files changed, 60 insertions(+)

diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 61b8f62fd78c..22e9badb22c1 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -72,6 +72,18 @@ static int perf_init(struct gpu_top *gt)
 				 gt->fd) >= 0)
 		gt->have_sema = 1;
 
+	if (perf_i915_open_group(I915_PMU_ENGINE_QUEUED(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_queued = 1;
+
+	if (perf_i915_open_group(I915_PMU_ENGINE_RUNNABLE(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_runnable = 1;
+
+	if (perf_i915_open_group(I915_PMU_ENGINE_RUNNING(d->class, d->inst),
+				 gt->fd) >= 0)
+		gt->have_running = 1;
+
 	gt->ring[0].name = d->name;
 	gt->num_rings = 1;
 
@@ -93,6 +105,24 @@ static int perf_init(struct gpu_top *gt)
 				   gt->fd) < 0)
 			return -1;
 
+		if (gt->have_queued &&
+		    perf_i915_open_group(I915_PMU_ENGINE_QUEUED(d->class,
+								d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		if (gt->have_runnable &&
+		    perf_i915_open_group(I915_PMU_ENGINE_RUNNABLE(d->class,
+								  d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		if (gt->have_running &&
+		    perf_i915_open_group(I915_PMU_ENGINE_RUNNING(d->class,
+								 d->inst),
+				   gt->fd) < 0)
+			return -1;
+
 		gt->ring[gt->num_rings++].name = d->name;
 	}
 
@@ -298,6 +328,12 @@ int gpu_top_update(struct gpu_top *gt)
 				s->wait[n] = sample[m++];
 			if (gt->have_sema)
 				s->sema[n] = sample[m++];
+			if (gt->have_queued)
+				s->queued[n] = sample[m++];
+			if (gt->have_runnable)
+				s->runnable[n] = sample[m++];
+			if (gt->have_running)
+				s->running[n] = sample[m++];
 		}
 
 		if (gt->count == 1)
@@ -310,6 +346,12 @@ int gpu_top_update(struct gpu_top *gt)
 				gt->ring[n].u.u.wait = (100 * (s->wait[n] - d->wait[n]) + d_time/2) / d_time;
 			if (gt->have_sema)
 				gt->ring[n].u.u.sema = (100 * (s->sema[n] - d->sema[n]) + d_time/2) / d_time;
+			if (gt->have_queued)
+				gt->ring[n].queued = (double)((s->queued[n] - d->queued[n])) / I915_SAMPLE_QUEUED_DIVISOR * 1e9 / d_time;
+			if (gt->have_runnable)
+				gt->ring[n].runnable = (double)((s->runnable[n] - d->runnable[n])) / I915_SAMPLE_RUNNABLE_DIVISOR  * 1e9 / d_time;
+			if (gt->have_running)
+				gt->ring[n].running = (double)((s->running[n] - d->running[n])) / I915_SAMPLE_RUNNING_DIVISOR * 1e9 / d_time;
 
 			/* in case of rounding + sampling errors, fudge */
 			if (gt->ring[n].u.u.busy > 100)
diff --git a/overlay/gpu-top.h b/overlay/gpu-top.h
index d3cdd779760f..cb4310c82a94 100644
--- a/overlay/gpu-top.h
+++ b/overlay/gpu-top.h
@@ -36,6 +36,9 @@ struct gpu_top {
 	int num_rings;
 	int have_wait;
 	int have_sema;
+	int have_queued;
+	int have_runnable;
+	int have_running;
 
 	struct gpu_top_ring {
 		const char *name;
@@ -47,6 +50,10 @@ struct gpu_top {
 			} u;
 			uint32_t payload;
 		} u;
+
+		double queued;
+		double runnable;
+		double running;
 	} ring[MAX_RINGS];
 
 	struct gpu_top_stat {
@@ -54,7 +61,11 @@ struct gpu_top {
 		uint64_t busy[MAX_RINGS];
 		uint64_t wait[MAX_RINGS];
 		uint64_t sema[MAX_RINGS];
+		uint64_t queued[MAX_RINGS];
+		uint64_t runnable[MAX_RINGS];
+		uint64_t running[MAX_RINGS];
 	} stat[2];
+
 	int count;
 };
 
diff --git a/overlay/overlay.c b/overlay/overlay.c
index eae5ddfa8823..7087d5e5b24e 100644
--- a/overlay/overlay.c
+++ b/overlay/overlay.c
@@ -256,6 +256,13 @@ static void show_gpu_top(struct overlay_context *ctx, struct overlay_gpu_top *gt
 		len = sprintf(txt, "%s: %3d%% busy",
 			      gt->gpu_top.ring[n].name,
 			      gt->gpu_top.ring[n].u.u.busy);
+		if (gt->gpu_top.have_queued &&
+		    gt->gpu_top.have_runnable &&
+		    gt->gpu_top.have_running)
+			len += sprintf(txt + len, " (%.2f / %.2f / %.2f)",
+				       gt->gpu_top.ring[n].queued,
+				       gt->gpu_top.ring[n].runnable,
+				       gt->gpu_top.ring[n].running);
 		if (gt->gpu_top.ring[n].u.u.wait)
 			len += sprintf(txt + len, ", %d%% wait",
 				       gt->gpu_top.ring[n].u.u.wait);
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC i-g-t 3/6] intel-gpu-overlay: Show 1s, 30s and 15m GPU load
  2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Show total GPU loads in the window banner.

Engine load is defined as total of runnable and running requests on an
engine.

Total, non-normalized, load is display. In other words if N engines are
busy with exactly one request, the load will be shown as N.

v2:
 * Different flavour of load avg. (Chris Wilson)
 * Simplify code. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gpu-top.c | 39 ++++++++++++++++++++++++++++++++++++++-
 overlay/gpu-top.h | 11 ++++++++++-
 overlay/overlay.c | 28 ++++++++++++++++++++++------
 3 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 22e9badb22c1..501429b86379 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -28,6 +28,7 @@
 #include <string.h>
 #include <unistd.h>
 #include <fcntl.h>
+#include <math.h>
 #include <errno.h>
 #include <assert.h>
 
@@ -126,6 +127,10 @@ static int perf_init(struct gpu_top *gt)
 		gt->ring[gt->num_rings++].name = d->name;
 	}
 
+	gt->have_load_avg = gt->have_queued &&
+			    gt->have_runnable &&
+			    gt->have_running;
+
 	return 0;
 }
 
@@ -290,17 +295,32 @@ static void mmio_init(struct gpu_top *gt)
 	}
 }
 
-void gpu_top_init(struct gpu_top *gt)
+void gpu_top_init(struct gpu_top *gt, unsigned int period_us)
 {
+	const double period = (double)period_us / 1e6;
+	const double load_period[NUM_LOADS] = { 1.0, 30.0, 900.0 };
+	const char *load_names[NUM_LOADS] = { "1s", "30s", "15m" };
+	unsigned int i;
+
 	memset(gt, 0, sizeof(*gt));
 	gt->fd = -1;
 
+	for (i = 0; i < NUM_LOADS; i++) {
+		gt->load_name[i] = load_names[i];
+		gt->exp[i] = exp(-period / load_period[i]);
+	}
+
 	if (perf_init(gt) == 0)
 		return;
 
 	mmio_init(gt);
 }
 
+static double update_load(double load, double exp, double val)
+{
+	return val + exp * (load - val);
+}
+
 int gpu_top_update(struct gpu_top *gt)
 {
 	uint32_t data[1024];
@@ -313,6 +333,8 @@ int gpu_top_update(struct gpu_top *gt)
 		struct gpu_top_stat *s = &gt->stat[gt->count++&1];
 		struct gpu_top_stat *d = &gt->stat[gt->count&1];
 		uint64_t *sample, d_time;
+		double gpu_qd = 0.0;
+		unsigned int i;
 		int n, m;
 
 		len = read(gt->fd, data, sizeof(data));
@@ -341,6 +363,8 @@ int gpu_top_update(struct gpu_top *gt)
 
 		d_time = s->time - d->time;
 		for (n = 0; n < gt->num_rings; n++) {
+			double qd = 0.0;
+
 			gt->ring[n].u.u.busy = (100 * (s->busy[n] - d->busy[n]) + d_time/2) / d_time;
 			if (gt->have_wait)
 				gt->ring[n].u.u.wait = (100 * (s->wait[n] - d->wait[n]) + d_time/2) / d_time;
@@ -353,6 +377,14 @@ int gpu_top_update(struct gpu_top *gt)
 			if (gt->have_running)
 				gt->ring[n].running = (double)((s->running[n] - d->running[n])) / I915_SAMPLE_RUNNING_DIVISOR * 1e9 / d_time;
 
+			qd = gt->ring[n].runnable + gt->ring[n].running;
+			gpu_qd += qd;
+
+			for (i = 0; i < NUM_LOADS; i++)
+				gt->ring[n].load[i] =
+					update_load(gt->ring[n].load[i],
+						    gt->exp[i], qd);
+
 			/* in case of rounding + sampling errors, fudge */
 			if (gt->ring[n].u.u.busy > 100)
 				gt->ring[n].u.u.busy = 100;
@@ -362,6 +394,11 @@ int gpu_top_update(struct gpu_top *gt)
 				gt->ring[n].u.u.sema = 100;
 		}
 
+		for (i = 0; i < NUM_LOADS; i++) {
+			gt->load[i] = update_load(gt->load[i], gt->exp[i],
+						  gpu_qd);
+			gt->norm_load[i] = gt->load[i] / gt->num_rings;
+		}
 		update = 1;
 	} else {
 		while ((len = read(gt->fd, data, sizeof(data))) > 0) {
diff --git a/overlay/gpu-top.h b/overlay/gpu-top.h
index cb4310c82a94..115ce8c482c1 100644
--- a/overlay/gpu-top.h
+++ b/overlay/gpu-top.h
@@ -26,6 +26,7 @@
 #define GPU_TOP_H
 
 #define MAX_RINGS 16
+#define NUM_LOADS 3
 
 #include <stdint.h>
 
@@ -39,6 +40,12 @@ struct gpu_top {
 	int have_queued;
 	int have_runnable;
 	int have_running;
+	int have_load_avg;
+
+	double exp[NUM_LOADS];
+	double load[NUM_LOADS];
+	double norm_load[NUM_LOADS];
+	const char *load_name[NUM_LOADS];
 
 	struct gpu_top_ring {
 		const char *name;
@@ -54,6 +61,8 @@ struct gpu_top {
 		double queued;
 		double runnable;
 		double running;
+
+		double load[NUM_LOADS];
 	} ring[MAX_RINGS];
 
 	struct gpu_top_stat {
@@ -69,7 +78,7 @@ struct gpu_top {
 	int count;
 };
 
-void gpu_top_init(struct gpu_top *gt);
+void gpu_top_init(struct gpu_top *gt, unsigned int period_us);
 int gpu_top_update(struct gpu_top *gt);
 
 #endif /* GPU_TOP_H */
diff --git a/overlay/overlay.c b/overlay/overlay.c
index 7087d5e5b24e..1a1d01db082b 100644
--- a/overlay/overlay.c
+++ b/overlay/overlay.c
@@ -141,7 +141,8 @@ struct overlay_context {
 };
 
 static void init_gpu_top(struct overlay_context *ctx,
-			 struct overlay_gpu_top *gt)
+			 struct overlay_gpu_top *gt,
+			 unsigned int period_us)
 {
 	const double rgba[][4] = {
 		{ 1, 0.25, 0.25, 1 },
@@ -153,7 +154,7 @@ static void init_gpu_top(struct overlay_context *ctx,
 	int n;
 
 	cpu_top_init(&gt->cpu_top);
-	gpu_top_init(&gt->gpu_top);
+	gpu_top_init(&gt->gpu_top, period_us);
 
 	chart_init(&gt->cpu, "CPU", 120);
 	chart_set_position(&gt->cpu, PAD, PAD);
@@ -928,13 +929,13 @@ int main(int argc, char **argv)
 
 	debugfs_init();
 
-	init_gpu_top(&ctx, &ctx.gpu_top);
+	sample_period = get_sample_period(&config);
+
+	init_gpu_top(&ctx, &ctx.gpu_top, sample_period);
 	init_gpu_perf(&ctx, &ctx.gpu_perf);
 	init_gpu_freq(&ctx, &ctx.gpu_freq);
 	init_gem_objects(&ctx, &ctx.gem_objects);
 
-	sample_period = get_sample_period(&config);
-
 	i = 0;
 	while (1) {
 		ctx.time = time(NULL);
@@ -950,9 +951,24 @@ int main(int argc, char **argv)
 		show_gem_objects(&ctx, &ctx.gem_objects);
 
 		{
-			char buf[80];
+			struct gpu_top *gt = &ctx.gpu_top.gpu_top;
 			cairo_text_extents_t extents;
+			char buf[256];
+
 			gethostname(buf, sizeof(buf));
+
+			if (gt->have_load_avg) {
+				int len = strlen(buf);
+
+				snprintf(buf + len, sizeof(buf) - len,
+					 "%s; %u engines; load %s %.2f, %s %.2f, %s %.2f",
+					 buf,
+					 gt->num_rings,
+					 gt->load_name[0], gt->load[0],
+					 gt->load_name[1], gt->load[1],
+					 gt->load_name[2], gt->load[2]);
+			}
+
 			cairo_set_source_rgb(ctx.cr, .5, .5, .5);
 			cairo_set_font_size(ctx.cr, PAD-2);
 			cairo_text_extents(ctx.cr, buf, &extents);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] [RFC i-g-t 3/6] intel-gpu-overlay: Show 1s, 30s and 15m GPU load
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Show total GPU loads in the window banner.

Engine load is defined as total of runnable and running requests on an
engine.

Total, non-normalized, load is display. In other words if N engines are
busy with exactly one request, the load will be shown as N.

v2:
 * Different flavour of load avg. (Chris Wilson)
 * Simplify code. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gpu-top.c | 39 ++++++++++++++++++++++++++++++++++++++-
 overlay/gpu-top.h | 11 ++++++++++-
 overlay/overlay.c | 28 ++++++++++++++++++++++------
 3 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 22e9badb22c1..501429b86379 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -28,6 +28,7 @@
 #include <string.h>
 #include <unistd.h>
 #include <fcntl.h>
+#include <math.h>
 #include <errno.h>
 #include <assert.h>
 
@@ -126,6 +127,10 @@ static int perf_init(struct gpu_top *gt)
 		gt->ring[gt->num_rings++].name = d->name;
 	}
 
+	gt->have_load_avg = gt->have_queued &&
+			    gt->have_runnable &&
+			    gt->have_running;
+
 	return 0;
 }
 
@@ -290,17 +295,32 @@ static void mmio_init(struct gpu_top *gt)
 	}
 }
 
-void gpu_top_init(struct gpu_top *gt)
+void gpu_top_init(struct gpu_top *gt, unsigned int period_us)
 {
+	const double period = (double)period_us / 1e6;
+	const double load_period[NUM_LOADS] = { 1.0, 30.0, 900.0 };
+	const char *load_names[NUM_LOADS] = { "1s", "30s", "15m" };
+	unsigned int i;
+
 	memset(gt, 0, sizeof(*gt));
 	gt->fd = -1;
 
+	for (i = 0; i < NUM_LOADS; i++) {
+		gt->load_name[i] = load_names[i];
+		gt->exp[i] = exp(-period / load_period[i]);
+	}
+
 	if (perf_init(gt) == 0)
 		return;
 
 	mmio_init(gt);
 }
 
+static double update_load(double load, double exp, double val)
+{
+	return val + exp * (load - val);
+}
+
 int gpu_top_update(struct gpu_top *gt)
 {
 	uint32_t data[1024];
@@ -313,6 +333,8 @@ int gpu_top_update(struct gpu_top *gt)
 		struct gpu_top_stat *s = &gt->stat[gt->count++&1];
 		struct gpu_top_stat *d = &gt->stat[gt->count&1];
 		uint64_t *sample, d_time;
+		double gpu_qd = 0.0;
+		unsigned int i;
 		int n, m;
 
 		len = read(gt->fd, data, sizeof(data));
@@ -341,6 +363,8 @@ int gpu_top_update(struct gpu_top *gt)
 
 		d_time = s->time - d->time;
 		for (n = 0; n < gt->num_rings; n++) {
+			double qd = 0.0;
+
 			gt->ring[n].u.u.busy = (100 * (s->busy[n] - d->busy[n]) + d_time/2) / d_time;
 			if (gt->have_wait)
 				gt->ring[n].u.u.wait = (100 * (s->wait[n] - d->wait[n]) + d_time/2) / d_time;
@@ -353,6 +377,14 @@ int gpu_top_update(struct gpu_top *gt)
 			if (gt->have_running)
 				gt->ring[n].running = (double)((s->running[n] - d->running[n])) / I915_SAMPLE_RUNNING_DIVISOR * 1e9 / d_time;
 
+			qd = gt->ring[n].runnable + gt->ring[n].running;
+			gpu_qd += qd;
+
+			for (i = 0; i < NUM_LOADS; i++)
+				gt->ring[n].load[i] =
+					update_load(gt->ring[n].load[i],
+						    gt->exp[i], qd);
+
 			/* in case of rounding + sampling errors, fudge */
 			if (gt->ring[n].u.u.busy > 100)
 				gt->ring[n].u.u.busy = 100;
@@ -362,6 +394,11 @@ int gpu_top_update(struct gpu_top *gt)
 				gt->ring[n].u.u.sema = 100;
 		}
 
+		for (i = 0; i < NUM_LOADS; i++) {
+			gt->load[i] = update_load(gt->load[i], gt->exp[i],
+						  gpu_qd);
+			gt->norm_load[i] = gt->load[i] / gt->num_rings;
+		}
 		update = 1;
 	} else {
 		while ((len = read(gt->fd, data, sizeof(data))) > 0) {
diff --git a/overlay/gpu-top.h b/overlay/gpu-top.h
index cb4310c82a94..115ce8c482c1 100644
--- a/overlay/gpu-top.h
+++ b/overlay/gpu-top.h
@@ -26,6 +26,7 @@
 #define GPU_TOP_H
 
 #define MAX_RINGS 16
+#define NUM_LOADS 3
 
 #include <stdint.h>
 
@@ -39,6 +40,12 @@ struct gpu_top {
 	int have_queued;
 	int have_runnable;
 	int have_running;
+	int have_load_avg;
+
+	double exp[NUM_LOADS];
+	double load[NUM_LOADS];
+	double norm_load[NUM_LOADS];
+	const char *load_name[NUM_LOADS];
 
 	struct gpu_top_ring {
 		const char *name;
@@ -54,6 +61,8 @@ struct gpu_top {
 		double queued;
 		double runnable;
 		double running;
+
+		double load[NUM_LOADS];
 	} ring[MAX_RINGS];
 
 	struct gpu_top_stat {
@@ -69,7 +78,7 @@ struct gpu_top {
 	int count;
 };
 
-void gpu_top_init(struct gpu_top *gt);
+void gpu_top_init(struct gpu_top *gt, unsigned int period_us);
 int gpu_top_update(struct gpu_top *gt);
 
 #endif /* GPU_TOP_H */
diff --git a/overlay/overlay.c b/overlay/overlay.c
index 7087d5e5b24e..1a1d01db082b 100644
--- a/overlay/overlay.c
+++ b/overlay/overlay.c
@@ -141,7 +141,8 @@ struct overlay_context {
 };
 
 static void init_gpu_top(struct overlay_context *ctx,
-			 struct overlay_gpu_top *gt)
+			 struct overlay_gpu_top *gt,
+			 unsigned int period_us)
 {
 	const double rgba[][4] = {
 		{ 1, 0.25, 0.25, 1 },
@@ -153,7 +154,7 @@ static void init_gpu_top(struct overlay_context *ctx,
 	int n;
 
 	cpu_top_init(&gt->cpu_top);
-	gpu_top_init(&gt->gpu_top);
+	gpu_top_init(&gt->gpu_top, period_us);
 
 	chart_init(&gt->cpu, "CPU", 120);
 	chart_set_position(&gt->cpu, PAD, PAD);
@@ -928,13 +929,13 @@ int main(int argc, char **argv)
 
 	debugfs_init();
 
-	init_gpu_top(&ctx, &ctx.gpu_top);
+	sample_period = get_sample_period(&config);
+
+	init_gpu_top(&ctx, &ctx.gpu_top, sample_period);
 	init_gpu_perf(&ctx, &ctx.gpu_perf);
 	init_gpu_freq(&ctx, &ctx.gpu_freq);
 	init_gem_objects(&ctx, &ctx.gem_objects);
 
-	sample_period = get_sample_period(&config);
-
 	i = 0;
 	while (1) {
 		ctx.time = time(NULL);
@@ -950,9 +951,24 @@ int main(int argc, char **argv)
 		show_gem_objects(&ctx, &ctx.gem_objects);
 
 		{
-			char buf[80];
+			struct gpu_top *gt = &ctx.gpu_top.gpu_top;
 			cairo_text_extents_t extents;
+			char buf[256];
+
 			gethostname(buf, sizeof(buf));
+
+			if (gt->have_load_avg) {
+				int len = strlen(buf);
+
+				snprintf(buf + len, sizeof(buf) - len,
+					 "%s; %u engines; load %s %.2f, %s %.2f, %s %.2f",
+					 buf,
+					 gt->num_rings,
+					 gt->load_name[0], gt->load[0],
+					 gt->load_name[1], gt->load[1],
+					 gt->load_name[2], gt->load[2]);
+			}
+
 			cairo_set_source_rgb(ctx.cr, .5, .5, .5);
 			cairo_set_font_size(ctx.cr, PAD-2);
 			cairo_text_extents(ctx.cr, buf, &extents);
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC i-g-t 4/6] tests/perf_pmu: Add tests for engine queued/runnable/running stats
  2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Simple tests to check reported queue depths are correct.

v2:
 * Improvements similar to ones from i915_query.c.

v3:
 * Rebase for __igt_spin_batch_new.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 259 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 259 insertions(+)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 6ab2595b114d..6f064b605a4c 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -169,6 +169,7 @@ static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
 #define TEST_RUNTIME_PM (8)
 #define FLAG_LONG (16)
 #define FLAG_HANG (32)
+#define TEST_CONTEXTS (64)
 
 static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
 {
@@ -959,6 +960,224 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	assert_within_epsilon(val[1], perf_slept[1], tolerance);
 }
 
+static double calc_queued(uint64_t d_val, uint64_t d_ns)
+{
+	return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
+}
+
+static void
+queued(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	const unsigned int max_rq = 10;
+	double queued[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	uint32_t bo;
+	int fd;
+
+	igt_require_sw_sync();
+	if (flags & TEST_CONTEXTS)
+		gem_require_contexts(gem_fd);
+
+	memset(queued, 0, sizeof(queued));
+
+	bo = gem_create(gem_fd, 4096);
+	gem_write(gem_fd, bo, 4092, &bbe, sizeof(bbe));
+
+	fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		IGT_CORK_FENCE(cork);
+		int fence = -1;
+
+		gem_quiescent_gpu(gem_fd);
+
+		if (n)
+			fence = igt_cork_plug(&cork, -1);
+
+		for (i = 0; i < n; i++) {
+			struct drm_i915_gem_exec_object2 obj = { };
+			struct drm_i915_gem_execbuffer2 eb = { };
+
+			obj.handle = bo;
+
+			eb.buffer_count = 1;
+			eb.buffers_ptr = to_user_pointer(&obj);
+
+			eb.flags = engine | I915_EXEC_FENCE_IN;
+			if (flags & TEST_CONTEXTS)
+				eb.rsvd1 = gem_context_create(gem_fd);
+			eb.rsvd2 = fence;
+
+			gem_execbuf(gem_fd, &eb);
+
+			if (flags & TEST_CONTEXTS)
+				gem_context_destroy(gem_fd, eb.rsvd1);
+		}
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u queued=%.2f\n", n, queued[n]);
+
+		if (fence >= 0)
+			igt_cork_unplug(&cork);
+
+		for (i = 0; i < n; i++)
+			gem_sync(gem_fd, bo);
+	}
+
+	close(fd);
+
+	gem_close(gem_fd, bo);
+
+	for (i = 0; i <= max_rq; i++)
+		assert_within_epsilon(queued[i], i, tolerance);
+}
+
+static unsigned long __query_wait(igt_spin_t *spin, unsigned int n)
+{
+	struct timespec ts = { };
+	unsigned long t;
+
+	igt_nsec_elapsed(&ts);
+
+	if (spin->running) {
+		igt_spin_busywait_until_running(spin);
+	} else {
+		igt_debug("__spin_wait - usleep mode\n");
+		usleep(500e3); /* Better than nothing! */
+	}
+
+	t = igt_nsec_elapsed(&ts);
+
+	return spin->running ? t : 500e6 / n;
+}
+
+static void
+runnable(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	bool contexts = gem_has_contexts(gem_fd);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double runnable[max_rq + 1];
+	uint32_t ctx[max_rq];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(runnable, 0, sizeof(runnable));
+
+	if (contexts) {
+		for (i = 0; i < max_rq; i++)
+			ctx[i] = gem_context_create(gem_fd);
+	}
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			uint32_t ctx_ = contexts ? ctx[i] : 0;
+
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, ctx_, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd,
+							       .ctx = ctx_,
+							       .engine = engine);
+		}
+
+		if (n)
+			usleep(__query_wait(spin[0], n) * n);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	if (contexts) {
+		for (i = 0; i < max_rq; i++)
+			gem_context_destroy(gem_fd, ctx[i]);
+	}
+
+	close(fd);
+
+	assert_within_epsilon(runnable[0], 0, tolerance);
+	igt_assert(runnable[max_rq] > 0.0);
+
+	if (contexts)
+		assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1],
+				      1, tolerance);
+}
+
+static void
+running(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double running[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(running, 0, sizeof(running));
+	memset(spin, 0, sizeof(spin));
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, 0, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd,
+							       .engine = engine);
+		}
+
+		if (n)
+			usleep(__query_wait(spin[0], n) * n);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u running=%.2f\n", n, running[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	close(fd);
+
+	assert_within_epsilon(running[0], 0, tolerance);
+	for (i = 1; i <= max_rq; i++)
+		igt_assert(running[i] > 0);
+}
+
 /**
  * Tests that i915 PMU corectly errors out in invalid initialization.
  * i915 PMU is uncore PMU, thus:
@@ -1709,6 +1928,15 @@ igt_main
 		igt_subtest_f("init-sema-%s", e->name)
 			init(fd, e, I915_SAMPLE_SEMA);
 
+		igt_subtest_f("init-queued-%s", e->name)
+			init(fd, e, I915_SAMPLE_QUEUED);
+
+		igt_subtest_f("init-runnable-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNABLE);
+
+		igt_subtest_f("init-running-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNING);
+
 		igt_subtest_group {
 			igt_fixture {
 				gem_require_engine(fd, e->class, e->instance);
@@ -1814,6 +2042,27 @@ igt_main
 
 			igt_subtest_f("busy-hang-%s", e->name)
 				single(fd, e, TEST_BUSY | FLAG_HANG);
+
+			/**
+			 * Test that queued metric works.
+			 */
+			igt_subtest_f("queued-%s", e->name)
+				queued(fd, e, 0);
+
+			igt_subtest_f("queued-contexts-%s", e->name)
+				queued(fd, e, TEST_CONTEXTS);
+
+			/**
+			 * Test that runnable metric works.
+			 */
+			igt_subtest_f("runnable-%s", e->name)
+				runnable(fd, e);
+
+			/**
+			 * Test that running metric works.
+			 */
+			igt_subtest_f("running-%s", e->name)
+				running(fd, e);
 		}
 
 		/**
@@ -1906,6 +2155,16 @@ igt_main
 					      e->name)
 					single(render_fd, e,
 					       TEST_BUSY | TEST_TRAILING_IDLE);
+				igt_subtest_f("render-node-queued-%s", e->name)
+					queued(render_fd, e, 0);
+				igt_subtest_f("render-node-queued-contexts-%s",
+					      e->name)
+					queued(render_fd, e, TEST_CONTEXTS);
+				igt_subtest_f("render-node-runnable-%s",
+					      e->name)
+					runnable(render_fd, e);
+				igt_subtest_f("render-node-running-%s", e->name)
+					running(render_fd, e);
 			}
 		}
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] [RFC i-g-t 4/6] tests/perf_pmu: Add tests for engine queued/runnable/running stats
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Simple tests to check reported queue depths are correct.

v2:
 * Improvements similar to ones from i915_query.c.

v3:
 * Rebase for __igt_spin_batch_new.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 259 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 259 insertions(+)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 6ab2595b114d..6f064b605a4c 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -169,6 +169,7 @@ static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
 #define TEST_RUNTIME_PM (8)
 #define FLAG_LONG (16)
 #define FLAG_HANG (32)
+#define TEST_CONTEXTS (64)
 
 static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
 {
@@ -959,6 +960,224 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	assert_within_epsilon(val[1], perf_slept[1], tolerance);
 }
 
+static double calc_queued(uint64_t d_val, uint64_t d_ns)
+{
+	return (double)d_val * 1e9 / I915_SAMPLE_QUEUED_DIVISOR / d_ns;
+}
+
+static void
+queued(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	const unsigned int max_rq = 10;
+	double queued[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	uint32_t bo;
+	int fd;
+
+	igt_require_sw_sync();
+	if (flags & TEST_CONTEXTS)
+		gem_require_contexts(gem_fd);
+
+	memset(queued, 0, sizeof(queued));
+
+	bo = gem_create(gem_fd, 4096);
+	gem_write(gem_fd, bo, 4092, &bbe, sizeof(bbe));
+
+	fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		IGT_CORK_FENCE(cork);
+		int fence = -1;
+
+		gem_quiescent_gpu(gem_fd);
+
+		if (n)
+			fence = igt_cork_plug(&cork, -1);
+
+		for (i = 0; i < n; i++) {
+			struct drm_i915_gem_exec_object2 obj = { };
+			struct drm_i915_gem_execbuffer2 eb = { };
+
+			obj.handle = bo;
+
+			eb.buffer_count = 1;
+			eb.buffers_ptr = to_user_pointer(&obj);
+
+			eb.flags = engine | I915_EXEC_FENCE_IN;
+			if (flags & TEST_CONTEXTS)
+				eb.rsvd1 = gem_context_create(gem_fd);
+			eb.rsvd2 = fence;
+
+			gem_execbuf(gem_fd, &eb);
+
+			if (flags & TEST_CONTEXTS)
+				gem_context_destroy(gem_fd, eb.rsvd1);
+		}
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		queued[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u queued=%.2f\n", n, queued[n]);
+
+		if (fence >= 0)
+			igt_cork_unplug(&cork);
+
+		for (i = 0; i < n; i++)
+			gem_sync(gem_fd, bo);
+	}
+
+	close(fd);
+
+	gem_close(gem_fd, bo);
+
+	for (i = 0; i <= max_rq; i++)
+		assert_within_epsilon(queued[i], i, tolerance);
+}
+
+static unsigned long __query_wait(igt_spin_t *spin, unsigned int n)
+{
+	struct timespec ts = { };
+	unsigned long t;
+
+	igt_nsec_elapsed(&ts);
+
+	if (spin->running) {
+		igt_spin_busywait_until_running(spin);
+	} else {
+		igt_debug("__spin_wait - usleep mode\n");
+		usleep(500e3); /* Better than nothing! */
+	}
+
+	t = igt_nsec_elapsed(&ts);
+
+	return spin->running ? t : 500e6 / n;
+}
+
+static void
+runnable(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	bool contexts = gem_has_contexts(gem_fd);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double runnable[max_rq + 1];
+	uint32_t ctx[max_rq];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(runnable, 0, sizeof(runnable));
+
+	if (contexts) {
+		for (i = 0; i < max_rq; i++)
+			ctx[i] = gem_context_create(gem_fd);
+	}
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNABLE(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			uint32_t ctx_ = contexts ? ctx[i] : 0;
+
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, ctx_, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd,
+							       .ctx = ctx_,
+							       .engine = engine);
+		}
+
+		if (n)
+			usleep(__query_wait(spin[0], n) * n);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		runnable[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u runnable=%.2f\n", n, runnable[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	if (contexts) {
+		for (i = 0; i < max_rq; i++)
+			gem_context_destroy(gem_fd, ctx[i]);
+	}
+
+	close(fd);
+
+	assert_within_epsilon(runnable[0], 0, tolerance);
+	igt_assert(runnable[max_rq] > 0.0);
+
+	if (contexts)
+		assert_within_epsilon(runnable[max_rq] - runnable[max_rq - 1],
+				      1, tolerance);
+}
+
+static void
+running(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	const unsigned long engine = e2ring(gem_fd, e);
+	const unsigned int max_rq = 10;
+	igt_spin_t *spin[max_rq + 1];
+	double running[max_rq + 1];
+	unsigned int n, i;
+	uint64_t val[2];
+	uint64_t ts[2];
+	int fd;
+
+	memset(running, 0, sizeof(running));
+	memset(spin, 0, sizeof(spin));
+
+	fd = open_pmu(I915_PMU_ENGINE_RUNNING(e->class, e->instance));
+
+	for (n = 0; n <= max_rq; n++) {
+		gem_quiescent_gpu(gem_fd);
+
+		for (i = 0; i < n; i++) {
+			if (i == 0)
+				spin[i] = __spin_poll(gem_fd, 0, engine);
+			else
+				spin[i] = __igt_spin_batch_new(gem_fd,
+							       .engine = engine);
+		}
+
+		if (n)
+			usleep(__query_wait(spin[0], n) * n);
+
+		val[0] = __pmu_read_single(fd, &ts[0]);
+		usleep(batch_duration_ns / 1000);
+		val[1] = __pmu_read_single(fd, &ts[1]);
+
+		running[n] = calc_queued(val[1] - val[0], ts[1] - ts[0]);
+		igt_info("n=%u running=%.2f\n", n, running[n]);
+
+		for (i = 0; i < n; i++) {
+			end_spin(gem_fd, spin[i], FLAG_SYNC);
+			igt_spin_batch_free(gem_fd, spin[i]);
+		}
+	}
+
+	close(fd);
+
+	assert_within_epsilon(running[0], 0, tolerance);
+	for (i = 1; i <= max_rq; i++)
+		igt_assert(running[i] > 0);
+}
+
 /**
  * Tests that i915 PMU corectly errors out in invalid initialization.
  * i915 PMU is uncore PMU, thus:
@@ -1709,6 +1928,15 @@ igt_main
 		igt_subtest_f("init-sema-%s", e->name)
 			init(fd, e, I915_SAMPLE_SEMA);
 
+		igt_subtest_f("init-queued-%s", e->name)
+			init(fd, e, I915_SAMPLE_QUEUED);
+
+		igt_subtest_f("init-runnable-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNABLE);
+
+		igt_subtest_f("init-running-%s", e->name)
+			init(fd, e, I915_SAMPLE_RUNNING);
+
 		igt_subtest_group {
 			igt_fixture {
 				gem_require_engine(fd, e->class, e->instance);
@@ -1814,6 +2042,27 @@ igt_main
 
 			igt_subtest_f("busy-hang-%s", e->name)
 				single(fd, e, TEST_BUSY | FLAG_HANG);
+
+			/**
+			 * Test that queued metric works.
+			 */
+			igt_subtest_f("queued-%s", e->name)
+				queued(fd, e, 0);
+
+			igt_subtest_f("queued-contexts-%s", e->name)
+				queued(fd, e, TEST_CONTEXTS);
+
+			/**
+			 * Test that runnable metric works.
+			 */
+			igt_subtest_f("runnable-%s", e->name)
+				runnable(fd, e);
+
+			/**
+			 * Test that running metric works.
+			 */
+			igt_subtest_f("running-%s", e->name)
+				running(fd, e);
 		}
 
 		/**
@@ -1906,6 +2155,16 @@ igt_main
 					      e->name)
 					single(render_fd, e,
 					       TEST_BUSY | TEST_TRAILING_IDLE);
+				igt_subtest_f("render-node-queued-%s", e->name)
+					queued(render_fd, e, 0);
+				igt_subtest_f("render-node-queued-contexts-%s",
+					      e->name)
+					queued(render_fd, e, TEST_CONTEXTS);
+				igt_subtest_f("render-node-runnable-%s",
+					      e->name)
+					runnable(render_fd, e);
+				igt_subtest_f("render-node-running-%s", e->name)
+					running(render_fd, e);
 			}
 		}
 
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC i-g-t 5/6] intel-gpu-top: Add queue depths and load average
  2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

With the driver now exporting various request queue depths for each
engine, we can display this information here. Both as raw counters,
and by adding a load average like metrics composed from number of
runnable and running requests in a given time period. 1s, 30s and 5m
periods are used for load average.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tools/Makefile.am     |   2 +-
 tools/intel_gpu_top.c | 122 +++++++++++++++++++++++++++++++++++++-----
 tools/meson.build     |   2 +-
 3 files changed, 110 insertions(+), 16 deletions(-)

diff --git a/tools/Makefile.am b/tools/Makefile.am
index e7de4d90241c..e03842afce8d 100644
--- a/tools/Makefile.am
+++ b/tools/Makefile.am
@@ -29,7 +29,7 @@ intel_aubdump_la_LDFLAGS = -module -avoid-version -no-undefined
 intel_aubdump_la_SOURCES = aubdump.c
 intel_aubdump_la_LIBADD = $(top_builddir)/lib/libintel_tools.la -ldl
 
-intel_gpu_top_LDADD = $(top_builddir)/lib/libigt_perf.la
+intel_gpu_top_LDADD = $(top_builddir)/lib/libigt_perf.la -lm
 
 bin_SCRIPTS = intel_aubdump
 CLEANFILES = $(bin_SCRIPTS)
diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index b923c3cfbe97..8990ef17b771 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -56,6 +56,10 @@ struct pmu_counter {
 	struct pmu_pair val;
 };
 
+#define NUM_QUEUES (3)
+
+#define NUM_LOADS (3)
+
 struct engine {
 	const char *name;
 	const char *display_name;
@@ -65,9 +69,13 @@ struct engine {
 
 	unsigned int num_counters;
 
+	double qd[3];
+	double load_avg[NUM_LOADS];
+
 	struct pmu_counter busy;
 	struct pmu_counter wait;
 	struct pmu_counter sema;
+	struct pmu_counter queue[NUM_QUEUES];
 };
 
 struct engines {
@@ -95,6 +103,11 @@ struct engines {
 	struct pmu_counter imc_reads;
 	struct pmu_counter imc_writes;
 
+	double qd_scale;
+
+	double load_exp[NUM_LOADS];
+	double load_avg[NUM_LOADS];
+
 	struct engine engine;
 };
 
@@ -320,6 +333,13 @@ static double filename_to_double(const char *filename)
 	return v;
 }
 
+#define I915_EVENT "/sys/devices/i915/events/"
+
+static double i915_qd_scale(void)
+{
+	return filename_to_double(I915_EVENT "rcs0-queued.scale");
+}
+
 #define RAPL_ROOT "/sys/devices/power/"
 #define RAPL_EVENT "/sys/devices/power/events/"
 
@@ -453,6 +473,8 @@ static int pmu_init(struct engines *engines)
 	engines->rc6.config = I915_PMU_RC6_RESIDENCY;
 	_open_pmu(engines->num_counters, &engines->rc6, engines->fd);
 
+	engines->qd_scale = i915_qd_scale();
+
 	for (i = 0; i < engines->num_engines; i++) {
 		struct engine *engine = engine_ptr(engines, i);
 		struct {
@@ -462,6 +484,9 @@ static int pmu_init(struct engines *engines)
 			{ .pmu = &engine->busy, .counter = "busy" },
 			{ .pmu = &engine->wait, .counter = "wait" },
 			{ .pmu = &engine->sema, .counter = "sema" },
+			{ .pmu = &engine->queue[0], .counter = "queued" },
+			{ .pmu = &engine->queue[1], .counter = "runnable" },
+			{ .pmu = &engine->queue[2], .counter = "running" },
 			{ .pmu = NULL, .counter = NULL },
 		};
 
@@ -576,12 +601,11 @@ static void fill_str(char *buf, unsigned int bufsz, char c, unsigned int num)
 	*buf = 0;
 }
 
-static void pmu_calc(struct pmu_counter *cnt,
-		     char *buf, unsigned int bufsz,
-		     unsigned int width, unsigned width_dec,
-		     double d, double t, double s)
+static void _pmu_calc(struct pmu_counter *cnt,
+		      char *buf, unsigned int bufsz,
+		      unsigned int width, unsigned width_dec,
+		      double val)
 {
-	double val;
 	int len;
 
 	assert(bufsz >= (width + width_dec + 1));
@@ -591,8 +615,6 @@ static void pmu_calc(struct pmu_counter *cnt,
 		return;
 	}
 
-	val = __pmu_calc(&cnt->val, d, t, s);
-
 	len = snprintf(buf, bufsz, "%*.*f", width + width_dec, width_dec, val);
 	if (len < 0 || len == bufsz) {
 		fill_str(buf, bufsz, 'X', width + width_dec);
@@ -600,6 +622,16 @@ static void pmu_calc(struct pmu_counter *cnt,
 	}
 }
 
+static void pmu_calc(struct pmu_counter *cnt,
+		     char *buf, unsigned int bufsz,
+		     unsigned int width, unsigned width_dec,
+		     double d, double t, double s)
+{
+	double val = __pmu_calc(&cnt->val, d, t, s);
+
+	_pmu_calc(cnt, buf, bufsz, width, width_dec, val);
+}
+
 static uint64_t __pmu_read_single(int fd, uint64_t *ts)
 {
 	uint64_t data[2] = { };
@@ -658,10 +690,14 @@ static void pmu_sample(struct engines *engines)
 
 	for (i = 0; i < engines->num_engines; i++) {
 		struct engine *engine = engine_ptr(engines, i);
+		unsigned int j;
 
 		update_sample(&engine->busy, val);
 		update_sample(&engine->sema, val);
 		update_sample(&engine->wait, val);
+
+		for (j = 0; j < NUM_QUEUES; j++)
+			update_sample(&engine->queue[j], val);
 	}
 }
 
@@ -702,12 +738,19 @@ usage(const char *appname)
 		appname, DEFAULT_PERIOD_MS);
 }
 
+static double update_load(double load, double exp, double val)
+{
+	return val + exp * (load - val);
+}
+
 int main(int argc, char **argv)
 {
 	unsigned int period_us = DEFAULT_PERIOD_MS * 1000;
+	const double load_period[NUM_LOADS] = { 1.0, 30.0, 900.0 };
 	int con_w = -1, con_h = -1;
 	struct engines *engines;
 	unsigned int i;
+	double period;
 	int ret, ch;
 
 	/* Parse options */
@@ -741,10 +784,15 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	/* Load average setup. */
+	period = (double)period_us / 1e6;
+	for (i = 0; i < NUM_LOADS; i++)
+		engines->load_exp[i] = exp(-period / load_period[i]);
+
 	pmu_sample(engines);
 
 	for (;;) {
-		double t;
+		double t, qd = 0;
 #define BUFSZ 16
 		char freq[BUFSZ];
 		char fact[BUFSZ];
@@ -754,6 +802,7 @@ int main(int argc, char **argv)
 		char reads[BUFSZ];
 		char writes[BUFSZ];
 		struct winsize ws;
+		unsigned int j;
 		int lines = 0;
 
 		/* Update terminal size. */
@@ -778,8 +827,44 @@ int main(int argc, char **argv)
 		pmu_calc(&engines->imc_writes, writes, BUFSZ, 6, 0, 1.0, t,
 			 engines->imc_writes_scale);
 
+               for (i = 0; i < engines->num_engines; i++) {
+                       struct engine *engine = engine_ptr(engines, i);
+
+			if (!engine->num_counters)
+				continue;
+
+			for (j = 0; j < NUM_QUEUES; j++) {
+				if (!engine->queue[j].present)
+					continue;
+
+				engine->qd[j] =
+					__pmu_calc(&engine->queue[j].val, 1, t,
+						   engines->qd_scale);
+			}
+
+			qd += engine->qd[1] + engine->qd[2];
+
+			for (j = 0; j < NUM_LOADS; j++) {
+				engine->load_avg[j] =
+					update_load(engine->load_avg[j],
+						    engines->load_exp[j],
+						    engine->qd[1] +
+						    engine->qd[2]);
+			}
+               }
+
+		for (j = 0; j < NUM_LOADS; j++) {
+			engines->load_avg[j] =
+				update_load(engines->load_avg[j],
+					    engines->load_exp[j],
+					    qd);
+		}
+
 		if (lines++ < con_h)
-			printf("intel-gpu-top - %s/%s MHz;  %s%% RC6; %s %s; %s irqs/s\n",
+			printf("intel-gpu-top - load avg %5.2f, %5.2f, %5.2f; %s/%s MHz;  %s%% RC6; %s %s; %s irqs/s\n",
+			       engines->load_avg[0],
+			       engines->load_avg[1],
+			       engines->load_avg[2],
 			       fact, freq, rc6, power, engines->rapl_unit, irq);
 
 		if (lines++ < con_h)
@@ -803,7 +888,7 @@ int main(int argc, char **argv)
 
 			if (engine->num_counters && lines < con_h) {
 				const char *a = "          ENGINE      BUSY ";
-				const char *b = " MI_SEMA MI_WAIT";
+				const char *b = "Q   r   R MI_SEMA MI_WAIT";
 
 				printf("\033[7m%s%*s%s\033[0m\n",
 				       a,
@@ -817,6 +902,7 @@ int main(int argc, char **argv)
 		for (i = 0; i < engines->num_engines && lines < con_h; i++) {
 			struct engine *engine = engine_ptr(engines, i);
 			unsigned int max_w = con_w - 1;
+			char qdbuf[NUM_LOADS][BUFSZ];
 			unsigned int len;
 			char sema[BUFSZ];
 			char wait[BUFSZ];
@@ -827,14 +913,22 @@ int main(int argc, char **argv)
 			if (!engine->num_counters)
 				continue;
 
+			for (j = 0; j < NUM_QUEUES; j++)
+				_pmu_calc(&engine->queue[j], qdbuf[j], BUFSZ,
+					  3, 0, engine->qd[j]);
+
 			pmu_calc(&engine->sema, sema, BUFSZ, 3, 0, 1e9, t, 100);
 			pmu_calc(&engine->wait, wait, BUFSZ, 3, 0, 1e9, t, 100);
-			len = snprintf(buf, sizeof(buf), "    %s%%    %s%%",
+
+			len = snprintf(buf, sizeof(buf),
+				       " %s %s %s    %s%%    %s%%",
+				       qdbuf[0], qdbuf[1], qdbuf[2],
 				       sema, wait);
 
-			pmu_calc(&engine->busy, busy, BUFSZ, 6, 2, 1e9, t,
-				 100);
-			len += printf("%16s %s%% ", engine->display_name, busy);
+			pmu_calc(&engine->busy, busy, BUFSZ, 6, 2, 1e9, t, 100);
+
+			len += printf("%16s %s%% ",
+				      engine->display_name, busy);
 
 			val = __pmu_calc(&engine->busy.val, 1e9, t, 100);
 			print_percentage_bar(val, max_w - len);
diff --git a/tools/meson.build b/tools/meson.build
index e4517d667299..c1dd71fada79 100644
--- a/tools/meson.build
+++ b/tools/meson.build
@@ -97,7 +97,7 @@ shared_library('intel_aubdump', 'aubdump.c',
 executable('intel_gpu_top', 'intel_gpu_top.c',
 	   install : true,
 	   install_rpath : bindir_rpathdir,
-	   dependencies : tool_deps + [ lib_igt_perf ])
+	   dependencies : tool_deps + [ lib_igt_perf, math ])
 
 conf_data = configuration_data()
 conf_data.set('prefix', prefix)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] [RFC i-g-t 5/6] intel-gpu-top: Add queue depths and load average
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

With the driver now exporting various request queue depths for each
engine, we can display this information here. Both as raw counters,
and by adding a load average like metrics composed from number of
runnable and running requests in a given time period. 1s, 30s and 5m
periods are used for load average.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tools/Makefile.am     |   2 +-
 tools/intel_gpu_top.c | 122 +++++++++++++++++++++++++++++++++++++-----
 tools/meson.build     |   2 +-
 3 files changed, 110 insertions(+), 16 deletions(-)

diff --git a/tools/Makefile.am b/tools/Makefile.am
index e7de4d90241c..e03842afce8d 100644
--- a/tools/Makefile.am
+++ b/tools/Makefile.am
@@ -29,7 +29,7 @@ intel_aubdump_la_LDFLAGS = -module -avoid-version -no-undefined
 intel_aubdump_la_SOURCES = aubdump.c
 intel_aubdump_la_LIBADD = $(top_builddir)/lib/libintel_tools.la -ldl
 
-intel_gpu_top_LDADD = $(top_builddir)/lib/libigt_perf.la
+intel_gpu_top_LDADD = $(top_builddir)/lib/libigt_perf.la -lm
 
 bin_SCRIPTS = intel_aubdump
 CLEANFILES = $(bin_SCRIPTS)
diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index b923c3cfbe97..8990ef17b771 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -56,6 +56,10 @@ struct pmu_counter {
 	struct pmu_pair val;
 };
 
+#define NUM_QUEUES (3)
+
+#define NUM_LOADS (3)
+
 struct engine {
 	const char *name;
 	const char *display_name;
@@ -65,9 +69,13 @@ struct engine {
 
 	unsigned int num_counters;
 
+	double qd[3];
+	double load_avg[NUM_LOADS];
+
 	struct pmu_counter busy;
 	struct pmu_counter wait;
 	struct pmu_counter sema;
+	struct pmu_counter queue[NUM_QUEUES];
 };
 
 struct engines {
@@ -95,6 +103,11 @@ struct engines {
 	struct pmu_counter imc_reads;
 	struct pmu_counter imc_writes;
 
+	double qd_scale;
+
+	double load_exp[NUM_LOADS];
+	double load_avg[NUM_LOADS];
+
 	struct engine engine;
 };
 
@@ -320,6 +333,13 @@ static double filename_to_double(const char *filename)
 	return v;
 }
 
+#define I915_EVENT "/sys/devices/i915/events/"
+
+static double i915_qd_scale(void)
+{
+	return filename_to_double(I915_EVENT "rcs0-queued.scale");
+}
+
 #define RAPL_ROOT "/sys/devices/power/"
 #define RAPL_EVENT "/sys/devices/power/events/"
 
@@ -453,6 +473,8 @@ static int pmu_init(struct engines *engines)
 	engines->rc6.config = I915_PMU_RC6_RESIDENCY;
 	_open_pmu(engines->num_counters, &engines->rc6, engines->fd);
 
+	engines->qd_scale = i915_qd_scale();
+
 	for (i = 0; i < engines->num_engines; i++) {
 		struct engine *engine = engine_ptr(engines, i);
 		struct {
@@ -462,6 +484,9 @@ static int pmu_init(struct engines *engines)
 			{ .pmu = &engine->busy, .counter = "busy" },
 			{ .pmu = &engine->wait, .counter = "wait" },
 			{ .pmu = &engine->sema, .counter = "sema" },
+			{ .pmu = &engine->queue[0], .counter = "queued" },
+			{ .pmu = &engine->queue[1], .counter = "runnable" },
+			{ .pmu = &engine->queue[2], .counter = "running" },
 			{ .pmu = NULL, .counter = NULL },
 		};
 
@@ -576,12 +601,11 @@ static void fill_str(char *buf, unsigned int bufsz, char c, unsigned int num)
 	*buf = 0;
 }
 
-static void pmu_calc(struct pmu_counter *cnt,
-		     char *buf, unsigned int bufsz,
-		     unsigned int width, unsigned width_dec,
-		     double d, double t, double s)
+static void _pmu_calc(struct pmu_counter *cnt,
+		      char *buf, unsigned int bufsz,
+		      unsigned int width, unsigned width_dec,
+		      double val)
 {
-	double val;
 	int len;
 
 	assert(bufsz >= (width + width_dec + 1));
@@ -591,8 +615,6 @@ static void pmu_calc(struct pmu_counter *cnt,
 		return;
 	}
 
-	val = __pmu_calc(&cnt->val, d, t, s);
-
 	len = snprintf(buf, bufsz, "%*.*f", width + width_dec, width_dec, val);
 	if (len < 0 || len == bufsz) {
 		fill_str(buf, bufsz, 'X', width + width_dec);
@@ -600,6 +622,16 @@ static void pmu_calc(struct pmu_counter *cnt,
 	}
 }
 
+static void pmu_calc(struct pmu_counter *cnt,
+		     char *buf, unsigned int bufsz,
+		     unsigned int width, unsigned width_dec,
+		     double d, double t, double s)
+{
+	double val = __pmu_calc(&cnt->val, d, t, s);
+
+	_pmu_calc(cnt, buf, bufsz, width, width_dec, val);
+}
+
 static uint64_t __pmu_read_single(int fd, uint64_t *ts)
 {
 	uint64_t data[2] = { };
@@ -658,10 +690,14 @@ static void pmu_sample(struct engines *engines)
 
 	for (i = 0; i < engines->num_engines; i++) {
 		struct engine *engine = engine_ptr(engines, i);
+		unsigned int j;
 
 		update_sample(&engine->busy, val);
 		update_sample(&engine->sema, val);
 		update_sample(&engine->wait, val);
+
+		for (j = 0; j < NUM_QUEUES; j++)
+			update_sample(&engine->queue[j], val);
 	}
 }
 
@@ -702,12 +738,19 @@ usage(const char *appname)
 		appname, DEFAULT_PERIOD_MS);
 }
 
+static double update_load(double load, double exp, double val)
+{
+	return val + exp * (load - val);
+}
+
 int main(int argc, char **argv)
 {
 	unsigned int period_us = DEFAULT_PERIOD_MS * 1000;
+	const double load_period[NUM_LOADS] = { 1.0, 30.0, 900.0 };
 	int con_w = -1, con_h = -1;
 	struct engines *engines;
 	unsigned int i;
+	double period;
 	int ret, ch;
 
 	/* Parse options */
@@ -741,10 +784,15 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	/* Load average setup. */
+	period = (double)period_us / 1e6;
+	for (i = 0; i < NUM_LOADS; i++)
+		engines->load_exp[i] = exp(-period / load_period[i]);
+
 	pmu_sample(engines);
 
 	for (;;) {
-		double t;
+		double t, qd = 0;
 #define BUFSZ 16
 		char freq[BUFSZ];
 		char fact[BUFSZ];
@@ -754,6 +802,7 @@ int main(int argc, char **argv)
 		char reads[BUFSZ];
 		char writes[BUFSZ];
 		struct winsize ws;
+		unsigned int j;
 		int lines = 0;
 
 		/* Update terminal size. */
@@ -778,8 +827,44 @@ int main(int argc, char **argv)
 		pmu_calc(&engines->imc_writes, writes, BUFSZ, 6, 0, 1.0, t,
 			 engines->imc_writes_scale);
 
+               for (i = 0; i < engines->num_engines; i++) {
+                       struct engine *engine = engine_ptr(engines, i);
+
+			if (!engine->num_counters)
+				continue;
+
+			for (j = 0; j < NUM_QUEUES; j++) {
+				if (!engine->queue[j].present)
+					continue;
+
+				engine->qd[j] =
+					__pmu_calc(&engine->queue[j].val, 1, t,
+						   engines->qd_scale);
+			}
+
+			qd += engine->qd[1] + engine->qd[2];
+
+			for (j = 0; j < NUM_LOADS; j++) {
+				engine->load_avg[j] =
+					update_load(engine->load_avg[j],
+						    engines->load_exp[j],
+						    engine->qd[1] +
+						    engine->qd[2]);
+			}
+               }
+
+		for (j = 0; j < NUM_LOADS; j++) {
+			engines->load_avg[j] =
+				update_load(engines->load_avg[j],
+					    engines->load_exp[j],
+					    qd);
+		}
+
 		if (lines++ < con_h)
-			printf("intel-gpu-top - %s/%s MHz;  %s%% RC6; %s %s; %s irqs/s\n",
+			printf("intel-gpu-top - load avg %5.2f, %5.2f, %5.2f; %s/%s MHz;  %s%% RC6; %s %s; %s irqs/s\n",
+			       engines->load_avg[0],
+			       engines->load_avg[1],
+			       engines->load_avg[2],
 			       fact, freq, rc6, power, engines->rapl_unit, irq);
 
 		if (lines++ < con_h)
@@ -803,7 +888,7 @@ int main(int argc, char **argv)
 
 			if (engine->num_counters && lines < con_h) {
 				const char *a = "          ENGINE      BUSY ";
-				const char *b = " MI_SEMA MI_WAIT";
+				const char *b = "Q   r   R MI_SEMA MI_WAIT";
 
 				printf("\033[7m%s%*s%s\033[0m\n",
 				       a,
@@ -817,6 +902,7 @@ int main(int argc, char **argv)
 		for (i = 0; i < engines->num_engines && lines < con_h; i++) {
 			struct engine *engine = engine_ptr(engines, i);
 			unsigned int max_w = con_w - 1;
+			char qdbuf[NUM_LOADS][BUFSZ];
 			unsigned int len;
 			char sema[BUFSZ];
 			char wait[BUFSZ];
@@ -827,14 +913,22 @@ int main(int argc, char **argv)
 			if (!engine->num_counters)
 				continue;
 
+			for (j = 0; j < NUM_QUEUES; j++)
+				_pmu_calc(&engine->queue[j], qdbuf[j], BUFSZ,
+					  3, 0, engine->qd[j]);
+
 			pmu_calc(&engine->sema, sema, BUFSZ, 3, 0, 1e9, t, 100);
 			pmu_calc(&engine->wait, wait, BUFSZ, 3, 0, 1e9, t, 100);
-			len = snprintf(buf, sizeof(buf), "    %s%%    %s%%",
+
+			len = snprintf(buf, sizeof(buf),
+				       " %s %s %s    %s%%    %s%%",
+				       qdbuf[0], qdbuf[1], qdbuf[2],
 				       sema, wait);
 
-			pmu_calc(&engine->busy, busy, BUFSZ, 6, 2, 1e9, t,
-				 100);
-			len += printf("%16s %s%% ", engine->display_name, busy);
+			pmu_calc(&engine->busy, busy, BUFSZ, 6, 2, 1e9, t, 100);
+
+			len += printf("%16s %s%% ",
+				      engine->display_name, busy);
 
 			val = __pmu_calc(&engine->busy.val, 1e9, t, 100);
 			print_percentage_bar(val, max_w - len);
diff --git a/tools/meson.build b/tools/meson.build
index e4517d667299..c1dd71fada79 100644
--- a/tools/meson.build
+++ b/tools/meson.build
@@ -97,7 +97,7 @@ shared_library('intel_aubdump', 'aubdump.c',
 executable('intel_gpu_top', 'intel_gpu_top.c',
 	   install : true,
 	   install_rpath : bindir_rpathdir,
-	   dependencies : tool_deps + [ lib_igt_perf ])
+	   dependencies : tool_deps + [ lib_igt_perf, math ])
 
 conf_data = configuration_data()
 conf_data.set('prefix', prefix)
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC i-g-t 6/6] intel-gpu-top: Support for client stats
  2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tools/intel_gpu_top.c | 360 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 351 insertions(+), 9 deletions(-)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index 8990ef17b771..02c5a7d77614 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -20,9 +20,6 @@
  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN THE SOFTWARE.
  *
- * Authors:
- *    Eric Anholt <eric@anholt.net>
- *    Eugeni Dodonov <eugeni.dodonov@intel.com>
  */
 
 #include <stdio.h>
@@ -41,6 +38,8 @@
 #include <errno.h>
 #include <math.h>
 #include <locale.h>
+#include <limits.h>
+#include <signal.h>
 
 #include "igt_perf.h"
 
@@ -701,8 +700,300 @@ static void pmu_sample(struct engines *engines)
 	}
 }
 
+enum client_status {
+	FREE = 0, /* mbz */
+	ALIVE,
+	PROBE
+};
+
+struct client {
+	enum client_status status;
+	unsigned int id;
+	unsigned int pid;
+	char name[128];
+	unsigned int samples;
+	unsigned long total;
+	struct engines *engines;
+	unsigned long *val;
+	uint64_t *last;
+};
+
+#define SYSFS_ENABLE "/sys/class/drm/card0/clients/enable_stats"
+
+bool __stats_enabled;
+
+static int __set_stats(bool val)
+{
+	int fd, ret;
+
+	fd = open(SYSFS_ENABLE, O_WRONLY);
+	if (fd < 0)
+		return -errno;
+
+	ret = write(fd, val ? "1" : "0", 2);
+	if (ret < 0)
+		return -errno;
+	else if (ret < 2)
+		return 1;
+
+	close(fd);
+
+	return 0;
+}
+
+static void __restore_stats(void)
+{
+	int ret;
+
+	if (__stats_enabled)
+		return;
+
+	ret = __set_stats(false);
+	if (ret)
+		fprintf(stderr, "Failed to disable per-client stats! (%d)\n",
+			ret);
+}
+
+static void __restore_stats_signal(int sig)
+{
+	exit(0);
+}
+
+static void enable_stats(void)
+{
+	int fd, ret;
+
+	fd = open(SYSFS_ENABLE, O_RDONLY);
+	if (fd < 0)
+		return;
+
+	close(fd);
+
+	__stats_enabled = filename_to_u64(SYSFS_ENABLE, 10);
+	if (__stats_enabled)
+		return;
+
+	ret = __set_stats(true);
+	if (!ret) {
+		if (atexit(__restore_stats))
+			fprintf(stderr, "Failed to register exit handler!");
+
+		if (signal(SIGINT, __restore_stats_signal))
+			fprintf(stderr, "Failed to register signal handler!");
+	} else {
+		fprintf(stderr, "Failed to enable per-client stats! (%d)\n",
+			ret);
+	}
+}
+
+#define SYSFS_CLIENTS "/sys/class/drm/card0/clients/"
+
+static struct client *clients;
+static unsigned int num_clients;
+
+#define for_each_client(c, tmp) \
+	for (tmp = num_clients, c = clients; tmp > 0; tmp--, c++)
+
+static uint64_t read_client_busy(unsigned int id, const char *engine)
+{
+	char buf[256];
+	ssize_t ret;
+
+	ret = snprintf(buf, sizeof(buf), SYSFS_CLIENTS "/%u/busy/%s",
+		       id, engine);
+	assert(ret > 0 && ret < sizeof(buf));
+	if (ret <= 0 || ret == sizeof(buf))
+		return 0;
+
+	return filename_to_u64(buf, 10);
+}
+
+static struct client *find_client(enum client_status status, unsigned int id)
+{
+	struct client *c;
+	unsigned int tmp;
+
+	for_each_client(c, tmp) {
+		if ((status == FREE && c->status == FREE) ||
+		    (status == c->status && c->id == id))
+			return c;
+	}
+
+	return NULL;
+}
+
+static void update_client(struct client *c, unsigned int pid, char *name)
+{
+	uint64_t val[c->engines->num_engines];
+	unsigned int i;
+
+	if (c->pid != pid)
+		c->pid = pid;
+
+	if (strncmp(c->name, name, sizeof(c->name)))
+		strncpy(c->name, name, sizeof(c->name));
+
+	for (i = 0; i < c->engines->num_engines; i++) {
+		struct engine *engine = engine_ptr(c->engines, i);
+
+		val[i] = read_client_busy(c->id, engine->name);
+	}
+
+	c->total = 0;
+
+	for (i = 0; i < c->engines->num_engines; i++) {
+		assert(val[i] >= c->last[i]);
+		c->val[i] = val[i] - c->last[i];
+		c->total += c->val[i];
+		c->last[i] = val[i];
+	}
+
+	c->samples++;
+	c->status = ALIVE;
+}
+
+static void
+add_client(unsigned int id, unsigned int pid, char *name,
+	   struct engines *engines)
+{
+	struct client *c;
+
+	assert(!find_client(ALIVE, id));
+
+	c = find_client(FREE, 0);
+	if (!c) {
+		unsigned int idx = num_clients;
+
+		num_clients += (num_clients + 2) / 2;
+		clients = realloc(clients, num_clients * sizeof(*c));
+		assert(clients);
+		c = &clients[idx];
+		memset(c, 0, (num_clients - idx) * sizeof(*c));
+	}
+
+	c->id = id;
+	c->engines = engines;
+	c->val = calloc(engines->num_engines, sizeof(c->val));
+	c->last = calloc(engines->num_engines, sizeof(c->last));
+	assert(c->val && c->last);
+
+	update_client(c, pid, name);
+}
+
+static void free_client(struct client *c)
+{
+	free(c->val);
+	free(c->last);
+	memset(c, 0, sizeof(*c));
+}
+
+static char *read_client_sysfs(unsigned int id, const char *field)
+{
+	char buf[256];
+	ssize_t ret;
+
+	ret = snprintf(buf, sizeof(buf), SYSFS_CLIENTS "/%u/%s", id, field);
+	assert(ret > 0 && ret < sizeof(buf));
+	if (ret <= 0 || ret == sizeof(buf))
+		return NULL;
+
+	ret = filename_to_buf(buf, buf, sizeof(buf));
+	assert(ret == 0);
+	if (ret)
+		return NULL;
+
+	return strdup(buf);
+}
+
+static void scan(struct engines *engines)
+{
+	struct dirent *dent;
+	struct client *c;
+	char *pid, *name;
+	unsigned int tmp;
+	unsigned int id;
+	DIR *d;
+
+	for_each_client(c, tmp) {
+		if (c->status == ALIVE)
+			c->status = PROBE;
+	}
+
+	d = opendir(SYSFS_CLIENTS);
+	if (!d)
+		return;
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!isdigit(dent->d_name[0]))
+			continue;
+
+		id = atoi(dent->d_name);
+
+		name = read_client_sysfs(id, "name");
+		assert(name);
+		if (!name)
+			continue;
+
+		pid = read_client_sysfs(id, "pid");
+		assert(pid);
+		if (!pid) {
+			free(name);
+			continue;
+		}
+
+		c = find_client(PROBE, id);
+		if (c) {
+			update_client(c, atoi(pid), name);
+			continue;
+		}
+
+		add_client(id, atoi(pid), name, engines);
+
+		free(name);
+		free(pid);
+	}
+
+	closedir(d);
+
+	for_each_client(c, tmp) {
+		if (c->status == PROBE)
+			free_client(c);
+	}
+}
+
+static int cmp(const void *_a, const void *_b)
+{
+	const struct client *a = _a;
+	const struct client *b = _b;
+	long tot_a = a->total;
+	long tot_b = b->total;
+
+	tot_a *= a->status == ALIVE && a->samples > 1;
+	tot_b *= b->status == ALIVE && b->samples > 1;
+
+	tot_b -= tot_a;
+
+	if (!tot_b)
+		return (int)b->id - a->id;
+
+	while (tot_b > INT_MAX || tot_b < INT_MIN)
+		tot_b /= 2;
+
+	return tot_b;
+}
+
 static const char *bars[] = { " ", "▏", "▎", "▍", "▌", "▋", "▊", "▉", "█" };
 
+static void n_spaces(const unsigned int n)
+{
+	unsigned int i;
+
+	for (i = 0; i < n; i++)
+		putchar(' ');
+}
+
 static void
 print_percentage_bar(double percent, int max_len)
 {
@@ -716,8 +1007,10 @@ print_percentage_bar(double percent, int max_len)
 	if (i)
 		printf("%s", bars[i]);
 
-	for (i = 0; i < (max_len - 2 - (bar_len + 7) / 8); i++)
-		putchar(' ');
+	bar_len = max_len - 2 - (bar_len + 7) / 8;
+	if (bar_len > max_len)
+		bar_len = max_len;
+	n_spaces(bar_len);
 
 	putchar('|');
 }
@@ -784,12 +1077,15 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	enable_stats();
+
 	/* Load average setup. */
 	period = (double)period_us / 1e6;
 	for (i = 0; i < NUM_LOADS; i++)
 		engines->load_exp[i] = exp(-period / load_period[i]);
 
 	pmu_sample(engines);
+	scan(engines);
 
 	for (;;) {
 		double t, qd = 0;
@@ -802,7 +1098,8 @@ int main(int argc, char **argv)
 		char reads[BUFSZ];
 		char writes[BUFSZ];
 		struct winsize ws;
-		unsigned int j;
+		unsigned int len, engine_w, j;
+		struct client *c;
 		int lines = 0;
 
 		/* Update terminal size. */
@@ -812,9 +1109,10 @@ int main(int argc, char **argv)
 		}
 
 		pmu_sample(engines);
-		t = (double)(engines->ts.cur - engines->ts.prev) / 1e9;
+		scan(engines);
+		qsort(clients, num_clients, sizeof(*c), cmp);
 
-		printf("\033[H\033[J");
+		t = (double)(engines->ts.cur - engines->ts.prev) / 1e9;
 
 		pmu_calc(&engines->freq_req, freq, BUFSZ, 4, 0, 1.0, t, 1);
 		pmu_calc(&engines->freq_act, fact, BUFSZ, 4, 0, 1.0, t, 1);
@@ -860,6 +1158,8 @@ int main(int argc, char **argv)
 					    qd);
 		}
 
+		printf("\033[H\033[J");
+
 		if (lines++ < con_h)
 			printf("intel-gpu-top - load avg %5.2f, %5.2f, %5.2f; %s/%s MHz;  %s%% RC6; %s %s; %s irqs/s\n",
 			       engines->load_avg[0],
@@ -903,7 +1203,6 @@ int main(int argc, char **argv)
 			struct engine *engine = engine_ptr(engines, i);
 			unsigned int max_w = con_w - 1;
 			char qdbuf[NUM_LOADS][BUFSZ];
-			unsigned int len;
 			char sema[BUFSZ];
 			char wait[BUFSZ];
 			char busy[BUFSZ];
@@ -941,6 +1240,49 @@ int main(int argc, char **argv)
 		if (lines++ < con_h)
 			printf("\n");
 
+		if (lines++ < con_h) {
+			printf("\033[7m");
+			len = printf("%5s%16s", "PID", "NAME");
+
+			engine_w = (con_w - len) / engines->num_engines;
+			for (i = 0; i < engines->num_engines; i++) {
+				struct engine *engine = engine_ptr(engines, i);
+				unsigned int name_len =
+					strlen(engine->display_name);
+				unsigned int pad = (engine_w - name_len) / 2;
+
+
+				n_spaces(pad);
+				printf("%s", engine->display_name);
+				n_spaces(engine_w - pad - name_len);
+				len += pad + name_len + (engine_w - pad -
+				       name_len);
+			}
+			n_spaces(con_w - len);
+			printf("\033[0m\n");
+		}
+
+		for_each_client(c, i) {
+			if (lines++ > con_h)
+				break;
+
+			assert(c->status != PROBE);
+			if (c->status != ALIVE || c->samples < 2)
+				break;
+
+			printf("%5u%16s ", c->pid, c->name);
+
+			for (j = 0; j < engines->num_engines; j++) {
+				double pct;
+
+				pct = (double)c->val[j] / period_us / 1e3 * 100;
+
+				print_percentage_bar(pct, engine_w);
+			}
+
+			putchar('\n');
+		}
+
 		usleep(period_us);
 	}
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] [RFC i-g-t 6/6] intel-gpu-top: Support for client stats
@ 2018-10-03 12:07   ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-10-03 12:07 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tools/intel_gpu_top.c | 360 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 351 insertions(+), 9 deletions(-)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index 8990ef17b771..02c5a7d77614 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -20,9 +20,6 @@
  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  * DEALINGS IN THE SOFTWARE.
  *
- * Authors:
- *    Eric Anholt <eric@anholt.net>
- *    Eugeni Dodonov <eugeni.dodonov@intel.com>
  */
 
 #include <stdio.h>
@@ -41,6 +38,8 @@
 #include <errno.h>
 #include <math.h>
 #include <locale.h>
+#include <limits.h>
+#include <signal.h>
 
 #include "igt_perf.h"
 
@@ -701,8 +700,300 @@ static void pmu_sample(struct engines *engines)
 	}
 }
 
+enum client_status {
+	FREE = 0, /* mbz */
+	ALIVE,
+	PROBE
+};
+
+struct client {
+	enum client_status status;
+	unsigned int id;
+	unsigned int pid;
+	char name[128];
+	unsigned int samples;
+	unsigned long total;
+	struct engines *engines;
+	unsigned long *val;
+	uint64_t *last;
+};
+
+#define SYSFS_ENABLE "/sys/class/drm/card0/clients/enable_stats"
+
+bool __stats_enabled;
+
+static int __set_stats(bool val)
+{
+	int fd, ret;
+
+	fd = open(SYSFS_ENABLE, O_WRONLY);
+	if (fd < 0)
+		return -errno;
+
+	ret = write(fd, val ? "1" : "0", 2);
+	if (ret < 0)
+		return -errno;
+	else if (ret < 2)
+		return 1;
+
+	close(fd);
+
+	return 0;
+}
+
+static void __restore_stats(void)
+{
+	int ret;
+
+	if (__stats_enabled)
+		return;
+
+	ret = __set_stats(false);
+	if (ret)
+		fprintf(stderr, "Failed to disable per-client stats! (%d)\n",
+			ret);
+}
+
+static void __restore_stats_signal(int sig)
+{
+	exit(0);
+}
+
+static void enable_stats(void)
+{
+	int fd, ret;
+
+	fd = open(SYSFS_ENABLE, O_RDONLY);
+	if (fd < 0)
+		return;
+
+	close(fd);
+
+	__stats_enabled = filename_to_u64(SYSFS_ENABLE, 10);
+	if (__stats_enabled)
+		return;
+
+	ret = __set_stats(true);
+	if (!ret) {
+		if (atexit(__restore_stats))
+			fprintf(stderr, "Failed to register exit handler!");
+
+		if (signal(SIGINT, __restore_stats_signal))
+			fprintf(stderr, "Failed to register signal handler!");
+	} else {
+		fprintf(stderr, "Failed to enable per-client stats! (%d)\n",
+			ret);
+	}
+}
+
+#define SYSFS_CLIENTS "/sys/class/drm/card0/clients/"
+
+static struct client *clients;
+static unsigned int num_clients;
+
+#define for_each_client(c, tmp) \
+	for (tmp = num_clients, c = clients; tmp > 0; tmp--, c++)
+
+static uint64_t read_client_busy(unsigned int id, const char *engine)
+{
+	char buf[256];
+	ssize_t ret;
+
+	ret = snprintf(buf, sizeof(buf), SYSFS_CLIENTS "/%u/busy/%s",
+		       id, engine);
+	assert(ret > 0 && ret < sizeof(buf));
+	if (ret <= 0 || ret == sizeof(buf))
+		return 0;
+
+	return filename_to_u64(buf, 10);
+}
+
+static struct client *find_client(enum client_status status, unsigned int id)
+{
+	struct client *c;
+	unsigned int tmp;
+
+	for_each_client(c, tmp) {
+		if ((status == FREE && c->status == FREE) ||
+		    (status == c->status && c->id == id))
+			return c;
+	}
+
+	return NULL;
+}
+
+static void update_client(struct client *c, unsigned int pid, char *name)
+{
+	uint64_t val[c->engines->num_engines];
+	unsigned int i;
+
+	if (c->pid != pid)
+		c->pid = pid;
+
+	if (strncmp(c->name, name, sizeof(c->name)))
+		strncpy(c->name, name, sizeof(c->name));
+
+	for (i = 0; i < c->engines->num_engines; i++) {
+		struct engine *engine = engine_ptr(c->engines, i);
+
+		val[i] = read_client_busy(c->id, engine->name);
+	}
+
+	c->total = 0;
+
+	for (i = 0; i < c->engines->num_engines; i++) {
+		assert(val[i] >= c->last[i]);
+		c->val[i] = val[i] - c->last[i];
+		c->total += c->val[i];
+		c->last[i] = val[i];
+	}
+
+	c->samples++;
+	c->status = ALIVE;
+}
+
+static void
+add_client(unsigned int id, unsigned int pid, char *name,
+	   struct engines *engines)
+{
+	struct client *c;
+
+	assert(!find_client(ALIVE, id));
+
+	c = find_client(FREE, 0);
+	if (!c) {
+		unsigned int idx = num_clients;
+
+		num_clients += (num_clients + 2) / 2;
+		clients = realloc(clients, num_clients * sizeof(*c));
+		assert(clients);
+		c = &clients[idx];
+		memset(c, 0, (num_clients - idx) * sizeof(*c));
+	}
+
+	c->id = id;
+	c->engines = engines;
+	c->val = calloc(engines->num_engines, sizeof(c->val));
+	c->last = calloc(engines->num_engines, sizeof(c->last));
+	assert(c->val && c->last);
+
+	update_client(c, pid, name);
+}
+
+static void free_client(struct client *c)
+{
+	free(c->val);
+	free(c->last);
+	memset(c, 0, sizeof(*c));
+}
+
+static char *read_client_sysfs(unsigned int id, const char *field)
+{
+	char buf[256];
+	ssize_t ret;
+
+	ret = snprintf(buf, sizeof(buf), SYSFS_CLIENTS "/%u/%s", id, field);
+	assert(ret > 0 && ret < sizeof(buf));
+	if (ret <= 0 || ret == sizeof(buf))
+		return NULL;
+
+	ret = filename_to_buf(buf, buf, sizeof(buf));
+	assert(ret == 0);
+	if (ret)
+		return NULL;
+
+	return strdup(buf);
+}
+
+static void scan(struct engines *engines)
+{
+	struct dirent *dent;
+	struct client *c;
+	char *pid, *name;
+	unsigned int tmp;
+	unsigned int id;
+	DIR *d;
+
+	for_each_client(c, tmp) {
+		if (c->status == ALIVE)
+			c->status = PROBE;
+	}
+
+	d = opendir(SYSFS_CLIENTS);
+	if (!d)
+		return;
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!isdigit(dent->d_name[0]))
+			continue;
+
+		id = atoi(dent->d_name);
+
+		name = read_client_sysfs(id, "name");
+		assert(name);
+		if (!name)
+			continue;
+
+		pid = read_client_sysfs(id, "pid");
+		assert(pid);
+		if (!pid) {
+			free(name);
+			continue;
+		}
+
+		c = find_client(PROBE, id);
+		if (c) {
+			update_client(c, atoi(pid), name);
+			continue;
+		}
+
+		add_client(id, atoi(pid), name, engines);
+
+		free(name);
+		free(pid);
+	}
+
+	closedir(d);
+
+	for_each_client(c, tmp) {
+		if (c->status == PROBE)
+			free_client(c);
+	}
+}
+
+static int cmp(const void *_a, const void *_b)
+{
+	const struct client *a = _a;
+	const struct client *b = _b;
+	long tot_a = a->total;
+	long tot_b = b->total;
+
+	tot_a *= a->status == ALIVE && a->samples > 1;
+	tot_b *= b->status == ALIVE && b->samples > 1;
+
+	tot_b -= tot_a;
+
+	if (!tot_b)
+		return (int)b->id - a->id;
+
+	while (tot_b > INT_MAX || tot_b < INT_MIN)
+		tot_b /= 2;
+
+	return tot_b;
+}
+
 static const char *bars[] = { " ", "▏", "▎", "▍", "▌", "▋", "▊", "▉", "█" };
 
+static void n_spaces(const unsigned int n)
+{
+	unsigned int i;
+
+	for (i = 0; i < n; i++)
+		putchar(' ');
+}
+
 static void
 print_percentage_bar(double percent, int max_len)
 {
@@ -716,8 +1007,10 @@ print_percentage_bar(double percent, int max_len)
 	if (i)
 		printf("%s", bars[i]);
 
-	for (i = 0; i < (max_len - 2 - (bar_len + 7) / 8); i++)
-		putchar(' ');
+	bar_len = max_len - 2 - (bar_len + 7) / 8;
+	if (bar_len > max_len)
+		bar_len = max_len;
+	n_spaces(bar_len);
 
 	putchar('|');
 }
@@ -784,12 +1077,15 @@ int main(int argc, char **argv)
 		return 1;
 	}
 
+	enable_stats();
+
 	/* Load average setup. */
 	period = (double)period_us / 1e6;
 	for (i = 0; i < NUM_LOADS; i++)
 		engines->load_exp[i] = exp(-period / load_period[i]);
 
 	pmu_sample(engines);
+	scan(engines);
 
 	for (;;) {
 		double t, qd = 0;
@@ -802,7 +1098,8 @@ int main(int argc, char **argv)
 		char reads[BUFSZ];
 		char writes[BUFSZ];
 		struct winsize ws;
-		unsigned int j;
+		unsigned int len, engine_w, j;
+		struct client *c;
 		int lines = 0;
 
 		/* Update terminal size. */
@@ -812,9 +1109,10 @@ int main(int argc, char **argv)
 		}
 
 		pmu_sample(engines);
-		t = (double)(engines->ts.cur - engines->ts.prev) / 1e9;
+		scan(engines);
+		qsort(clients, num_clients, sizeof(*c), cmp);
 
-		printf("\033[H\033[J");
+		t = (double)(engines->ts.cur - engines->ts.prev) / 1e9;
 
 		pmu_calc(&engines->freq_req, freq, BUFSZ, 4, 0, 1.0, t, 1);
 		pmu_calc(&engines->freq_act, fact, BUFSZ, 4, 0, 1.0, t, 1);
@@ -860,6 +1158,8 @@ int main(int argc, char **argv)
 					    qd);
 		}
 
+		printf("\033[H\033[J");
+
 		if (lines++ < con_h)
 			printf("intel-gpu-top - load avg %5.2f, %5.2f, %5.2f; %s/%s MHz;  %s%% RC6; %s %s; %s irqs/s\n",
 			       engines->load_avg[0],
@@ -903,7 +1203,6 @@ int main(int argc, char **argv)
 			struct engine *engine = engine_ptr(engines, i);
 			unsigned int max_w = con_w - 1;
 			char qdbuf[NUM_LOADS][BUFSZ];
-			unsigned int len;
 			char sema[BUFSZ];
 			char wait[BUFSZ];
 			char busy[BUFSZ];
@@ -941,6 +1240,49 @@ int main(int argc, char **argv)
 		if (lines++ < con_h)
 			printf("\n");
 
+		if (lines++ < con_h) {
+			printf("\033[7m");
+			len = printf("%5s%16s", "PID", "NAME");
+
+			engine_w = (con_w - len) / engines->num_engines;
+			for (i = 0; i < engines->num_engines; i++) {
+				struct engine *engine = engine_ptr(engines, i);
+				unsigned int name_len =
+					strlen(engine->display_name);
+				unsigned int pad = (engine_w - name_len) / 2;
+
+
+				n_spaces(pad);
+				printf("%s", engine->display_name);
+				n_spaces(engine_w - pad - name_len);
+				len += pad + name_len + (engine_w - pad -
+				       name_len);
+			}
+			n_spaces(con_w - len);
+			printf("\033[0m\n");
+		}
+
+		for_each_client(c, i) {
+			if (lines++ > con_h)
+				break;
+
+			assert(c->status != PROBE);
+			if (c->status != ALIVE || c->samples < 2)
+				break;
+
+			printf("%5u%16s ", c->pid, c->name);
+
+			for (j = 0; j < engines->num_engines; j++) {
+				double pct;
+
+				pct = (double)c->val[j] / period_us / 1e3 * 100;
+
+				print_percentage_bar(pct, engine_w);
+			}
+
+			putchar('\n');
+		}
+
 		usleep(period_us);
 	}
 
-- 
2.17.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for 21st century intel_gpu_top & Co.
  2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  (?)
@ 2018-10-03 16:57 ` Patchwork
  -1 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-10-03 16:57 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: 21st century intel_gpu_top & Co.
URL   : https://patchwork.freedesktop.org/series/50499/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4918 -> IGTPW_1901 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/50499/revisions/1/mbox/

== Known issues ==

  Here are the changes found in IGTPW_1901 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_module_reload@basic-reload-inject:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106725, fdo#106248) +1

    igt@gem_wait@basic-busy-all:
      fi-skl-gvtdvm:      PASS -> DMESG-WARN (fdo#105541)

    igt@kms_cursor_legacy@basic-flip-before-cursor-atomic:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#105719)

    igt@kms_flip@basic-flip-vs-modeset:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106000)

    igt@kms_flip@basic-flip-vs-wf_vblank:
      fi-glk-j4005:       PASS -> FAIL (fdo#100368)

    igt@kms_flip@basic-plain-flip:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106097)

    igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence:
      fi-byt-clapper:     PASS -> FAIL (fdo#103191, fdo#107362)

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#105541 https://bugs.freedesktop.org/show_bug.cgi?id=105541
  fdo#105719 https://bugs.freedesktop.org/show_bug.cgi?id=105719
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106097 https://bugs.freedesktop.org/show_bug.cgi?id=106097
  fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248
  fdo#106725 https://bugs.freedesktop.org/show_bug.cgi?id=106725
  fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362


== Participating hosts (48 -> 43) ==

  Additional (1): fi-cfl-guc 
  Missing    (6): fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-icl-u2 fi-snb-2520m fi-bdw-samus 


== Build changes ==

    * IGT: IGT_4662 -> IGTPW_1901

  CI_DRM_4918: f595aba3a6e2f6972bb158eb8434b58d22d0e5f0 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_1901: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1901/
  IGT_4662: ebf6a1dd1795e2f014ff3c47fe2eb4d5255845bd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools



== Testlist changes ==

+igt@perf_pmu@init-queued-bcs0
+igt@perf_pmu@init-queued-rcs0
+igt@perf_pmu@init-queued-vcs0
+igt@perf_pmu@init-queued-vcs1
+igt@perf_pmu@init-queued-vecs0
+igt@perf_pmu@init-runnable-bcs0
+igt@perf_pmu@init-runnable-rcs0
+igt@perf_pmu@init-runnable-vcs0
+igt@perf_pmu@init-runnable-vcs1
+igt@perf_pmu@init-runnable-vecs0
+igt@perf_pmu@init-running-bcs0
+igt@perf_pmu@init-running-rcs0
+igt@perf_pmu@init-running-vcs0
+igt@perf_pmu@init-running-vcs1
+igt@perf_pmu@init-running-vecs0
+igt@perf_pmu@queued-bcs0
+igt@perf_pmu@queued-contexts-bcs0
+igt@perf_pmu@queued-contexts-rcs0
+igt@perf_pmu@queued-contexts-vcs0
+igt@perf_pmu@queued-contexts-vcs1
+igt@perf_pmu@queued-contexts-vecs0
+igt@perf_pmu@queued-rcs0
+igt@perf_pmu@queued-vcs0
+igt@perf_pmu@queued-vcs1
+igt@perf_pmu@queued-vecs0
+igt@perf_pmu@render-node-queued-bcs0
+igt@perf_pmu@render-node-queued-contexts-bcs0
+igt@perf_pmu@render-node-queued-contexts-rcs0
+igt@perf_pmu@render-node-queued-contexts-vcs0
+igt@perf_pmu@render-node-queued-contexts-vcs1
+igt@perf_pmu@render-node-queued-contexts-vecs0
+igt@perf_pmu@render-node-queued-rcs0
+igt@perf_pmu@render-node-queued-vcs0
+igt@perf_pmu@render-node-queued-vcs1
+igt@perf_pmu@render-node-queued-vecs0
+igt@perf_pmu@render-node-runnable-bcs0
+igt@perf_pmu@render-node-runnable-rcs0
+igt@perf_pmu@render-node-runnable-vcs0
+igt@perf_pmu@render-node-runnable-vcs1
+igt@perf_pmu@render-node-runnable-vecs0
+igt@perf_pmu@render-node-running-bcs0
+igt@perf_pmu@render-node-running-rcs0
+igt@perf_pmu@render-node-running-vcs0
+igt@perf_pmu@render-node-running-vcs1
+igt@perf_pmu@render-node-running-vecs0
+igt@perf_pmu@runnable-bcs0
+igt@perf_pmu@runnable-rcs0
+igt@perf_pmu@runnable-vcs0
+igt@perf_pmu@runnable-vcs1
+igt@perf_pmu@runnable-vecs0
+igt@perf_pmu@running-bcs0
+igt@perf_pmu@running-rcs0
+igt@perf_pmu@running-vcs0
+igt@perf_pmu@running-vcs1
+igt@perf_pmu@running-vecs0

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1901/issues.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [igt-dev] ✗ Fi.CI.IGT: failure for 21st century intel_gpu_top & Co.
  2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  (?)
@ 2018-10-04  7:40 ` Patchwork
  -1 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-10-04  7:40 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: 21st century intel_gpu_top & Co.
URL   : https://patchwork.freedesktop.org/series/50499/
State : failure

== Summary ==

= CI Bug Log - changes from IGT_4662_full -> IGTPW_1901_full =

== Summary - FAILURE ==

  Serious unknown changes coming with IGTPW_1901_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in IGTPW_1901_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/50499/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in IGTPW_1901_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@perf_pmu@queued-contexts-bcs0:
      shard-hsw:          NOTRUN -> FAIL +43

    igt@perf_pmu@render-node-runnable-vcs0:
      shard-apl:          NOTRUN -> FAIL +43

    igt@perf_pmu@runnable-bcs0:
      shard-snb:          NOTRUN -> FAIL +24

    igt@perf_pmu@running-vcs1:
      shard-kbl:          NOTRUN -> FAIL +54

    
== Known issues ==

  Here are the changes found in IGTPW_1901_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_suspend@shrink:
      shard-snb:          PASS -> INCOMPLETE (fdo#106886, fdo#105411)
      shard-kbl:          PASS -> INCOMPLETE (fdo#106886, fdo#103665)

    igt@kms_available_modes_crc@available_mode_test_crc:
      shard-apl:          PASS -> FAIL (fdo#106641)

    igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-a:
      shard-snb:          NOTRUN -> DMESG-WARN (fdo#107956)

    igt@kms_color@pipe-a-ctm-max:
      shard-apl:          PASS -> FAIL (fdo#108147)

    igt@kms_cursor_crc@cursor-128x128-onscreen:
      shard-kbl:          PASS -> FAIL (fdo#103232)
      shard-apl:          PASS -> FAIL (fdo#103232)

    igt@kms_cursor_crc@cursor-64x64-suspend:
      shard-apl:          PASS -> FAIL (fdo#103191, fdo#103232)

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-wc:
      shard-apl:          PASS -> FAIL (fdo#103167) +1

    igt@kms_frontbuffer_tracking@fbc-1p-rte:
      shard-apl:          PASS -> DMESG-WARN (fdo#103558, fdo#108131)

    igt@kms_plane_multiple@atomic-pipe-c-tiling-x:
      shard-kbl:          PASS -> FAIL (fdo#103166) +1

    igt@kms_plane_multiple@atomic-pipe-c-tiling-y:
      shard-apl:          PASS -> FAIL (fdo#103166) +1

    igt@kms_setmode@basic:
      shard-apl:          PASS -> FAIL (fdo#99912)
      shard-kbl:          PASS -> FAIL (fdo#99912)

    igt@kms_vblank@pipe-c-ts-continuation-idle-hang:
      shard-apl:          PASS -> DMESG-WARN (fdo#105602, fdo#103558) +7

    
    ==== Possible fixes ====

    igt@gem_pwrite@big-gtt-fbr:
      shard-snb:          INCOMPLETE (fdo#105411) -> PASS

    igt@kms_available_modes_crc@available_mode_test_crc:
      shard-snb:          FAIL (fdo#106641) -> PASS

    igt@kms_color@pipe-a-degamma:
      shard-apl:          FAIL (fdo#108145) -> PASS

    igt@kms_cursor_crc@cursor-128x128-suspend:
      shard-apl:          FAIL (fdo#103191, fdo#103232) -> PASS
      shard-kbl:          FAIL (fdo#103191, fdo#103232) -> PASS

    igt@kms_cursor_crc@cursor-256x85-random:
      shard-apl:          FAIL (fdo#103232) -> PASS +4

    igt@kms_cursor_crc@cursor-64x21-onscreen:
      shard-kbl:          FAIL (fdo#103232) -> PASS +1

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-pwrite:
      shard-apl:          FAIL (fdo#103167) -> PASS +1
      shard-kbl:          FAIL (fdo#103167) -> PASS

    igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
      shard-apl:          FAIL (fdo#103166) -> PASS +1

    igt@kms_rotation_crc@sprite-rotation-180:
      shard-snb:          FAIL (fdo#103925) -> PASS

    igt@pm_rpm@system-suspend-modeset:
      shard-kbl:          INCOMPLETE (fdo#107807, fdo#103665) -> PASS

    
  fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103925 https://bugs.freedesktop.org/show_bug.cgi?id=103925
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#106641 https://bugs.freedesktop.org/show_bug.cgi?id=106641
  fdo#106886 https://bugs.freedesktop.org/show_bug.cgi?id=106886
  fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807
  fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
  fdo#108131 https://bugs.freedesktop.org/show_bug.cgi?id=108131
  fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
  fdo#108147 https://bugs.freedesktop.org/show_bug.cgi?id=108147
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (6 -> 4) ==

  Missing    (2): shard-skl shard-glk 


== Build changes ==

    * IGT: IGT_4662 -> IGTPW_1901
    * Linux: CI_DRM_4915 -> CI_DRM_4918

  CI_DRM_4915: 26e7a7d954a9c28b97af8ca7813f430fd9117232 @ git://anongit.freedesktop.org/gfx-ci/linux
  CI_DRM_4918: f595aba3a6e2f6972bb158eb8434b58d22d0e5f0 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_1901: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1901/
  IGT_4662: ebf6a1dd1795e2f014ff3c47fe2eb4d5255845bd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1901/shards.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-10-04  7:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-03 12:07 [RFC i-g-t 0/6] 21st century intel_gpu_top & Co Tvrtko Ursulin
2018-10-03 12:07 ` [igt-dev] " Tvrtko Ursulin
2018-10-03 12:07 ` [RFC i-g-t 1/6] include: DRM uAPI headers update Tvrtko Ursulin
2018-10-03 12:07   ` [igt-dev] " Tvrtko Ursulin
2018-10-03 12:07 ` [RFC i-g-t 2/6] intel-gpu-overlay: Add engine queue stats Tvrtko Ursulin
2018-10-03 12:07   ` [igt-dev] " Tvrtko Ursulin
2018-10-03 12:07 ` [RFC i-g-t 3/6] intel-gpu-overlay: Show 1s, 30s and 15m GPU load Tvrtko Ursulin
2018-10-03 12:07   ` [igt-dev] " Tvrtko Ursulin
2018-10-03 12:07 ` [RFC i-g-t 4/6] tests/perf_pmu: Add tests for engine queued/runnable/running stats Tvrtko Ursulin
2018-10-03 12:07   ` [igt-dev] " Tvrtko Ursulin
2018-10-03 12:07 ` [RFC i-g-t 5/6] intel-gpu-top: Add queue depths and load average Tvrtko Ursulin
2018-10-03 12:07   ` [igt-dev] " Tvrtko Ursulin
2018-10-03 12:07 ` [RFC i-g-t 6/6] intel-gpu-top: Support for client stats Tvrtko Ursulin
2018-10-03 12:07   ` [igt-dev] " Tvrtko Ursulin
2018-10-03 16:57 ` [igt-dev] ✓ Fi.CI.BAT: success for 21st century intel_gpu_top & Co Patchwork
2018-10-04  7:40 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.