All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 i-g-t 0/7] IGT PMU support
@ 2017-10-10  9:29 Tvrtko Ursulin
  2017-10-10  9:30 ` [PATCH i-g-t 1/9] intel-gpu-overlay: Move local perf implementation to a library Tvrtko Ursulin
                   ` (18 more replies)
  0 siblings, 19 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:29 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

1.
Fixes for intel-gpu-overlay to work on top of the proposed i915 PMU perf API.

2.
New test to exercise the same API.

3.
Update to gem_wsim and media-bench.pl to be able to use engine busyness via PMU
for making balancing decisions.

v2:
 * Added gem_wsim and media-bench.pl patches.
 * Comments and fixes for the perf_pmu test.

v3:
 * A bunch of review feedback implemented.

v4:
 * Tests for semaphore waits and event waits.
 * Review feedabck.
 * RAPL PMU for intel-gpu-overlay.


Tvrtko Ursulin (7):
  intel-gpu-overlay: Move local perf implementation to a library
  intel-gpu-overlay: Consolidate perf PMU access to library
  intel-gpu-overlay: Fix interrupts PMU readout
  intel-gpu-overlay: Catch-up to new i915 PMU
  tests/perf_pmu: Tests for i915 PMU API
  gem_wsim: Busy stats balancers
  media-bench.pl: Add busy balancers to the list

 benchmarks/Makefile.am   |   2 +-
 benchmarks/gem_wsim.c    | 142 +++++++
 lib/Makefile.am          |   6 +-
 lib/igt_gt.c             |  50 +++
 lib/igt_gt.h             |  38 ++
 lib/igt_perf.c           |  58 +++
 lib/igt_perf.h           |  96 +++++
 overlay/Makefile.am      |   6 +-
 overlay/gem-interrupts.c |  25 +-
 overlay/gpu-freq.c       |  29 +-
 overlay/gpu-perf.c       |   3 +-
 overlay/gpu-top.c        |  87 ++---
 overlay/perf.c           |  26 --
 overlay/perf.h           |  64 ----
 overlay/power.c          |  22 +-
 overlay/rc6.c            |  41 +-
 scripts/media-bench.pl   |   5 +-
 tests/Makefile.am        |   1 +
 tests/Makefile.sources   |   1 +
 tests/perf_pmu.c         | 957 +++++++++++++++++++++++++++++++++++++++++++++++
 20 files changed, 1425 insertions(+), 234 deletions(-)
 create mode 100644 lib/igt_perf.c
 create mode 100644 lib/igt_perf.h
 delete mode 100644 overlay/perf.c
 delete mode 100644 overlay/perf.h
 create mode 100644 tests/perf_pmu.c

-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 1/9] intel-gpu-overlay: Move local perf implementation to a library
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-11-21 19:34   ` [PATCH i-g-t v3 " Tvrtko Ursulin
  2017-10-10  9:30 ` [PATCH i-g-t 2/9] intel-gpu-overlay: Consolidate perf PMU access to library Tvrtko Ursulin
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Idea is to avoid duplication across multiple users in
upcoming patches.

v2: Commit message and use a separate library instead of piggy-
    backing to libintel_tools. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 lib/Makefile.am                  | 6 +++++-
 overlay/perf.c => lib/igt_perf.c | 2 +-
 overlay/perf.h => lib/igt_perf.h | 2 ++
 overlay/Makefile.am              | 6 ++----
 overlay/gem-interrupts.c         | 3 ++-
 overlay/gpu-freq.c               | 3 ++-
 overlay/gpu-perf.c               | 3 ++-
 overlay/gpu-top.c                | 3 ++-
 overlay/power.c                  | 3 ++-
 overlay/rc6.c                    | 3 ++-
 10 files changed, 22 insertions(+), 12 deletions(-)
 rename overlay/perf.c => lib/igt_perf.c (94%)
 rename overlay/perf.h => lib/igt_perf.h (99%)

diff --git a/lib/Makefile.am b/lib/Makefile.am
index 30ddb92bd0bc..30423dbc8c21 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -7,7 +7,11 @@ include Makefile.sources
 
 libintel_tools_la_SOURCES = $(lib_source_list)
 
-noinst_LTLIBRARIES = libintel_tools.la
+libigt_perf_la_SOURCES = \
+	igt_perf.c	 \
+	igt_perf.h
+
+noinst_LTLIBRARIES = libintel_tools.la libigt_perf.la
 noinst_HEADERS = check-ndebug.h
 
 if HAVE_LIBDRM_VC4
diff --git a/overlay/perf.c b/lib/igt_perf.c
similarity index 94%
rename from overlay/perf.c
rename to lib/igt_perf.c
index b8fdc675c587..45cccff0ae53 100644
--- a/overlay/perf.c
+++ b/lib/igt_perf.c
@@ -3,7 +3,7 @@
 #include <unistd.h>
 #include <stdlib.h>
 
-#include "perf.h"
+#include "igt_perf.h"
 
 uint64_t i915_type_id(void)
 {
diff --git a/overlay/perf.h b/lib/igt_perf.h
similarity index 99%
rename from overlay/perf.h
rename to lib/igt_perf.h
index c44e65f9734c..a80b311cd1d1 100644
--- a/overlay/perf.h
+++ b/lib/igt_perf.h
@@ -1,6 +1,8 @@
 #ifndef I915_PERF_H
 #define I915_PERF_H
 
+#include <stdint.h>
+
 #include <linux/perf_event.h>
 
 #define I915_SAMPLE_BUSY	0
diff --git a/overlay/Makefile.am b/overlay/Makefile.am
index 39fbcc4ec3cf..cefde2d040f8 100644
--- a/overlay/Makefile.am
+++ b/overlay/Makefile.am
@@ -4,8 +4,8 @@ endif
 
 AM_CPPFLAGS = -I.
 AM_CFLAGS = $(DRM_CFLAGS) $(PCIACCESS_CFLAGS) $(CWARNFLAGS) \
-	$(CAIRO_CFLAGS) $(OVERLAY_CFLAGS) $(WERROR_CFLAGS)
-LDADD = $(DRM_LIBS) $(PCIACCESS_LIBS) $(CAIRO_LIBS) $(OVERLAY_LIBS)
+	$(CAIRO_CFLAGS) $(OVERLAY_CFLAGS) $(WERROR_CFLAGS) -I$(srcdir)/../lib
+LDADD = $(DRM_LIBS) $(PCIACCESS_LIBS) $(CAIRO_LIBS) $(OVERLAY_LIBS) $(top_builddir)/lib/libigt_perf.la
 
 intel_gpu_overlay_SOURCES = \
 	chart.h \
@@ -29,8 +29,6 @@ intel_gpu_overlay_SOURCES = \
 	igfx.c \
 	overlay.h \
 	overlay.c \
-	perf.h \
-	perf.c \
 	power.h \
 	power.c \
 	rc6.h \
diff --git a/overlay/gem-interrupts.c b/overlay/gem-interrupts.c
index 0150a1d03825..7ba54fcd487d 100644
--- a/overlay/gem-interrupts.c
+++ b/overlay/gem-interrupts.c
@@ -31,9 +31,10 @@
 #include <string.h>
 #include <ctype.h>
 
+#include "igt_perf.h"
+
 #include "gem-interrupts.h"
 #include "debugfs.h"
-#include "perf.h"
 
 static int perf_open(void)
 {
diff --git a/overlay/gpu-freq.c b/overlay/gpu-freq.c
index 321c93882238..7f29b1aa986e 100644
--- a/overlay/gpu-freq.c
+++ b/overlay/gpu-freq.c
@@ -28,9 +28,10 @@
 #include <string.h>
 #include <stdio.h>
 
+#include "igt_perf.h"
+
 #include "gpu-freq.h"
 #include "debugfs.h"
-#include "perf.h"
 
 static int perf_i915_open(int config, int group)
 {
diff --git a/overlay/gpu-perf.c b/overlay/gpu-perf.c
index f557b9f06a17..3d4a9be91a94 100644
--- a/overlay/gpu-perf.c
+++ b/overlay/gpu-perf.c
@@ -34,7 +34,8 @@
 #include <fcntl.h>
 #include <errno.h>
 
-#include "perf.h"
+#include "igt_perf.h"
+
 #include "gpu-perf.h"
 #include "debugfs.h"
 
diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 891a7ea7c0b1..06f489dfdc83 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -31,7 +31,8 @@
 #include <errno.h>
 #include <assert.h>
 
-#include "perf.h"
+#include "igt_perf.h"
+
 #include "igfx.h"
 #include "gpu-top.h"
 
diff --git a/overlay/power.c b/overlay/power.c
index 2f1521b82cd6..84d860cae40c 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -31,7 +31,8 @@
 #include <time.h>
 #include <errno.h>
 
-#include "perf.h"
+#include "igt_perf.h"
+
 #include "power.h"
 #include "debugfs.h"
 
diff --git a/overlay/rc6.c b/overlay/rc6.c
index d7047c2f4880..3175bb22308f 100644
--- a/overlay/rc6.c
+++ b/overlay/rc6.c
@@ -31,8 +31,9 @@
 #include <time.h>
 #include <errno.h>
 
+#include "igt_perf.h"
+
 #include "rc6.h"
-#include "perf.h"
 
 static int perf_i915_open(int config, int group)
 {
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 2/9] intel-gpu-overlay: Consolidate perf PMU access to library
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
  2017-10-10  9:30 ` [PATCH i-g-t 1/9] intel-gpu-overlay: Move local perf implementation to a library Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-10-10 12:21   ` Chris Wilson
  2017-10-10  9:30 ` [PATCH i-g-t 3/9] lib/perf: Fix data types and general tidy Tvrtko Ursulin
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Various tool modules implement their owm PMU open wrapper which
can be replaced by calling the library one.

v2:
 * Remove extra newline. (Chris Wilson)
 * Commit msg.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 lib/igt_perf.c           | 32 ++++++++++++++++++++++++++++++++
 lib/igt_perf.h           |  2 ++
 overlay/gem-interrupts.c | 16 +---------------
 overlay/gpu-freq.c       | 22 ++--------------------
 overlay/gpu-top.c        | 32 ++++++++------------------------
 overlay/power.c          | 17 +----------------
 overlay/rc6.c            | 24 +++---------------------
 7 files changed, 49 insertions(+), 96 deletions(-)

diff --git a/lib/igt_perf.c b/lib/igt_perf.c
index 45cccff0ae53..961a858af9e3 100644
--- a/lib/igt_perf.c
+++ b/lib/igt_perf.c
@@ -2,6 +2,8 @@
 #include <fcntl.h>
 #include <unistd.h>
 #include <stdlib.h>
+#include <string.h>
+#include <errno.h>
 
 #include "igt_perf.h"
 
@@ -24,3 +26,33 @@ uint64_t i915_type_id(void)
 	return strtoull(buf, 0, 0);
 }
 
+static int _perf_open(int config, int group, int format)
+{
+	struct perf_event_attr attr;
+
+	memset(&attr, 0, sizeof (attr));
+
+	attr.type = i915_type_id();
+	if (attr.type == 0)
+		return -ENOENT;
+
+	attr.config = config;
+
+	if (group >= 0)
+		format &= ~PERF_FORMAT_GROUP;
+
+	attr.read_format = format;
+
+	return perf_event_open(&attr, -1, 0, group, 0);
+}
+
+int perf_i915_open(int config)
+{
+	return _perf_open(config, -1, PERF_FORMAT_TOTAL_TIME_ENABLED);
+}
+
+int perf_i915_open_group(int config, int group)
+{
+	return _perf_open(config, group,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_GROUP);
+}
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index a80b311cd1d1..8e674c3a3755 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -62,5 +62,7 @@ perf_event_open(struct perf_event_attr *attr,
 }
 
 uint64_t i915_type_id(void);
+int perf_i915_open(int config);
+int perf_i915_open_group(int config, int group);
 
 #endif /* I915_PERF_H */
diff --git a/overlay/gem-interrupts.c b/overlay/gem-interrupts.c
index 7ba54fcd487d..a84aef0398a7 100644
--- a/overlay/gem-interrupts.c
+++ b/overlay/gem-interrupts.c
@@ -36,20 +36,6 @@
 #include "gem-interrupts.h"
 #include "debugfs.h"
 
-static int perf_open(void)
-{
-	struct perf_event_attr attr;
-
-	memset(&attr, 0, sizeof (attr));
-
-	attr.type = i915_type_id();
-	if (attr.type == 0)
-		return -ENOENT;
-	attr.config = I915_PERF_INTERRUPTS;
-
-	return perf_event_open(&attr, -1, 0, -1, 0);
-}
-
 static long long debugfs_read(void)
 {
 	char buf[8192], *b;
@@ -127,7 +113,7 @@ int gem_interrupts_init(struct gem_interrupts *irqs)
 {
 	memset(irqs, 0, sizeof(*irqs));
 
-	irqs->fd = perf_open();
+	irqs->fd = perf_i915_open(I915_PERF_INTERRUPTS);
 	if (irqs->fd < 0 && interrupts_read() < 0)
 		irqs->error = ENODEV;
 
diff --git a/overlay/gpu-freq.c b/overlay/gpu-freq.c
index 7f29b1aa986e..76c5ed9acfd1 100644
--- a/overlay/gpu-freq.c
+++ b/overlay/gpu-freq.c
@@ -33,30 +33,12 @@
 #include "gpu-freq.h"
 #include "debugfs.h"
 
-static int perf_i915_open(int config, int group)
-{
-	struct perf_event_attr attr;
-
-	memset(&attr, 0, sizeof (attr));
-
-	attr.type = i915_type_id();
-	if (attr.type == 0)
-		return -ENOENT;
-	attr.config = config;
-
-	attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
-	if (group == -1)
-		attr.read_format |= PERF_FORMAT_GROUP;
-
-	return perf_event_open(&attr, -1, 0, group, 0);
-}
-
 static int perf_open(void)
 {
 	int fd;
 
-	fd = perf_i915_open(I915_PERF_ACTUAL_FREQUENCY, -1);
-	if (perf_i915_open(I915_PERF_REQUESTED_FREQUENCY, fd) < 0) {
+	fd = perf_i915_open_group(I915_PERF_ACTUAL_FREQUENCY, -1);
+	if (perf_i915_open_group(I915_PERF_REQUESTED_FREQUENCY, fd) < 0) {
 		close(fd);
 		fd = -1;
 	}
diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 06f489dfdc83..812f47d5aced 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -48,24 +48,6 @@
 #define I915_PERF_RING_WAIT(n) (__I915_PERF_RING(n) + 1)
 #define I915_PERF_RING_SEMA(n) (__I915_PERF_RING(n) + 2)
 
-static int perf_i915_open(int config, int group)
-{
-	struct perf_event_attr attr;
-
-	memset(&attr, 0, sizeof (attr));
-
-	attr.type = i915_type_id();
-	if (attr.type == 0)
-		return -ENOENT;
-	attr.config = config;
-
-	attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
-	if (group == -1)
-		attr.read_format |= PERF_FORMAT_GROUP;
-
-	return perf_event_open(&attr, -1, 0, group, 0);
-}
-
 static int perf_init(struct gpu_top *gt)
 {
 	const char *names[] = {
@@ -77,27 +59,29 @@ static int perf_init(struct gpu_top *gt)
 	};
 	int n;
 
-	gt->fd = perf_i915_open(I915_PERF_RING_BUSY(0), -1);
+	gt->fd = perf_i915_open_group(I915_PERF_RING_BUSY(0), -1);
 	if (gt->fd < 0)
 		return -1;
 
-	if (perf_i915_open(I915_PERF_RING_WAIT(0), gt->fd) >= 0)
+	if (perf_i915_open_group(I915_PERF_RING_WAIT(0), gt->fd) >= 0)
 		gt->have_wait = 1;
 
-	if (perf_i915_open(I915_PERF_RING_SEMA(0), gt->fd) >= 0)
+	if (perf_i915_open_group(I915_PERF_RING_SEMA(0), gt->fd) >= 0)
 		gt->have_sema = 1;
 
 	gt->ring[0].name = names[0];
 	gt->num_rings = 1;
 
 	for (n = 1; names[n]; n++) {
-		if (perf_i915_open(I915_PERF_RING_BUSY(n), gt->fd) >= 0) {
+		if (perf_i915_open_group(I915_PERF_RING_BUSY(n), gt->fd) >= 0) {
 			if (gt->have_wait &&
-			    perf_i915_open(I915_PERF_RING_WAIT(n), gt->fd) < 0)
+			    perf_i915_open_group(I915_PERF_RING_WAIT(n),
+						 gt->fd) < 0)
 				return -1;
 
 			if (gt->have_sema &&
-			    perf_i915_open(I915_PERF_RING_SEMA(n), gt->fd) < 0)
+			    perf_i915_open_group(I915_PERF_RING_SEMA(n),
+						 gt->fd) < 0)
 				return -1;
 
 			gt->ring[gt->num_rings++].name = names[n];
diff --git a/overlay/power.c b/overlay/power.c
index 84d860cae40c..dd4aec6bffd9 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -38,21 +38,6 @@
 
 /* XXX Is this exposed through RAPL? */
 
-static int perf_open(void)
-{
-	struct perf_event_attr attr;
-
-	memset(&attr, 0, sizeof (attr));
-
-	attr.type = i915_type_id();
-	if (attr.type == 0)
-		return -1;
-	attr.config = I915_PERF_ENERGY;
-
-	attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
-	return perf_event_open(&attr, -1, 0, -1, 0);
-}
-
 int power_init(struct power *power)
 {
 	char buf[4096];
@@ -60,7 +45,7 @@ int power_init(struct power *power)
 
 	memset(power, 0, sizeof(*power));
 
-	power->fd = perf_open();
+	power->fd = perf_i915_open(I915_PERF_ENERGY);
 	if (power->fd != -1)
 		return 0;
 
diff --git a/overlay/rc6.c b/overlay/rc6.c
index 3175bb22308f..46c975a557ff 100644
--- a/overlay/rc6.c
+++ b/overlay/rc6.c
@@ -35,24 +35,6 @@
 
 #include "rc6.h"
 
-static int perf_i915_open(int config, int group)
-{
-	struct perf_event_attr attr;
-
-	memset(&attr, 0, sizeof (attr));
-
-	attr.type = i915_type_id();
-	if (attr.type == 0)
-		return -ENOENT;
-	attr.config = config;
-
-	attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
-	if (group == -1)
-		attr.read_format |= PERF_FORMAT_GROUP;
-
-	return perf_event_open(&attr, -1, 0, group, 0);
-}
-
 #define RC6	(1<<0)
 #define RC6p	(1<<1)
 #define RC6pp	(1<<2)
@@ -61,15 +43,15 @@ static int perf_open(unsigned *flags)
 {
 	int fd;
 
-	fd = perf_i915_open(I915_PERF_RC6_RESIDENCY, -1);
+	fd = perf_i915_open_group(I915_PERF_RC6_RESIDENCY, -1);
 	if (fd < 0)
 		return -1;
 
 	*flags |= RC6;
-	if (perf_i915_open(I915_PERF_RC6p_RESIDENCY, fd) >= 0)
+	if (perf_i915_open_group(I915_PERF_RC6p_RESIDENCY, fd) >= 0)
 		*flags |= RC6p;
 
-	if (perf_i915_open(I915_PERF_RC6pp_RESIDENCY, fd) >= 0)
+	if (perf_i915_open_group(I915_PERF_RC6pp_RESIDENCY, fd) >= 0)
 		*flags |= RC6pp;
 
 	return fd;
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 3/9] lib/perf: Fix data types and general tidy
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
  2017-10-10  9:30 ` [PATCH i-g-t 1/9] intel-gpu-overlay: Move local perf implementation to a library Tvrtko Ursulin
  2017-10-10  9:30 ` [PATCH i-g-t 2/9] intel-gpu-overlay: Consolidate perf PMU access to library Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-10-10 12:22   ` Chris Wilson
  2017-10-10  9:30 ` [PATCH i-g-t 4/9] intel-gpu-overlay: Fix interrupts PMU readout Tvrtko Ursulin
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Configuration and format are uint64_t in the perf API.

Tidy some other details as well.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 lib/igt_perf.c | 40 +++++++++++++++++++---------------------
 lib/igt_perf.h |  4 ++--
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/lib/igt_perf.c b/lib/igt_perf.c
index 961a858af9e3..208474302fcc 100644
--- a/lib/igt_perf.c
+++ b/lib/igt_perf.c
@@ -9,49 +9,47 @@
 
 uint64_t i915_type_id(void)
 {
-	char buf[1024];
-	int fd, n;
-
-	fd = open("/sys/bus/event_source/devices/i915/type", 0);
-	if (fd < 0) {
-		n = -1;
-	} else {
-		n = read(fd, buf, sizeof(buf)-1);
-		close(fd);
-	}
-	if (n < 0)
+	char buf[64];
+	ssize_t ret;
+	int fd;
+
+	fd = open("/sys/bus/event_source/devices/i915/type", O_RDONLY);
+	if (fd < 0)
+		return 0;
+
+	ret = read(fd, buf, sizeof(buf) - 1);
+	close(fd);
+	if (ret < 1)
 		return 0;
 
-	buf[n] = '\0';
-	return strtoull(buf, 0, 0);
+	buf[ret] = '\0';
+
+	return strtoull(buf, NULL, 0);
 }
 
-static int _perf_open(int config, int group, int format)
+static int _perf_open(uint64_t config, int group, uint64_t format)
 {
-	struct perf_event_attr attr;
-
-	memset(&attr, 0, sizeof (attr));
+	struct perf_event_attr attr = { };
 
 	attr.type = i915_type_id();
 	if (attr.type == 0)
 		return -ENOENT;
 
-	attr.config = config;
-
 	if (group >= 0)
 		format &= ~PERF_FORMAT_GROUP;
 
 	attr.read_format = format;
+	attr.config = config;
 
 	return perf_event_open(&attr, -1, 0, group, 0);
 }
 
-int perf_i915_open(int config)
+int perf_i915_open(uint64_t config)
 {
 	return _perf_open(config, -1, PERF_FORMAT_TOTAL_TIME_ENABLED);
 }
 
-int perf_i915_open_group(int config, int group)
+int perf_i915_open_group(uint64_t config, int group)
 {
 	return _perf_open(config, group,
 			  PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_GROUP);
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index 8e674c3a3755..cc10cb300aaf 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -62,7 +62,7 @@ perf_event_open(struct perf_event_attr *attr,
 }
 
 uint64_t i915_type_id(void);
-int perf_i915_open(int config);
-int perf_i915_open_group(int config, int group);
+int perf_i915_open(uint64_t config);
+int perf_i915_open_group(uint64_t config, int group);
 
 #endif /* I915_PERF_H */
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 4/9] intel-gpu-overlay: Fix interrupts PMU readout
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2017-10-10  9:30 ` [PATCH i-g-t 3/9] lib/perf: Fix data types and general tidy Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-10-10 12:23   ` Chris Wilson
  2017-10-10  9:30 ` [PATCH i-g-t 5/9] intel-gpu-overlay: Catch-up to new i915 PMU Tvrtko Ursulin
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 overlay/gem-interrupts.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/overlay/gem-interrupts.c b/overlay/gem-interrupts.c
index a84aef0398a7..3eda24f4d7eb 100644
--- a/overlay/gem-interrupts.c
+++ b/overlay/gem-interrupts.c
@@ -136,8 +136,12 @@ int gem_interrupts_update(struct gem_interrupts *irqs)
 		else
 			val = ret;
 	} else {
-		if (read(irqs->fd, &val, sizeof(val)) < 0)
+		uint64_t data[2];
+
+		if (read(irqs->fd, &data, sizeof(data)) < 0)
 			return irqs->error = errno;
+
+		val = data[0];
 	}
 
 	update = irqs->last_count == 0;
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 5/9] intel-gpu-overlay: Catch-up to new i915 PMU
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2017-10-10  9:30 ` [PATCH i-g-t 4/9] intel-gpu-overlay: Fix interrupts PMU readout Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-11-21 18:20   ` Tvrtko Ursulin
  2017-10-10  9:30 ` [PATCH i-g-t 6/9] intel-gpu-overlay: Use RAPL PMU for power reading Tvrtko Ursulin
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

v2: Update for i915 changes.
v3: Use 1eN for large numbers. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_perf.h           | 89 +++++++++++++++++++++++++++++++++---------------
 overlay/gem-interrupts.c |  2 +-
 overlay/gpu-freq.c       |  8 ++---
 overlay/gpu-top.c        | 68 ++++++++++++++++++++----------------
 overlay/power.c          |  4 +--
 overlay/rc6.c            | 20 +++++------
 6 files changed, 116 insertions(+), 75 deletions(-)

diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index cc10cb300aaf..285823786324 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -1,3 +1,27 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
 #ifndef I915_PERF_H
 #define I915_PERF_H
 
@@ -5,41 +29,52 @@
 
 #include <linux/perf_event.h>
 
-#define I915_SAMPLE_BUSY	0
-#define I915_SAMPLE_WAIT	1
-#define I915_SAMPLE_SEMA	2
+enum drm_i915_gem_engine_class {
+	I915_ENGINE_CLASS_OTHER = 0,
+	I915_ENGINE_CLASS_RENDER = 1,
+	I915_ENGINE_CLASS_COPY = 2,
+	I915_ENGINE_CLASS_VIDEO = 3,
+	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
+	I915_ENGINE_CLASS_MAX /* non-ABI */
+};
+
+enum drm_i915_pmu_engine_sample {
+	I915_SAMPLE_BUSY = 0,
+	I915_SAMPLE_WAIT = 1,
+	I915_SAMPLE_SEMA = 2,
+	I915_ENGINE_SAMPLE_MAX /* non-ABI */
+};
 
-#define I915_SAMPLE_RCS		0
-#define I915_SAMPLE_VCS		1
-#define I915_SAMPLE_BCS		2
-#define I915_SAMPLE_VECS	3
+#define I915_PMU_SAMPLE_BITS (4)
+#define I915_PMU_SAMPLE_MASK (0xf)
+#define I915_PMU_SAMPLE_INSTANCE_BITS (8)
+#define I915_PMU_CLASS_SHIFT \
+	(I915_PMU_SAMPLE_BITS + I915_PMU_SAMPLE_INSTANCE_BITS)
 
-#define __I915_PERF_COUNT(ring, id) ((ring) << 4 | (id))
+#define __I915_PMU_ENGINE(class, instance, sample) \
+	((class) << I915_PMU_CLASS_SHIFT | \
+	(instance) << I915_PMU_SAMPLE_BITS | \
+	(sample))
 
-#define I915_PERF_COUNT_RCS_BUSY __I915_PERF_COUNT(I915_SAMPLE_RCS, I915_SAMPLE_BUSY)
-#define I915_PERF_COUNT_RCS_WAIT __I915_PERF_COUNT(I915_SAMPLE_RCS, I915_SAMPLE_WAIT)
-#define I915_PERF_COUNT_RCS_SEMA __I915_PERF_COUNT(I915_SAMPLE_RCS, I915_SAMPLE_SEMA)
+#define I915_PMU_ENGINE_BUSY(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_BUSY)
 
-#define I915_PERF_COUNT_VCS_BUSY __I915_PERF_COUNT(I915_SAMPLE_VCS, I915_SAMPLE_BUSY)
-#define I915_PERF_COUNT_VCS_WAIT __I915_PERF_COUNT(I915_SAMPLE_VCS, I915_SAMPLE_WAIT)
-#define I915_PERF_COUNT_VCS_SEMA __I915_PERF_COUNT(I915_SAMPLE_VCS, I915_SAMPLE_SEMA)
+#define I915_PMU_ENGINE_WAIT(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_WAIT)
 
-#define I915_PERF_COUNT_BCS_BUSY __I915_PERF_COUNT(I915_SAMPLE_BCS, I915_SAMPLE_BUSY)
-#define I915_PERF_COUNT_BCS_WAIT __I915_PERF_COUNT(I915_SAMPLE_BCS, I915_SAMPLE_WAIT)
-#define I915_PERF_COUNT_BCS_SEMA __I915_PERF_COUNT(I915_SAMPLE_BCS, I915_SAMPLE_SEMA)
+#define I915_PMU_ENGINE_SEMA(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
-#define I915_PERF_COUNT_VECS_BUSY __I915_PERF_COUNT(I915_SAMPLE_VECS, I915_SAMPLE_BUSY)
-#define I915_PERF_COUNT_VECS_WAIT __I915_PERF_COUNT(I915_SAMPLE_VECS, I915_SAMPLE_WAIT)
-#define I915_PERF_COUNT_VECS_SEMA __I915_PERF_COUNT(I915_SAMPLE_VECS, I915_SAMPLE_SEMA)
+#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
-#define I915_PERF_ACTUAL_FREQUENCY 32
-#define I915_PERF_REQUESTED_FREQUENCY 33
-#define I915_PERF_ENERGY 34
-#define I915_PERF_INTERRUPTS 35
+#define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
+#define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
+#define I915_PMU_INTERRUPTS		__I915_PMU_OTHER(2)
+#define I915_PMU_RC6_RESIDENCY		__I915_PMU_OTHER(3)
+#define I915_PMU_RC6p_RESIDENCY		__I915_PMU_OTHER(4)
+#define I915_PMU_RC6pp_RESIDENCY	__I915_PMU_OTHER(5)
 
-#define I915_PERF_RC6_RESIDENCY		40
-#define I915_PERF_RC6p_RESIDENCY	41
-#define I915_PERF_RC6pp_RESIDENCY	42
+#define I915_PMU_LAST I915_PMU_RC6pp_RESIDENCY
 
 static inline int
 perf_event_open(struct perf_event_attr *attr,
diff --git a/overlay/gem-interrupts.c b/overlay/gem-interrupts.c
index 3eda24f4d7eb..add4a9dfd725 100644
--- a/overlay/gem-interrupts.c
+++ b/overlay/gem-interrupts.c
@@ -113,7 +113,7 @@ int gem_interrupts_init(struct gem_interrupts *irqs)
 {
 	memset(irqs, 0, sizeof(*irqs));
 
-	irqs->fd = perf_i915_open(I915_PERF_INTERRUPTS);
+	irqs->fd = perf_i915_open(I915_PMU_INTERRUPTS);
 	if (irqs->fd < 0 && interrupts_read() < 0)
 		irqs->error = ENODEV;
 
diff --git a/overlay/gpu-freq.c b/overlay/gpu-freq.c
index 76c5ed9acfd1..0d8032592ef5 100644
--- a/overlay/gpu-freq.c
+++ b/overlay/gpu-freq.c
@@ -37,8 +37,8 @@ static int perf_open(void)
 {
 	int fd;
 
-	fd = perf_i915_open_group(I915_PERF_ACTUAL_FREQUENCY, -1);
-	if (perf_i915_open_group(I915_PERF_REQUESTED_FREQUENCY, fd) < 0) {
+	fd = perf_i915_open_group(I915_PMU_ACTUAL_FREQUENCY, -1);
+	if (perf_i915_open_group(I915_PMU_REQUESTED_FREQUENCY, fd) < 0) {
 		close(fd);
 		fd = -1;
 	}
@@ -176,8 +176,8 @@ int gpu_freq_update(struct gpu_freq *gf)
 			return EAGAIN;
 		}
 
-		gf->current = (s->act - d->act) / d_time;
-		gf->request = (s->req - d->req) / d_time;
+		gf->current = (s->act - d->act) * 1e9 / d_time;
+		gf->request = (s->req - d->req) * 1e9 / d_time;
 	}
 
 	return 0;
diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 812f47d5aced..61b8f62fd78c 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -43,49 +43,57 @@
 #define   RING_WAIT		(1<<11)
 #define   RING_WAIT_SEMAPHORE	(1<<10)
 
-#define __I915_PERF_RING(n) (4*n)
-#define I915_PERF_RING_BUSY(n) (__I915_PERF_RING(n) + 0)
-#define I915_PERF_RING_WAIT(n) (__I915_PERF_RING(n) + 1)
-#define I915_PERF_RING_SEMA(n) (__I915_PERF_RING(n) + 2)
-
 static int perf_init(struct gpu_top *gt)
 {
-	const char *names[] = {
-		"RCS",
-		"BCS",
-		"VCS0",
-		"VCS1",
-		NULL,
+	struct engine_desc {
+		unsigned class, inst;
+		const char *name;
+	} *d, engines[] = {
+		{ I915_ENGINE_CLASS_RENDER, 0, "rcs0" },
+		{ I915_ENGINE_CLASS_COPY, 0, "bcs0" },
+		{ I915_ENGINE_CLASS_VIDEO, 0, "vcs0" },
+		{ I915_ENGINE_CLASS_VIDEO, 1, "vcs1" },
+		{ I915_ENGINE_CLASS_VIDEO_ENHANCE, 0, "vecs0" },
+		{ 0, 0, NULL }
 	};
-	int n;
 
-	gt->fd = perf_i915_open_group(I915_PERF_RING_BUSY(0), -1);
+	d = &engines[0];
+
+	gt->fd = perf_i915_open_group(I915_PMU_ENGINE_BUSY(d->class, d->inst),
+				      -1);
 	if (gt->fd < 0)
 		return -1;
 
-	if (perf_i915_open_group(I915_PERF_RING_WAIT(0), gt->fd) >= 0)
+	if (perf_i915_open_group(I915_PMU_ENGINE_WAIT(d->class, d->inst),
+				 gt->fd) >= 0)
 		gt->have_wait = 1;
 
-	if (perf_i915_open_group(I915_PERF_RING_SEMA(0), gt->fd) >= 0)
+	if (perf_i915_open_group(I915_PMU_ENGINE_SEMA(d->class, d->inst),
+				 gt->fd) >= 0)
 		gt->have_sema = 1;
 
-	gt->ring[0].name = names[0];
+	gt->ring[0].name = d->name;
 	gt->num_rings = 1;
 
-	for (n = 1; names[n]; n++) {
-		if (perf_i915_open_group(I915_PERF_RING_BUSY(n), gt->fd) >= 0) {
-			if (gt->have_wait &&
-			    perf_i915_open_group(I915_PERF_RING_WAIT(n),
-						 gt->fd) < 0)
-				return -1;
-
-			if (gt->have_sema &&
-			    perf_i915_open_group(I915_PERF_RING_SEMA(n),
-						 gt->fd) < 0)
-				return -1;
-
-			gt->ring[gt->num_rings++].name = names[n];
-		}
+	for (d++; d->name; d++) {
+		if (perf_i915_open_group(I915_PMU_ENGINE_BUSY(d->class,
+							      d->inst),
+					gt->fd) < 0)
+			continue;
+
+		if (gt->have_wait &&
+		    perf_i915_open_group(I915_PMU_ENGINE_WAIT(d->class,
+							      d->inst),
+					 gt->fd) < 0)
+			return -1;
+
+		if (gt->have_sema &&
+		    perf_i915_open_group(I915_PMU_ENGINE_SEMA(d->class,
+							      d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		gt->ring[gt->num_rings++].name = d->name;
 	}
 
 	return 0;
diff --git a/overlay/power.c b/overlay/power.c
index dd4aec6bffd9..805f4ca7805c 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -45,9 +45,7 @@ int power_init(struct power *power)
 
 	memset(power, 0, sizeof(*power));
 
-	power->fd = perf_i915_open(I915_PERF_ENERGY);
-	if (power->fd != -1)
-		return 0;
+	power->fd = -1;
 
 	sprintf(buf, "%s/i915_energy_uJ", debugfs_dri_path);
 	fd = open(buf, 0);
diff --git a/overlay/rc6.c b/overlay/rc6.c
index 46c975a557ff..8977f0993095 100644
--- a/overlay/rc6.c
+++ b/overlay/rc6.c
@@ -43,15 +43,15 @@ static int perf_open(unsigned *flags)
 {
 	int fd;
 
-	fd = perf_i915_open_group(I915_PERF_RC6_RESIDENCY, -1);
+	fd = perf_i915_open_group(I915_PMU_RC6_RESIDENCY, -1);
 	if (fd < 0)
 		return -1;
 
 	*flags |= RC6;
-	if (perf_i915_open_group(I915_PERF_RC6p_RESIDENCY, fd) >= 0)
+	if (perf_i915_open_group(I915_PMU_RC6p_RESIDENCY, fd) >= 0)
 		*flags |= RC6p;
 
-	if (perf_i915_open_group(I915_PERF_RC6pp_RESIDENCY, fd) >= 0)
+	if (perf_i915_open_group(I915_PMU_RC6pp_RESIDENCY, fd) >= 0)
 		*flags |= RC6pp;
 
 	return fd;
@@ -132,11 +132,11 @@ int rc6_update(struct rc6 *rc6)
 
 		len = 2;
 		if (rc6->flags & RC6)
-			s->rc6_residency = data[len++];
+			s->rc6_residency = data[len++] / 1e6;
 		if (rc6->flags & RC6p)
-			s->rc6p_residency = data[len++];
+			s->rc6p_residency = data[len++] / 1e6;
 		if (rc6->flags & RC6pp)
-			s->rc6pp_residency = data[len++];
+			s->rc6pp_residency = data[len++] / 1e6;
 	}
 
 	if (rc6->count == 1)
@@ -149,14 +149,14 @@ int rc6_update(struct rc6 *rc6)
 	}
 
 	d_rc6 = s->rc6_residency - d->rc6_residency;
-	rc6->rc6 = (100 * d_rc6 + d_time/2) / d_time;
+	rc6->rc6 = 100 * d_rc6 / d_time;
 
 	d_rc6p = s->rc6p_residency - d->rc6p_residency;
-	rc6->rc6p = (100 * d_rc6p + d_time/2) / d_time;
+	rc6->rc6p = 100 * d_rc6p / d_time;
 
 	d_rc6pp = s->rc6pp_residency - d->rc6pp_residency;
-	rc6->rc6pp = (100 * d_rc6pp + d_time/2) / d_time;
+	rc6->rc6pp = 100 * d_rc6pp / d_time;
 
-	rc6->rc6_combined = (100 * (d_rc6 + d_rc6p + d_rc6pp) + d_time/2) / d_time;
+	rc6->rc6_combined = 100 * (d_rc6 + d_rc6p + d_rc6pp) / d_time;
 	return 0;
 }
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 6/9] intel-gpu-overlay: Use RAPL PMU for power reading
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2017-10-10  9:30 ` [PATCH i-g-t 5/9] intel-gpu-overlay: Catch-up to new i915 PMU Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-10-10 11:30   ` [PATCH i-g-t v2 " Tvrtko Ursulin
  2017-10-10  9:30 ` [PATCH i-g-t 7/9] tests/perf_pmu: Tests for i915 PMU API Tvrtko Ursulin
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Wire up to the RAPL PMU for GPU energy readings.

The only complication is that we have to add code to parse:

 # cat /sys/devices/power/events/energy-gpu.scale
 2.3283064365386962890625e-10

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 lib/igt_perf.c  |  16 ++++--
 lib/igt_perf.h  |   1 +
 overlay/power.c | 156 ++++++++++++++++++++++++++++++++++++++++++--------------
 overlay/power.h |   2 +
 4 files changed, 133 insertions(+), 42 deletions(-)

diff --git a/lib/igt_perf.c b/lib/igt_perf.c
index 208474302fcc..0221461e918f 100644
--- a/lib/igt_perf.c
+++ b/lib/igt_perf.c
@@ -27,11 +27,12 @@ uint64_t i915_type_id(void)
 	return strtoull(buf, NULL, 0);
 }
 
-static int _perf_open(uint64_t config, int group, uint64_t format)
+static int
+_perf_open(uint64_t type, uint64_t config, int group, uint64_t format)
 {
 	struct perf_event_attr attr = { };
 
-	attr.type = i915_type_id();
+	attr.type = type;
 	if (attr.type == 0)
 		return -ENOENT;
 
@@ -46,11 +47,18 @@ static int _perf_open(uint64_t config, int group, uint64_t format)
 
 int perf_i915_open(uint64_t config)
 {
-	return _perf_open(config, -1, PERF_FORMAT_TOTAL_TIME_ENABLED);
+	return _perf_open(i915_type_id(), config, -1,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED);
 }
 
 int perf_i915_open_group(uint64_t config, int group)
 {
-	return _perf_open(config, group,
+	return _perf_open(i915_type_id(), config, group,
 			  PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_GROUP);
 }
+
+int igt_perf_open(uint64_t type, uint64_t config)
+{
+	return _perf_open(type, config, -1,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED);
+}
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index 285823786324..b1f525739c69 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -99,5 +99,6 @@ perf_event_open(struct perf_event_attr *attr,
 uint64_t i915_type_id(void);
 int perf_i915_open(uint64_t config);
 int perf_i915_open_group(uint64_t config, int group);
+int igt_perf_open(uint64_t type, uint64_t config);
 
 #endif /* I915_PERF_H */
diff --git a/overlay/power.c b/overlay/power.c
index 805f4ca7805c..35e446e6bce5 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -30,60 +30,138 @@
 #include <fcntl.h>
 #include <time.h>
 #include <errno.h>
+#include <ctype.h>
+#include <math.h>
 
 #include "igt_perf.h"
 
 #include "power.h"
 #include "debugfs.h"
 
-/* XXX Is this exposed through RAPL? */
+static uint64_t filename_to_u64(const char *filename, int base)
+{
+	char buf[64], *b;
+	ssize_t ret;
+	int fd;
 
-int power_init(struct power *power)
+	fd = open(filename, O_RDONLY);
+	if (fd < 0)
+		return 0;
+
+	ret = read(fd, buf, sizeof(buf) - 1);
+	close(fd);
+	if (ret < 1)
+		return 0;
+
+	buf[ret] = '\0';
+
+	b = buf;
+	while (*b && !isdigit(*b))
+		b++;
+
+	return strtoull(b, NULL, base);
+}
+
+static uint64_t debugfs_file_to_u64(const char *name)
 {
-	char buf[4096];
-	int fd, len;
+	char buf[1024];
 
-	memset(power, 0, sizeof(*power));
+	snprintf(buf, sizeof(buf), "%s/%s", debugfs_dri_path, name);
+
+	return filename_to_u64(buf, 0);
+}
 
-	power->fd = -1;
+static uint64_t rapl_type_id(void)
+{
+	return filename_to_u64("/sys/devices/power/type", 10);
+}
 
-	sprintf(buf, "%s/i915_energy_uJ", debugfs_dri_path);
-	fd = open(buf, 0);
+static uint64_t rapl_gpu_power(void)
+{
+	return filename_to_u64("/sys/devices/power/events/energy-gpu", 0);
+}
+
+static double filename_to_double(const char *filename)
+{
+	char *dot = NULL, *e = NULL;
+	unsigned long long int decimal;
+	char buf[64], *b;
+	long int val;
+	long int exponent;
+	double result;
+	ssize_t ret;
+	int fd;
+
+	fd = open(filename, O_RDONLY);
 	if (fd < 0)
-		return power->error = errno;
+		return NAN;
 
-	len = read(fd, buf, sizeof(buf));
+	ret = read(fd, buf, sizeof(buf) - 1);
 	close(fd);
+	if (ret < 1)
+		return NAN;
+
+	buf[ret] = '\0';
+
+	b = buf;
+	while (*b) {
+		if (*b == '.')
+			dot = b;
+		else if (*b == 'e')
+			e = b;
+		b++;
+	}
 
-	if (len < 0)
-		return power->error = errno;
+	if (!dot || !e)
+		return NAN;
 
-	buf[len] = '\0';
-	if (strtoull(buf, 0, 0) == 0)
-		return power->error = EINVAL;
+	*dot = '\0';
+	*e = '\0';
 
-	return 0;
+	/* Reduce precision to fit in long int. */
+	if ((e - dot) > 18)
+		dot[18] = '\0';
+
+	val = strtoll(buf, NULL, 10);
+	decimal = strtoull(++dot, NULL, 10);
+	exponent = strtoll(++e, NULL, 10);
+
+	result = (double)decimal;
+	result /= round(pow(10, strlen(dot)));
+	result += val;
+	result *= pow(10, exponent);
+
+	return result;
 }
 
-static uint64_t file_to_u64(const char *name)
+static double rapl_gpu_power_scale(void)
 {
-	char buf[4096];
-	int fd, len;
+	return filename_to_double("/sys/devices/power/events/energy-gpu.scale");
+}
 
-	sprintf(buf, "%s/%s", debugfs_dri_path, name);
-	fd = open(buf, 0);
-	if (fd < 0)
-		return 0;
+int power_init(struct power *power)
+{
+	uint64_t val;
 
-	len = read(fd, buf, sizeof(buf)-1);
-	close(fd);
+	memset(power, 0, sizeof(*power));
 
-	if (len < 0)
-		return 0;
+	power->fd = igt_perf_open(rapl_type_id(), rapl_gpu_power());
+	if (power->fd >= 0) {
+		power->rapl_scale = rapl_gpu_power_scale();
+
+		if (power->rapl_scale != NAN) {
+			power->rapl_scale *= 1e3; /* from nano to micro */
+			return 0;
+		}
+	}
 
-	buf[len] = '\0';
+	val = debugfs_file_to_u64("i915_energy_uJ");
+	if (val == -1)
+		return power->error = errno;
+	else if (val == 0)
+		return power->error = EINVAL;
 
-	return strtoull(buf, 0, 0);
+	return 0;
 }
 
 static uint64_t clock_ms_to_u64(void)
@@ -93,30 +171,30 @@ static uint64_t clock_ms_to_u64(void)
 	if (clock_gettime(CLOCK_MONOTONIC, &tv) < 0)
 		return 0;
 
-	return (uint64_t)tv.tv_sec * 1000 + tv.tv_nsec / 1000000;
+	return (uint64_t)tv.tv_sec * 1e3 + tv.tv_nsec / 1e6;
 }
 
 int power_update(struct power *power)
 {
-	struct power_stat *s = &power->stat[power->count++&1];
-	struct power_stat *d = &power->stat[power->count&1];
+	struct power_stat *s = &power->stat[power->count++ & 1];
+	struct power_stat *d = &power->stat[power->count & 1];
 	uint64_t d_time;
 
 	if (power->error)
 		return power->error;
 
-	if (power->fd != -1) {
+	if (power->fd >= 0) {
 		uint64_t data[2];
 		int len;
 
 		len = read(power->fd, data, sizeof(data));
-		if (len < 0)
+		if (len != sizeof(data))
 			return power->error = errno;
 
-		s->energy = data[0];
-		s->timestamp = data[1] / (1000*1000);
+		s->energy = llround((double)data[0] * power->rapl_scale);
+		s->timestamp = data[1] / 1e6;
 	} else {
-		s->energy = file_to_u64("i915_energy_uJ");
+		s->energy = debugfs_file_to_u64("i915_energy_uJ") / 1e3;
 		s->timestamp = clock_ms_to_u64();
 	}
 
@@ -124,7 +202,9 @@ int power_update(struct power *power)
 		return EAGAIN;
 
 	d_time = s->timestamp - d->timestamp;
-	power->power_mW = (s->energy - d->energy) / d_time;
+	power->power_mW = round((double)(s->energy - d->energy) *
+				(1e3f / d_time));
 	power->new_sample = 1;
+
 	return 0;
 }
diff --git a/overlay/power.h b/overlay/power.h
index bf8346ce46b4..28abfc32234b 100644
--- a/overlay/power.h
+++ b/overlay/power.h
@@ -39,6 +39,8 @@ struct power {
 	int new_sample;
 
 	uint64_t power_mW;
+
+	double rapl_scale;
 };
 
 int power_init(struct power *power);
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2017-10-10  9:30 ` [PATCH i-g-t 6/9] intel-gpu-overlay: Use RAPL PMU for power reading Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-10-10 12:37   ` Chris Wilson
  2017-10-10  9:30 ` [PATCH i-g-t 8/9] gem_wsim: Busy stats balancers Tvrtko Ursulin
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A bunch of tests for the new i915 PMU feature.

Parts of the code were initialy sketched by Dmitry Rogozhkin.

v2: (Most suggestions by Chris Wilson)
 * Add new class/instance based engine list.
 * Add gem_has_engine/gem_require_engine to work with class/instance.
 * Use the above two throughout the test.
 * Shorten tests to 100ms busy batches, seems enough.
 * Add queued counter sanity checks.
 * Use igt_nsec_elapsed.
 * Skip on perf -ENODEV in some tests instead of embedding knowledge locally.
 * Fix multi ordering for busy accounting.
 * Use new guranteed_usleep when sleep time is asserted on.
 * Check for no queued when idle/busy.
 * Add queued counter init test.
 * Add queued tests.
 * Consolidate and increase multiple busy engines tests to most-busy and
   all-busy tests.
 * Guarantte interrupts by using fences.
 * Test RC6 via forcewake.

v3:
 * Tweak assert in interrupts subtest.
 * Sprinkle of comments.
 * Fix multi-client test which got broken in v2.

v4:
 * Measured instead of guaranteed sleep.
 * Missing sync in no_sema.
 * Log busyness before asserts for debug.
 * access(2) instead of open(2) to determine if cpu0 is hotpluggable.
 * Test frequency reporting via min/max setting instead assuming.
   ^^ All above suggested by Chris Wilson. ^^
 * Drop queued subtests to match i915.
 * Use long batches with fences to ensure interrupts.
 * Test render node as well.

v5:
 * Add to meson build. (Petri Latvala)
 * Use 1eN constants. (Chris Wilson)
 * Add tests for semaphore and event waiting.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
---
 lib/igt_gt.c           |   50 ++
 lib/igt_gt.h           |   38 ++
 lib/igt_perf.h         |    9 +-
 tests/Makefile.am      |    1 +
 tests/Makefile.sources |    1 +
 tests/meson.build      |    1 +
 tests/perf_pmu.c       | 1238 ++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 1330 insertions(+), 8 deletions(-)
 create mode 100644 tests/perf_pmu.c

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index b3f3b3809eee..4c75811fb1b3 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -568,3 +568,53 @@ bool gem_can_store_dword(int fd, unsigned int engine)
 
 	return true;
 }
+
+const struct intel_execution_engine2 intel_execution_engines2[] = {
+	{ "rcs0", I915_ENGINE_CLASS_RENDER, 0 },
+	{ "bcs0", I915_ENGINE_CLASS_COPY, 0 },
+	{ "vcs0", I915_ENGINE_CLASS_VIDEO, 0 },
+	{ "vcs1", I915_ENGINE_CLASS_VIDEO, 1 },
+	{ "vecs0", I915_ENGINE_CLASS_VIDEO_ENHANCE, 0 },
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance)
+{
+	if (class != I915_ENGINE_CLASS_VIDEO)
+		igt_assert(instance == 0);
+	else
+		igt_assert(instance >= 0 && instance <= 1);
+
+	switch (class) {
+	case I915_ENGINE_CLASS_RENDER:
+		return I915_EXEC_RENDER;
+	case I915_ENGINE_CLASS_COPY:
+		return I915_EXEC_BLT;
+	case I915_ENGINE_CLASS_VIDEO:
+		if (instance == 0) {
+			if (gem_has_bsd2(gem_fd))
+				return I915_EXEC_BSD | I915_EXEC_BSD_RING1;
+			else
+				return I915_EXEC_BSD;
+
+		} else {
+			return I915_EXEC_BSD | I915_EXEC_BSD_RING2;
+		}
+	case I915_ENGINE_CLASS_VIDEO_ENHANCE:
+		return I915_EXEC_VEBOX;
+	case I915_ENGINE_CLASS_OTHER:
+	default:
+		igt_assert(0);
+	};
+}
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance)
+{
+	return gem_has_ring(gem_fd,
+			    gem_class_instance_to_eb_flags(gem_fd, class,
+							   instance));
+}
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index 2579cbd37be7..fb67ae1a7d1f 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -25,6 +25,7 @@
 #define IGT_GT_H
 
 #include "igt_debugfs.h"
+#include "igt_core.h"
 
 void igt_require_hang_ring(int fd, int ring);
 
@@ -80,4 +81,41 @@ extern const struct intel_execution_engine {
 
 bool gem_can_store_dword(int fd, unsigned int engine);
 
+extern const struct intel_execution_engine2 {
+	const char *name;
+	int class;
+	int instance;
+} intel_execution_engines2[];
+
+#define for_each_engine_class_instance(fd__, e__) \
+	for ((e__) = intel_execution_engines2;\
+	     (e__)->name; \
+	     (e__)++)
+
+enum drm_i915_gem_engine_class {
+	I915_ENGINE_CLASS_OTHER = 0,
+	I915_ENGINE_CLASS_RENDER = 1,
+	I915_ENGINE_CLASS_COPY = 2,
+	I915_ENGINE_CLASS_VIDEO = 3,
+	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
+	I915_ENGINE_CLASS_MAX /* non-ABI */
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance);
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance);
+
+static inline
+void gem_require_engine(int gem_fd,
+			enum drm_i915_gem_engine_class class,
+			unsigned int instance)
+{
+	igt_require(gem_has_engine(gem_fd, class, instance));
+}
+
 #endif /* IGT_GT_H */
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index b1f525739c69..5428feb0c746 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -29,14 +29,7 @@
 
 #include <linux/perf_event.h>
 
-enum drm_i915_gem_engine_class {
-	I915_ENGINE_CLASS_OTHER = 0,
-	I915_ENGINE_CLASS_RENDER = 1,
-	I915_ENGINE_CLASS_COPY = 2,
-	I915_ENGINE_CLASS_VIDEO = 3,
-	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
-	I915_ENGINE_CLASS_MAX /* non-ABI */
-};
+#include "igt_gt.h"
 
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 89a970153992..17ee1be08d8a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -131,6 +131,7 @@ gen7_forcewake_mt_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gen7_forcewake_mt_LDADD = $(LDADD) -lpthread
 gem_userptr_blits_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_userptr_blits_LDADD = $(LDADD) -lpthread
+perf_pmu_LDADD = $(LDADD) $(top_builddir)/lib/libigt_perf.la
 
 gem_wait_LDADD = $(LDADD) -lrt
 kms_flip_LDADD = $(LDADD) -lrt -lpthread
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index c4d320ebc61b..744eeeab9ef4 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -217,6 +217,7 @@ TESTS_progs = \
 	kms_vblank \
 	meta_test \
 	perf \
+	perf_pmu \
 	pm_backlight \
 	pm_lpsp \
 	pm_rc6_residency \
diff --git a/tests/meson.build b/tests/meson.build
index 6cb3584a4dd9..12d5706faaeb 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -197,6 +197,7 @@ test_progs = [
 	'kms_vblank',
 	'meta_test',
 	'perf',
+	'perf_pmu',
 	'pm_backlight',
 	'pm_lpsp',
 	'pm_rc6_residency',
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
new file mode 100644
index 000000000000..ba3d124691ac
--- /dev/null
+++ b/tests/perf_pmu.c
@@ -0,0 +1,1238 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/times.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <time.h>
+#include <poll.h>
+
+#include "igt.h"
+#include "igt_core.h"
+#include "igt_perf.h"
+#include "igt_sysfs.h"
+
+IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
+
+const double tolerance = 0.03f;
+const unsigned long batch_duration_ns = 100 * 1000 * 1000;
+
+static int open_pmu(uint64_t config)
+{
+	int fd;
+
+	fd = perf_i915_open(config);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static int open_group(uint64_t config, int group)
+{
+	int fd;
+
+	fd = perf_i915_open_group(config, group);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static void
+init(int gem_fd, const struct intel_execution_engine2 *e, uint8_t sample)
+{
+	int fd;
+
+	fd = open_pmu(__I915_PMU_ENGINE(e->class, e->instance, sample));
+
+	close(fd);
+}
+
+static uint64_t pmu_read_single(int fd)
+{
+	uint64_t data[2];
+
+	igt_assert_eq(read(fd, data, sizeof(data)), sizeof(data));
+
+	return data[0];
+}
+
+static void pmu_read_multi(int fd, unsigned int num, uint64_t *val)
+{
+	uint64_t buf[2 + num];
+	unsigned int i;
+
+	igt_assert_eq(read(fd, buf, sizeof(buf)), sizeof(buf));
+
+	for (i = 0; i < num; i++)
+		val[i] = buf[2 + i];
+}
+
+#define assert_within_epsilon(x, ref, tolerance) \
+	igt_assert_f((double)(x) <= (1.0 + tolerance) * (double)ref && \
+		     (double)(x) >= (1.0 - tolerance) * (double)ref, \
+		     "'%s' != '%s' (%f not within %f%% tolerance of %f)\n",\
+		     #x, #ref, (double)x, tolerance * 100.0, (double)ref)
+
+/*
+ * Helper for cases where we assert on time spent sleeping (directly or
+ * indirectly), so make it more robust by ensuring the system sleep time
+ * is within test tolerance to start with.
+ */
+static unsigned int measured_usleep(unsigned int usec)
+{
+	uint64_t slept = 0;
+
+	while (usec > 0) {
+		struct timespec start = { };
+		uint64_t this_sleep;
+
+		igt_nsec_elapsed(&start);
+		usleep(usec);
+		this_sleep = igt_nsec_elapsed(&start);
+		slept += this_sleep;
+		if (this_sleep > usec * 1000)
+			break;
+		usec -= this_sleep;
+	}
+
+	return slept;
+}
+
+static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
+}
+
+static void
+single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	double ref = busy ? batch_duration_ns : 0.0f;
+	igt_spin_t *spin;
+	uint64_t val;
+	int fd;
+
+	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	val = pmu_read_single(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+}
+
+static void log_busy(int fd, unsigned int num_engines, uint64_t *val)
+{
+	char buf[1024];
+	int rem = sizeof(buf);
+	unsigned int i;
+	char *p = buf;
+
+	for (i = 0; i < num_engines; i++) {
+		int len;
+
+		len = snprintf(p, rem, "%u=%" PRIu64 "\n",  i, val[i]);
+		igt_assert(len > 0);
+		rem -= len;
+		p += len;
+	}
+
+	igt_info("%s", buf);
+}
+
+static void
+busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+	       const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin;
+	unsigned int busy_idx, i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+		else if (e == e_)
+			busy_idx = i;
+
+		fd[i++] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							  e_->instance),
+				     fd[0]);
+	}
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	assert_within_epsilon(val[busy_idx], batch_duration_ns, tolerance);
+	for (i = 0; i < num_engines; i++) {
+		if (i == busy_idx)
+			continue;
+		assert_within_epsilon(val[i], 0.0f, tolerance);
+	}
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+}
+
+static void
+most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+		    const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int idle_idx, i;
+
+	gem_require_engine(gem_fd, e->class, e->instance);
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							e_->instance),
+				   fd[0]);
+
+		if (e == e_) {
+			idle_idx = i;
+		} else {
+			spin[i] = igt_spin_batch_new(gem_fd, 0,
+						     e2ring(gem_fd, e_), 0);
+			igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+		}
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			gem_sync(gem_fd, spin[i]->handle);
+	}
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++) {
+		if (i == idle_idx)
+			assert_within_epsilon(val[i], 0.0f, tolerance);
+		else
+			assert_within_epsilon(val[i], batch_duration_ns,
+					      tolerance);
+	}
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			igt_spin_batch_free(gem_fd, spin[i]);
+	}
+	close(fd[0]);
+}
+
+static void
+all_busy_check_all(int gem_fd, const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e) {
+		if (!gem_has_engine(gem_fd, e->class, e->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e->class, e->instance),
+				   fd[0]);
+
+		spin[i] = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++)
+		gem_sync(gem_fd, spin[i]->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++)
+		assert_within_epsilon(val[i], batch_duration_ns, tolerance);
+
+	for (i = 0; i < num_engines; i++)
+		igt_spin_batch_free(gem_fd, spin[i]);
+	close(fd[0]);
+}
+
+static void
+no_sema(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd;
+
+	fd = open_group(I915_PMU_ENGINE_SEMA(e->class, e->instance), -1);
+	open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, val);
+
+	assert_within_epsilon(val[0], 0.0f, tolerance);
+	assert_within_epsilon(val[1], 0.0f, tolerance);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+}
+
+#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
+#define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
+#define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+
+static void
+sema_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_relocation_entry reloc = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	struct drm_i915_gem_exec_object2 obj[2];
+	uint32_t bb_handle, obj_handle;
+	unsigned long slept;
+	uint32_t *obj_ptr;
+	uint32_t batch[6];
+	uint64_t val[2];
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 8);
+
+	/**
+	 * Setup up a batchbuffer with a polling semaphore wait command which
+	 * will wait on an value in a shared bo to change. This way we are able
+	 * to control how much time we will spend in this bb.
+	 */
+
+	bb_handle = gem_create(gem_fd, 4096);
+	obj_handle = gem_create(gem_fd, 4096);
+
+	obj_ptr = gem_mmap__wc(gem_fd, obj_handle, 0, 4096, PROT_WRITE);
+
+	batch[0] = MI_SEMAPHORE_WAIT |
+		   MI_SEMAPHORE_POLL |
+		   MI_SEMAPHORE_SAD_GTE_SDD;
+	batch[1] = 1;
+	batch[2] = 0x0;
+	batch[3] = 0x0;
+	batch[4] = MI_NOOP;
+	batch[5] = MI_BATCH_BUFFER_END;
+
+	gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+	reloc.target_handle = obj_handle;
+	reloc.offset = 2 * sizeof(uint32_t);
+	reloc.read_domains = I915_GEM_DOMAIN_RENDER;
+
+	memset(obj, 0, sizeof(obj));
+
+	obj[0].handle = obj_handle;
+
+	obj[1].handle = bb_handle;
+	obj[1].relocation_count = 1;
+	obj[1].relocs_ptr = to_user_pointer(&reloc);
+
+	eb.buffer_count = 2;
+	eb.buffers_ptr = to_user_pointer(obj);
+	eb.flags = e2ring(gem_fd, e);
+
+	/**
+	 * Start the semaphore wait PMU and after some known time let the above
+	 * semaphore wait command finish. Then check that the PMU is reporting
+	 * to expected time spent in semaphore wait state.
+	 */
+
+	fd = open_pmu(I915_PMU_ENGINE_SEMA(e->class, e->instance));
+
+	val[0] = pmu_read_single(fd);
+
+	gem_execbuf(gem_fd, &eb);
+
+	slept = measured_usleep(1e5);
+
+	*obj_ptr = 1;
+
+	gem_sync(gem_fd, bb_handle);
+
+	val[1] = pmu_read_single(fd);
+
+	munmap(obj_ptr, 4096);
+	gem_close(gem_fd, obj_handle);
+	gem_close(gem_fd, bb_handle);
+	close(fd);
+
+	assert_within_epsilon(val[1] - val[0], slept, tolerance);
+}
+
+#define   MI_WAIT_FOR_PIPE_C_VBLANK (1<<21)
+#define   MI_WAIT_FOR_PIPE_B_VBLANK (1<<11)
+#define   MI_WAIT_FOR_PIPE_A_VBLANK (1<<3)
+
+typedef struct {
+	igt_display_t display;
+	struct igt_fb primary_fb;
+	igt_output_t *output;
+	enum pipe pipe;
+} data_t;
+
+static void prepare_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	drmModeModeInfo *mode;
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	/* select the pipe we want to use */
+	igt_output_set_pipe(output, data->pipe);
+
+	/* create and set the primary plane fb */
+	mode = igt_output_get_mode(output);
+	igt_create_color_fb(fd, mode->hdisplay, mode->vdisplay,
+			    DRM_FORMAT_XRGB8888,
+			    LOCAL_DRM_FORMAT_MOD_NONE,
+			    0.0, 0.0, 0.0,
+			    &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, &data->primary_fb);
+
+	igt_display_commit(display);
+
+	igt_wait_for_vblank(fd, data->pipe);
+}
+
+static void cleanup_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	igt_remove_fb(fd, &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, NULL);
+
+	igt_output_set_pipe(output, PIPE_ANY);
+	igt_display_commit(display);
+}
+
+static int wait_vblank(int fd, union drm_wait_vblank *vbl)
+{
+	int err;
+
+	err = 0;
+	if (igt_ioctl(fd, DRM_IOCTL_WAIT_VBLANK, vbl))
+		err = -errno;
+
+	return err;
+}
+
+static void
+event_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_exec_object2 obj = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	data_t data;
+	igt_display_t *display = &data.display;
+	const uint32_t DERRMR = 0x44050;
+	unsigned int valid_tests = 0;
+	uint32_t batch[8], *b;
+	igt_output_t *output;
+	uint32_t bb_handle;
+	uint32_t reg;
+	enum pipe p;
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
+	igt_require(intel_register_access_init(intel_get_pci_device(),
+					       false, gem_fd) == 0);
+
+	/**
+	 * We will use the display to render event forwarind so need to
+	 * program the DERRMR register and restore it at exit.
+	 *
+	 * We will emit a MI_WAIT_FOR_EVENT listening for vblank events,
+	 * have a background helper to indirectly enable vblank irqs, and
+	 * listen to the recorded time spent in engine wait state as reported
+	 * by the PMU.
+	 */
+	reg = intel_register_read(DERRMR);
+
+	kmstest_set_vt_graphics_mode();
+	igt_display_init(&data.display, gem_fd);
+
+	bb_handle = gem_create(gem_fd, 4096);
+
+	b = batch;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
+	*b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg;
+	*b++ = MI_BATCH_BUFFER_END;
+
+	obj.handle = bb_handle;
+
+	eb.buffer_count = 1;
+	eb.buffers_ptr = to_user_pointer(&obj);
+	eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
+
+	for_each_pipe_with_valid_output(display, p, output) {
+		struct igt_helper_process waiter = { };
+		const unsigned int frames = 3;
+		unsigned int frame;
+		uint64_t val[2];
+
+		batch[3] = MI_WAIT_FOR_EVENT;
+		switch (p) {
+		case PIPE_A:
+			batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+			break;
+		case PIPE_B:
+			batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+			break;
+		case PIPE_C:
+			batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+			break;
+		default:
+			continue;
+		}
+
+		gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+		data.pipe = p;
+		prepare_crtc(&data, gem_fd, output);
+
+		fd = open_pmu(I915_PMU_ENGINE_WAIT(e->class, e->instance));
+
+		val[0] = pmu_read_single(fd);
+
+		igt_fork_helper(&waiter) {
+			const uint32_t pipe_id_flag =
+					kmstest_get_vbl_flag(data.pipe);
+
+			for (;;) {
+				union drm_wait_vblank vbl = { };
+
+				vbl.request.type = DRM_VBLANK_RELATIVE;
+				vbl.request.type |= pipe_id_flag;
+				vbl.request.sequence = 1;
+				igt_assert_eq(wait_vblank(gem_fd, &vbl), 0);
+			}
+		}
+
+		for (frame = 0; frame < frames; frame++) {
+			gem_execbuf(gem_fd, &eb);
+			gem_sync(gem_fd, bb_handle);
+		}
+
+		igt_stop_helper(&waiter);
+
+		val[1] = pmu_read_single(fd);
+
+		close(fd);
+
+		cleanup_crtc(&data, gem_fd, output);
+		valid_tests++;
+
+		igt_assert(val[1] - val[0] > 0);
+	}
+
+	gem_close(gem_fd, bb_handle);
+
+	intel_register_access_fini();
+
+	igt_require_f(valid_tests,
+		      "no valid crtc/connector combinations found\n");
+}
+
+static void
+multi_client(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	uint64_t config = I915_PMU_ENGINE_BUSY(e->class, e->instance);
+	unsigned int slept;
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd[2];
+
+	fd[0] = open_pmu(config);
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	usleep(batch_duration_ns / 3000);
+
+	/*
+	 * Second PMU client which is initialized after the first one,
+	 * and exists before it, should not affect accounting as reported
+	 * in the first client.
+	 */
+	fd[1] = open_pmu(config);
+	slept = measured_usleep(batch_duration_ns / 3000);
+	val[1] = pmu_read_single(fd[1]);
+	close(fd[1]);
+
+	gem_sync(gem_fd, spin->handle);
+
+	val[0] = pmu_read_single(fd[0]);
+
+	assert_within_epsilon(val[0], batch_duration_ns, tolerance);
+	assert_within_epsilon(val[1], slept, tolerance);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+}
+
+/**
+ * Tests that i915 PMU corectly errors out in invalid initialization.
+ * i915 PMU is uncore PMU, thus:
+ *  - sampling period is not supported
+ *  - pid > 0 is not supported since we can't count per-process (we count
+ *    per whole system)
+ *  - cpu != 0 is not supported since i915 PMU exposes cpumask for CPU0
+ */
+static void invalid_init(void)
+{
+	struct perf_event_attr attr;
+	int pid, cpu;
+
+#define ATTR_INIT() \
+do { \
+	memset(&attr, 0, sizeof (attr)); \
+	attr.config = I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0); \
+	attr.type = i915_type_id(); \
+	igt_assert(attr.type != 0); \
+} while(0)
+
+	ATTR_INIT();
+	attr.sample_period = 100;
+	pid = -1;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = 0;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = -1;
+	cpu = 1;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, ENODEV);
+}
+
+static void init_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	close(fd);
+}
+
+static void read_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	(void)pmu_read_single(fd);
+
+	close(fd);
+}
+
+static bool cpu0_hotplug_support(void)
+{
+	return access("/sys/devices/system/cpu/cpu0/online", W_OK) == 0;
+}
+
+static void cpu_hotplug(int gem_fd)
+{
+	struct timespec start = { };
+	igt_spin_t *spin;
+	uint64_t val, ref;
+	int fd;
+
+	igt_require(cpu0_hotplug_support());
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
+	igt_assert(fd >= 0);
+
+	igt_nsec_elapsed(&start);
+
+	/*
+	 * Toggle online status of all the CPUs in a child process and ensure
+	 * this has not affected busyness stats in the parent.
+	 */
+	igt_fork(child, 1) {
+		int cpu = 0;
+
+		for (;;) {
+			char name[128];
+			int cpufd;
+
+			sprintf(name, "/sys/devices/system/cpu/cpu%d/online",
+				cpu);
+			cpufd = open(name, O_WRONLY);
+			if (cpufd == -1) {
+				igt_assert(cpu > 0);
+				break;
+			}
+			igt_assert_eq(write(cpufd, "0", 2), 2);
+
+			usleep(1000 * 1000);
+
+			igt_assert_eq(write(cpufd, "1", 2), 2);
+
+			close(cpufd);
+			cpu++;
+		}
+	}
+
+	igt_waitchildren();
+
+	igt_spin_batch_end(spin);
+	gem_sync(gem_fd, spin->handle);
+
+	ref = igt_nsec_elapsed(&start);
+	val = pmu_read_single(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+}
+
+static unsigned long calibrate_nop(int fd, const unsigned int calibration_us)
+{
+	const unsigned int cal_min_us = calibration_us * 3;
+	const unsigned int tolerance_pct = 10;
+	const uint32_t bbe = 0xa << 23;
+	const unsigned int loops = 17;
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	struct timespec t_begin = { };
+	long size, last_size;
+	unsigned long ns;
+
+	igt_nsec_elapsed(&t_begin);
+
+	size = 256 * 1024;
+	do {
+		struct timespec t_start = { };
+
+		obj.handle = gem_create(fd, size);
+		gem_write(fd, obj.handle, size - sizeof(bbe), &bbe,
+			  sizeof(bbe));
+		gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		igt_nsec_elapsed(&t_start);
+
+		for (int loop = 0; loop < loops; loop++)
+			gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		ns = igt_nsec_elapsed(&t_start);
+
+		gem_close(fd, obj.handle);
+
+		last_size = size;
+		size = calibration_us * 1000 * size * loops / ns;
+		size = ALIGN(size, sizeof(uint32_t));
+	} while (igt_nsec_elapsed(&t_begin) / 1000 < cal_min_us ||
+		 abs(size - last_size) > (size * tolerance_pct / 100));
+
+	return size / sizeof(uint32_t);
+}
+
+static int chain_nop(int gem_fd, unsigned long sz, int in_fence, bool sync)
+{
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	const uint32_t bbe = 0xa << 23;
+
+	sz = ALIGN(sz, sizeof(uint32_t));
+
+	obj.handle = gem_create(gem_fd, sz);
+	gem_write(gem_fd, obj.handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
+
+	eb.flags = I915_EXEC_RENDER | I915_EXEC_FENCE_OUT;
+
+	if (in_fence >= 0) {
+		eb.flags |= I915_EXEC_FENCE_IN;
+		eb.rsvd2 = in_fence;
+	}
+
+	gem_execbuf_wr(gem_fd, &eb);
+
+	if (sync)
+		gem_sync(gem_fd, obj.handle);
+
+	gem_close(gem_fd, obj.handle);
+	if (in_fence >= 0)
+		close(in_fence);
+
+	return eb.rsvd2 >> 32;
+}
+
+static void
+test_interrupts(int gem_fd)
+{
+	const unsigned int calibration_us = 250000;
+	const unsigned int batch_len_us = 100000;
+	const unsigned int batch_count = 3e6 / batch_len_us;
+	uint64_t idle, busy, prev;
+	unsigned long cal, sz;
+	int fd, fence = -1;
+	unsigned int i;
+
+	cal = calibrate_nop(gem_fd, calibration_us);
+	sz = batch_len_us * cal / calibration_us;
+
+	fd = open_pmu(I915_PMU_INTERRUPTS);
+
+	gem_quiescent_gpu(gem_fd);
+
+	/* Wait for idle state. */
+	prev = pmu_read_single(fd);
+	idle = prev + 1;
+	while (idle != prev) {
+		usleep(1e6);
+		prev = idle;
+		idle = pmu_read_single(fd);
+	}
+
+	igt_assert_eq(idle - prev, 0);
+
+	/* Send some no-op batches with chained fences to ensure interrupts. */
+	for (i = 1; i <= batch_count; i++)
+		fence = chain_nop(gem_fd, sz, fence,
+				  i < batch_count ? false : true);
+
+	close(fence);
+
+	/* Check at least as many interrupts has been generated. */
+	busy = pmu_read_single(fd);
+	igt_assert(busy >= batch_count);
+
+	close(fd);
+}
+
+static void
+test_frequency(int gem_fd)
+{
+	const uint64_t duration_ns = 2e9;
+	uint32_t min_freq, max_freq, boost_freq;
+	uint64_t min[2], max[2], start[2];
+	igt_spin_t *spin;
+	int fd, sysfs;
+
+	sysfs = igt_sysfs_open(gem_fd, NULL);
+	igt_require(sysfs >= 0);
+
+	min_freq = igt_sysfs_get_u32(sysfs, "gt_RPn_freq_mhz");
+	max_freq = igt_sysfs_get_u32(sysfs, "gt_RP0_freq_mhz");
+	boost_freq = igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz");
+	igt_require(min_freq > 0 && max_freq > 0 && boost_freq > 0);
+	igt_require(max_freq > min_freq);
+	igt_require(boost_freq > min_freq);
+
+	fd = open_group(I915_PMU_REQUESTED_FREQUENCY, -1);
+	open_group(I915_PMU_ACTUAL_FREQUENCY, fd);
+
+	/*
+	 * Set GPU to min frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == min_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, min);
+	min[0] -= start[0];
+	min[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	usleep(1e6);
+
+	/*
+	 * Set GPU to max frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == max_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", boost_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == boost_freq);
+
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, max);
+	max[0] -= start[0];
+	max[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	/*
+	 * Restore min/max.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == min_freq);
+
+	close(fd);
+
+	igt_assert(min[0] < max[0]);
+	igt_assert(min[1] < max[1]);
+}
+
+static void
+test_rc6(int gem_fd)
+{
+	int64_t duration_ns = 2 * 1000 * 1000 * 1000;
+	uint64_t idle, busy, prev;
+	unsigned int slept;
+	int fd, fw;
+
+	fd = open_pmu(I915_PMU_RC6_RESIDENCY);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	prev = pmu_read_single(fd);
+	slept = measured_usleep(duration_ns / 1000);
+	idle = pmu_read_single(fd);
+
+	assert_within_epsilon(idle - prev, slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	prev = pmu_read_single(fd);
+	usleep(duration_ns / 1000);
+	busy = pmu_read_single(fd);
+
+	assert_within_epsilon(busy - prev, 0.0, tolerance);
+
+	close(fw);
+	close(fd);
+}
+
+static void
+test_rc6p(int gem_fd)
+{
+	int64_t duration_ns = 2 * 1000 * 1000 * 1000;
+	unsigned int num_pmu = 1;
+	uint64_t idle[3], busy[3], prev[3];
+	unsigned int slept, i;
+	int fd, ret, fw;
+
+	fd = open_group(I915_PMU_RC6_RESIDENCY, -1);
+	ret = perf_i915_open_group(I915_PMU_RC6p_RESIDENCY, fd);
+	if (ret > 0) {
+		num_pmu++;
+		ret = perf_i915_open_group(I915_PMU_RC6pp_RESIDENCY, fd);
+		if (ret > 0)
+			num_pmu++;
+	}
+
+	igt_require(num_pmu == 3);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	pmu_read_multi(fd, num_pmu, prev);
+	slept = measured_usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, idle);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(idle[i] - prev[i], slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	pmu_read_multi(fd, num_pmu, prev);
+	usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, busy);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(busy[i] - prev[i], 0.0, tolerance);
+
+	close(fw);
+	close(fd);
+}
+
+igt_main
+{
+	const unsigned int num_other_metrics =
+				I915_PMU_LAST - __I915_PMU_OTHER(0) + 1;
+	unsigned int num_engines = 0;
+	int fd = -1;
+	const struct intel_execution_engine2 *e;
+	unsigned int i;
+
+	igt_fixture {
+		fd = drm_open_driver_master(DRIVER_INTEL);
+
+		igt_require_gem(fd);
+		igt_require(i915_type_id() > 0);
+
+		for_each_engine_class_instance(fd, e) {
+			if (gem_has_engine(fd, e->class, e->instance))
+				num_engines++;
+		}
+	}
+
+	/**
+	 * Test invalid access via perf API is rejected.
+	 */
+	igt_subtest("invalid-init")
+		invalid_init();
+
+	for_each_engine_class_instance(fd, e) {
+		/**
+		 * Test that a single engine metric can be initialized.
+		 */
+		igt_subtest_f("init-busy-%s", e->name)
+			init(fd, e, I915_SAMPLE_BUSY);
+
+		igt_subtest_f("init-wait-%s", e->name)
+			init(fd, e, I915_SAMPLE_WAIT);
+
+		igt_subtest_f("init-sema-%s", e->name)
+			init(fd, e, I915_SAMPLE_SEMA);
+
+		/**
+		 * Test that engines show no load when idle.
+		 */
+		igt_subtest_f("idle-%s", e->name)
+			single(fd, e, false);
+
+		/**
+		 * Test that a single engine reports load correctly.
+		 */
+		igt_subtest_f("busy-%s", e->name)
+			single(fd, e, true);
+
+		/**
+		 * Test that when one engine is loaded other report no load.
+		 */
+		igt_subtest_f("busy-check-all-%s", e->name)
+			busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that when all except one engine are loaded all loads
+		 * are correctly reported.
+		 */
+		igt_subtest_f("most-busy-check-all-%s", e->name)
+			most_busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that semphore counters report no activity on idle
+		 * or busy engines.
+		 */
+		igt_subtest_f("idle-no-semaphores-%s", e->name)
+			no_sema(fd, e, false);
+
+		igt_subtest_f("busy-no-semaphores-%s", e->name)
+			no_sema(fd, e, true);
+
+		/**
+		 * Test that semaphore waits are correctly reported.
+		 */
+		igt_subtest_f("semaphore-wait-%s", e->name)
+			sema_wait(fd, e);
+
+		/**
+		 * Test that event waits are correctly reported.
+		 */
+		if (e->class == I915_ENGINE_CLASS_RENDER)
+			igt_subtest_f("event-wait-%s", e->name)
+				event_wait(fd, e);
+
+		/**
+		 * Check that two perf clients do not influence each others
+		 * observations.
+		 */
+		igt_subtest_f("multi-client-%s", e->name)
+			multi_client(fd, e);
+	}
+
+	/**
+	 * Test that when all engines are loaded all loads are
+	 * correctly reported.
+	 */
+	igt_subtest("all-busy-check-all")
+		all_busy_check_all(fd, num_engines);
+
+	/**
+	 * Test that non-engine counters can be initialized and read. Apart
+	 * from the invalid metric which should fail.
+	 */
+	for (i = 0; i < num_other_metrics + 1; i++) {
+		igt_subtest_f("other-init-%u", i)
+			init_other(i, i < num_other_metrics);
+
+		igt_subtest_f("other-read-%u", i)
+			read_other(i, i < num_other_metrics);
+	}
+
+	/**
+	 * Test counters are not affected by CPU offline/online events.
+	 */
+	igt_subtest("cpu-hotplug")
+		cpu_hotplug(fd);
+
+	/**
+	 * Test GPU frequency.
+	 */
+	igt_subtest("frequency")
+		test_frequency(fd);
+
+	/**
+	 * Test interrupt count reporting.
+	 */
+	igt_subtest("interrupts")
+		test_interrupts(fd);
+
+	/**
+	 * Test RC6 residency reporting.
+	 */
+	igt_subtest("rc6")
+		test_rc6(fd);
+
+	/**
+	 * Test RC6p residency reporting.
+	 */
+	igt_subtest("rc6p")
+		test_rc6p(fd);
+
+	/**
+	 * Check render nodes are counted.
+	 */
+	igt_subtest_group {
+		int render_fd;
+
+		igt_fixture {
+			render_fd = drm_open_driver_render(DRIVER_INTEL);
+			igt_require_gem(render_fd);
+
+			gem_quiescent_gpu(fd);
+		}
+
+		for_each_engine_class_instance(fd, e) {
+			igt_subtest_f("render-node-busy-%s", e->name)
+				single(fd, e, true);
+		}
+
+		igt_fixture {
+			close(render_fd);
+		}
+	}
+}
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 8/9] gem_wsim: Busy stats balancers
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2017-10-10  9:30 ` [PATCH i-g-t 7/9] tests/perf_pmu: Tests for i915 PMU API Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-11-21 19:37   ` [PATCH i-g-t v3 " Tvrtko Ursulin
  2017-10-10  9:30 ` [PATCH i-g-t 9/9] media-bench.pl: Add busy balancers to the list Tvrtko Ursulin
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add busy and busy-avg balancers which make balancing decisions by looking
at engine busyness via the i915 PMU.

And thus are able to make decisions on the actual instantaneous load of
the system, and not use metrics that lag behind by a batch or two. In
doing so, each client should be able to greedily maximise their own
usage of the system, leading to improved load balancing even in the face
of other uncooperative clients. On the other hand, we are only using the
instantaneous load without coupling in the predictive factor for dispatch
and execution length.

v2:
 * Commit text. (Chris Wilson)
 * Rename get_stats to get_pmu_stats. (Chris Wilson)
 * Fix PMU readout in VCS remap mode.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 benchmarks/Makefile.am |   2 +-
 benchmarks/gem_wsim.c  | 142 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/benchmarks/Makefile.am b/benchmarks/Makefile.am
index d066112a32a2..a81a55e01697 100644
--- a/benchmarks/Makefile.am
+++ b/benchmarks/Makefile.am
@@ -21,7 +21,7 @@ gem_latency_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_latency_LDADD = $(LDADD) -lpthread
 gem_syslatency_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_syslatency_LDADD = $(LDADD) -lpthread -lrt
-gem_wsim_LDADD = $(LDADD) -lpthread
+gem_wsim_LDADD = $(LDADD) $(top_builddir)/lib/libigt_perf.la -lpthread
 
 EXTRA_DIST= \
 	README \
diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 82fe6ba9ec5f..8b2cd90659a9 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -50,6 +50,7 @@
 #include "intel_io.h"
 #include "igt_aux.h"
 #include "igt_rand.h"
+#include "igt_perf.h"
 #include "sw_sync.h"
 
 #include "ewma.h"
@@ -188,6 +189,16 @@ struct workload
 			uint32_t last[NUM_ENGINES];
 		} rt;
 	};
+
+	struct busy_balancer {
+		int fd;
+		bool first;
+		unsigned int num_engines;
+		unsigned int engine_map[5];
+		uint64_t t_prev;
+		uint64_t prev[5];
+		double busy[5];
+	} busy_balancer;
 };
 
 static const unsigned int nop_calibration_us = 1000;
@@ -993,6 +1004,8 @@ struct workload_balancer {
 	unsigned int flags;
 	unsigned int min_gen;
 
+	int (*init)(const struct workload_balancer *balancer,
+		    struct workload *wrk);
 	unsigned int (*get_qd)(const struct workload_balancer *balancer,
 			       struct workload *wrk,
 			       enum intel_engine_id engine);
@@ -1242,6 +1255,108 @@ context_balance(const struct workload_balancer *balancer,
 	return get_vcs_engine(wrk->ctx_list[w->context].static_vcs);
 }
 
+static unsigned int
+get_engine_busy(const struct workload_balancer *balancer,
+		struct workload *wrk, enum intel_engine_id engine)
+{
+	struct busy_balancer *bb = &wrk->busy_balancer;
+
+	if (engine == VCS2 && (wrk->flags & VCS2REMAP))
+		engine = BCS;
+
+	return bb->busy[bb->engine_map[engine]];
+}
+
+static void
+get_pmu_stats(const struct workload_balancer *b, struct workload *wrk)
+{
+	struct busy_balancer *bb = &wrk->busy_balancer;
+	uint64_t val[7];
+	unsigned int i;
+
+	igt_assert_eq(read(bb->fd, val, sizeof(val)),
+		      (2 + bb->num_engines) * sizeof(uint64_t));
+
+	if (!bb->first) {
+		for (i = 0; i < bb->num_engines; i++) {
+			double d;
+
+			d = (val[2 + i] - bb->prev[i]) * 100;
+			d /= val[1] - bb->t_prev;
+			bb->busy[i] = d;
+		}
+	}
+
+	for (i = 0; i < bb->num_engines; i++)
+		bb->prev[i] = val[2 + i];
+
+	bb->t_prev = val[1];
+	bb->first = false;
+}
+
+static enum intel_engine_id
+busy_avg_balance(const struct workload_balancer *balancer,
+		 struct workload *wrk, struct w_step *w)
+{
+	get_pmu_stats(balancer, wrk);
+
+	return qdavg_balance(balancer, wrk, w);
+}
+
+static enum intel_engine_id
+busy_balance(const struct workload_balancer *balancer,
+	     struct workload *wrk, struct w_step *w)
+{
+	get_pmu_stats(balancer, wrk);
+
+	return qd_balance(balancer, wrk, w);
+}
+
+static int
+busy_init(const struct workload_balancer *balancer, struct workload *wrk)
+{
+	struct busy_balancer *bb = &wrk->busy_balancer;
+	struct engine_desc {
+		unsigned class, inst;
+		enum intel_engine_id id;
+	} *d, engines[] = {
+		{ I915_ENGINE_CLASS_RENDER, 0, RCS },
+		{ I915_ENGINE_CLASS_COPY, 0, BCS },
+		{ I915_ENGINE_CLASS_VIDEO, 0, VCS1 },
+		{ I915_ENGINE_CLASS_VIDEO, 1, VCS2 },
+		{ I915_ENGINE_CLASS_VIDEO_ENHANCE, 0, VECS },
+		{ 0, 0, VCS }
+	};
+
+	bb->num_engines = 0;
+	bb->first = true;
+	bb->fd = -1;
+
+	for (d = &engines[0]; d->id != VCS; d++) {
+		int pfd;
+
+		pfd = perf_i915_open_group(I915_PMU_ENGINE_BUSY(d->class,
+							        d->inst),
+					   bb->fd);
+		if (pfd < 0) {
+			if (d->id != VCS2)
+				return -(10 + bb->num_engines);
+			else
+				continue;
+		}
+
+		if (bb->num_engines == 0)
+			bb->fd = pfd;
+
+		bb->engine_map[d->id] = bb->num_engines++;
+	}
+
+	if (bb->num_engines < 5 && !(wrk->flags & VCS2REMAP))
+		return -1;
+
+	return 0;
+}
+
 static const struct workload_balancer all_balancers[] = {
 	{
 		.id = 0,
@@ -1315,6 +1430,22 @@ static const struct workload_balancer all_balancers[] = {
 		.desc = "Static round-robin VCS assignment at context creation.",
 		.balance = context_balance,
 	},
+	{
+		.id = 9,
+		.name = "busy",
+		.desc = "Engine busyness based balancing.",
+		.init = busy_init,
+		.get_qd = get_engine_busy,
+		.balance = busy_balance,
+	},
+	{
+		.id = 10,
+		.name = "busy-avg",
+		.desc = "Average engine busyness based balancing.",
+		.init = busy_init,
+		.get_qd = get_engine_busy,
+		.balance = busy_avg_balance,
+	},
 };
 
 static unsigned int
@@ -2226,6 +2357,17 @@ int main(int argc, char **argv)
 				    (verbose > 0 && master_workload == i);
 
 		prepare_workload(i, w[i], flags_);
+
+		if (balancer && balancer->init) {
+			int ret = balancer->init(balancer, w[i]);
+			if (ret) {
+				if (verbose)
+					fprintf(stderr,
+						"Failed to initialize balancing! (%u=%d)\n",
+						i, ret);
+				return 1;
+			}
+		}
 	}
 
 	gem_quiescent_gpu(fd);
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 9/9] media-bench.pl: Add busy balancers to the list
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2017-10-10  9:30 ` [PATCH i-g-t 8/9] gem_wsim: Busy stats balancers Tvrtko Ursulin
@ 2017-10-10  9:30 ` Tvrtko Ursulin
  2017-11-21 11:51   ` Chris Wilson
  2017-10-10  9:42 ` ✗ Fi.CI.BAT: failure for IGT PMU support (rev7) Patchwork
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10  9:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 scripts/media-bench.pl | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/scripts/media-bench.pl b/scripts/media-bench.pl
index 0956ef0a0621..78f45199e95d 100755
--- a/scripts/media-bench.pl
+++ b/scripts/media-bench.pl
@@ -47,8 +47,9 @@ my $nop;
 my %opts;
 
 my @balancers = ( 'rr', 'rand', 'qd', 'qdr', 'qdavg', 'rt', 'rtr', 'rtavg',
-		  'context' );
-my %bal_skip_H = ( 'rr' => 1, 'rand' => 1, 'context' => 1 );
+		  'context', 'busy', 'busy-avg' );
+my %bal_skip_H = ( 'rr' => 1, 'rand' => 1, 'context' => 1, , 'busy' => 1,
+		   'busy-avg' => 1 );
 my %bal_skip_R = ( 'context' => 1 );
 
 my @workloads = (
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* ✗ Fi.CI.BAT: failure for IGT PMU support (rev7)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  2017-10-10  9:30 ` [PATCH i-g-t 9/9] media-bench.pl: Add busy balancers to the list Tvrtko Ursulin
@ 2017-10-10  9:42 ` Patchwork
  2017-10-10 12:06 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev8) Patchwork
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2017-10-10  9:42 UTC (permalink / raw)
  To: Petri Latvala; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev7)
URL   : https://patchwork.freedesktop.org/series/28253/
State : failure

== Summary ==

IGT patchset build failed on latest successful build
d7c88290ab6a8393dc341b30c7fb5e27d2952901 syncobj: Add a test for SYNCOBJ_CREATE_SIGNALED

make  all-recursive
Making all in lib
make  all-recursive
Making all in .
Making all in tests
make[4]: Nothing to be done for 'all'.
Making all in man
make[2]: Nothing to be done for 'all'.
Making all in tools
Making all in null_state_gen
make[3]: Nothing to be done for 'all'.
Making all in registers
make[3]: Nothing to be done for 'all'.
make[3]: Nothing to be done for 'all-am'.
Making all in scripts
make[2]: Nothing to be done for 'all'.
Making all in benchmarks
Making all in wsim
make[3]: Nothing to be done for 'all'.
Making all in ezbench.d
make[3]: Nothing to be done for 'all'.
make[3]: Nothing to be done for 'all-am'.
Making all in tests
Making all in intel-ci
make[3]: Nothing to be done for 'all'.
make[3]: Nothing to be done for 'all-am'.
Making all in assembler
make  all-recursive
Making all in doc
make[4]: Nothing to be done for 'all'.
Making all in test
make[4]: Nothing to be done for 'all'.
make[4]: Nothing to be done for 'all-am'.
Making all in overlay
  CCLD     intel-gpu-overlay
/usr/bin/ld: power.o: undefined reference to symbol 'llround@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libm.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
Makefile:564: recipe for target 'intel-gpu-overlay' failed
make[2]: *** [intel-gpu-overlay] Error 1
Makefile:533: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
Makefile:465: recipe for target 'all' failed
make: *** [all] Error 2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v2 6/9] intel-gpu-overlay: Use RAPL PMU for power reading
  2017-10-10  9:30 ` [PATCH i-g-t 6/9] intel-gpu-overlay: Use RAPL PMU for power reading Tvrtko Ursulin
@ 2017-10-10 11:30   ` Tvrtko Ursulin
  2017-10-10 12:05     ` [PATCH i-g-t v3 " Tvrtko Ursulin
  0 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10 11:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Wire up to the RAPL PMU for GPU energy readings.

The only complication is that we have to add code to parse:

 # cat /sys/devices/power/events/energy-gpu.scale
 2.3283064365386962890625e-10

v2: Link with -lm.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 lib/igt_perf.c      |  16 ++++--
 lib/igt_perf.h      |   1 +
 overlay/Makefile.am |   2 +-
 overlay/power.c     | 156 +++++++++++++++++++++++++++++++++++++++-------------
 overlay/power.h     |   2 +
 5 files changed, 134 insertions(+), 43 deletions(-)

diff --git a/lib/igt_perf.c b/lib/igt_perf.c
index 208474302fcc..0221461e918f 100644
--- a/lib/igt_perf.c
+++ b/lib/igt_perf.c
@@ -27,11 +27,12 @@ uint64_t i915_type_id(void)
 	return strtoull(buf, NULL, 0);
 }
 
-static int _perf_open(uint64_t config, int group, uint64_t format)
+static int
+_perf_open(uint64_t type, uint64_t config, int group, uint64_t format)
 {
 	struct perf_event_attr attr = { };
 
-	attr.type = i915_type_id();
+	attr.type = type;
 	if (attr.type == 0)
 		return -ENOENT;
 
@@ -46,11 +47,18 @@ static int _perf_open(uint64_t config, int group, uint64_t format)
 
 int perf_i915_open(uint64_t config)
 {
-	return _perf_open(config, -1, PERF_FORMAT_TOTAL_TIME_ENABLED);
+	return _perf_open(i915_type_id(), config, -1,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED);
 }
 
 int perf_i915_open_group(uint64_t config, int group)
 {
-	return _perf_open(config, group,
+	return _perf_open(i915_type_id(), config, group,
 			  PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_GROUP);
 }
+
+int igt_perf_open(uint64_t type, uint64_t config)
+{
+	return _perf_open(type, config, -1,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED);
+}
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index 285823786324..b1f525739c69 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -99,5 +99,6 @@ perf_event_open(struct perf_event_attr *attr,
 uint64_t i915_type_id(void);
 int perf_i915_open(uint64_t config);
 int perf_i915_open_group(uint64_t config, int group);
+int igt_perf_open(uint64_t type, uint64_t config);
 
 #endif /* I915_PERF_H */
diff --git a/overlay/Makefile.am b/overlay/Makefile.am
index cefde2d040f8..f49f54ac3590 100644
--- a/overlay/Makefile.am
+++ b/overlay/Makefile.am
@@ -63,7 +63,7 @@ intel_gpu_overlay_SOURCES += \
 
 intel_gpu_overlay_SOURCES += $(both_x11_sources)
 
-intel_gpu_overlay_LDADD = $(LDADD) -lrt
+intel_gpu_overlay_LDADD = $(LDADD) -lrt -lm
 
 EXTRA_DIST= \
 	README \
diff --git a/overlay/power.c b/overlay/power.c
index 805f4ca7805c..35e446e6bce5 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -30,60 +30,138 @@
 #include <fcntl.h>
 #include <time.h>
 #include <errno.h>
+#include <ctype.h>
+#include <math.h>
 
 #include "igt_perf.h"
 
 #include "power.h"
 #include "debugfs.h"
 
-/* XXX Is this exposed through RAPL? */
+static uint64_t filename_to_u64(const char *filename, int base)
+{
+	char buf[64], *b;
+	ssize_t ret;
+	int fd;
 
-int power_init(struct power *power)
+	fd = open(filename, O_RDONLY);
+	if (fd < 0)
+		return 0;
+
+	ret = read(fd, buf, sizeof(buf) - 1);
+	close(fd);
+	if (ret < 1)
+		return 0;
+
+	buf[ret] = '\0';
+
+	b = buf;
+	while (*b && !isdigit(*b))
+		b++;
+
+	return strtoull(b, NULL, base);
+}
+
+static uint64_t debugfs_file_to_u64(const char *name)
 {
-	char buf[4096];
-	int fd, len;
+	char buf[1024];
 
-	memset(power, 0, sizeof(*power));
+	snprintf(buf, sizeof(buf), "%s/%s", debugfs_dri_path, name);
+
+	return filename_to_u64(buf, 0);
+}
 
-	power->fd = -1;
+static uint64_t rapl_type_id(void)
+{
+	return filename_to_u64("/sys/devices/power/type", 10);
+}
 
-	sprintf(buf, "%s/i915_energy_uJ", debugfs_dri_path);
-	fd = open(buf, 0);
+static uint64_t rapl_gpu_power(void)
+{
+	return filename_to_u64("/sys/devices/power/events/energy-gpu", 0);
+}
+
+static double filename_to_double(const char *filename)
+{
+	char *dot = NULL, *e = NULL;
+	unsigned long long int decimal;
+	char buf[64], *b;
+	long int val;
+	long int exponent;
+	double result;
+	ssize_t ret;
+	int fd;
+
+	fd = open(filename, O_RDONLY);
 	if (fd < 0)
-		return power->error = errno;
+		return NAN;
 
-	len = read(fd, buf, sizeof(buf));
+	ret = read(fd, buf, sizeof(buf) - 1);
 	close(fd);
+	if (ret < 1)
+		return NAN;
+
+	buf[ret] = '\0';
+
+	b = buf;
+	while (*b) {
+		if (*b == '.')
+			dot = b;
+		else if (*b == 'e')
+			e = b;
+		b++;
+	}
 
-	if (len < 0)
-		return power->error = errno;
+	if (!dot || !e)
+		return NAN;
 
-	buf[len] = '\0';
-	if (strtoull(buf, 0, 0) == 0)
-		return power->error = EINVAL;
+	*dot = '\0';
+	*e = '\0';
 
-	return 0;
+	/* Reduce precision to fit in long int. */
+	if ((e - dot) > 18)
+		dot[18] = '\0';
+
+	val = strtoll(buf, NULL, 10);
+	decimal = strtoull(++dot, NULL, 10);
+	exponent = strtoll(++e, NULL, 10);
+
+	result = (double)decimal;
+	result /= round(pow(10, strlen(dot)));
+	result += val;
+	result *= pow(10, exponent);
+
+	return result;
 }
 
-static uint64_t file_to_u64(const char *name)
+static double rapl_gpu_power_scale(void)
 {
-	char buf[4096];
-	int fd, len;
+	return filename_to_double("/sys/devices/power/events/energy-gpu.scale");
+}
 
-	sprintf(buf, "%s/%s", debugfs_dri_path, name);
-	fd = open(buf, 0);
-	if (fd < 0)
-		return 0;
+int power_init(struct power *power)
+{
+	uint64_t val;
 
-	len = read(fd, buf, sizeof(buf)-1);
-	close(fd);
+	memset(power, 0, sizeof(*power));
 
-	if (len < 0)
-		return 0;
+	power->fd = igt_perf_open(rapl_type_id(), rapl_gpu_power());
+	if (power->fd >= 0) {
+		power->rapl_scale = rapl_gpu_power_scale();
+
+		if (power->rapl_scale != NAN) {
+			power->rapl_scale *= 1e3; /* from nano to micro */
+			return 0;
+		}
+	}
 
-	buf[len] = '\0';
+	val = debugfs_file_to_u64("i915_energy_uJ");
+	if (val == -1)
+		return power->error = errno;
+	else if (val == 0)
+		return power->error = EINVAL;
 
-	return strtoull(buf, 0, 0);
+	return 0;
 }
 
 static uint64_t clock_ms_to_u64(void)
@@ -93,30 +171,30 @@ static uint64_t clock_ms_to_u64(void)
 	if (clock_gettime(CLOCK_MONOTONIC, &tv) < 0)
 		return 0;
 
-	return (uint64_t)tv.tv_sec * 1000 + tv.tv_nsec / 1000000;
+	return (uint64_t)tv.tv_sec * 1e3 + tv.tv_nsec / 1e6;
 }
 
 int power_update(struct power *power)
 {
-	struct power_stat *s = &power->stat[power->count++&1];
-	struct power_stat *d = &power->stat[power->count&1];
+	struct power_stat *s = &power->stat[power->count++ & 1];
+	struct power_stat *d = &power->stat[power->count & 1];
 	uint64_t d_time;
 
 	if (power->error)
 		return power->error;
 
-	if (power->fd != -1) {
+	if (power->fd >= 0) {
 		uint64_t data[2];
 		int len;
 
 		len = read(power->fd, data, sizeof(data));
-		if (len < 0)
+		if (len != sizeof(data))
 			return power->error = errno;
 
-		s->energy = data[0];
-		s->timestamp = data[1] / (1000*1000);
+		s->energy = llround((double)data[0] * power->rapl_scale);
+		s->timestamp = data[1] / 1e6;
 	} else {
-		s->energy = file_to_u64("i915_energy_uJ");
+		s->energy = debugfs_file_to_u64("i915_energy_uJ") / 1e3;
 		s->timestamp = clock_ms_to_u64();
 	}
 
@@ -124,7 +202,9 @@ int power_update(struct power *power)
 		return EAGAIN;
 
 	d_time = s->timestamp - d->timestamp;
-	power->power_mW = (s->energy - d->energy) / d_time;
+	power->power_mW = round((double)(s->energy - d->energy) *
+				(1e3f / d_time));
 	power->new_sample = 1;
+
 	return 0;
 }
diff --git a/overlay/power.h b/overlay/power.h
index bf8346ce46b4..28abfc32234b 100644
--- a/overlay/power.h
+++ b/overlay/power.h
@@ -39,6 +39,8 @@ struct power {
 	int new_sample;
 
 	uint64_t power_mW;
+
+	double rapl_scale;
 };
 
 int power_init(struct power *power);
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v3 6/9] intel-gpu-overlay: Use RAPL PMU for power reading
  2017-10-10 11:30   ` [PATCH i-g-t v2 " Tvrtko Ursulin
@ 2017-10-10 12:05     ` Tvrtko Ursulin
  2017-10-10 12:25       ` Chris Wilson
  2017-11-21 19:35       ` [PATCH i-g-t v4 " Tvrtko Ursulin
  0 siblings, 2 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10 12:05 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Wire up to the RAPL PMU for GPU energy readings.

The only complication is that we have to add code to parse:

 # cat /sys/devices/power/events/energy-gpu.scale
 2.3283064365386962890625e-10

v2: Link with -lm.
v3: strtod can handle scientific notation, even though my initial
    reading of the man page did not spot that. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 lib/igt_perf.c      |  16 +++++--
 lib/igt_perf.h      |   1 +
 overlay/Makefile.am |   2 +-
 overlay/power.c     | 127 ++++++++++++++++++++++++++++++++++++----------------
 overlay/power.h     |   2 +
 5 files changed, 104 insertions(+), 44 deletions(-)

diff --git a/lib/igt_perf.c b/lib/igt_perf.c
index 208474302fcc..0221461e918f 100644
--- a/lib/igt_perf.c
+++ b/lib/igt_perf.c
@@ -27,11 +27,12 @@ uint64_t i915_type_id(void)
 	return strtoull(buf, NULL, 0);
 }
 
-static int _perf_open(uint64_t config, int group, uint64_t format)
+static int
+_perf_open(uint64_t type, uint64_t config, int group, uint64_t format)
 {
 	struct perf_event_attr attr = { };
 
-	attr.type = i915_type_id();
+	attr.type = type;
 	if (attr.type == 0)
 		return -ENOENT;
 
@@ -46,11 +47,18 @@ static int _perf_open(uint64_t config, int group, uint64_t format)
 
 int perf_i915_open(uint64_t config)
 {
-	return _perf_open(config, -1, PERF_FORMAT_TOTAL_TIME_ENABLED);
+	return _perf_open(i915_type_id(), config, -1,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED);
 }
 
 int perf_i915_open_group(uint64_t config, int group)
 {
-	return _perf_open(config, group,
+	return _perf_open(i915_type_id(), config, group,
 			  PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_GROUP);
 }
+
+int igt_perf_open(uint64_t type, uint64_t config)
+{
+	return _perf_open(type, config, -1,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED);
+}
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index 285823786324..b1f525739c69 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -99,5 +99,6 @@ perf_event_open(struct perf_event_attr *attr,
 uint64_t i915_type_id(void);
 int perf_i915_open(uint64_t config);
 int perf_i915_open_group(uint64_t config, int group);
+int igt_perf_open(uint64_t type, uint64_t config);
 
 #endif /* I915_PERF_H */
diff --git a/overlay/Makefile.am b/overlay/Makefile.am
index cefde2d040f8..f49f54ac3590 100644
--- a/overlay/Makefile.am
+++ b/overlay/Makefile.am
@@ -63,7 +63,7 @@ intel_gpu_overlay_SOURCES += \
 
 intel_gpu_overlay_SOURCES += $(both_x11_sources)
 
-intel_gpu_overlay_LDADD = $(LDADD) -lrt
+intel_gpu_overlay_LDADD = $(LDADD) -lrt -lm
 
 EXTRA_DIST= \
 	README \
diff --git a/overlay/power.c b/overlay/power.c
index 805f4ca7805c..9ac90fde8786 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -30,60 +30,107 @@
 #include <fcntl.h>
 #include <time.h>
 #include <errno.h>
+#include <ctype.h>
+#include <math.h>
 
 #include "igt_perf.h"
 
 #include "power.h"
 #include "debugfs.h"
 
-/* XXX Is this exposed through RAPL? */
-
-int power_init(struct power *power)
+static int
+filename_to_buf(const char *filename, char *buf, unsigned int bufsize)
 {
-	char buf[4096];
-	int fd, len;
-
-	memset(power, 0, sizeof(*power));
-
-	power->fd = -1;
+	int fd;
+	ssize_t ret;
 
-	sprintf(buf, "%s/i915_energy_uJ", debugfs_dri_path);
-	fd = open(buf, 0);
+	fd = open(filename, O_RDONLY);
 	if (fd < 0)
-		return power->error = errno;
+		return -1;
 
-	len = read(fd, buf, sizeof(buf));
+	ret = read(fd, buf, bufsize - 1);
 	close(fd);
+	if (ret < 1)
+		return -1;
 
-	if (len < 0)
-		return power->error = errno;
-
-	buf[len] = '\0';
-	if (strtoull(buf, 0, 0) == 0)
-		return power->error = EINVAL;
+	buf[ret] = '\0';
 
 	return 0;
 }
 
-static uint64_t file_to_u64(const char *name)
+static uint64_t filename_to_u64(const char *filename, int base)
 {
-	char buf[4096];
-	int fd, len;
+	char buf[64], *b;
 
-	sprintf(buf, "%s/%s", debugfs_dri_path, name);
-	fd = open(buf, 0);
-	if (fd < 0)
+	if (filename_to_buf(filename, buf, sizeof(buf)))
 		return 0;
 
-	len = read(fd, buf, sizeof(buf)-1);
-	close(fd);
+	/*
+	 * Handle both single integer and key=value formats by skipping
+	 * leading non-digits.
+	 */
+	b = buf;
+	while (*b && !isdigit(*b))
+		b++;
+
+	return strtoull(b, NULL, base);
+}
+
+static uint64_t debugfs_file_to_u64(const char *name)
+{
+	char buf[1024];
+
+	snprintf(buf, sizeof(buf), "%s/%s", debugfs_dri_path, name);
+
+	return filename_to_u64(buf, 0);
+}
+
+static uint64_t rapl_type_id(void)
+{
+	return filename_to_u64("/sys/devices/power/type", 10);
+}
+
+static uint64_t rapl_gpu_power(void)
+{
+	return filename_to_u64("/sys/devices/power/events/energy-gpu", 0);
+}
 
-	if (len < 0)
+static double filename_to_double(const char *filename)
+{
+	char buf[64];
+
+	if (filename_to_buf(filename, buf, sizeof(buf)))
 		return 0;
 
-	buf[len] = '\0';
+	return strtod(buf, NULL);
+}
+
+static double rapl_gpu_power_scale(void)
+{
+	return filename_to_double("/sys/devices/power/events/energy-gpu.scale");
+}
+
+int power_init(struct power *power)
+{
+	uint64_t val;
+
+	memset(power, 0, sizeof(*power));
+
+	power->fd = igt_perf_open(rapl_type_id(), rapl_gpu_power());
+	if (power->fd >= 0) {
+		power->rapl_scale = rapl_gpu_power_scale();
+
+		if (power->rapl_scale != NAN) {
+			power->rapl_scale *= 1e3; /* from nano to micro */
+			return 0;
+		}
+	}
+
+	val = debugfs_file_to_u64("i915_energy_uJ");
+	if (val == 0)
+		return power->error = EINVAL;
 
-	return strtoull(buf, 0, 0);
+	return 0;
 }
 
 static uint64_t clock_ms_to_u64(void)
@@ -93,30 +140,30 @@ static uint64_t clock_ms_to_u64(void)
 	if (clock_gettime(CLOCK_MONOTONIC, &tv) < 0)
 		return 0;
 
-	return (uint64_t)tv.tv_sec * 1000 + tv.tv_nsec / 1000000;
+	return (uint64_t)tv.tv_sec * 1e3 + tv.tv_nsec / 1e6;
 }
 
 int power_update(struct power *power)
 {
-	struct power_stat *s = &power->stat[power->count++&1];
-	struct power_stat *d = &power->stat[power->count&1];
+	struct power_stat *s = &power->stat[power->count++ & 1];
+	struct power_stat *d = &power->stat[power->count & 1];
 	uint64_t d_time;
 
 	if (power->error)
 		return power->error;
 
-	if (power->fd != -1) {
+	if (power->fd >= 0) {
 		uint64_t data[2];
 		int len;
 
 		len = read(power->fd, data, sizeof(data));
-		if (len < 0)
+		if (len != sizeof(data))
 			return power->error = errno;
 
-		s->energy = data[0];
-		s->timestamp = data[1] / (1000*1000);
+		s->energy = llround((double)data[0] * power->rapl_scale);
+		s->timestamp = data[1] / 1e6;
 	} else {
-		s->energy = file_to_u64("i915_energy_uJ");
+		s->energy = debugfs_file_to_u64("i915_energy_uJ") / 1e3;
 		s->timestamp = clock_ms_to_u64();
 	}
 
@@ -124,7 +171,9 @@ int power_update(struct power *power)
 		return EAGAIN;
 
 	d_time = s->timestamp - d->timestamp;
-	power->power_mW = (s->energy - d->energy) / d_time;
+	power->power_mW = round((double)(s->energy - d->energy) *
+				(1e3f / d_time));
 	power->new_sample = 1;
+
 	return 0;
 }
diff --git a/overlay/power.h b/overlay/power.h
index bf8346ce46b4..28abfc32234b 100644
--- a/overlay/power.h
+++ b/overlay/power.h
@@ -39,6 +39,8 @@ struct power {
 	int new_sample;
 
 	uint64_t power_mW;
+
+	double rapl_scale;
 };
 
 int power_init(struct power *power);
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* ✓ Fi.CI.BAT: success for IGT PMU support (rev8)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  2017-10-10  9:42 ` ✗ Fi.CI.BAT: failure for IGT PMU support (rev7) Patchwork
@ 2017-10-10 12:06 ` Patchwork
  2017-10-10 13:48 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev9) Patchwork
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2017-10-10 12:06 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev8)
URL   : https://patchwork.freedesktop.org/series/28253/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
d7c88290ab6a8393dc341b30c7fb5e27d2952901 syncobj: Add a test for SYNCOBJ_CREATE_SIGNALED

with latest DRM-Tip kernel build CI_DRM_3202
6bcaf2275e52 drm-tip: 2017y-10m-10d-10h-57m-51s UTC integration manifest

Testlist changes:
+igt@perf_pmu@all-busy-check-all
+igt@perf_pmu@busy-bcs0
+igt@perf_pmu@busy-check-all-bcs0
+igt@perf_pmu@busy-check-all-rcs0
+igt@perf_pmu@busy-check-all-vcs0
+igt@perf_pmu@busy-check-all-vcs1
+igt@perf_pmu@busy-check-all-vecs0
+igt@perf_pmu@busy-no-semaphores-bcs0
+igt@perf_pmu@busy-no-semaphores-rcs0
+igt@perf_pmu@busy-no-semaphores-vcs0
+igt@perf_pmu@busy-no-semaphores-vcs1
+igt@perf_pmu@busy-no-semaphores-vecs0
+igt@perf_pmu@busy-rcs0
+igt@perf_pmu@busy-vcs0
+igt@perf_pmu@busy-vcs1
+igt@perf_pmu@busy-vecs0
+igt@perf_pmu@cpu-hotplug
+igt@perf_pmu@event-wait-rcs0
+igt@perf_pmu@frequency
+igt@perf_pmu@idle-bcs0
+igt@perf_pmu@idle-no-semaphores-bcs0
+igt@perf_pmu@idle-no-semaphores-rcs0
+igt@perf_pmu@idle-no-semaphores-vcs0
+igt@perf_pmu@idle-no-semaphores-vcs1
+igt@perf_pmu@idle-no-semaphores-vecs0
+igt@perf_pmu@idle-rcs0
+igt@perf_pmu@idle-vcs0
+igt@perf_pmu@idle-vcs1
+igt@perf_pmu@idle-vecs0
+igt@perf_pmu@init-busy-bcs0
+igt@perf_pmu@init-busy-rcs0
+igt@perf_pmu@init-busy-vcs0
+igt@perf_pmu@init-busy-vcs1
+igt@perf_pmu@init-busy-vecs0
+igt@perf_pmu@init-sema-bcs0
+igt@perf_pmu@init-sema-rcs0
+igt@perf_pmu@init-sema-vcs0
+igt@perf_pmu@init-sema-vcs1
+igt@perf_pmu@init-sema-vecs0
+igt@perf_pmu@init-wait-bcs0
+igt@perf_pmu@init-wait-rcs0
+igt@perf_pmu@init-wait-vcs0
+igt@perf_pmu@init-wait-vcs1
+igt@perf_pmu@init-wait-vecs0
+igt@perf_pmu@interrupts
+igt@perf_pmu@invalid-init
+igt@perf_pmu@most-busy-check-all-bcs0
+igt@perf_pmu@most-busy-check-all-rcs0
+igt@perf_pmu@most-busy-check-all-vcs0
+igt@perf_pmu@most-busy-check-all-vcs1
+igt@perf_pmu@most-busy-check-all-vecs0
+igt@perf_pmu@multi-client-bcs0
+igt@perf_pmu@multi-client-rcs0
+igt@perf_pmu@multi-client-vcs0
+igt@perf_pmu@multi-client-vcs1
+igt@perf_pmu@multi-client-vecs0
+igt@perf_pmu@other-init-0
+igt@perf_pmu@other-init-1
+igt@perf_pmu@other-init-2
+igt@perf_pmu@other-init-3
+igt@perf_pmu@other-init-4
+igt@perf_pmu@other-init-5
+igt@perf_pmu@other-init-6
+igt@perf_pmu@other-read-0
+igt@perf_pmu@other-read-1
+igt@perf_pmu@other-read-2
+igt@perf_pmu@other-read-3
+igt@perf_pmu@other-read-4
+igt@perf_pmu@other-read-5
+igt@perf_pmu@other-read-6
+igt@perf_pmu@rc6
+igt@perf_pmu@rc6p
+igt@perf_pmu@render-node-busy-bcs0
+igt@perf_pmu@render-node-busy-rcs0
+igt@perf_pmu@render-node-busy-vcs0
+igt@perf_pmu@render-node-busy-vcs1
+igt@perf_pmu@render-node-busy-vecs0
+igt@perf_pmu@semaphore-wait-bcs0
+igt@perf_pmu@semaphore-wait-rcs0
+igt@perf_pmu@semaphore-wait-vcs0
+igt@perf_pmu@semaphore-wait-vcs1
+igt@perf_pmu@semaphore-wait-vecs0

Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-byt-n2820) fdo#101705

fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:461s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:479s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:396s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:570s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:286s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:523s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:524s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:534s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:536s
fi-cfl-s         total:289  pass:256  dwarn:1   dfail:0   fail:0   skip:32  time:564s
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:634s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:427s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:600s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:445s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:422s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:466s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:510s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:478s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:500s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:584s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:490s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:589s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:659s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:474s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:660s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:540s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:517s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:475s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:583s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:438s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_310/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t 2/9] intel-gpu-overlay: Consolidate perf PMU access to library
  2017-10-10  9:30 ` [PATCH i-g-t 2/9] intel-gpu-overlay: Consolidate perf PMU access to library Tvrtko Ursulin
@ 2017-10-10 12:21   ` Chris Wilson
  0 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2017-10-10 12:21 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-10 10:30:01)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Various tool modules implement their owm PMU open wrapper which
> can be replaced by calling the library one.
> 
> v2:
>  * Remove extra newline. (Chris Wilson)
>  * Commit msg.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t 3/9] lib/perf: Fix data types and general tidy
  2017-10-10  9:30 ` [PATCH i-g-t 3/9] lib/perf: Fix data types and general tidy Tvrtko Ursulin
@ 2017-10-10 12:22   ` Chris Wilson
  0 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2017-10-10 12:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-10 10:30:02)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Configuration and format are uint64_t in the perf API.

Planning for a busy few years? ;)

> Tidy some other details as well.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t 4/9] intel-gpu-overlay: Fix interrupts PMU readout
  2017-10-10  9:30 ` [PATCH i-g-t 4/9] intel-gpu-overlay: Fix interrupts PMU readout Tvrtko Ursulin
@ 2017-10-10 12:23   ` Chris Wilson
  2017-10-10 14:17     ` [PATCH i-g-t v2 " Tvrtko Ursulin
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2017-10-10 12:23 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-10 10:30:03)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  overlay/gem-interrupts.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/overlay/gem-interrupts.c b/overlay/gem-interrupts.c
> index a84aef0398a7..3eda24f4d7eb 100644
> --- a/overlay/gem-interrupts.c
> +++ b/overlay/gem-interrupts.c
> @@ -136,8 +136,12 @@ int gem_interrupts_update(struct gem_interrupts *irqs)
>                 else
>                         val = ret;
>         } else {
> -               if (read(irqs->fd, &val, sizeof(val)) < 0)
> +               uint64_t data[2];
> +
> +               if (read(irqs->fd, &data, sizeof(data)) < 0)

s/&data/data/

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t v3 6/9] intel-gpu-overlay: Use RAPL PMU for power reading
  2017-10-10 12:05     ` [PATCH i-g-t v3 " Tvrtko Ursulin
@ 2017-10-10 12:25       ` Chris Wilson
  2017-11-21 19:35       ` [PATCH i-g-t v4 " Tvrtko Ursulin
  1 sibling, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2017-10-10 12:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-10 13:05:40)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Wire up to the RAPL PMU for GPU energy readings.
> 
> The only complication is that we have to add code to parse:
> 
>  # cat /sys/devices/power/events/energy-gpu.scale
>  2.3283064365386962890625e-10
> 
> v2: Link with -lm.
> v3: strtod can handle scientific notation, even though my initial
>     reading of the man page did not spot that. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-10  9:30 ` [PATCH i-g-t 7/9] tests/perf_pmu: Tests for i915 PMU API Tvrtko Ursulin
@ 2017-10-10 12:37   ` Chris Wilson
  2017-10-10 13:38     ` Tvrtko Ursulin
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2017-10-10 12:37 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-10 10:30:06)
> +static void
> +event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> +{
> +       struct drm_i915_gem_exec_object2 obj = { };
> +       struct drm_i915_gem_execbuffer2 eb = { };
> +       data_t data;
> +       igt_display_t *display = &data.display;
> +       const uint32_t DERRMR = 0x44050;
> +       unsigned int valid_tests = 0;
> +       uint32_t batch[8], *b;
> +       igt_output_t *output;
> +       uint32_t bb_handle;
> +       uint32_t reg;
> +       enum pipe p;
> +       int fd;
> +
> +       igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
> +       igt_require(intel_register_access_init(intel_get_pci_device(),
> +                                              false, gem_fd) == 0);
> +
> +       /**
> +        * We will use the display to render event forwarind so need to
> +        * program the DERRMR register and restore it at exit.

DERRMR is always masked until we need it. If you really wanted to
preserve the old value, SRM, LRM around the test. Not that fussed, just
a general dislike of direct poking of registers.

> +        *
> +        * We will emit a MI_WAIT_FOR_EVENT listening for vblank events,
> +        * have a background helper to indirectly enable vblank irqs, and
> +        * listen to the recorded time spent in engine wait state as reported
> +        * by the PMU.
> +        */
> +       reg = intel_register_read(DERRMR);
> +
> +       kmstest_set_vt_graphics_mode();
> +       igt_display_init(&data.display, gem_fd);
> +
> +       bb_handle = gem_create(gem_fd, 4096);
> +
> +       b = batch;
> +       *b++ = MI_LOAD_REGISTER_IMM;
> +       *b++ = DERRMR;
> +       *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
> +       *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
> +       *b++ = MI_LOAD_REGISTER_IMM;
> +       *b++ = DERRMR;
> +       *b++ = reg;
> +       *b++ = MI_BATCH_BUFFER_END;

> +static int chain_nop(int gem_fd, unsigned long sz, int in_fence, bool sync)
> +{
> +       struct drm_i915_gem_exec_object2 obj = {};
> +       struct drm_i915_gem_execbuffer2 eb =
> +               { .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
> +       const uint32_t bbe = 0xa << 23;
> +
> +       sz = ALIGN(sz, sizeof(uint32_t));
> +
> +       obj.handle = gem_create(gem_fd, sz);
> +       gem_write(gem_fd, obj.handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
> +
> +       eb.flags = I915_EXEC_RENDER | I915_EXEC_FENCE_OUT;
> +
> +       if (in_fence >= 0) {
> +               eb.flags |= I915_EXEC_FENCE_IN;
> +               eb.rsvd2 = in_fence;
> +       }
> +
> +       gem_execbuf_wr(gem_fd, &eb);

On the same ctx/engine, this shouldn't be generating interrupts between
the batches. The fence should be resolved to an i915 request and then we
see that the requests are naturally ordered so the fence is elided.
> +
> +       if (sync)
> +               gem_sync(gem_fd, obj.handle);

So it looks like this will remain the only interrupt generator.

If we exported the out-fence and then polled that, that is currently
hooked up to an interrupt only path.


I don't have anything else to say! Afaict you have everything covered,
so the only way to find what's not is by letting it go live!

Give or take more tuning of the interrupt test,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-10 12:37   ` Chris Wilson
@ 2017-10-10 13:38     ` Tvrtko Ursulin
  2017-10-10 13:46       ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10 13:38 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx


On 10/10/2017 13:37, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-10-10 10:30:06)
>> +static void
>> +event_wait(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +       struct drm_i915_gem_exec_object2 obj = { };
>> +       struct drm_i915_gem_execbuffer2 eb = { };
>> +       data_t data;
>> +       igt_display_t *display = &data.display;
>> +       const uint32_t DERRMR = 0x44050;
>> +       unsigned int valid_tests = 0;
>> +       uint32_t batch[8], *b;
>> +       igt_output_t *output;
>> +       uint32_t bb_handle;
>> +       uint32_t reg;
>> +       enum pipe p;
>> +       int fd;
>> +
>> +       igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
>> +       igt_require(intel_register_access_init(intel_get_pci_device(),
>> +                                              false, gem_fd) == 0);
>> +
>> +       /**
>> +        * We will use the display to render event forwarind so need to
>> +        * program the DERRMR register and restore it at exit.
> 
> DERRMR is always masked until we need it. If you really wanted to
> preserve the old value, SRM, LRM around the test. Not that fussed, just
> a general dislike of direct poking of registers.

Thought I can get away with it since the handle is drm master and also 
there doesn't seem to be a patch in i915 which touches this register.

>> +        *
>> +        * We will emit a MI_WAIT_FOR_EVENT listening for vblank events,
>> +        * have a background helper to indirectly enable vblank irqs, and
>> +        * listen to the recorded time spent in engine wait state as reported
>> +        * by the PMU.
>> +        */
>> +       reg = intel_register_read(DERRMR);
>> +
>> +       kmstest_set_vt_graphics_mode();
>> +       igt_display_init(&data.display, gem_fd);
>> +
>> +       bb_handle = gem_create(gem_fd, 4096);
>> +
>> +       b = batch;
>> +       *b++ = MI_LOAD_REGISTER_IMM;
>> +       *b++ = DERRMR;
>> +       *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
>> +       *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
>> +       *b++ = MI_LOAD_REGISTER_IMM;
>> +       *b++ = DERRMR;
>> +       *b++ = reg;
>> +       *b++ = MI_BATCH_BUFFER_END;
> 
>> +static int chain_nop(int gem_fd, unsigned long sz, int in_fence, bool sync)
>> +{
>> +       struct drm_i915_gem_exec_object2 obj = {};
>> +       struct drm_i915_gem_execbuffer2 eb =
>> +               { .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
>> +       const uint32_t bbe = 0xa << 23;
>> +
>> +       sz = ALIGN(sz, sizeof(uint32_t));
>> +
>> +       obj.handle = gem_create(gem_fd, sz);
>> +       gem_write(gem_fd, obj.handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
>> +
>> +       eb.flags = I915_EXEC_RENDER | I915_EXEC_FENCE_OUT;
>> +
>> +       if (in_fence >= 0) {
>> +               eb.flags |= I915_EXEC_FENCE_IN;
>> +               eb.rsvd2 = in_fence;
>> +       }
>> +
>> +       gem_execbuf_wr(gem_fd, &eb);
> 
> On the same ctx/engine, this shouldn't be generating interrupts between
> the batches. The fence should be resolved to an i915 request and then we
> see that the requests are naturally ordered so the fence is elided.
>> +
>> +       if (sync)
>> +               gem_sync(gem_fd, obj.handle);
> 
> So it looks like this will remain the only interrupt generator.
> 
> If we exported the out-fence and then polled that, that is currently
> hooked up to an interrupt only path.

So you think I'm counting ctx switches and other stuff? Maybe we should 
have separate PMU counters for all the different interrupts. :)

What do you mean by exporting the out fence? Looping it through via 
something external to hide the source? Not sure how to do it. Dup the 
fd? Or create a swfence and merge the out fence to it? (Talking from 
memory here.)

P.S.
I did miss to subtract the previous count before the assert.

Regards,

Tvrtko

> 
> 
> I don't have anything else to say! Afaict you have everything covered,
> so the only way to find what's not is by letting it go live!
> 
> Give or take more tuning of the interrupt test,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-10 13:38     ` Tvrtko Ursulin
@ 2017-10-10 13:46       ` Chris Wilson
  2017-10-10 14:17         ` [PATCH i-g-t v6 " Tvrtko Ursulin
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2017-10-10 13:46 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-10 14:38:10)
> 
> On 10/10/2017 13:37, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-10-10 10:30:06)
> >> +static void
> >> +event_wait(int gem_fd, const struct intel_execution_engine2 *e)
> >> +{
> >> +       struct drm_i915_gem_exec_object2 obj = { };
> >> +       struct drm_i915_gem_execbuffer2 eb = { };
> >> +       data_t data;
> >> +       igt_display_t *display = &data.display;
> >> +       const uint32_t DERRMR = 0x44050;
> >> +       unsigned int valid_tests = 0;
> >> +       uint32_t batch[8], *b;
> >> +       igt_output_t *output;
> >> +       uint32_t bb_handle;
> >> +       uint32_t reg;
> >> +       enum pipe p;
> >> +       int fd;
> >> +
> >> +       igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
> >> +       igt_require(intel_register_access_init(intel_get_pci_device(),
> >> +                                              false, gem_fd) == 0);
> >> +
> >> +       /**
> >> +        * We will use the display to render event forwarind so need to
> >> +        * program the DERRMR register and restore it at exit.
> > 
> > DERRMR is always masked until we need it. If you really wanted to
> > preserve the old value, SRM, LRM around the test. Not that fussed, just
> > a general dislike of direct poking of registers.
> 
> Thought I can get away with it since the handle is drm master and also 
> there doesn't seem to be a patch in i915 which touches this register.

Oh, we used to set it around pageflips. Good point about drmMaster.
That's worth a igt_require(igt_set_master()) to document the requirement
that we need it here for SECURE dispatch.

I'm actually a bit concerned that DERRMR isn't ~0u by default. (Though
maybe they only assert the bits that are connected to events).

> >> +        *
> >> +        * We will emit a MI_WAIT_FOR_EVENT listening for vblank events,
> >> +        * have a background helper to indirectly enable vblank irqs, and
> >> +        * listen to the recorded time spent in engine wait state as reported
> >> +        * by the PMU.
> >> +        */
> >> +       reg = intel_register_read(DERRMR);
> >> +
> >> +       kmstest_set_vt_graphics_mode();
> >> +       igt_display_init(&data.display, gem_fd);
> >> +
> >> +       bb_handle = gem_create(gem_fd, 4096);
> >> +
> >> +       b = batch;
> >> +       *b++ = MI_LOAD_REGISTER_IMM;
> >> +       *b++ = DERRMR;
> >> +       *b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
> >> +       *b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
> >> +       *b++ = MI_LOAD_REGISTER_IMM;
> >> +       *b++ = DERRMR;
> >> +       *b++ = reg;
> >> +       *b++ = MI_BATCH_BUFFER_END;
> > 
> >> +static int chain_nop(int gem_fd, unsigned long sz, int in_fence, bool sync)
> >> +{
> >> +       struct drm_i915_gem_exec_object2 obj = {};
> >> +       struct drm_i915_gem_execbuffer2 eb =
> >> +               { .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
> >> +       const uint32_t bbe = 0xa << 23;
> >> +
> >> +       sz = ALIGN(sz, sizeof(uint32_t));
> >> +
> >> +       obj.handle = gem_create(gem_fd, sz);
> >> +       gem_write(gem_fd, obj.handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
> >> +
> >> +       eb.flags = I915_EXEC_RENDER | I915_EXEC_FENCE_OUT;
> >> +
> >> +       if (in_fence >= 0) {
> >> +               eb.flags |= I915_EXEC_FENCE_IN;
> >> +               eb.rsvd2 = in_fence;
> >> +       }
> >> +
> >> +       gem_execbuf_wr(gem_fd, &eb);
> > 
> > On the same ctx/engine, this shouldn't be generating interrupts between
> > the batches. The fence should be resolved to an i915 request and then we
> > see that the requests are naturally ordered so the fence is elided.
> >> +
> >> +       if (sync)
> >> +               gem_sync(gem_fd, obj.handle);
> > 
> > So it looks like this will remain the only interrupt generator.
> > 
> > If we exported the out-fence and then polled that, that is currently
> > hooked up to an interrupt only path.
> 
> So you think I'm counting ctx switches and other stuff? Maybe we should 
> have separate PMU counters for all the different interrupts. :)

Yeah, each request will be generating a new lite-restore. Hmm, how about
if we guarded that with i915_gem_request_started(). Interesting...
 
> What do you mean by exporting the out fence? Looping it through via 
> something external to hide the source? Not sure how to do it. Dup the 
> fd? Or create a swfence and merge the out fence to it? (Talking from 
> memory here.)

We can just use the poll((struct pfd){fd, POLLIN}, 1, -1). Instead of
"exporting the out_fence" read "having exported the out_fence", i.e. the
sync_file currently forces use of the interrupt (so long as it is not
already completed!).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* ✓ Fi.CI.BAT: success for IGT PMU support (rev9)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (10 preceding siblings ...)
  2017-10-10 12:06 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev8) Patchwork
@ 2017-10-10 13:48 ` Patchwork
  2017-10-10 15:19 ` ✗ Fi.CI.IGT: failure for IGT PMU support (rev8) Patchwork
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2017-10-10 13:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev9)
URL   : https://patchwork.freedesktop.org/series/28253/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
d7c88290ab6a8393dc341b30c7fb5e27d2952901 syncobj: Add a test for SYNCOBJ_CREATE_SIGNALED

with latest DRM-Tip kernel build CI_DRM_3202
6bcaf2275e52 drm-tip: 2017y-10m-10d-10h-57m-51s UTC integration manifest

Testlist changes:
+igt@perf_pmu@all-busy-check-all
+igt@perf_pmu@busy-bcs0
+igt@perf_pmu@busy-check-all-bcs0
+igt@perf_pmu@busy-check-all-rcs0
+igt@perf_pmu@busy-check-all-vcs0
+igt@perf_pmu@busy-check-all-vcs1
+igt@perf_pmu@busy-check-all-vecs0
+igt@perf_pmu@busy-no-semaphores-bcs0
+igt@perf_pmu@busy-no-semaphores-rcs0
+igt@perf_pmu@busy-no-semaphores-vcs0
+igt@perf_pmu@busy-no-semaphores-vcs1
+igt@perf_pmu@busy-no-semaphores-vecs0
+igt@perf_pmu@busy-rcs0
+igt@perf_pmu@busy-vcs0
+igt@perf_pmu@busy-vcs1
+igt@perf_pmu@busy-vecs0
+igt@perf_pmu@cpu-hotplug
+igt@perf_pmu@event-wait-rcs0
+igt@perf_pmu@frequency
+igt@perf_pmu@idle-bcs0
+igt@perf_pmu@idle-no-semaphores-bcs0
+igt@perf_pmu@idle-no-semaphores-rcs0
+igt@perf_pmu@idle-no-semaphores-vcs0
+igt@perf_pmu@idle-no-semaphores-vcs1
+igt@perf_pmu@idle-no-semaphores-vecs0
+igt@perf_pmu@idle-rcs0
+igt@perf_pmu@idle-vcs0
+igt@perf_pmu@idle-vcs1
+igt@perf_pmu@idle-vecs0
+igt@perf_pmu@init-busy-bcs0
+igt@perf_pmu@init-busy-rcs0
+igt@perf_pmu@init-busy-vcs0
+igt@perf_pmu@init-busy-vcs1
+igt@perf_pmu@init-busy-vecs0
+igt@perf_pmu@init-sema-bcs0
+igt@perf_pmu@init-sema-rcs0
+igt@perf_pmu@init-sema-vcs0
+igt@perf_pmu@init-sema-vcs1
+igt@perf_pmu@init-sema-vecs0
+igt@perf_pmu@init-wait-bcs0
+igt@perf_pmu@init-wait-rcs0
+igt@perf_pmu@init-wait-vcs0
+igt@perf_pmu@init-wait-vcs1
+igt@perf_pmu@init-wait-vecs0
+igt@perf_pmu@interrupts
+igt@perf_pmu@invalid-init
+igt@perf_pmu@most-busy-check-all-bcs0
+igt@perf_pmu@most-busy-check-all-rcs0
+igt@perf_pmu@most-busy-check-all-vcs0
+igt@perf_pmu@most-busy-check-all-vcs1
+igt@perf_pmu@most-busy-check-all-vecs0
+igt@perf_pmu@multi-client-bcs0
+igt@perf_pmu@multi-client-rcs0
+igt@perf_pmu@multi-client-vcs0
+igt@perf_pmu@multi-client-vcs1
+igt@perf_pmu@multi-client-vecs0
+igt@perf_pmu@other-init-0
+igt@perf_pmu@other-init-1
+igt@perf_pmu@other-init-2
+igt@perf_pmu@other-init-3
+igt@perf_pmu@other-init-4
+igt@perf_pmu@other-init-5
+igt@perf_pmu@other-init-6
+igt@perf_pmu@other-read-0
+igt@perf_pmu@other-read-1
+igt@perf_pmu@other-read-2
+igt@perf_pmu@other-read-3
+igt@perf_pmu@other-read-4
+igt@perf_pmu@other-read-5
+igt@perf_pmu@other-read-6
+igt@perf_pmu@rc6
+igt@perf_pmu@rc6p
+igt@perf_pmu@render-node-busy-bcs0
+igt@perf_pmu@render-node-busy-rcs0
+igt@perf_pmu@render-node-busy-vcs0
+igt@perf_pmu@render-node-busy-vcs1
+igt@perf_pmu@render-node-busy-vecs0
+igt@perf_pmu@semaphore-wait-bcs0
+igt@perf_pmu@semaphore-wait-rcs0
+igt@perf_pmu@semaphore-wait-vcs0
+igt@perf_pmu@semaphore-wait-vcs1
+igt@perf_pmu@semaphore-wait-vecs0

Test gem_exec_suspend:
        Subgroup basic-s3:
                dmesg-warn -> PASS       (fi-cfl-s) fdo#103026
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-byt-n2820) fdo#101705

fdo#103026 https://bugs.freedesktop.org/show_bug.cgi?id=103026
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:455s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:475s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:397s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:562s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:286s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:523s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:528s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:538s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:528s
fi-cfl-s         total:289  pass:257  dwarn:0   dfail:0   fail:0   skip:32  time:570s
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:630s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:438s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:599s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:441s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:416s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:464s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:510s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:480s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:506s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:586s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:490s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:596s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:660s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:467s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:656s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:532s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:522s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:477s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:585s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:435s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_311/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v2 4/9] intel-gpu-overlay: Fix interrupts PMU readout
  2017-10-10 12:23   ` Chris Wilson
@ 2017-10-10 14:17     ` Tvrtko Ursulin
  0 siblings, 0 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10 14:17 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

v2: Use correct address of. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 overlay/gem-interrupts.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/overlay/gem-interrupts.c b/overlay/gem-interrupts.c
index a84aef0398a7..5bd8656e0e63 100644
--- a/overlay/gem-interrupts.c
+++ b/overlay/gem-interrupts.c
@@ -136,8 +136,12 @@ int gem_interrupts_update(struct gem_interrupts *irqs)
 		else
 			val = ret;
 	} else {
-		if (read(irqs->fd, &val, sizeof(val)) < 0)
+		uint64_t data[2];
+
+		if (read(irqs->fd, data, sizeof(data)) < 0)
 			return irqs->error = errno;
+
+		val = data[0];
 	}
 
 	update = irqs->last_count == 0;
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v6 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-10 13:46       ` Chris Wilson
@ 2017-10-10 14:17         ` Tvrtko Ursulin
  2017-10-10 16:39           ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-10 14:17 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A bunch of tests for the new i915 PMU feature.

Parts of the code were initialy sketched by Dmitry Rogozhkin.

v2: (Most suggestions by Chris Wilson)
 * Add new class/instance based engine list.
 * Add gem_has_engine/gem_require_engine to work with class/instance.
 * Use the above two throughout the test.
 * Shorten tests to 100ms busy batches, seems enough.
 * Add queued counter sanity checks.
 * Use igt_nsec_elapsed.
 * Skip on perf -ENODEV in some tests instead of embedding knowledge locally.
 * Fix multi ordering for busy accounting.
 * Use new guranteed_usleep when sleep time is asserted on.
 * Check for no queued when idle/busy.
 * Add queued counter init test.
 * Add queued tests.
 * Consolidate and increase multiple busy engines tests to most-busy and
   all-busy tests.
 * Guarantte interrupts by using fences.
 * Test RC6 via forcewake.

v3:
 * Tweak assert in interrupts subtest.
 * Sprinkle of comments.
 * Fix multi-client test which got broken in v2.

v4:
 * Measured instead of guaranteed sleep.
 * Missing sync in no_sema.
 * Log busyness before asserts for debug.
 * access(2) instead of open(2) to determine if cpu0 is hotpluggable.
 * Test frequency reporting via min/max setting instead assuming.
   ^^ All above suggested by Chris Wilson. ^^
 * Drop queued subtests to match i915.
 * Use long batches with fences to ensure interrupts.
 * Test render node as well.

v5:
 * Add to meson build. (Petri Latvala)
 * Use 1eN constants. (Chris Wilson)
 * Add tests for semaphore and event waiting.

v6:
 * Fix interrupts subtest by polling the fence from the "outside".
   (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
---
 lib/igt_gt.c           |   50 ++
 lib/igt_gt.h           |   38 ++
 lib/igt_perf.h         |    9 +-
 tests/Makefile.am      |    1 +
 tests/Makefile.sources |    1 +
 tests/meson.build      |    1 +
 tests/perf_pmu.c       | 1238 ++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 1330 insertions(+), 8 deletions(-)
 create mode 100644 tests/perf_pmu.c

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index b3f3b3809eee..4c75811fb1b3 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -568,3 +568,53 @@ bool gem_can_store_dword(int fd, unsigned int engine)
 
 	return true;
 }
+
+const struct intel_execution_engine2 intel_execution_engines2[] = {
+	{ "rcs0", I915_ENGINE_CLASS_RENDER, 0 },
+	{ "bcs0", I915_ENGINE_CLASS_COPY, 0 },
+	{ "vcs0", I915_ENGINE_CLASS_VIDEO, 0 },
+	{ "vcs1", I915_ENGINE_CLASS_VIDEO, 1 },
+	{ "vecs0", I915_ENGINE_CLASS_VIDEO_ENHANCE, 0 },
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance)
+{
+	if (class != I915_ENGINE_CLASS_VIDEO)
+		igt_assert(instance == 0);
+	else
+		igt_assert(instance >= 0 && instance <= 1);
+
+	switch (class) {
+	case I915_ENGINE_CLASS_RENDER:
+		return I915_EXEC_RENDER;
+	case I915_ENGINE_CLASS_COPY:
+		return I915_EXEC_BLT;
+	case I915_ENGINE_CLASS_VIDEO:
+		if (instance == 0) {
+			if (gem_has_bsd2(gem_fd))
+				return I915_EXEC_BSD | I915_EXEC_BSD_RING1;
+			else
+				return I915_EXEC_BSD;
+
+		} else {
+			return I915_EXEC_BSD | I915_EXEC_BSD_RING2;
+		}
+	case I915_ENGINE_CLASS_VIDEO_ENHANCE:
+		return I915_EXEC_VEBOX;
+	case I915_ENGINE_CLASS_OTHER:
+	default:
+		igt_assert(0);
+	};
+}
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance)
+{
+	return gem_has_ring(gem_fd,
+			    gem_class_instance_to_eb_flags(gem_fd, class,
+							   instance));
+}
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index 2579cbd37be7..fb67ae1a7d1f 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -25,6 +25,7 @@
 #define IGT_GT_H
 
 #include "igt_debugfs.h"
+#include "igt_core.h"
 
 void igt_require_hang_ring(int fd, int ring);
 
@@ -80,4 +81,41 @@ extern const struct intel_execution_engine {
 
 bool gem_can_store_dword(int fd, unsigned int engine);
 
+extern const struct intel_execution_engine2 {
+	const char *name;
+	int class;
+	int instance;
+} intel_execution_engines2[];
+
+#define for_each_engine_class_instance(fd__, e__) \
+	for ((e__) = intel_execution_engines2;\
+	     (e__)->name; \
+	     (e__)++)
+
+enum drm_i915_gem_engine_class {
+	I915_ENGINE_CLASS_OTHER = 0,
+	I915_ENGINE_CLASS_RENDER = 1,
+	I915_ENGINE_CLASS_COPY = 2,
+	I915_ENGINE_CLASS_VIDEO = 3,
+	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
+	I915_ENGINE_CLASS_MAX /* non-ABI */
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance);
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance);
+
+static inline
+void gem_require_engine(int gem_fd,
+			enum drm_i915_gem_engine_class class,
+			unsigned int instance)
+{
+	igt_require(gem_has_engine(gem_fd, class, instance));
+}
+
 #endif /* IGT_GT_H */
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index b1f525739c69..5428feb0c746 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -29,14 +29,7 @@
 
 #include <linux/perf_event.h>
 
-enum drm_i915_gem_engine_class {
-	I915_ENGINE_CLASS_OTHER = 0,
-	I915_ENGINE_CLASS_RENDER = 1,
-	I915_ENGINE_CLASS_COPY = 2,
-	I915_ENGINE_CLASS_VIDEO = 3,
-	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
-	I915_ENGINE_CLASS_MAX /* non-ABI */
-};
+#include "igt_gt.h"
 
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 89a970153992..17ee1be08d8a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -131,6 +131,7 @@ gen7_forcewake_mt_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gen7_forcewake_mt_LDADD = $(LDADD) -lpthread
 gem_userptr_blits_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_userptr_blits_LDADD = $(LDADD) -lpthread
+perf_pmu_LDADD = $(LDADD) $(top_builddir)/lib/libigt_perf.la
 
 gem_wait_LDADD = $(LDADD) -lrt
 kms_flip_LDADD = $(LDADD) -lrt -lpthread
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index c4d320ebc61b..744eeeab9ef4 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -217,6 +217,7 @@ TESTS_progs = \
 	kms_vblank \
 	meta_test \
 	perf \
+	perf_pmu \
 	pm_backlight \
 	pm_lpsp \
 	pm_rc6_residency \
diff --git a/tests/meson.build b/tests/meson.build
index 6cb3584a4dd9..12d5706faaeb 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -197,6 +197,7 @@ test_progs = [
 	'kms_vblank',
 	'meta_test',
 	'perf',
+	'perf_pmu',
 	'pm_backlight',
 	'pm_lpsp',
 	'pm_rc6_residency',
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
new file mode 100644
index 000000000000..f2645bfd2a8d
--- /dev/null
+++ b/tests/perf_pmu.c
@@ -0,0 +1,1238 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/times.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <time.h>
+#include <poll.h>
+
+#include "igt.h"
+#include "igt_core.h"
+#include "igt_perf.h"
+#include "igt_sysfs.h"
+
+IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
+
+const double tolerance = 0.03f;
+const unsigned long batch_duration_ns = 100 * 1000 * 1000;
+
+static int open_pmu(uint64_t config)
+{
+	int fd;
+
+	fd = perf_i915_open(config);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static int open_group(uint64_t config, int group)
+{
+	int fd;
+
+	fd = perf_i915_open_group(config, group);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static void
+init(int gem_fd, const struct intel_execution_engine2 *e, uint8_t sample)
+{
+	int fd;
+
+	fd = open_pmu(__I915_PMU_ENGINE(e->class, e->instance, sample));
+
+	close(fd);
+}
+
+static uint64_t pmu_read_single(int fd)
+{
+	uint64_t data[2];
+
+	igt_assert_eq(read(fd, data, sizeof(data)), sizeof(data));
+
+	return data[0];
+}
+
+static void pmu_read_multi(int fd, unsigned int num, uint64_t *val)
+{
+	uint64_t buf[2 + num];
+	unsigned int i;
+
+	igt_assert_eq(read(fd, buf, sizeof(buf)), sizeof(buf));
+
+	for (i = 0; i < num; i++)
+		val[i] = buf[2 + i];
+}
+
+#define assert_within_epsilon(x, ref, tolerance) \
+	igt_assert_f((double)(x) <= (1.0 + tolerance) * (double)ref && \
+		     (double)(x) >= (1.0 - tolerance) * (double)ref, \
+		     "'%s' != '%s' (%f not within %f%% tolerance of %f)\n",\
+		     #x, #ref, (double)x, tolerance * 100.0, (double)ref)
+
+/*
+ * Helper for cases where we assert on time spent sleeping (directly or
+ * indirectly), so make it more robust by ensuring the system sleep time
+ * is within test tolerance to start with.
+ */
+static unsigned int measured_usleep(unsigned int usec)
+{
+	uint64_t slept = 0;
+
+	while (usec > 0) {
+		struct timespec start = { };
+		uint64_t this_sleep;
+
+		igt_nsec_elapsed(&start);
+		usleep(usec);
+		this_sleep = igt_nsec_elapsed(&start);
+		slept += this_sleep;
+		if (this_sleep > usec * 1000)
+			break;
+		usec -= this_sleep;
+	}
+
+	return slept;
+}
+
+static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
+}
+
+static void
+single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	double ref = busy ? batch_duration_ns : 0.0f;
+	igt_spin_t *spin;
+	uint64_t val;
+	int fd;
+
+	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	val = pmu_read_single(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+}
+
+static void log_busy(int fd, unsigned int num_engines, uint64_t *val)
+{
+	char buf[1024];
+	int rem = sizeof(buf);
+	unsigned int i;
+	char *p = buf;
+
+	for (i = 0; i < num_engines; i++) {
+		int len;
+
+		len = snprintf(p, rem, "%u=%" PRIu64 "\n",  i, val[i]);
+		igt_assert(len > 0);
+		rem -= len;
+		p += len;
+	}
+
+	igt_info("%s", buf);
+}
+
+static void
+busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+	       const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin;
+	unsigned int busy_idx, i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+		else if (e == e_)
+			busy_idx = i;
+
+		fd[i++] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							  e_->instance),
+				     fd[0]);
+	}
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	assert_within_epsilon(val[busy_idx], batch_duration_ns, tolerance);
+	for (i = 0; i < num_engines; i++) {
+		if (i == busy_idx)
+			continue;
+		assert_within_epsilon(val[i], 0.0f, tolerance);
+	}
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+}
+
+static void
+most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+		    const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int idle_idx, i;
+
+	gem_require_engine(gem_fd, e->class, e->instance);
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							e_->instance),
+				   fd[0]);
+
+		if (e == e_) {
+			idle_idx = i;
+		} else {
+			spin[i] = igt_spin_batch_new(gem_fd, 0,
+						     e2ring(gem_fd, e_), 0);
+			igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+		}
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			gem_sync(gem_fd, spin[i]->handle);
+	}
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++) {
+		if (i == idle_idx)
+			assert_within_epsilon(val[i], 0.0f, tolerance);
+		else
+			assert_within_epsilon(val[i], batch_duration_ns,
+					      tolerance);
+	}
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			igt_spin_batch_free(gem_fd, spin[i]);
+	}
+	close(fd[0]);
+}
+
+static void
+all_busy_check_all(int gem_fd, const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e) {
+		if (!gem_has_engine(gem_fd, e->class, e->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e->class, e->instance),
+				   fd[0]);
+
+		spin[i] = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++)
+		gem_sync(gem_fd, spin[i]->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++)
+		assert_within_epsilon(val[i], batch_duration_ns, tolerance);
+
+	for (i = 0; i < num_engines; i++)
+		igt_spin_batch_free(gem_fd, spin[i]);
+	close(fd[0]);
+}
+
+static void
+no_sema(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd;
+
+	fd = open_group(I915_PMU_ENGINE_SEMA(e->class, e->instance), -1);
+	open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, val);
+
+	assert_within_epsilon(val[0], 0.0f, tolerance);
+	assert_within_epsilon(val[1], 0.0f, tolerance);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+}
+
+#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
+#define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
+#define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+
+static void
+sema_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_relocation_entry reloc = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	struct drm_i915_gem_exec_object2 obj[2];
+	uint32_t bb_handle, obj_handle;
+	unsigned long slept;
+	uint32_t *obj_ptr;
+	uint32_t batch[6];
+	uint64_t val[2];
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 8);
+
+	/**
+	 * Setup up a batchbuffer with a polling semaphore wait command which
+	 * will wait on an value in a shared bo to change. This way we are able
+	 * to control how much time we will spend in this bb.
+	 */
+
+	bb_handle = gem_create(gem_fd, 4096);
+	obj_handle = gem_create(gem_fd, 4096);
+
+	obj_ptr = gem_mmap__wc(gem_fd, obj_handle, 0, 4096, PROT_WRITE);
+
+	batch[0] = MI_SEMAPHORE_WAIT |
+		   MI_SEMAPHORE_POLL |
+		   MI_SEMAPHORE_SAD_GTE_SDD;
+	batch[1] = 1;
+	batch[2] = 0x0;
+	batch[3] = 0x0;
+	batch[4] = MI_NOOP;
+	batch[5] = MI_BATCH_BUFFER_END;
+
+	gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+	reloc.target_handle = obj_handle;
+	reloc.offset = 2 * sizeof(uint32_t);
+	reloc.read_domains = I915_GEM_DOMAIN_RENDER;
+
+	memset(obj, 0, sizeof(obj));
+
+	obj[0].handle = obj_handle;
+
+	obj[1].handle = bb_handle;
+	obj[1].relocation_count = 1;
+	obj[1].relocs_ptr = to_user_pointer(&reloc);
+
+	eb.buffer_count = 2;
+	eb.buffers_ptr = to_user_pointer(obj);
+	eb.flags = e2ring(gem_fd, e);
+
+	/**
+	 * Start the semaphore wait PMU and after some known time let the above
+	 * semaphore wait command finish. Then check that the PMU is reporting
+	 * to expected time spent in semaphore wait state.
+	 */
+
+	fd = open_pmu(I915_PMU_ENGINE_SEMA(e->class, e->instance));
+
+	val[0] = pmu_read_single(fd);
+
+	gem_execbuf(gem_fd, &eb);
+
+	slept = measured_usleep(1e5);
+
+	*obj_ptr = 1;
+
+	gem_sync(gem_fd, bb_handle);
+
+	val[1] = pmu_read_single(fd);
+
+	munmap(obj_ptr, 4096);
+	gem_close(gem_fd, obj_handle);
+	gem_close(gem_fd, bb_handle);
+	close(fd);
+
+	assert_within_epsilon(val[1] - val[0], slept, tolerance);
+}
+
+#define   MI_WAIT_FOR_PIPE_C_VBLANK (1<<21)
+#define   MI_WAIT_FOR_PIPE_B_VBLANK (1<<11)
+#define   MI_WAIT_FOR_PIPE_A_VBLANK (1<<3)
+
+typedef struct {
+	igt_display_t display;
+	struct igt_fb primary_fb;
+	igt_output_t *output;
+	enum pipe pipe;
+} data_t;
+
+static void prepare_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	drmModeModeInfo *mode;
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	/* select the pipe we want to use */
+	igt_output_set_pipe(output, data->pipe);
+
+	/* create and set the primary plane fb */
+	mode = igt_output_get_mode(output);
+	igt_create_color_fb(fd, mode->hdisplay, mode->vdisplay,
+			    DRM_FORMAT_XRGB8888,
+			    LOCAL_DRM_FORMAT_MOD_NONE,
+			    0.0, 0.0, 0.0,
+			    &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, &data->primary_fb);
+
+	igt_display_commit(display);
+
+	igt_wait_for_vblank(fd, data->pipe);
+}
+
+static void cleanup_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	igt_remove_fb(fd, &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, NULL);
+
+	igt_output_set_pipe(output, PIPE_ANY);
+	igt_display_commit(display);
+}
+
+static int wait_vblank(int fd, union drm_wait_vblank *vbl)
+{
+	int err;
+
+	err = 0;
+	if (igt_ioctl(fd, DRM_IOCTL_WAIT_VBLANK, vbl))
+		err = -errno;
+
+	return err;
+}
+
+static void
+event_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_exec_object2 obj = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	data_t data;
+	igt_display_t *display = &data.display;
+	const uint32_t DERRMR = 0x44050;
+	unsigned int valid_tests = 0;
+	uint32_t batch[8], *b;
+	igt_output_t *output;
+	uint32_t bb_handle;
+	uint32_t reg;
+	enum pipe p;
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
+	igt_require(intel_register_access_init(intel_get_pci_device(),
+					       false, gem_fd) == 0);
+
+	/**
+	 * We will use the display to render event forwarind so need to
+	 * program the DERRMR register and restore it at exit.
+	 *
+	 * We will emit a MI_WAIT_FOR_EVENT listening for vblank events,
+	 * have a background helper to indirectly enable vblank irqs, and
+	 * listen to the recorded time spent in engine wait state as reported
+	 * by the PMU.
+	 */
+	reg = intel_register_read(DERRMR);
+
+	kmstest_set_vt_graphics_mode();
+	igt_display_init(&data.display, gem_fd);
+
+	bb_handle = gem_create(gem_fd, 4096);
+
+	b = batch;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
+	*b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg;
+	*b++ = MI_BATCH_BUFFER_END;
+
+	obj.handle = bb_handle;
+
+	eb.buffer_count = 1;
+	eb.buffers_ptr = to_user_pointer(&obj);
+	eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
+
+	for_each_pipe_with_valid_output(display, p, output) {
+		struct igt_helper_process waiter = { };
+		const unsigned int frames = 3;
+		unsigned int frame;
+		uint64_t val[2];
+
+		batch[3] = MI_WAIT_FOR_EVENT;
+		switch (p) {
+		case PIPE_A:
+			batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+			break;
+		case PIPE_B:
+			batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+			break;
+		case PIPE_C:
+			batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+			break;
+		default:
+			continue;
+		}
+
+		gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+		data.pipe = p;
+		prepare_crtc(&data, gem_fd, output);
+
+		fd = open_pmu(I915_PMU_ENGINE_WAIT(e->class, e->instance));
+
+		val[0] = pmu_read_single(fd);
+
+		igt_fork_helper(&waiter) {
+			const uint32_t pipe_id_flag =
+					kmstest_get_vbl_flag(data.pipe);
+
+			for (;;) {
+				union drm_wait_vblank vbl = { };
+
+				vbl.request.type = DRM_VBLANK_RELATIVE;
+				vbl.request.type |= pipe_id_flag;
+				vbl.request.sequence = 1;
+				igt_assert_eq(wait_vblank(gem_fd, &vbl), 0);
+			}
+		}
+
+		for (frame = 0; frame < frames; frame++) {
+			gem_execbuf(gem_fd, &eb);
+			gem_sync(gem_fd, bb_handle);
+		}
+
+		igt_stop_helper(&waiter);
+
+		val[1] = pmu_read_single(fd);
+
+		close(fd);
+
+		cleanup_crtc(&data, gem_fd, output);
+		valid_tests++;
+
+		igt_assert(val[1] - val[0] > 0);
+	}
+
+	gem_close(gem_fd, bb_handle);
+
+	intel_register_access_fini();
+
+	igt_require_f(valid_tests,
+		      "no valid crtc/connector combinations found\n");
+}
+
+static void
+multi_client(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	uint64_t config = I915_PMU_ENGINE_BUSY(e->class, e->instance);
+	unsigned int slept;
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd[2];
+
+	fd[0] = open_pmu(config);
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	usleep(batch_duration_ns / 3000);
+
+	/*
+	 * Second PMU client which is initialized after the first one,
+	 * and exists before it, should not affect accounting as reported
+	 * in the first client.
+	 */
+	fd[1] = open_pmu(config);
+	slept = measured_usleep(batch_duration_ns / 3000);
+	val[1] = pmu_read_single(fd[1]);
+	close(fd[1]);
+
+	gem_sync(gem_fd, spin->handle);
+
+	val[0] = pmu_read_single(fd[0]);
+
+	assert_within_epsilon(val[0], batch_duration_ns, tolerance);
+	assert_within_epsilon(val[1], slept, tolerance);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+}
+
+/**
+ * Tests that i915 PMU corectly errors out in invalid initialization.
+ * i915 PMU is uncore PMU, thus:
+ *  - sampling period is not supported
+ *  - pid > 0 is not supported since we can't count per-process (we count
+ *    per whole system)
+ *  - cpu != 0 is not supported since i915 PMU exposes cpumask for CPU0
+ */
+static void invalid_init(void)
+{
+	struct perf_event_attr attr;
+	int pid, cpu;
+
+#define ATTR_INIT() \
+do { \
+	memset(&attr, 0, sizeof (attr)); \
+	attr.config = I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0); \
+	attr.type = i915_type_id(); \
+	igt_assert(attr.type != 0); \
+} while(0)
+
+	ATTR_INIT();
+	attr.sample_period = 100;
+	pid = -1;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = 0;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = -1;
+	cpu = 1;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, ENODEV);
+}
+
+static void init_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	close(fd);
+}
+
+static void read_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	(void)pmu_read_single(fd);
+
+	close(fd);
+}
+
+static bool cpu0_hotplug_support(void)
+{
+	return access("/sys/devices/system/cpu/cpu0/online", W_OK) == 0;
+}
+
+static void cpu_hotplug(int gem_fd)
+{
+	struct timespec start = { };
+	igt_spin_t *spin;
+	uint64_t val, ref;
+	int fd;
+
+	igt_require(cpu0_hotplug_support());
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
+	igt_assert(fd >= 0);
+
+	igt_nsec_elapsed(&start);
+
+	/*
+	 * Toggle online status of all the CPUs in a child process and ensure
+	 * this has not affected busyness stats in the parent.
+	 */
+	igt_fork(child, 1) {
+		int cpu = 0;
+
+		for (;;) {
+			char name[128];
+			int cpufd;
+
+			sprintf(name, "/sys/devices/system/cpu/cpu%d/online",
+				cpu);
+			cpufd = open(name, O_WRONLY);
+			if (cpufd == -1) {
+				igt_assert(cpu > 0);
+				break;
+			}
+			igt_assert_eq(write(cpufd, "0", 2), 2);
+
+			usleep(1000 * 1000);
+
+			igt_assert_eq(write(cpufd, "1", 2), 2);
+
+			close(cpufd);
+			cpu++;
+		}
+	}
+
+	igt_waitchildren();
+
+	igt_spin_batch_end(spin);
+	gem_sync(gem_fd, spin->handle);
+
+	ref = igt_nsec_elapsed(&start);
+	val = pmu_read_single(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+}
+
+static unsigned long calibrate_nop(int fd, const unsigned int calibration_us)
+{
+	const unsigned int cal_min_us = calibration_us * 3;
+	const unsigned int tolerance_pct = 10;
+	const uint32_t bbe = 0xa << 23;
+	const unsigned int loops = 17;
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	struct timespec t_begin = { };
+	long size, last_size;
+	unsigned long ns;
+
+	igt_nsec_elapsed(&t_begin);
+
+	size = 256 * 1024;
+	do {
+		struct timespec t_start = { };
+
+		obj.handle = gem_create(fd, size);
+		gem_write(fd, obj.handle, size - sizeof(bbe), &bbe,
+			  sizeof(bbe));
+		gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		igt_nsec_elapsed(&t_start);
+
+		for (int loop = 0; loop < loops; loop++)
+			gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		ns = igt_nsec_elapsed(&t_start);
+
+		gem_close(fd, obj.handle);
+
+		last_size = size;
+		size = calibration_us * 1000 * size * loops / ns;
+		size = ALIGN(size, sizeof(uint32_t));
+	} while (igt_nsec_elapsed(&t_begin) / 1000 < cal_min_us ||
+		 abs(size - last_size) > (size * tolerance_pct / 100));
+
+	return size / sizeof(uint32_t);
+}
+
+static void exec_nop(int gem_fd, unsigned long sz)
+{
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	struct pollfd pfd;
+	int fence;
+
+	sz = ALIGN(sz, sizeof(uint32_t));
+
+	obj.handle = gem_create(gem_fd, sz);
+	gem_write(gem_fd, obj.handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
+
+	eb.flags = I915_EXEC_RENDER | I915_EXEC_FENCE_OUT;
+
+	gem_execbuf_wr(gem_fd, &eb);
+	fence = eb.rsvd2 >> 32;
+
+	/*
+	 * Poll on the output fence to ensure user interrupts will be
+	 * generated and listened to.
+	 */
+	pfd.fd = fence;
+	pfd.events = POLLIN;
+	igt_assert_eq(poll(&pfd, 1, -1), 1);
+
+	close(fence);
+	gem_close(gem_fd, obj.handle);
+}
+
+static void
+test_interrupts(int gem_fd)
+{
+	const unsigned int calibration_us = 250000;
+	const unsigned int batch_len_us = 100000;
+	const unsigned int batch_count = 3e6 / batch_len_us;
+	uint64_t idle, busy, prev;
+	unsigned long cal, sz;
+	unsigned int i;
+	int fd;
+
+	cal = calibrate_nop(gem_fd, calibration_us);
+	sz = batch_len_us * cal / calibration_us;
+
+	fd = open_pmu(I915_PMU_INTERRUPTS);
+
+	gem_quiescent_gpu(gem_fd);
+
+	/* Wait for idle state. */
+	prev = pmu_read_single(fd);
+	idle = prev + 1;
+	while (idle != prev) {
+		usleep(1e6);
+		prev = idle;
+		idle = pmu_read_single(fd);
+	}
+
+	igt_assert_eq(idle - prev, 0);
+
+	/*
+	 * Send some no-op batches waiting on output fences to
+	 * ensure interrupts.
+	 */
+	for (i = 0; i < batch_count; i++)
+		exec_nop(gem_fd, sz);
+
+	/* Check at least as many interrupts has been generated. */
+	busy = pmu_read_single(fd) - idle;
+	igt_assert(busy >= batch_count);
+
+	close(fd);
+}
+
+static void
+test_frequency(int gem_fd)
+{
+	const uint64_t duration_ns = 2e9;
+	uint32_t min_freq, max_freq, boost_freq;
+	uint64_t min[2], max[2], start[2];
+	igt_spin_t *spin;
+	int fd, sysfs;
+
+	sysfs = igt_sysfs_open(gem_fd, NULL);
+	igt_require(sysfs >= 0);
+
+	min_freq = igt_sysfs_get_u32(sysfs, "gt_RPn_freq_mhz");
+	max_freq = igt_sysfs_get_u32(sysfs, "gt_RP0_freq_mhz");
+	boost_freq = igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz");
+	igt_require(min_freq > 0 && max_freq > 0 && boost_freq > 0);
+	igt_require(max_freq > min_freq);
+	igt_require(boost_freq > min_freq);
+
+	fd = open_group(I915_PMU_REQUESTED_FREQUENCY, -1);
+	open_group(I915_PMU_ACTUAL_FREQUENCY, fd);
+
+	/*
+	 * Set GPU to min frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == min_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, min);
+	min[0] -= start[0];
+	min[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	usleep(1e6);
+
+	/*
+	 * Set GPU to max frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == max_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", boost_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == boost_freq);
+
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, max);
+	max[0] -= start[0];
+	max[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	/*
+	 * Restore min/max.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == min_freq);
+
+	close(fd);
+
+	igt_assert(min[0] < max[0]);
+	igt_assert(min[1] < max[1]);
+}
+
+static void
+test_rc6(int gem_fd)
+{
+	int64_t duration_ns = 2 * 1000 * 1000 * 1000;
+	uint64_t idle, busy, prev;
+	unsigned int slept;
+	int fd, fw;
+
+	fd = open_pmu(I915_PMU_RC6_RESIDENCY);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	prev = pmu_read_single(fd);
+	slept = measured_usleep(duration_ns / 1000);
+	idle = pmu_read_single(fd);
+
+	assert_within_epsilon(idle - prev, slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	prev = pmu_read_single(fd);
+	usleep(duration_ns / 1000);
+	busy = pmu_read_single(fd);
+
+	assert_within_epsilon(busy - prev, 0.0, tolerance);
+
+	close(fw);
+	close(fd);
+}
+
+static void
+test_rc6p(int gem_fd)
+{
+	int64_t duration_ns = 2 * 1000 * 1000 * 1000;
+	unsigned int num_pmu = 1;
+	uint64_t idle[3], busy[3], prev[3];
+	unsigned int slept, i;
+	int fd, ret, fw;
+
+	fd = open_group(I915_PMU_RC6_RESIDENCY, -1);
+	ret = perf_i915_open_group(I915_PMU_RC6p_RESIDENCY, fd);
+	if (ret > 0) {
+		num_pmu++;
+		ret = perf_i915_open_group(I915_PMU_RC6pp_RESIDENCY, fd);
+		if (ret > 0)
+			num_pmu++;
+	}
+
+	igt_require(num_pmu == 3);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	pmu_read_multi(fd, num_pmu, prev);
+	slept = measured_usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, idle);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(idle[i] - prev[i], slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	pmu_read_multi(fd, num_pmu, prev);
+	usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, busy);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(busy[i] - prev[i], 0.0, tolerance);
+
+	close(fw);
+	close(fd);
+}
+
+igt_main
+{
+	const unsigned int num_other_metrics =
+				I915_PMU_LAST - __I915_PMU_OTHER(0) + 1;
+	unsigned int num_engines = 0;
+	int fd = -1;
+	const struct intel_execution_engine2 *e;
+	unsigned int i;
+
+	igt_fixture {
+		fd = drm_open_driver_master(DRIVER_INTEL);
+
+		igt_require_gem(fd);
+		igt_require(i915_type_id() > 0);
+
+		for_each_engine_class_instance(fd, e) {
+			if (gem_has_engine(fd, e->class, e->instance))
+				num_engines++;
+		}
+	}
+
+	/**
+	 * Test invalid access via perf API is rejected.
+	 */
+	igt_subtest("invalid-init")
+		invalid_init();
+
+	for_each_engine_class_instance(fd, e) {
+		/**
+		 * Test that a single engine metric can be initialized.
+		 */
+		igt_subtest_f("init-busy-%s", e->name)
+			init(fd, e, I915_SAMPLE_BUSY);
+
+		igt_subtest_f("init-wait-%s", e->name)
+			init(fd, e, I915_SAMPLE_WAIT);
+
+		igt_subtest_f("init-sema-%s", e->name)
+			init(fd, e, I915_SAMPLE_SEMA);
+
+		/**
+		 * Test that engines show no load when idle.
+		 */
+		igt_subtest_f("idle-%s", e->name)
+			single(fd, e, false);
+
+		/**
+		 * Test that a single engine reports load correctly.
+		 */
+		igt_subtest_f("busy-%s", e->name)
+			single(fd, e, true);
+
+		/**
+		 * Test that when one engine is loaded other report no load.
+		 */
+		igt_subtest_f("busy-check-all-%s", e->name)
+			busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that when all except one engine are loaded all loads
+		 * are correctly reported.
+		 */
+		igt_subtest_f("most-busy-check-all-%s", e->name)
+			most_busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that semphore counters report no activity on idle
+		 * or busy engines.
+		 */
+		igt_subtest_f("idle-no-semaphores-%s", e->name)
+			no_sema(fd, e, false);
+
+		igt_subtest_f("busy-no-semaphores-%s", e->name)
+			no_sema(fd, e, true);
+
+		/**
+		 * Test that semaphore waits are correctly reported.
+		 */
+		igt_subtest_f("semaphore-wait-%s", e->name)
+			sema_wait(fd, e);
+
+		/**
+		 * Test that event waits are correctly reported.
+		 */
+		if (e->class == I915_ENGINE_CLASS_RENDER)
+			igt_subtest_f("event-wait-%s", e->name)
+				event_wait(fd, e);
+
+		/**
+		 * Check that two perf clients do not influence each others
+		 * observations.
+		 */
+		igt_subtest_f("multi-client-%s", e->name)
+			multi_client(fd, e);
+	}
+
+	/**
+	 * Test that when all engines are loaded all loads are
+	 * correctly reported.
+	 */
+	igt_subtest("all-busy-check-all")
+		all_busy_check_all(fd, num_engines);
+
+	/**
+	 * Test that non-engine counters can be initialized and read. Apart
+	 * from the invalid metric which should fail.
+	 */
+	for (i = 0; i < num_other_metrics + 1; i++) {
+		igt_subtest_f("other-init-%u", i)
+			init_other(i, i < num_other_metrics);
+
+		igt_subtest_f("other-read-%u", i)
+			read_other(i, i < num_other_metrics);
+	}
+
+	/**
+	 * Test counters are not affected by CPU offline/online events.
+	 */
+	igt_subtest("cpu-hotplug")
+		cpu_hotplug(fd);
+
+	/**
+	 * Test GPU frequency.
+	 */
+	igt_subtest("frequency")
+		test_frequency(fd);
+
+	/**
+	 * Test interrupt count reporting.
+	 */
+	igt_subtest("interrupts")
+		test_interrupts(fd);
+
+	/**
+	 * Test RC6 residency reporting.
+	 */
+	igt_subtest("rc6")
+		test_rc6(fd);
+
+	/**
+	 * Test RC6p residency reporting.
+	 */
+	igt_subtest("rc6p")
+		test_rc6p(fd);
+
+	/**
+	 * Check render nodes are counted.
+	 */
+	igt_subtest_group {
+		int render_fd;
+
+		igt_fixture {
+			render_fd = drm_open_driver_render(DRIVER_INTEL);
+			igt_require_gem(render_fd);
+
+			gem_quiescent_gpu(fd);
+		}
+
+		for_each_engine_class_instance(fd, e) {
+			igt_subtest_f("render-node-busy-%s", e->name)
+				single(fd, e, true);
+		}
+
+		igt_fixture {
+			close(render_fd);
+		}
+	}
+}
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* ✗ Fi.CI.IGT: failure for IGT PMU support (rev8)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (11 preceding siblings ...)
  2017-10-10 13:48 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev9) Patchwork
@ 2017-10-10 15:19 ` Patchwork
  2017-10-10 18:42 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev11) Patchwork
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2017-10-10 15:19 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev8)
URL   : https://patchwork.freedesktop.org/series/28253/
State : failure

== Summary ==

Test gem_eio:
        Subgroup wait:
                pass       -> DMESG-WARN (shard-hsw) fdo#102886 +2
Test drv_module_reload:
        Subgroup basic-no-display:
                pass       -> DMESG-WARN (shard-hsw) fdo#102707
Test kms_flip_tiling:
        Subgroup flip-to-Yf-tiled:
                skip       -> INCOMPLETE (shard-hsw)

fdo#102886 https://bugs.freedesktop.org/show_bug.cgi?id=102886
fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707

shard-hsw        total:2634 pass:1402 dwarn:7   dfail:0   fail:13  skip:1159 time:9342s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_310/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t v6 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-10 14:17         ` [PATCH i-g-t v6 " Tvrtko Ursulin
@ 2017-10-10 16:39           ` Chris Wilson
  2017-10-11 12:54             ` [PATCH i-g-t v7 " Tvrtko Ursulin
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2017-10-10 16:39 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-10 15:17:54)
> +static void
> +busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
> +              const unsigned int num_engines)
> +{
> +       const struct intel_execution_engine2 *e_;
> +       uint64_t val[num_engines];
> +       int fd[num_engines];
> +       igt_spin_t *spin;
> +       unsigned int busy_idx, i;
> +
> +       i = 0;
> +       fd[0] = -1;
> +       for_each_engine_class_instance(fd, e_) {
> +               if (!gem_has_engine(gem_fd, e_->class, e_->instance))
> +                       continue;
> +               else if (e == e_)
> +                       busy_idx = i;
> +
> +               fd[i++] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
> +                                                         e_->instance),
> +                                    fd[0]);
> +       }
igt_assert(i == num_engines);

Feels like a bug waiting to happen; a trap.

> +static void
> +test_frequency(int gem_fd)
> +{
> +       const uint64_t duration_ns = 2e9;
> +       uint32_t min_freq, max_freq, boost_freq;
> +       uint64_t min[2], max[2], start[2];
> +       igt_spin_t *spin;
> +       int fd, sysfs;
> +
> +       sysfs = igt_sysfs_open(gem_fd, NULL);
> +       igt_require(sysfs >= 0);
> +
> +       min_freq = igt_sysfs_get_u32(sysfs, "gt_RPn_freq_mhz");
> +       max_freq = igt_sysfs_get_u32(sysfs, "gt_RP0_freq_mhz");
> +       boost_freq = igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz");
> +       igt_require(min_freq > 0 && max_freq > 0 && boost_freq > 0);
> +       igt_require(max_freq > min_freq);
> +       igt_require(boost_freq > min_freq);
> +
> +       fd = open_group(I915_PMU_REQUESTED_FREQUENCY, -1);
> +       open_group(I915_PMU_ACTUAL_FREQUENCY, fd);
> +
> +       /*
> +        * Set GPU to min frequency and read PMU counters.
> +        */
> +       igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", min_freq));
> +       igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == min_freq);
> +       igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", min_freq));
> +       igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
> +
> +       pmu_read_multi(fd, 2, start);
> +
> +       spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
> +       igt_spin_batch_set_timeout(spin, duration_ns);
> +       gem_sync(gem_fd, spin->handle);
> +
> +       pmu_read_multi(fd, 2, min);
> +       min[0] -= start[0];
> +       min[1] -= start[1];
> +
> +       igt_spin_batch_free(gem_fd, spin);
> +
> +       usleep(1e6);
> +
> +       /*
> +        * Set GPU to max frequency and read PMU counters.
> +        */
> +       igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", max_freq));
> +       igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == max_freq);
> +       igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", boost_freq));
> +       igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == boost_freq);
> +
> +       igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", max_freq));
> +       igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
> +
> +       pmu_read_multi(fd, 2, start);
> +
> +       spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
> +       igt_spin_batch_set_timeout(spin, duration_ns);
> +       gem_sync(gem_fd, spin->handle);
> +
> +       pmu_read_multi(fd, 2, max);
> +       max[0] -= start[0];
> +       max[1] -= start[1];
> +
> +       igt_spin_batch_free(gem_fd, spin);
> +
> +       /*
> +        * Restore min/max.
> +        */
> +       igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", min_freq));
> +       igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == min_freq);

The test is done at this point, and you are just being neat and tidy for
the next user. We don't need to do anything but warn:

igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz");
if (igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") != min_freq)
	igt_warn("Unable to restore min frequency to save value [%d MHz], now %d MHz\n",
		 min_freq, igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz"));
> +
> +       close(fd);

Add to the list of subtests that want a destructor (for clean error
paths).

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* ✓ Fi.CI.BAT: success for IGT PMU support (rev11)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (12 preceding siblings ...)
  2017-10-10 15:19 ` ✗ Fi.CI.IGT: failure for IGT PMU support (rev8) Patchwork
@ 2017-10-10 18:42 ` Patchwork
  2017-10-11  1:28 ` ✓ Fi.CI.IGT: " Patchwork
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2017-10-10 18:42 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev11)
URL   : https://patchwork.freedesktop.org/series/28253/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
d7c88290ab6a8393dc341b30c7fb5e27d2952901 syncobj: Add a test for SYNCOBJ_CREATE_SIGNALED

with latest DRM-Tip kernel build CI_DRM_3206
cc58e6d2bc38 drm-tip: 2017y-10m-10d-15h-40m-22s UTC integration manifest

Testlist changes:
+igt@perf_pmu@all-busy-check-all
+igt@perf_pmu@busy-bcs0
+igt@perf_pmu@busy-check-all-bcs0
+igt@perf_pmu@busy-check-all-rcs0
+igt@perf_pmu@busy-check-all-vcs0
+igt@perf_pmu@busy-check-all-vcs1
+igt@perf_pmu@busy-check-all-vecs0
+igt@perf_pmu@busy-no-semaphores-bcs0
+igt@perf_pmu@busy-no-semaphores-rcs0
+igt@perf_pmu@busy-no-semaphores-vcs0
+igt@perf_pmu@busy-no-semaphores-vcs1
+igt@perf_pmu@busy-no-semaphores-vecs0
+igt@perf_pmu@busy-rcs0
+igt@perf_pmu@busy-vcs0
+igt@perf_pmu@busy-vcs1
+igt@perf_pmu@busy-vecs0
+igt@perf_pmu@cpu-hotplug
+igt@perf_pmu@event-wait-rcs0
+igt@perf_pmu@frequency
+igt@perf_pmu@idle-bcs0
+igt@perf_pmu@idle-no-semaphores-bcs0
+igt@perf_pmu@idle-no-semaphores-rcs0
+igt@perf_pmu@idle-no-semaphores-vcs0
+igt@perf_pmu@idle-no-semaphores-vcs1
+igt@perf_pmu@idle-no-semaphores-vecs0
+igt@perf_pmu@idle-rcs0
+igt@perf_pmu@idle-vcs0
+igt@perf_pmu@idle-vcs1
+igt@perf_pmu@idle-vecs0
+igt@perf_pmu@init-busy-bcs0
+igt@perf_pmu@init-busy-rcs0
+igt@perf_pmu@init-busy-vcs0
+igt@perf_pmu@init-busy-vcs1
+igt@perf_pmu@init-busy-vecs0
+igt@perf_pmu@init-sema-bcs0
+igt@perf_pmu@init-sema-rcs0
+igt@perf_pmu@init-sema-vcs0
+igt@perf_pmu@init-sema-vcs1
+igt@perf_pmu@init-sema-vecs0
+igt@perf_pmu@init-wait-bcs0
+igt@perf_pmu@init-wait-rcs0
+igt@perf_pmu@init-wait-vcs0
+igt@perf_pmu@init-wait-vcs1
+igt@perf_pmu@init-wait-vecs0
+igt@perf_pmu@interrupts
+igt@perf_pmu@invalid-init
+igt@perf_pmu@most-busy-check-all-bcs0
+igt@perf_pmu@most-busy-check-all-rcs0
+igt@perf_pmu@most-busy-check-all-vcs0
+igt@perf_pmu@most-busy-check-all-vcs1
+igt@perf_pmu@most-busy-check-all-vecs0
+igt@perf_pmu@multi-client-bcs0
+igt@perf_pmu@multi-client-rcs0
+igt@perf_pmu@multi-client-vcs0
+igt@perf_pmu@multi-client-vcs1
+igt@perf_pmu@multi-client-vecs0
+igt@perf_pmu@other-init-0
+igt@perf_pmu@other-init-1
+igt@perf_pmu@other-init-2
+igt@perf_pmu@other-init-3
+igt@perf_pmu@other-init-4
+igt@perf_pmu@other-init-5
+igt@perf_pmu@other-init-6
+igt@perf_pmu@other-read-0
+igt@perf_pmu@other-read-1
+igt@perf_pmu@other-read-2
+igt@perf_pmu@other-read-3
+igt@perf_pmu@other-read-4
+igt@perf_pmu@other-read-5
+igt@perf_pmu@other-read-6
+igt@perf_pmu@rc6
+igt@perf_pmu@rc6p
+igt@perf_pmu@render-node-busy-bcs0
+igt@perf_pmu@render-node-busy-rcs0
+igt@perf_pmu@render-node-busy-vcs0
+igt@perf_pmu@render-node-busy-vcs1
+igt@perf_pmu@render-node-busy-vecs0
+igt@perf_pmu@semaphore-wait-bcs0
+igt@perf_pmu@semaphore-wait-rcs0
+igt@perf_pmu@semaphore-wait-vcs0
+igt@perf_pmu@semaphore-wait-vcs1
+igt@perf_pmu@semaphore-wait-vecs0

Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                incomplete -> PASS       (fi-kbl-7560u) fdo#102846 +1
Test drv_module_reload:
        Subgroup basic-reload-inject:
                dmesg-warn -> INCOMPLETE (fi-cfl-s) fdo#103022

fdo#102846 https://bugs.freedesktop.org/show_bug.cgi?id=102846
fdo#103022 https://bugs.freedesktop.org/show_bug.cgi?id=103022

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:458s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:478s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:396s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:578s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:284s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:529s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:527s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:537s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:530s
fi-cfl-s         total:288  pass:253  dwarn:3   dfail:0   fail:0   skip:31 
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:634s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:436s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:606s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:440s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:421s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:464s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:508s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:476s
fi-kbl-7500u     total:289  pass:263  dwarn:1   dfail:0   fail:1   skip:24  time:495s
fi-kbl-7560u     total:247  pass:230  dwarn:0   dfail:0   fail:0   skip:16 
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:486s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:589s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:665s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:476s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:664s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:535s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:516s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:471s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:586s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:440s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_314/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* ✓ Fi.CI.IGT: success for IGT PMU support (rev11)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (13 preceding siblings ...)
  2017-10-10 18:42 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev11) Patchwork
@ 2017-10-11  1:28 ` Patchwork
  2017-10-11 14:09 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev12) Patchwork
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2017-10-11  1:28 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev11)
URL   : https://patchwork.freedesktop.org/series/28253/
State : success

== Summary ==

Test kms_flip:
        Subgroup flip-vs-fences-interruptible:
                pass       -> FAIL       (shard-hsw) fdo#102946
Test drv_module_reload:
        Subgroup basic-no-display:
                pass       -> DMESG-WARN (shard-hsw) fdo#102707
Test gem_eio:
        Subgroup in-flight-contexts:
                pass       -> DMESG-WARN (shard-hsw) fdo#102886 +2
Test gem_flink_race:
        Subgroup flink_close:
                pass       -> FAIL       (shard-hsw) fdo#102655

fdo#102946 https://bugs.freedesktop.org/show_bug.cgi?id=102946
fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707
fdo#102886 https://bugs.freedesktop.org/show_bug.cgi?id=102886
fdo#102655 https://bugs.freedesktop.org/show_bug.cgi?id=102655

shard-hsw        total:2634 pass:1427 dwarn:7   dfail:0   fail:15  skip:1185 time:9631s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_314/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v7 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-10 16:39           ` Chris Wilson
@ 2017-10-11 12:54             ` Tvrtko Ursulin
  2017-11-21 11:50               ` Chris Wilson
  2017-11-21 18:21               ` [PATCH i-g-t v8 " Tvrtko Ursulin
  0 siblings, 2 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-10-11 12:54 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A bunch of tests for the new i915 PMU feature.

Parts of the code were initialy sketched by Dmitry Rogozhkin.

v2: (Most suggestions by Chris Wilson)
 * Add new class/instance based engine list.
 * Add gem_has_engine/gem_require_engine to work with class/instance.
 * Use the above two throughout the test.
 * Shorten tests to 100ms busy batches, seems enough.
 * Add queued counter sanity checks.
 * Use igt_nsec_elapsed.
 * Skip on perf -ENODEV in some tests instead of embedding knowledge locally.
 * Fix multi ordering for busy accounting.
 * Use new guranteed_usleep when sleep time is asserted on.
 * Check for no queued when idle/busy.
 * Add queued counter init test.
 * Add queued tests.
 * Consolidate and increase multiple busy engines tests to most-busy and
   all-busy tests.
 * Guarantte interrupts by using fences.
 * Test RC6 via forcewake.

v3:
 * Tweak assert in interrupts subtest.
 * Sprinkle of comments.
 * Fix multi-client test which got broken in v2.

v4:
 * Measured instead of guaranteed sleep.
 * Missing sync in no_sema.
 * Log busyness before asserts for debug.
 * access(2) instead of open(2) to determine if cpu0 is hotpluggable.
 * Test frequency reporting via min/max setting instead assuming.
   ^^ All above suggested by Chris Wilson. ^^
 * Drop queued subtests to match i915.
 * Use long batches with fences to ensure interrupts.
 * Test render node as well.

v5:
 * Add to meson build. (Petri Latvala)
 * Use 1eN constants. (Chris Wilson)
 * Add tests for semaphore and event waiting.

v6:
 * Fix interrupts subtest by polling the fence from the "outside".
   (Chris Wilson)

v7:
 * Assert number of initialized engines matches the expectation.
   (Chris Wilson)
 * Warn instead of skipping if we couldn't restore the initial
   frequency. (Chris Wilson)
 * Move all asserts to after the test cleanup (just a tidy).
 * More 1eN notation for timeouts.
 * Bump the tolerance to 5% since I saw a few noisy runs with
   sampling counters.
 * Always start the PMU before submitting batches to lower
   reliance on i915 doing the delayed engine busy stats disable.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v6)
---
 lib/igt_gt.c           |   50 ++
 lib/igt_gt.h           |   38 ++
 lib/igt_perf.h         |    9 +-
 tests/Makefile.am      |    1 +
 tests/Makefile.sources |    1 +
 tests/meson.build      |    1 +
 tests/perf_pmu.c       | 1242 ++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 1334 insertions(+), 8 deletions(-)
 create mode 100644 tests/perf_pmu.c

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index b3f3b3809eee..4c75811fb1b3 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -568,3 +568,53 @@ bool gem_can_store_dword(int fd, unsigned int engine)
 
 	return true;
 }
+
+const struct intel_execution_engine2 intel_execution_engines2[] = {
+	{ "rcs0", I915_ENGINE_CLASS_RENDER, 0 },
+	{ "bcs0", I915_ENGINE_CLASS_COPY, 0 },
+	{ "vcs0", I915_ENGINE_CLASS_VIDEO, 0 },
+	{ "vcs1", I915_ENGINE_CLASS_VIDEO, 1 },
+	{ "vecs0", I915_ENGINE_CLASS_VIDEO_ENHANCE, 0 },
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance)
+{
+	if (class != I915_ENGINE_CLASS_VIDEO)
+		igt_assert(instance == 0);
+	else
+		igt_assert(instance >= 0 && instance <= 1);
+
+	switch (class) {
+	case I915_ENGINE_CLASS_RENDER:
+		return I915_EXEC_RENDER;
+	case I915_ENGINE_CLASS_COPY:
+		return I915_EXEC_BLT;
+	case I915_ENGINE_CLASS_VIDEO:
+		if (instance == 0) {
+			if (gem_has_bsd2(gem_fd))
+				return I915_EXEC_BSD | I915_EXEC_BSD_RING1;
+			else
+				return I915_EXEC_BSD;
+
+		} else {
+			return I915_EXEC_BSD | I915_EXEC_BSD_RING2;
+		}
+	case I915_ENGINE_CLASS_VIDEO_ENHANCE:
+		return I915_EXEC_VEBOX;
+	case I915_ENGINE_CLASS_OTHER:
+	default:
+		igt_assert(0);
+	};
+}
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance)
+{
+	return gem_has_ring(gem_fd,
+			    gem_class_instance_to_eb_flags(gem_fd, class,
+							   instance));
+}
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index 2579cbd37be7..fb67ae1a7d1f 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -25,6 +25,7 @@
 #define IGT_GT_H
 
 #include "igt_debugfs.h"
+#include "igt_core.h"
 
 void igt_require_hang_ring(int fd, int ring);
 
@@ -80,4 +81,41 @@ extern const struct intel_execution_engine {
 
 bool gem_can_store_dword(int fd, unsigned int engine);
 
+extern const struct intel_execution_engine2 {
+	const char *name;
+	int class;
+	int instance;
+} intel_execution_engines2[];
+
+#define for_each_engine_class_instance(fd__, e__) \
+	for ((e__) = intel_execution_engines2;\
+	     (e__)->name; \
+	     (e__)++)
+
+enum drm_i915_gem_engine_class {
+	I915_ENGINE_CLASS_OTHER = 0,
+	I915_ENGINE_CLASS_RENDER = 1,
+	I915_ENGINE_CLASS_COPY = 2,
+	I915_ENGINE_CLASS_VIDEO = 3,
+	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
+	I915_ENGINE_CLASS_MAX /* non-ABI */
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance);
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance);
+
+static inline
+void gem_require_engine(int gem_fd,
+			enum drm_i915_gem_engine_class class,
+			unsigned int instance)
+{
+	igt_require(gem_has_engine(gem_fd, class, instance));
+}
+
 #endif /* IGT_GT_H */
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index b1f525739c69..5428feb0c746 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -29,14 +29,7 @@
 
 #include <linux/perf_event.h>
 
-enum drm_i915_gem_engine_class {
-	I915_ENGINE_CLASS_OTHER = 0,
-	I915_ENGINE_CLASS_RENDER = 1,
-	I915_ENGINE_CLASS_COPY = 2,
-	I915_ENGINE_CLASS_VIDEO = 3,
-	I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
-	I915_ENGINE_CLASS_MAX /* non-ABI */
-};
+#include "igt_gt.h"
 
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 89a970153992..17ee1be08d8a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -131,6 +131,7 @@ gen7_forcewake_mt_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gen7_forcewake_mt_LDADD = $(LDADD) -lpthread
 gem_userptr_blits_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_userptr_blits_LDADD = $(LDADD) -lpthread
+perf_pmu_LDADD = $(LDADD) $(top_builddir)/lib/libigt_perf.la
 
 gem_wait_LDADD = $(LDADD) -lrt
 kms_flip_LDADD = $(LDADD) -lrt -lpthread
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index bb6652e2fa3b..9830472cbbba 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -217,6 +217,7 @@ TESTS_progs = \
 	kms_vblank \
 	meta_test \
 	perf \
+	perf_pmu \
 	pm_backlight \
 	pm_lpsp \
 	pm_rc6_residency \
diff --git a/tests/meson.build b/tests/meson.build
index 6cb3584a4dd9..12d5706faaeb 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -197,6 +197,7 @@ test_progs = [
 	'kms_vblank',
 	'meta_test',
 	'perf',
+	'perf_pmu',
 	'pm_backlight',
 	'pm_lpsp',
 	'pm_rc6_residency',
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
new file mode 100644
index 000000000000..8585ed7bcee8
--- /dev/null
+++ b/tests/perf_pmu.c
@@ -0,0 +1,1242 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/times.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <time.h>
+#include <poll.h>
+
+#include "igt.h"
+#include "igt_core.h"
+#include "igt_perf.h"
+#include "igt_sysfs.h"
+
+IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
+
+const double tolerance = 0.05f;
+const unsigned long batch_duration_ns = 100e6;
+
+static int open_pmu(uint64_t config)
+{
+	int fd;
+
+	fd = perf_i915_open(config);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static int open_group(uint64_t config, int group)
+{
+	int fd;
+
+	fd = perf_i915_open_group(config, group);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static void
+init(int gem_fd, const struct intel_execution_engine2 *e, uint8_t sample)
+{
+	int fd;
+
+	fd = open_pmu(__I915_PMU_ENGINE(e->class, e->instance, sample));
+
+	close(fd);
+}
+
+static uint64_t pmu_read_single(int fd)
+{
+	uint64_t data[2];
+
+	igt_assert_eq(read(fd, data, sizeof(data)), sizeof(data));
+
+	return data[0];
+}
+
+static void pmu_read_multi(int fd, unsigned int num, uint64_t *val)
+{
+	uint64_t buf[2 + num];
+	unsigned int i;
+
+	igt_assert_eq(read(fd, buf, sizeof(buf)), sizeof(buf));
+
+	for (i = 0; i < num; i++)
+		val[i] = buf[2 + i];
+}
+
+#define assert_within_epsilon(x, ref, tolerance) \
+	igt_assert_f((double)(x) <= (1.0 + tolerance) * (double)ref && \
+		     (double)(x) >= (1.0 - tolerance) * (double)ref, \
+		     "'%s' != '%s' (%f not within %f%% tolerance of %f)\n",\
+		     #x, #ref, (double)x, tolerance * 100.0, (double)ref)
+
+/*
+ * Helper for cases where we assert on time spent sleeping (directly or
+ * indirectly), so make it more robust by ensuring the system sleep time
+ * is within test tolerance to start with.
+ */
+static unsigned int measured_usleep(unsigned int usec)
+{
+	uint64_t slept = 0;
+
+	while (usec > 0) {
+		struct timespec start = { };
+		uint64_t this_sleep;
+
+		igt_nsec_elapsed(&start);
+		usleep(usec);
+		this_sleep = igt_nsec_elapsed(&start);
+		slept += this_sleep;
+		if (this_sleep > usec * 1000)
+			break;
+		usec -= this_sleep;
+	}
+
+	return slept;
+}
+
+static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
+}
+
+static void
+single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	double ref = busy ? batch_duration_ns : 0.0f;
+	igt_spin_t *spin;
+	uint64_t val;
+	int fd;
+
+	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	val = pmu_read_single(fd);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+}
+
+static void log_busy(int fd, unsigned int num_engines, uint64_t *val)
+{
+	char buf[1024];
+	int rem = sizeof(buf);
+	unsigned int i;
+	char *p = buf;
+
+	for (i = 0; i < num_engines; i++) {
+		int len;
+
+		len = snprintf(p, rem, "%u=%" PRIu64 "\n",  i, val[i]);
+		igt_assert(len > 0);
+		rem -= len;
+		p += len;
+	}
+
+	igt_info("%s", buf);
+}
+
+static void
+busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+	       const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin;
+	unsigned int busy_idx, i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+		else if (e == e_)
+			busy_idx = i;
+
+		fd[i++] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							  e_->instance),
+				     fd[0]);
+	}
+
+	igt_assert_eq(i, num_engines);
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+
+	assert_within_epsilon(val[busy_idx], batch_duration_ns, tolerance);
+	for (i = 0; i < num_engines; i++) {
+		if (i == busy_idx)
+			continue;
+		assert_within_epsilon(val[i], 0.0f, tolerance);
+	}
+
+}
+
+static void
+most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+		    const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int idle_idx, i;
+
+	gem_require_engine(gem_fd, e->class, e->instance);
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							e_->instance),
+				   fd[0]);
+
+		if (e == e_) {
+			idle_idx = i;
+		} else {
+			spin[i] = igt_spin_batch_new(gem_fd, 0,
+						     e2ring(gem_fd, e_), 0);
+			igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+		}
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			gem_sync(gem_fd, spin[i]->handle);
+	}
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			igt_spin_batch_free(gem_fd, spin[i]);
+	}
+	close(fd[0]);
+
+	for (i = 0; i < num_engines; i++) {
+		if (i == idle_idx)
+			assert_within_epsilon(val[i], 0.0f, tolerance);
+		else
+			assert_within_epsilon(val[i], batch_duration_ns,
+					      tolerance);
+	}
+}
+
+static void
+all_busy_check_all(int gem_fd, const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e) {
+		if (!gem_has_engine(gem_fd, e->class, e->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e->class, e->instance),
+				   fd[0]);
+
+		spin[i] = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++)
+		gem_sync(gem_fd, spin[i]->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++)
+		igt_spin_batch_free(gem_fd, spin[i]);
+	close(fd[0]);
+
+	for (i = 0; i < num_engines; i++)
+		assert_within_epsilon(val[i], batch_duration_ns, tolerance);
+}
+
+static void
+no_sema(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd;
+
+	fd = open_group(I915_PMU_ENGINE_SEMA(e->class, e->instance), -1);
+	open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, val);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val[0], 0.0f, tolerance);
+	assert_within_epsilon(val[1], 0.0f, tolerance);
+}
+
+#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
+#define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
+#define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+
+static void
+sema_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_relocation_entry reloc = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	struct drm_i915_gem_exec_object2 obj[2];
+	uint32_t bb_handle, obj_handle;
+	unsigned long slept;
+	uint32_t *obj_ptr;
+	uint32_t batch[6];
+	uint64_t val[2];
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 8);
+
+	/**
+	 * Setup up a batchbuffer with a polling semaphore wait command which
+	 * will wait on an value in a shared bo to change. This way we are able
+	 * to control how much time we will spend in this bb.
+	 */
+
+	bb_handle = gem_create(gem_fd, 4096);
+	obj_handle = gem_create(gem_fd, 4096);
+
+	obj_ptr = gem_mmap__wc(gem_fd, obj_handle, 0, 4096, PROT_WRITE);
+
+	batch[0] = MI_SEMAPHORE_WAIT |
+		   MI_SEMAPHORE_POLL |
+		   MI_SEMAPHORE_SAD_GTE_SDD;
+	batch[1] = 1;
+	batch[2] = 0x0;
+	batch[3] = 0x0;
+	batch[4] = MI_NOOP;
+	batch[5] = MI_BATCH_BUFFER_END;
+
+	gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+	reloc.target_handle = obj_handle;
+	reloc.offset = 2 * sizeof(uint32_t);
+	reloc.read_domains = I915_GEM_DOMAIN_RENDER;
+
+	memset(obj, 0, sizeof(obj));
+
+	obj[0].handle = obj_handle;
+
+	obj[1].handle = bb_handle;
+	obj[1].relocation_count = 1;
+	obj[1].relocs_ptr = to_user_pointer(&reloc);
+
+	eb.buffer_count = 2;
+	eb.buffers_ptr = to_user_pointer(obj);
+	eb.flags = e2ring(gem_fd, e);
+
+	/**
+	 * Start the semaphore wait PMU and after some known time let the above
+	 * semaphore wait command finish. Then check that the PMU is reporting
+	 * to expected time spent in semaphore wait state.
+	 */
+
+	fd = open_pmu(I915_PMU_ENGINE_SEMA(e->class, e->instance));
+
+	val[0] = pmu_read_single(fd);
+
+	gem_execbuf(gem_fd, &eb);
+
+	slept = measured_usleep(100e3);
+
+	*obj_ptr = 1;
+
+	gem_sync(gem_fd, bb_handle);
+
+	val[1] = pmu_read_single(fd);
+
+	munmap(obj_ptr, 4096);
+	gem_close(gem_fd, obj_handle);
+	gem_close(gem_fd, bb_handle);
+	close(fd);
+
+	assert_within_epsilon(val[1] - val[0], slept, tolerance);
+}
+
+#define   MI_WAIT_FOR_PIPE_C_VBLANK (1<<21)
+#define   MI_WAIT_FOR_PIPE_B_VBLANK (1<<11)
+#define   MI_WAIT_FOR_PIPE_A_VBLANK (1<<3)
+
+typedef struct {
+	igt_display_t display;
+	struct igt_fb primary_fb;
+	igt_output_t *output;
+	enum pipe pipe;
+} data_t;
+
+static void prepare_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	drmModeModeInfo *mode;
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	/* select the pipe we want to use */
+	igt_output_set_pipe(output, data->pipe);
+
+	/* create and set the primary plane fb */
+	mode = igt_output_get_mode(output);
+	igt_create_color_fb(fd, mode->hdisplay, mode->vdisplay,
+			    DRM_FORMAT_XRGB8888,
+			    LOCAL_DRM_FORMAT_MOD_NONE,
+			    0.0, 0.0, 0.0,
+			    &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, &data->primary_fb);
+
+	igt_display_commit(display);
+
+	igt_wait_for_vblank(fd, data->pipe);
+}
+
+static void cleanup_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	igt_remove_fb(fd, &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, NULL);
+
+	igt_output_set_pipe(output, PIPE_ANY);
+	igt_display_commit(display);
+}
+
+static int wait_vblank(int fd, union drm_wait_vblank *vbl)
+{
+	int err;
+
+	err = 0;
+	if (igt_ioctl(fd, DRM_IOCTL_WAIT_VBLANK, vbl))
+		err = -errno;
+
+	return err;
+}
+
+static void
+event_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_exec_object2 obj = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	data_t data;
+	igt_display_t *display = &data.display;
+	const uint32_t DERRMR = 0x44050;
+	unsigned int valid_tests = 0;
+	uint32_t batch[8], *b;
+	igt_output_t *output;
+	uint32_t bb_handle;
+	uint32_t reg;
+	enum pipe p;
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
+	igt_require(intel_register_access_init(intel_get_pci_device(),
+					       false, gem_fd) == 0);
+
+	/**
+	 * We will use the display to render event forwarind so need to
+	 * program the DERRMR register and restore it at exit.
+	 *
+	 * We will emit a MI_WAIT_FOR_EVENT listening for vblank events,
+	 * have a background helper to indirectly enable vblank irqs, and
+	 * listen to the recorded time spent in engine wait state as reported
+	 * by the PMU.
+	 */
+	reg = intel_register_read(DERRMR);
+
+	kmstest_set_vt_graphics_mode();
+	igt_display_init(&data.display, gem_fd);
+
+	bb_handle = gem_create(gem_fd, 4096);
+
+	b = batch;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
+	*b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg;
+	*b++ = MI_BATCH_BUFFER_END;
+
+	obj.handle = bb_handle;
+
+	eb.buffer_count = 1;
+	eb.buffers_ptr = to_user_pointer(&obj);
+	eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
+
+	for_each_pipe_with_valid_output(display, p, output) {
+		struct igt_helper_process waiter = { };
+		const unsigned int frames = 3;
+		unsigned int frame;
+		uint64_t val[2];
+
+		batch[3] = MI_WAIT_FOR_EVENT;
+		switch (p) {
+		case PIPE_A:
+			batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+			break;
+		case PIPE_B:
+			batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+			break;
+		case PIPE_C:
+			batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+			break;
+		default:
+			continue;
+		}
+
+		gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+		data.pipe = p;
+		prepare_crtc(&data, gem_fd, output);
+
+		fd = open_pmu(I915_PMU_ENGINE_WAIT(e->class, e->instance));
+
+		val[0] = pmu_read_single(fd);
+
+		igt_fork_helper(&waiter) {
+			const uint32_t pipe_id_flag =
+					kmstest_get_vbl_flag(data.pipe);
+
+			for (;;) {
+				union drm_wait_vblank vbl = { };
+
+				vbl.request.type = DRM_VBLANK_RELATIVE;
+				vbl.request.type |= pipe_id_flag;
+				vbl.request.sequence = 1;
+				igt_assert_eq(wait_vblank(gem_fd, &vbl), 0);
+			}
+		}
+
+		for (frame = 0; frame < frames; frame++) {
+			gem_execbuf(gem_fd, &eb);
+			gem_sync(gem_fd, bb_handle);
+		}
+
+		igt_stop_helper(&waiter);
+
+		val[1] = pmu_read_single(fd);
+
+		close(fd);
+
+		cleanup_crtc(&data, gem_fd, output);
+		valid_tests++;
+
+		igt_assert(val[1] - val[0] > 0);
+	}
+
+	gem_close(gem_fd, bb_handle);
+
+	intel_register_access_fini();
+
+	igt_require_f(valid_tests,
+		      "no valid crtc/connector combinations found\n");
+}
+
+static void
+multi_client(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	uint64_t config = I915_PMU_ENGINE_BUSY(e->class, e->instance);
+	unsigned int slept;
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd[2];
+
+	fd[0] = open_pmu(config);
+
+	/*
+	 * Second PMU client which is initialized after the first one,
+	 * and exists before it, should not affect accounting as reported
+	 * in the first client.
+	 */
+	fd[1] = open_pmu(config);
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	slept = measured_usleep(batch_duration_ns / 3000);
+	val[1] = pmu_read_single(fd[1]);
+	close(fd[1]);
+
+	gem_sync(gem_fd, spin->handle);
+
+	val[0] = pmu_read_single(fd[0]);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+
+	assert_within_epsilon(val[0], batch_duration_ns, tolerance);
+	assert_within_epsilon(val[1], slept, tolerance);
+}
+
+/**
+ * Tests that i915 PMU corectly errors out in invalid initialization.
+ * i915 PMU is uncore PMU, thus:
+ *  - sampling period is not supported
+ *  - pid > 0 is not supported since we can't count per-process (we count
+ *    per whole system)
+ *  - cpu != 0 is not supported since i915 PMU exposes cpumask for CPU0
+ */
+static void invalid_init(void)
+{
+	struct perf_event_attr attr;
+	int pid, cpu;
+
+#define ATTR_INIT() \
+do { \
+	memset(&attr, 0, sizeof (attr)); \
+	attr.config = I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0); \
+	attr.type = i915_type_id(); \
+	igt_assert(attr.type != 0); \
+} while(0)
+
+	ATTR_INIT();
+	attr.sample_period = 100;
+	pid = -1;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = 0;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = -1;
+	cpu = 1;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, ENODEV);
+}
+
+static void init_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	close(fd);
+}
+
+static void read_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	(void)pmu_read_single(fd);
+
+	close(fd);
+}
+
+static bool cpu0_hotplug_support(void)
+{
+	return access("/sys/devices/system/cpu/cpu0/online", W_OK) == 0;
+}
+
+static void cpu_hotplug(int gem_fd)
+{
+	struct timespec start = { };
+	igt_spin_t *spin;
+	uint64_t val, ref;
+	int fd;
+
+	igt_require(cpu0_hotplug_support());
+
+	fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
+	igt_assert(fd >= 0);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+
+	igt_nsec_elapsed(&start);
+
+	/*
+	 * Toggle online status of all the CPUs in a child process and ensure
+	 * this has not affected busyness stats in the parent.
+	 */
+	igt_fork(child, 1) {
+		int cpu = 0;
+
+		for (;;) {
+			char name[128];
+			int cpufd;
+
+			sprintf(name, "/sys/devices/system/cpu/cpu%d/online",
+				cpu);
+			cpufd = open(name, O_WRONLY);
+			if (cpufd == -1) {
+				igt_assert(cpu > 0);
+				break;
+			}
+			igt_assert_eq(write(cpufd, "0", 2), 2);
+
+			usleep(1e6);
+
+			igt_assert_eq(write(cpufd, "1", 2), 2);
+
+			close(cpufd);
+			cpu++;
+		}
+	}
+
+	igt_waitchildren();
+
+	igt_spin_batch_end(spin);
+	gem_sync(gem_fd, spin->handle);
+
+	ref = igt_nsec_elapsed(&start);
+	val = pmu_read_single(fd);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+}
+
+static unsigned long calibrate_nop(int fd, const unsigned int calibration_us)
+{
+	const unsigned int cal_min_us = calibration_us * 3;
+	const unsigned int tolerance_pct = 10;
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	const unsigned int loops = 17;
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	struct timespec t_begin = { };
+	long size, last_size;
+	unsigned long ns;
+
+	igt_nsec_elapsed(&t_begin);
+
+	size = 256 * 1024;
+	do {
+		struct timespec t_start = { };
+
+		obj.handle = gem_create(fd, size);
+		gem_write(fd, obj.handle, size - sizeof(bbe), &bbe,
+			  sizeof(bbe));
+		gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		igt_nsec_elapsed(&t_start);
+
+		for (int loop = 0; loop < loops; loop++)
+			gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		ns = igt_nsec_elapsed(&t_start);
+
+		gem_close(fd, obj.handle);
+
+		last_size = size;
+		size = calibration_us * 1000 * size * loops / ns;
+		size = ALIGN(size, sizeof(uint32_t));
+	} while (igt_nsec_elapsed(&t_begin) / 1000 < cal_min_us ||
+		 abs(size - last_size) > (size * tolerance_pct / 100));
+
+	return size / sizeof(uint32_t);
+}
+
+static void exec_nop(int gem_fd, unsigned long sz)
+{
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	struct pollfd pfd;
+	int fence;
+
+	sz = ALIGN(sz, sizeof(uint32_t));
+
+	obj.handle = gem_create(gem_fd, sz);
+	gem_write(gem_fd, obj.handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
+
+	eb.flags = I915_EXEC_RENDER | I915_EXEC_FENCE_OUT;
+
+	gem_execbuf_wr(gem_fd, &eb);
+	fence = eb.rsvd2 >> 32;
+
+	/*
+	 * Poll on the output fence to ensure user interrupts will be
+	 * generated and listened to.
+	 */
+	pfd.fd = fence;
+	pfd.events = POLLIN;
+	igt_assert_eq(poll(&pfd, 1, -1), 1);
+
+	close(fence);
+	gem_close(gem_fd, obj.handle);
+}
+
+static void
+test_interrupts(int gem_fd)
+{
+	const unsigned int calibration_us = 250000;
+	const unsigned int batch_len_us = 100000;
+	const unsigned int batch_count = 3e6 / batch_len_us;
+	uint64_t idle, busy, prev;
+	unsigned long cal, sz;
+	unsigned int i;
+	int fd;
+
+	cal = calibrate_nop(gem_fd, calibration_us);
+	sz = batch_len_us * cal / calibration_us;
+
+	fd = open_pmu(I915_PMU_INTERRUPTS);
+
+	gem_quiescent_gpu(gem_fd);
+
+	/* Wait for idle state. */
+	prev = pmu_read_single(fd);
+	idle = prev + 1;
+	while (idle != prev) {
+		usleep(1e6);
+		prev = idle;
+		idle = pmu_read_single(fd);
+	}
+
+	igt_assert_eq(idle - prev, 0);
+
+	/*
+	 * Send some no-op batches waiting on output fences to
+	 * ensure interrupts.
+	 */
+	for (i = 0; i < batch_count; i++)
+		exec_nop(gem_fd, sz);
+
+	/* Check at least as many interrupts has been generated. */
+	busy = pmu_read_single(fd) - idle;
+	close(fd);
+
+	igt_assert(busy >= batch_count);
+}
+
+static void
+test_frequency(int gem_fd)
+{
+	const uint64_t duration_ns = 2e9;
+	uint32_t min_freq, max_freq, boost_freq;
+	uint64_t min[2], max[2], start[2];
+	igt_spin_t *spin;
+	int fd, sysfs;
+
+	sysfs = igt_sysfs_open(gem_fd, NULL);
+	igt_require(sysfs >= 0);
+
+	min_freq = igt_sysfs_get_u32(sysfs, "gt_RPn_freq_mhz");
+	max_freq = igt_sysfs_get_u32(sysfs, "gt_RP0_freq_mhz");
+	boost_freq = igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz");
+	igt_require(min_freq > 0 && max_freq > 0 && boost_freq > 0);
+	igt_require(max_freq > min_freq);
+	igt_require(boost_freq > min_freq);
+
+	fd = open_group(I915_PMU_REQUESTED_FREQUENCY, -1);
+	open_group(I915_PMU_ACTUAL_FREQUENCY, fd);
+
+	/*
+	 * Set GPU to min frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == min_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, min);
+	min[0] -= start[0];
+	min[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	usleep(1e6);
+
+	/*
+	 * Set GPU to max frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == max_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", boost_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == boost_freq);
+
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, max);
+	max[0] -= start[0];
+	max[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	/*
+	 * Restore min/max.
+	 */
+	igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", min_freq);
+	if (igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") != min_freq)
+		igt_warn("Unable to restore min frequency to saved value [%u MHz], now %u MHz\n",
+			 min_freq, igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz"));
+	close(fd);
+
+	igt_assert(min[0] < max[0]);
+	igt_assert(min[1] < max[1]);
+}
+
+static void
+test_rc6(int gem_fd)
+{
+	int64_t duration_ns = 2e9;
+	uint64_t idle, busy, prev;
+	unsigned int slept;
+	int fd, fw;
+
+	fd = open_pmu(I915_PMU_RC6_RESIDENCY);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	prev = pmu_read_single(fd);
+	slept = measured_usleep(duration_ns / 1000);
+	idle = pmu_read_single(fd);
+
+	assert_within_epsilon(idle - prev, slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	prev = pmu_read_single(fd);
+	usleep(duration_ns / 1000);
+	busy = pmu_read_single(fd);
+
+	close(fw);
+	close(fd);
+
+	assert_within_epsilon(busy - prev, 0.0, tolerance);
+}
+
+static void
+test_rc6p(int gem_fd)
+{
+	int64_t duration_ns = 2e9;
+	unsigned int num_pmu = 1;
+	uint64_t idle[3], busy[3], prev[3];
+	unsigned int slept, i;
+	int fd, ret, fw;
+
+	fd = open_group(I915_PMU_RC6_RESIDENCY, -1);
+	ret = perf_i915_open_group(I915_PMU_RC6p_RESIDENCY, fd);
+	if (ret > 0) {
+		num_pmu++;
+		ret = perf_i915_open_group(I915_PMU_RC6pp_RESIDENCY, fd);
+		if (ret > 0)
+			num_pmu++;
+	}
+
+	igt_require(num_pmu == 3);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	pmu_read_multi(fd, num_pmu, prev);
+	slept = measured_usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, idle);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(idle[i] - prev[i], slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	pmu_read_multi(fd, num_pmu, prev);
+	usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, busy);
+
+	close(fw);
+	close(fd);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(busy[i] - prev[i], 0.0, tolerance);
+}
+
+igt_main
+{
+	const unsigned int num_other_metrics =
+				I915_PMU_LAST - __I915_PMU_OTHER(0) + 1;
+	unsigned int num_engines = 0;
+	int fd = -1;
+	const struct intel_execution_engine2 *e;
+	unsigned int i;
+
+	igt_fixture {
+		fd = drm_open_driver_master(DRIVER_INTEL);
+
+		igt_require_gem(fd);
+		igt_require(i915_type_id() > 0);
+
+		for_each_engine_class_instance(fd, e) {
+			if (gem_has_engine(fd, e->class, e->instance))
+				num_engines++;
+		}
+	}
+
+	/**
+	 * Test invalid access via perf API is rejected.
+	 */
+	igt_subtest("invalid-init")
+		invalid_init();
+
+	for_each_engine_class_instance(fd, e) {
+		/**
+		 * Test that a single engine metric can be initialized.
+		 */
+		igt_subtest_f("init-busy-%s", e->name)
+			init(fd, e, I915_SAMPLE_BUSY);
+
+		igt_subtest_f("init-wait-%s", e->name)
+			init(fd, e, I915_SAMPLE_WAIT);
+
+		igt_subtest_f("init-sema-%s", e->name)
+			init(fd, e, I915_SAMPLE_SEMA);
+
+		/**
+		 * Test that engines show no load when idle.
+		 */
+		igt_subtest_f("idle-%s", e->name)
+			single(fd, e, false);
+
+		/**
+		 * Test that a single engine reports load correctly.
+		 */
+		igt_subtest_f("busy-%s", e->name)
+			single(fd, e, true);
+
+		/**
+		 * Test that when one engine is loaded other report no load.
+		 */
+		igt_subtest_f("busy-check-all-%s", e->name)
+			busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that when all except one engine are loaded all loads
+		 * are correctly reported.
+		 */
+		igt_subtest_f("most-busy-check-all-%s", e->name)
+			most_busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that semphore counters report no activity on idle
+		 * or busy engines.
+		 */
+		igt_subtest_f("idle-no-semaphores-%s", e->name)
+			no_sema(fd, e, false);
+
+		igt_subtest_f("busy-no-semaphores-%s", e->name)
+			no_sema(fd, e, true);
+
+		/**
+		 * Test that semaphore waits are correctly reported.
+		 */
+		igt_subtest_f("semaphore-wait-%s", e->name)
+			sema_wait(fd, e);
+
+		/**
+		 * Test that event waits are correctly reported.
+		 */
+		if (e->class == I915_ENGINE_CLASS_RENDER)
+			igt_subtest_f("event-wait-%s", e->name)
+				event_wait(fd, e);
+
+		/**
+		 * Check that two perf clients do not influence each others
+		 * observations.
+		 */
+		igt_subtest_f("multi-client-%s", e->name)
+			multi_client(fd, e);
+	}
+
+	/**
+	 * Test that when all engines are loaded all loads are
+	 * correctly reported.
+	 */
+	igt_subtest("all-busy-check-all")
+		all_busy_check_all(fd, num_engines);
+
+	/**
+	 * Test that non-engine counters can be initialized and read. Apart
+	 * from the invalid metric which should fail.
+	 */
+	for (i = 0; i < num_other_metrics + 1; i++) {
+		igt_subtest_f("other-init-%u", i)
+			init_other(i, i < num_other_metrics);
+
+		igt_subtest_f("other-read-%u", i)
+			read_other(i, i < num_other_metrics);
+	}
+
+	/**
+	 * Test counters are not affected by CPU offline/online events.
+	 */
+	igt_subtest("cpu-hotplug")
+		cpu_hotplug(fd);
+
+	/**
+	 * Test GPU frequency.
+	 */
+	igt_subtest("frequency")
+		test_frequency(fd);
+
+	/**
+	 * Test interrupt count reporting.
+	 */
+	igt_subtest("interrupts")
+		test_interrupts(fd);
+
+	/**
+	 * Test RC6 residency reporting.
+	 */
+	igt_subtest("rc6")
+		test_rc6(fd);
+
+	/**
+	 * Test RC6p residency reporting.
+	 */
+	igt_subtest("rc6p")
+		test_rc6p(fd);
+
+	/**
+	 * Check render nodes are counted.
+	 */
+	igt_subtest_group {
+		int render_fd;
+
+		igt_fixture {
+			render_fd = drm_open_driver_render(DRIVER_INTEL);
+			igt_require_gem(render_fd);
+
+			gem_quiescent_gpu(fd);
+		}
+
+		for_each_engine_class_instance(fd, e) {
+			igt_subtest_f("render-node-busy-%s", e->name)
+				single(fd, e, true);
+		}
+
+		igt_fixture {
+			close(render_fd);
+		}
+	}
+}
-- 
2.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* ✓ Fi.CI.BAT: success for IGT PMU support (rev12)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (14 preceding siblings ...)
  2017-10-11  1:28 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-10-11 14:09 ` Patchwork
  2017-10-11 20:16 ` ✗ Fi.CI.IGT: failure " Patchwork
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2017-10-11 14:09 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev12)
URL   : https://patchwork.freedesktop.org/series/28253/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
136100c2f00b590bc9485100cce012282c1217cf igt/syncobj_wait: Don't close the timeline early in wait_snapshot

with latest DRM-Tip kernel build CI_DRM_3213
36e0e803d3d7 drm-tip: 2017y-10m-11d-11h-31m-33s UTC integration manifest

Testlist changes:
+igt@perf_pmu@all-busy-check-all
+igt@perf_pmu@busy-bcs0
+igt@perf_pmu@busy-check-all-bcs0
+igt@perf_pmu@busy-check-all-rcs0
+igt@perf_pmu@busy-check-all-vcs0
+igt@perf_pmu@busy-check-all-vcs1
+igt@perf_pmu@busy-check-all-vecs0
+igt@perf_pmu@busy-no-semaphores-bcs0
+igt@perf_pmu@busy-no-semaphores-rcs0
+igt@perf_pmu@busy-no-semaphores-vcs0
+igt@perf_pmu@busy-no-semaphores-vcs1
+igt@perf_pmu@busy-no-semaphores-vecs0
+igt@perf_pmu@busy-rcs0
+igt@perf_pmu@busy-vcs0
+igt@perf_pmu@busy-vcs1
+igt@perf_pmu@busy-vecs0
+igt@perf_pmu@cpu-hotplug
+igt@perf_pmu@event-wait-rcs0
+igt@perf_pmu@frequency
+igt@perf_pmu@idle-bcs0
+igt@perf_pmu@idle-no-semaphores-bcs0
+igt@perf_pmu@idle-no-semaphores-rcs0
+igt@perf_pmu@idle-no-semaphores-vcs0
+igt@perf_pmu@idle-no-semaphores-vcs1
+igt@perf_pmu@idle-no-semaphores-vecs0
+igt@perf_pmu@idle-rcs0
+igt@perf_pmu@idle-vcs0
+igt@perf_pmu@idle-vcs1
+igt@perf_pmu@idle-vecs0
+igt@perf_pmu@init-busy-bcs0
+igt@perf_pmu@init-busy-rcs0
+igt@perf_pmu@init-busy-vcs0
+igt@perf_pmu@init-busy-vcs1
+igt@perf_pmu@init-busy-vecs0
+igt@perf_pmu@init-sema-bcs0
+igt@perf_pmu@init-sema-rcs0
+igt@perf_pmu@init-sema-vcs0
+igt@perf_pmu@init-sema-vcs1
+igt@perf_pmu@init-sema-vecs0
+igt@perf_pmu@init-wait-bcs0
+igt@perf_pmu@init-wait-rcs0
+igt@perf_pmu@init-wait-vcs0
+igt@perf_pmu@init-wait-vcs1
+igt@perf_pmu@init-wait-vecs0
+igt@perf_pmu@interrupts
+igt@perf_pmu@invalid-init
+igt@perf_pmu@most-busy-check-all-bcs0
+igt@perf_pmu@most-busy-check-all-rcs0
+igt@perf_pmu@most-busy-check-all-vcs0
+igt@perf_pmu@most-busy-check-all-vcs1
+igt@perf_pmu@most-busy-check-all-vecs0
+igt@perf_pmu@multi-client-bcs0
+igt@perf_pmu@multi-client-rcs0
+igt@perf_pmu@multi-client-vcs0
+igt@perf_pmu@multi-client-vcs1
+igt@perf_pmu@multi-client-vecs0
+igt@perf_pmu@other-init-0
+igt@perf_pmu@other-init-1
+igt@perf_pmu@other-init-2
+igt@perf_pmu@other-init-3
+igt@perf_pmu@other-init-4
+igt@perf_pmu@other-init-5
+igt@perf_pmu@other-init-6
+igt@perf_pmu@other-read-0
+igt@perf_pmu@other-read-1
+igt@perf_pmu@other-read-2
+igt@perf_pmu@other-read-3
+igt@perf_pmu@other-read-4
+igt@perf_pmu@other-read-5
+igt@perf_pmu@other-read-6
+igt@perf_pmu@rc6
+igt@perf_pmu@rc6p
+igt@perf_pmu@render-node-busy-bcs0
+igt@perf_pmu@render-node-busy-rcs0
+igt@perf_pmu@render-node-busy-vcs0
+igt@perf_pmu@render-node-busy-vcs1
+igt@perf_pmu@render-node-busy-vecs0
+igt@perf_pmu@semaphore-wait-bcs0
+igt@perf_pmu@semaphore-wait-rcs0
+igt@perf_pmu@semaphore-wait-vcs0
+igt@perf_pmu@semaphore-wait-vcs1
+igt@perf_pmu@semaphore-wait-vecs0

Test chamelium:
        Subgroup dp-crc-fast:
                pass       -> FAIL       (fi-kbl-7500u) fdo#102514

fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:455s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:475s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:389s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:574s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:287s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:532s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:526s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:537s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:522s
fi-cfl-s         total:289  pass:253  dwarn:4   dfail:0   fail:0   skip:32  time:562s
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:633s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:437s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:274s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:597s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:440s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:475s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:505s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:474s
fi-kbl-7500u     total:289  pass:263  dwarn:1   dfail:0   fail:1   skip:24  time:500s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:586s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:488s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:597s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:661s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:474s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:652s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:539s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:512s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:473s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:583s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:429s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_329/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* ✗ Fi.CI.IGT: failure for IGT PMU support (rev12)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (15 preceding siblings ...)
  2017-10-11 14:09 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev12) Patchwork
@ 2017-10-11 20:16 ` Patchwork
  2017-11-22 11:41 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev18) Patchwork
  2017-11-22 14:31 ` ✓ Fi.CI.IGT: " Patchwork
  18 siblings, 0 replies; 46+ messages in thread
From: Patchwork @ 2017-10-11 20:16 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev12)
URL   : https://patchwork.freedesktop.org/series/28253/
State : failure

== Summary ==

Test gem_eio:
        Subgroup in-flight:
                dmesg-warn -> PASS       (shard-hsw) fdo#102886 +1
Test gem_tiled_swapping:
        Subgroup non-threaded:
                pass       -> DMESG-WARN (shard-hsw)
Test kms_setmode:
        Subgroup basic:
                fail       -> PASS       (shard-hsw) fdo#99912
Test kms_atomic:
        Subgroup atomic_invalid_params:
                pass       -> SKIP       (shard-hsw)
Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-indfb-fliptrack:
                pass       -> SKIP       (shard-hsw)
Test kms_chv_cursor_fail:
        Subgroup pipe-C-64x64-bottom-edge:
                pass       -> SKIP       (shard-hsw)
Test gem_flink_race:
        Subgroup flink_close:
                pass       -> FAIL       (shard-hsw) fdo#102655
Test gem_exec_store:
        Subgroup pages-render:
                pass       -> FAIL       (shard-hsw)
Test kms_flip:
        Subgroup flip-vs-rmfb:
                pass       -> DMESG-WARN (shard-hsw)

fdo#102886 https://bugs.freedesktop.org/show_bug.cgi?id=102886
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
fdo#102655 https://bugs.freedesktop.org/show_bug.cgi?id=102655

shard-hsw        total:2581 pass:1396 dwarn:7   dfail:0   fail:10  skip:1168 time:9317s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_329/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t v7 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-11 12:54             ` [PATCH i-g-t v7 " Tvrtko Ursulin
@ 2017-11-21 11:50               ` Chris Wilson
  2017-11-21 18:21               ` [PATCH i-g-t v8 " Tvrtko Ursulin
  1 sibling, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2017-11-21 11:50 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-11 13:54:18)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A bunch of tests for the new i915 PMU feature.
> 
> Parts of the code were initialy sketched by Dmitry Rogozhkin.
> 
> v2: (Most suggestions by Chris Wilson)
>  * Add new class/instance based engine list.
>  * Add gem_has_engine/gem_require_engine to work with class/instance.
>  * Use the above two throughout the test.
>  * Shorten tests to 100ms busy batches, seems enough.
>  * Add queued counter sanity checks.
>  * Use igt_nsec_elapsed.
>  * Skip on perf -ENODEV in some tests instead of embedding knowledge locally.
>  * Fix multi ordering for busy accounting.
>  * Use new guranteed_usleep when sleep time is asserted on.
>  * Check for no queued when idle/busy.
>  * Add queued counter init test.
>  * Add queued tests.
>  * Consolidate and increase multiple busy engines tests to most-busy and
>    all-busy tests.
>  * Guarantte interrupts by using fences.
>  * Test RC6 via forcewake.
> 
> v3:
>  * Tweak assert in interrupts subtest.
>  * Sprinkle of comments.
>  * Fix multi-client test which got broken in v2.
> 
> v4:
>  * Measured instead of guaranteed sleep.
>  * Missing sync in no_sema.
>  * Log busyness before asserts for debug.
>  * access(2) instead of open(2) to determine if cpu0 is hotpluggable.
>  * Test frequency reporting via min/max setting instead assuming.
>    ^^ All above suggested by Chris Wilson. ^^
>  * Drop queued subtests to match i915.
>  * Use long batches with fences to ensure interrupts.
>  * Test render node as well.
> 
> v5:
>  * Add to meson build. (Petri Latvala)
>  * Use 1eN constants. (Chris Wilson)
>  * Add tests for semaphore and event waiting.
> 
> v6:
>  * Fix interrupts subtest by polling the fence from the "outside".
>    (Chris Wilson)
> 
> v7:
>  * Assert number of initialized engines matches the expectation.
>    (Chris Wilson)
>  * Warn instead of skipping if we couldn't restore the initial
>    frequency. (Chris Wilson)
>  * Move all asserts to after the test cleanup (just a tidy).
>  * More 1eN notation for timeouts.
>  * Bump the tolerance to 5% since I saw a few noisy runs with
>    sampling counters.
>  * Always start the PMU before submitting batches to lower
>    reliance on i915 doing the delayed engine busy stats disable.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v6)

I think the proof of the pudding is in the eating...
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH i-g-t 9/9] media-bench.pl: Add busy balancers to the list
  2017-10-10  9:30 ` [PATCH i-g-t 9/9] media-bench.pl: Add busy balancers to the list Tvrtko Ursulin
@ 2017-11-21 11:51   ` Chris Wilson
  0 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2017-11-21 11:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

Quoting Tvrtko Ursulin (2017-10-10 10:30:08)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH i-g-t 5/9] intel-gpu-overlay: Catch-up to new i915 PMU
  2017-10-10  9:30 ` [PATCH i-g-t 5/9] intel-gpu-overlay: Catch-up to new i915 PMU Tvrtko Ursulin
@ 2017-11-21 18:20   ` Tvrtko Ursulin
  0 siblings, 0 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-11-21 18:20 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

v2: Update for i915 changes.
v3: Use 1eN for large numbers. (Chris Wilson)
v4: Update for upstream engine class enum.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_perf.h           | 89 +++++++++++++++++++++++++++++++++---------------
 overlay/gem-interrupts.c |  2 +-
 overlay/gpu-freq.c       |  8 ++---
 overlay/gpu-top.c        | 68 ++++++++++++++++++++----------------
 overlay/power.c          |  4 +--
 overlay/rc6.c            | 20 +++++------
 6 files changed, 116 insertions(+), 75 deletions(-)

diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index cc10cb300aaf..db07b00a7b6b 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -1,3 +1,27 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
 #ifndef I915_PERF_H
 #define I915_PERF_H
 
@@ -5,41 +29,52 @@
 
 #include <linux/perf_event.h>
 
-#define I915_SAMPLE_BUSY	0
-#define I915_SAMPLE_WAIT	1
-#define I915_SAMPLE_SEMA	2
+enum drm_i915_gem_engine_class {
+	I915_ENGINE_CLASS_RENDER 	= 0,
+	I915_ENGINE_CLASS_COPY		= 1,
+	I915_ENGINE_CLASS_VIDEO		= 2,
+	I915_ENGINE_CLASS_VIDEO_ENHANCE	= 3,
+
+	I915_ENGINE_CLASS_INVALID	= -1
+};
+
+enum drm_i915_pmu_engine_sample {
+	I915_SAMPLE_BUSY = 0,
+	I915_SAMPLE_WAIT = 1,
+	I915_SAMPLE_SEMA = 2,
+	I915_ENGINE_SAMPLE_MAX /* non-ABI */
+};
 
-#define I915_SAMPLE_RCS		0
-#define I915_SAMPLE_VCS		1
-#define I915_SAMPLE_BCS		2
-#define I915_SAMPLE_VECS	3
+#define I915_PMU_SAMPLE_BITS (4)
+#define I915_PMU_SAMPLE_MASK (0xf)
+#define I915_PMU_SAMPLE_INSTANCE_BITS (8)
+#define I915_PMU_CLASS_SHIFT \
+	(I915_PMU_SAMPLE_BITS + I915_PMU_SAMPLE_INSTANCE_BITS)
 
-#define __I915_PERF_COUNT(ring, id) ((ring) << 4 | (id))
+#define __I915_PMU_ENGINE(class, instance, sample) \
+	((class) << I915_PMU_CLASS_SHIFT | \
+	(instance) << I915_PMU_SAMPLE_BITS | \
+	(sample))
 
-#define I915_PERF_COUNT_RCS_BUSY __I915_PERF_COUNT(I915_SAMPLE_RCS, I915_SAMPLE_BUSY)
-#define I915_PERF_COUNT_RCS_WAIT __I915_PERF_COUNT(I915_SAMPLE_RCS, I915_SAMPLE_WAIT)
-#define I915_PERF_COUNT_RCS_SEMA __I915_PERF_COUNT(I915_SAMPLE_RCS, I915_SAMPLE_SEMA)
+#define I915_PMU_ENGINE_BUSY(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_BUSY)
 
-#define I915_PERF_COUNT_VCS_BUSY __I915_PERF_COUNT(I915_SAMPLE_VCS, I915_SAMPLE_BUSY)
-#define I915_PERF_COUNT_VCS_WAIT __I915_PERF_COUNT(I915_SAMPLE_VCS, I915_SAMPLE_WAIT)
-#define I915_PERF_COUNT_VCS_SEMA __I915_PERF_COUNT(I915_SAMPLE_VCS, I915_SAMPLE_SEMA)
+#define I915_PMU_ENGINE_WAIT(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_WAIT)
 
-#define I915_PERF_COUNT_BCS_BUSY __I915_PERF_COUNT(I915_SAMPLE_BCS, I915_SAMPLE_BUSY)
-#define I915_PERF_COUNT_BCS_WAIT __I915_PERF_COUNT(I915_SAMPLE_BCS, I915_SAMPLE_WAIT)
-#define I915_PERF_COUNT_BCS_SEMA __I915_PERF_COUNT(I915_SAMPLE_BCS, I915_SAMPLE_SEMA)
+#define I915_PMU_ENGINE_SEMA(class, instance) \
+	__I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA)
 
-#define I915_PERF_COUNT_VECS_BUSY __I915_PERF_COUNT(I915_SAMPLE_VECS, I915_SAMPLE_BUSY)
-#define I915_PERF_COUNT_VECS_WAIT __I915_PERF_COUNT(I915_SAMPLE_VECS, I915_SAMPLE_WAIT)
-#define I915_PERF_COUNT_VECS_SEMA __I915_PERF_COUNT(I915_SAMPLE_VECS, I915_SAMPLE_SEMA)
+#define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x))
 
-#define I915_PERF_ACTUAL_FREQUENCY 32
-#define I915_PERF_REQUESTED_FREQUENCY 33
-#define I915_PERF_ENERGY 34
-#define I915_PERF_INTERRUPTS 35
+#define I915_PMU_ACTUAL_FREQUENCY	__I915_PMU_OTHER(0)
+#define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
+#define I915_PMU_INTERRUPTS		__I915_PMU_OTHER(2)
+#define I915_PMU_RC6_RESIDENCY		__I915_PMU_OTHER(3)
+#define I915_PMU_RC6p_RESIDENCY		__I915_PMU_OTHER(4)
+#define I915_PMU_RC6pp_RESIDENCY	__I915_PMU_OTHER(5)
 
-#define I915_PERF_RC6_RESIDENCY		40
-#define I915_PERF_RC6p_RESIDENCY	41
-#define I915_PERF_RC6pp_RESIDENCY	42
+#define I915_PMU_LAST I915_PMU_RC6pp_RESIDENCY
 
 static inline int
 perf_event_open(struct perf_event_attr *attr,
diff --git a/overlay/gem-interrupts.c b/overlay/gem-interrupts.c
index 5bd8656e0e63..0233fbb0514b 100644
--- a/overlay/gem-interrupts.c
+++ b/overlay/gem-interrupts.c
@@ -113,7 +113,7 @@ int gem_interrupts_init(struct gem_interrupts *irqs)
 {
 	memset(irqs, 0, sizeof(*irqs));
 
-	irqs->fd = perf_i915_open(I915_PERF_INTERRUPTS);
+	irqs->fd = perf_i915_open(I915_PMU_INTERRUPTS);
 	if (irqs->fd < 0 && interrupts_read() < 0)
 		irqs->error = ENODEV;
 
diff --git a/overlay/gpu-freq.c b/overlay/gpu-freq.c
index 76c5ed9acfd1..0d8032592ef5 100644
--- a/overlay/gpu-freq.c
+++ b/overlay/gpu-freq.c
@@ -37,8 +37,8 @@ static int perf_open(void)
 {
 	int fd;
 
-	fd = perf_i915_open_group(I915_PERF_ACTUAL_FREQUENCY, -1);
-	if (perf_i915_open_group(I915_PERF_REQUESTED_FREQUENCY, fd) < 0) {
+	fd = perf_i915_open_group(I915_PMU_ACTUAL_FREQUENCY, -1);
+	if (perf_i915_open_group(I915_PMU_REQUESTED_FREQUENCY, fd) < 0) {
 		close(fd);
 		fd = -1;
 	}
@@ -176,8 +176,8 @@ int gpu_freq_update(struct gpu_freq *gf)
 			return EAGAIN;
 		}
 
-		gf->current = (s->act - d->act) / d_time;
-		gf->request = (s->req - d->req) / d_time;
+		gf->current = (s->act - d->act) * 1e9 / d_time;
+		gf->request = (s->req - d->req) * 1e9 / d_time;
 	}
 
 	return 0;
diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 812f47d5aced..61b8f62fd78c 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -43,49 +43,57 @@
 #define   RING_WAIT		(1<<11)
 #define   RING_WAIT_SEMAPHORE	(1<<10)
 
-#define __I915_PERF_RING(n) (4*n)
-#define I915_PERF_RING_BUSY(n) (__I915_PERF_RING(n) + 0)
-#define I915_PERF_RING_WAIT(n) (__I915_PERF_RING(n) + 1)
-#define I915_PERF_RING_SEMA(n) (__I915_PERF_RING(n) + 2)
-
 static int perf_init(struct gpu_top *gt)
 {
-	const char *names[] = {
-		"RCS",
-		"BCS",
-		"VCS0",
-		"VCS1",
-		NULL,
+	struct engine_desc {
+		unsigned class, inst;
+		const char *name;
+	} *d, engines[] = {
+		{ I915_ENGINE_CLASS_RENDER, 0, "rcs0" },
+		{ I915_ENGINE_CLASS_COPY, 0, "bcs0" },
+		{ I915_ENGINE_CLASS_VIDEO, 0, "vcs0" },
+		{ I915_ENGINE_CLASS_VIDEO, 1, "vcs1" },
+		{ I915_ENGINE_CLASS_VIDEO_ENHANCE, 0, "vecs0" },
+		{ 0, 0, NULL }
 	};
-	int n;
 
-	gt->fd = perf_i915_open_group(I915_PERF_RING_BUSY(0), -1);
+	d = &engines[0];
+
+	gt->fd = perf_i915_open_group(I915_PMU_ENGINE_BUSY(d->class, d->inst),
+				      -1);
 	if (gt->fd < 0)
 		return -1;
 
-	if (perf_i915_open_group(I915_PERF_RING_WAIT(0), gt->fd) >= 0)
+	if (perf_i915_open_group(I915_PMU_ENGINE_WAIT(d->class, d->inst),
+				 gt->fd) >= 0)
 		gt->have_wait = 1;
 
-	if (perf_i915_open_group(I915_PERF_RING_SEMA(0), gt->fd) >= 0)
+	if (perf_i915_open_group(I915_PMU_ENGINE_SEMA(d->class, d->inst),
+				 gt->fd) >= 0)
 		gt->have_sema = 1;
 
-	gt->ring[0].name = names[0];
+	gt->ring[0].name = d->name;
 	gt->num_rings = 1;
 
-	for (n = 1; names[n]; n++) {
-		if (perf_i915_open_group(I915_PERF_RING_BUSY(n), gt->fd) >= 0) {
-			if (gt->have_wait &&
-			    perf_i915_open_group(I915_PERF_RING_WAIT(n),
-						 gt->fd) < 0)
-				return -1;
-
-			if (gt->have_sema &&
-			    perf_i915_open_group(I915_PERF_RING_SEMA(n),
-						 gt->fd) < 0)
-				return -1;
-
-			gt->ring[gt->num_rings++].name = names[n];
-		}
+	for (d++; d->name; d++) {
+		if (perf_i915_open_group(I915_PMU_ENGINE_BUSY(d->class,
+							      d->inst),
+					gt->fd) < 0)
+			continue;
+
+		if (gt->have_wait &&
+		    perf_i915_open_group(I915_PMU_ENGINE_WAIT(d->class,
+							      d->inst),
+					 gt->fd) < 0)
+			return -1;
+
+		if (gt->have_sema &&
+		    perf_i915_open_group(I915_PMU_ENGINE_SEMA(d->class,
+							      d->inst),
+				   gt->fd) < 0)
+			return -1;
+
+		gt->ring[gt->num_rings++].name = d->name;
 	}
 
 	return 0;
diff --git a/overlay/power.c b/overlay/power.c
index dd4aec6bffd9..805f4ca7805c 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -45,9 +45,7 @@ int power_init(struct power *power)
 
 	memset(power, 0, sizeof(*power));
 
-	power->fd = perf_i915_open(I915_PERF_ENERGY);
-	if (power->fd != -1)
-		return 0;
+	power->fd = -1;
 
 	sprintf(buf, "%s/i915_energy_uJ", debugfs_dri_path);
 	fd = open(buf, 0);
diff --git a/overlay/rc6.c b/overlay/rc6.c
index 46c975a557ff..8977f0993095 100644
--- a/overlay/rc6.c
+++ b/overlay/rc6.c
@@ -43,15 +43,15 @@ static int perf_open(unsigned *flags)
 {
 	int fd;
 
-	fd = perf_i915_open_group(I915_PERF_RC6_RESIDENCY, -1);
+	fd = perf_i915_open_group(I915_PMU_RC6_RESIDENCY, -1);
 	if (fd < 0)
 		return -1;
 
 	*flags |= RC6;
-	if (perf_i915_open_group(I915_PERF_RC6p_RESIDENCY, fd) >= 0)
+	if (perf_i915_open_group(I915_PMU_RC6p_RESIDENCY, fd) >= 0)
 		*flags |= RC6p;
 
-	if (perf_i915_open_group(I915_PERF_RC6pp_RESIDENCY, fd) >= 0)
+	if (perf_i915_open_group(I915_PMU_RC6pp_RESIDENCY, fd) >= 0)
 		*flags |= RC6pp;
 
 	return fd;
@@ -132,11 +132,11 @@ int rc6_update(struct rc6 *rc6)
 
 		len = 2;
 		if (rc6->flags & RC6)
-			s->rc6_residency = data[len++];
+			s->rc6_residency = data[len++] / 1e6;
 		if (rc6->flags & RC6p)
-			s->rc6p_residency = data[len++];
+			s->rc6p_residency = data[len++] / 1e6;
 		if (rc6->flags & RC6pp)
-			s->rc6pp_residency = data[len++];
+			s->rc6pp_residency = data[len++] / 1e6;
 	}
 
 	if (rc6->count == 1)
@@ -149,14 +149,14 @@ int rc6_update(struct rc6 *rc6)
 	}
 
 	d_rc6 = s->rc6_residency - d->rc6_residency;
-	rc6->rc6 = (100 * d_rc6 + d_time/2) / d_time;
+	rc6->rc6 = 100 * d_rc6 / d_time;
 
 	d_rc6p = s->rc6p_residency - d->rc6p_residency;
-	rc6->rc6p = (100 * d_rc6p + d_time/2) / d_time;
+	rc6->rc6p = 100 * d_rc6p / d_time;
 
 	d_rc6pp = s->rc6pp_residency - d->rc6pp_residency;
-	rc6->rc6pp = (100 * d_rc6pp + d_time/2) / d_time;
+	rc6->rc6pp = 100 * d_rc6pp / d_time;
 
-	rc6->rc6_combined = (100 * (d_rc6 + d_rc6p + d_rc6pp) + d_time/2) / d_time;
+	rc6->rc6_combined = 100 * (d_rc6 + d_rc6p + d_rc6pp) / d_time;
 	return 0;
 }
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v8 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-10-11 12:54             ` [PATCH i-g-t v7 " Tvrtko Ursulin
  2017-11-21 11:50               ` Chris Wilson
@ 2017-11-21 18:21               ` Tvrtko Ursulin
  2017-11-21 19:36                 ` [PATCH i-g-t v9 " Tvrtko Ursulin
  1 sibling, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-11-21 18:21 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A bunch of tests for the new i915 PMU feature.

Parts of the code were initialy sketched by Dmitry Rogozhkin.

v2: (Most suggestions by Chris Wilson)
 * Add new class/instance based engine list.
 * Add gem_has_engine/gem_require_engine to work with class/instance.
 * Use the above two throughout the test.
 * Shorten tests to 100ms busy batches, seems enough.
 * Add queued counter sanity checks.
 * Use igt_nsec_elapsed.
 * Skip on perf -ENODEV in some tests instead of embedding knowledge locally.
 * Fix multi ordering for busy accounting.
 * Use new guranteed_usleep when sleep time is asserted on.
 * Check for no queued when idle/busy.
 * Add queued counter init test.
 * Add queued tests.
 * Consolidate and increase multiple busy engines tests to most-busy and
   all-busy tests.
 * Guarantte interrupts by using fences.
 * Test RC6 via forcewake.

v3:
 * Tweak assert in interrupts subtest.
 * Sprinkle of comments.
 * Fix multi-client test which got broken in v2.

v4:
 * Measured instead of guaranteed sleep.
 * Missing sync in no_sema.
 * Log busyness before asserts for debug.
 * access(2) instead of open(2) to determine if cpu0 is hotpluggable.
 * Test frequency reporting via min/max setting instead assuming.
   ^^ All above suggested by Chris Wilson. ^^
 * Drop queued subtests to match i915.
 * Use long batches with fences to ensure interrupts.
 * Test render node as well.

v5:
 * Add to meson build. (Petri Latvala)
 * Use 1eN constants. (Chris Wilson)
 * Add tests for semaphore and event waiting.

v6:
 * Fix interrupts subtest by polling the fence from the "outside".
   (Chris Wilson)

v7:
 * Assert number of initialized engines matches the expectation.
   (Chris Wilson)
 * Warn instead of skipping if we couldn't restore the initial
   frequency. (Chris Wilson)
 * Move all asserts to after the test cleanup (just a tidy).
 * More 1eN notation for timeouts.
 * Bump the tolerance to 5% since I saw a few noisy runs with
   sampling counters.
 * Always start the PMU before submitting batches to lower
   reliance on i915 doing the delayed engine busy stats disable.

v8:
 * Update for upstream engine class enum.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_gt.c           |   50 ++
 lib/igt_gt.h           |   38 ++
 lib/igt_perf.h         |    9 +-
 tests/Makefile.am      |    1 +
 tests/Makefile.sources |    1 +
 tests/meson.build      |    1 +
 tests/perf_pmu.c       | 1242 ++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 1334 insertions(+), 8 deletions(-)
 create mode 100644 tests/perf_pmu.c

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index 89727d22dd5c..63a06611f16d 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -608,3 +608,53 @@ bool gem_can_store_dword(int fd, unsigned int engine)
 
 	return true;
 }
+
+const struct intel_execution_engine2 intel_execution_engines2[] = {
+	{ "rcs0", I915_ENGINE_CLASS_RENDER, 0 },
+	{ "bcs0", I915_ENGINE_CLASS_COPY, 0 },
+	{ "vcs0", I915_ENGINE_CLASS_VIDEO, 0 },
+	{ "vcs1", I915_ENGINE_CLASS_VIDEO, 1 },
+	{ "vecs0", I915_ENGINE_CLASS_VIDEO_ENHANCE, 0 },
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance)
+{
+	if (class != I915_ENGINE_CLASS_VIDEO)
+		igt_assert(instance == 0);
+	else
+		igt_assert(instance >= 0 && instance <= 1);
+
+	switch (class) {
+	case I915_ENGINE_CLASS_RENDER:
+		return I915_EXEC_RENDER;
+	case I915_ENGINE_CLASS_COPY:
+		return I915_EXEC_BLT;
+	case I915_ENGINE_CLASS_VIDEO:
+		if (instance == 0) {
+			if (gem_has_bsd2(gem_fd))
+				return I915_EXEC_BSD | I915_EXEC_BSD_RING1;
+			else
+				return I915_EXEC_BSD;
+
+		} else {
+			return I915_EXEC_BSD | I915_EXEC_BSD_RING2;
+		}
+	case I915_ENGINE_CLASS_VIDEO_ENHANCE:
+		return I915_EXEC_VEBOX;
+	case I915_ENGINE_CLASS_INVALID:
+	default:
+		igt_assert(0);
+	};
+}
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance)
+{
+	return gem_has_ring(gem_fd,
+			    gem_class_instance_to_eb_flags(gem_fd, class,
+							   instance));
+}
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index 2579cbd37be7..48ed48af8117 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -25,6 +25,7 @@
 #define IGT_GT_H
 
 #include "igt_debugfs.h"
+#include "igt_core.h"
 
 void igt_require_hang_ring(int fd, int ring);
 
@@ -80,4 +81,41 @@ extern const struct intel_execution_engine {
 
 bool gem_can_store_dword(int fd, unsigned int engine);
 
+extern const struct intel_execution_engine2 {
+	const char *name;
+	int class;
+	int instance;
+} intel_execution_engines2[];
+
+#define for_each_engine_class_instance(fd__, e__) \
+	for ((e__) = intel_execution_engines2;\
+	     (e__)->name; \
+	     (e__)++)
+
+enum drm_i915_gem_engine_class {
+	I915_ENGINE_CLASS_RENDER 	= 0,
+	I915_ENGINE_CLASS_COPY		= 1,
+	I915_ENGINE_CLASS_VIDEO		= 2,
+	I915_ENGINE_CLASS_VIDEO_ENHANCE	= 3,
+
+	I915_ENGINE_CLASS_INVALID	= -1
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance);
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance);
+
+static inline
+void gem_require_engine(int gem_fd,
+			enum drm_i915_gem_engine_class class,
+			unsigned int instance)
+{
+	igt_require(gem_has_engine(gem_fd, class, instance));
+}
+
 #endif /* IGT_GT_H */
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index 938d548891c5..5428feb0c746 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -29,14 +29,7 @@
 
 #include <linux/perf_event.h>
 
-enum drm_i915_gem_engine_class {
-	I915_ENGINE_CLASS_RENDER 	= 0,
-	I915_ENGINE_CLASS_COPY		= 1,
-	I915_ENGINE_CLASS_VIDEO		= 2,
-	I915_ENGINE_CLASS_VIDEO_ENHANCE	= 3,
-
-	I915_ENGINE_CLASS_INVALID	= -1
-};
+#include "igt_gt.h"
 
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
diff --git a/tests/Makefile.am b/tests/Makefile.am
index db360523dad6..1bc1c57a7452 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -131,6 +131,7 @@ gen7_forcewake_mt_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gen7_forcewake_mt_LDADD = $(LDADD) -lpthread
 gem_userptr_blits_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_userptr_blits_LDADD = $(LDADD) -lpthread
+perf_pmu_LDADD = $(LDADD) $(top_builddir)/lib/libigt_perf.la
 
 gem_wait_LDADD = $(LDADD) -lrt
 kms_flip_LDADD = $(LDADD) -lrt -lpthread
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 2313c12b508c..2c22242e0113 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -215,6 +215,7 @@ TESTS_progs = \
 	kms_vblank \
 	meta_test \
 	perf \
+	perf_pmu \
 	pm_backlight \
 	pm_lpsp \
 	pm_rc6_residency \
diff --git a/tests/meson.build b/tests/meson.build
index 20ff79dcb15f..d9fd94e01c31 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -195,6 +195,7 @@ test_progs = [
 	'kms_vblank',
 	'meta_test',
 	'perf',
+	'perf_pmu',
 	'pm_backlight',
 	'pm_lpsp',
 	'pm_rc6_residency',
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
new file mode 100644
index 000000000000..8585ed7bcee8
--- /dev/null
+++ b/tests/perf_pmu.c
@@ -0,0 +1,1242 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/times.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <time.h>
+#include <poll.h>
+
+#include "igt.h"
+#include "igt_core.h"
+#include "igt_perf.h"
+#include "igt_sysfs.h"
+
+IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
+
+const double tolerance = 0.05f;
+const unsigned long batch_duration_ns = 100e6;
+
+static int open_pmu(uint64_t config)
+{
+	int fd;
+
+	fd = perf_i915_open(config);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static int open_group(uint64_t config, int group)
+{
+	int fd;
+
+	fd = perf_i915_open_group(config, group);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static void
+init(int gem_fd, const struct intel_execution_engine2 *e, uint8_t sample)
+{
+	int fd;
+
+	fd = open_pmu(__I915_PMU_ENGINE(e->class, e->instance, sample));
+
+	close(fd);
+}
+
+static uint64_t pmu_read_single(int fd)
+{
+	uint64_t data[2];
+
+	igt_assert_eq(read(fd, data, sizeof(data)), sizeof(data));
+
+	return data[0];
+}
+
+static void pmu_read_multi(int fd, unsigned int num, uint64_t *val)
+{
+	uint64_t buf[2 + num];
+	unsigned int i;
+
+	igt_assert_eq(read(fd, buf, sizeof(buf)), sizeof(buf));
+
+	for (i = 0; i < num; i++)
+		val[i] = buf[2 + i];
+}
+
+#define assert_within_epsilon(x, ref, tolerance) \
+	igt_assert_f((double)(x) <= (1.0 + tolerance) * (double)ref && \
+		     (double)(x) >= (1.0 - tolerance) * (double)ref, \
+		     "'%s' != '%s' (%f not within %f%% tolerance of %f)\n",\
+		     #x, #ref, (double)x, tolerance * 100.0, (double)ref)
+
+/*
+ * Helper for cases where we assert on time spent sleeping (directly or
+ * indirectly), so make it more robust by ensuring the system sleep time
+ * is within test tolerance to start with.
+ */
+static unsigned int measured_usleep(unsigned int usec)
+{
+	uint64_t slept = 0;
+
+	while (usec > 0) {
+		struct timespec start = { };
+		uint64_t this_sleep;
+
+		igt_nsec_elapsed(&start);
+		usleep(usec);
+		this_sleep = igt_nsec_elapsed(&start);
+		slept += this_sleep;
+		if (this_sleep > usec * 1000)
+			break;
+		usec -= this_sleep;
+	}
+
+	return slept;
+}
+
+static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
+}
+
+static void
+single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	double ref = busy ? batch_duration_ns : 0.0f;
+	igt_spin_t *spin;
+	uint64_t val;
+	int fd;
+
+	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	val = pmu_read_single(fd);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+}
+
+static void log_busy(int fd, unsigned int num_engines, uint64_t *val)
+{
+	char buf[1024];
+	int rem = sizeof(buf);
+	unsigned int i;
+	char *p = buf;
+
+	for (i = 0; i < num_engines; i++) {
+		int len;
+
+		len = snprintf(p, rem, "%u=%" PRIu64 "\n",  i, val[i]);
+		igt_assert(len > 0);
+		rem -= len;
+		p += len;
+	}
+
+	igt_info("%s", buf);
+}
+
+static void
+busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+	       const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin;
+	unsigned int busy_idx, i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+		else if (e == e_)
+			busy_idx = i;
+
+		fd[i++] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							  e_->instance),
+				     fd[0]);
+	}
+
+	igt_assert_eq(i, num_engines);
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+
+	assert_within_epsilon(val[busy_idx], batch_duration_ns, tolerance);
+	for (i = 0; i < num_engines; i++) {
+		if (i == busy_idx)
+			continue;
+		assert_within_epsilon(val[i], 0.0f, tolerance);
+	}
+
+}
+
+static void
+most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+		    const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int idle_idx, i;
+
+	gem_require_engine(gem_fd, e->class, e->instance);
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							e_->instance),
+				   fd[0]);
+
+		if (e == e_) {
+			idle_idx = i;
+		} else {
+			spin[i] = igt_spin_batch_new(gem_fd, 0,
+						     e2ring(gem_fd, e_), 0);
+			igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+		}
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			gem_sync(gem_fd, spin[i]->handle);
+	}
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			igt_spin_batch_free(gem_fd, spin[i]);
+	}
+	close(fd[0]);
+
+	for (i = 0; i < num_engines; i++) {
+		if (i == idle_idx)
+			assert_within_epsilon(val[i], 0.0f, tolerance);
+		else
+			assert_within_epsilon(val[i], batch_duration_ns,
+					      tolerance);
+	}
+}
+
+static void
+all_busy_check_all(int gem_fd, const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e) {
+		if (!gem_has_engine(gem_fd, e->class, e->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e->class, e->instance),
+				   fd[0]);
+
+		spin[i] = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++)
+		gem_sync(gem_fd, spin[i]->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++)
+		igt_spin_batch_free(gem_fd, spin[i]);
+	close(fd[0]);
+
+	for (i = 0; i < num_engines; i++)
+		assert_within_epsilon(val[i], batch_duration_ns, tolerance);
+}
+
+static void
+no_sema(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd;
+
+	fd = open_group(I915_PMU_ENGINE_SEMA(e->class, e->instance), -1);
+	open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, val);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val[0], 0.0f, tolerance);
+	assert_within_epsilon(val[1], 0.0f, tolerance);
+}
+
+#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
+#define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
+#define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+
+static void
+sema_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_relocation_entry reloc = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	struct drm_i915_gem_exec_object2 obj[2];
+	uint32_t bb_handle, obj_handle;
+	unsigned long slept;
+	uint32_t *obj_ptr;
+	uint32_t batch[6];
+	uint64_t val[2];
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 8);
+
+	/**
+	 * Setup up a batchbuffer with a polling semaphore wait command which
+	 * will wait on an value in a shared bo to change. This way we are able
+	 * to control how much time we will spend in this bb.
+	 */
+
+	bb_handle = gem_create(gem_fd, 4096);
+	obj_handle = gem_create(gem_fd, 4096);
+
+	obj_ptr = gem_mmap__wc(gem_fd, obj_handle, 0, 4096, PROT_WRITE);
+
+	batch[0] = MI_SEMAPHORE_WAIT |
+		   MI_SEMAPHORE_POLL |
+		   MI_SEMAPHORE_SAD_GTE_SDD;
+	batch[1] = 1;
+	batch[2] = 0x0;
+	batch[3] = 0x0;
+	batch[4] = MI_NOOP;
+	batch[5] = MI_BATCH_BUFFER_END;
+
+	gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+	reloc.target_handle = obj_handle;
+	reloc.offset = 2 * sizeof(uint32_t);
+	reloc.read_domains = I915_GEM_DOMAIN_RENDER;
+
+	memset(obj, 0, sizeof(obj));
+
+	obj[0].handle = obj_handle;
+
+	obj[1].handle = bb_handle;
+	obj[1].relocation_count = 1;
+	obj[1].relocs_ptr = to_user_pointer(&reloc);
+
+	eb.buffer_count = 2;
+	eb.buffers_ptr = to_user_pointer(obj);
+	eb.flags = e2ring(gem_fd, e);
+
+	/**
+	 * Start the semaphore wait PMU and after some known time let the above
+	 * semaphore wait command finish. Then check that the PMU is reporting
+	 * to expected time spent in semaphore wait state.
+	 */
+
+	fd = open_pmu(I915_PMU_ENGINE_SEMA(e->class, e->instance));
+
+	val[0] = pmu_read_single(fd);
+
+	gem_execbuf(gem_fd, &eb);
+
+	slept = measured_usleep(100e3);
+
+	*obj_ptr = 1;
+
+	gem_sync(gem_fd, bb_handle);
+
+	val[1] = pmu_read_single(fd);
+
+	munmap(obj_ptr, 4096);
+	gem_close(gem_fd, obj_handle);
+	gem_close(gem_fd, bb_handle);
+	close(fd);
+
+	assert_within_epsilon(val[1] - val[0], slept, tolerance);
+}
+
+#define   MI_WAIT_FOR_PIPE_C_VBLANK (1<<21)
+#define   MI_WAIT_FOR_PIPE_B_VBLANK (1<<11)
+#define   MI_WAIT_FOR_PIPE_A_VBLANK (1<<3)
+
+typedef struct {
+	igt_display_t display;
+	struct igt_fb primary_fb;
+	igt_output_t *output;
+	enum pipe pipe;
+} data_t;
+
+static void prepare_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	drmModeModeInfo *mode;
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	/* select the pipe we want to use */
+	igt_output_set_pipe(output, data->pipe);
+
+	/* create and set the primary plane fb */
+	mode = igt_output_get_mode(output);
+	igt_create_color_fb(fd, mode->hdisplay, mode->vdisplay,
+			    DRM_FORMAT_XRGB8888,
+			    LOCAL_DRM_FORMAT_MOD_NONE,
+			    0.0, 0.0, 0.0,
+			    &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, &data->primary_fb);
+
+	igt_display_commit(display);
+
+	igt_wait_for_vblank(fd, data->pipe);
+}
+
+static void cleanup_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	igt_remove_fb(fd, &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, NULL);
+
+	igt_output_set_pipe(output, PIPE_ANY);
+	igt_display_commit(display);
+}
+
+static int wait_vblank(int fd, union drm_wait_vblank *vbl)
+{
+	int err;
+
+	err = 0;
+	if (igt_ioctl(fd, DRM_IOCTL_WAIT_VBLANK, vbl))
+		err = -errno;
+
+	return err;
+}
+
+static void
+event_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_exec_object2 obj = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	data_t data;
+	igt_display_t *display = &data.display;
+	const uint32_t DERRMR = 0x44050;
+	unsigned int valid_tests = 0;
+	uint32_t batch[8], *b;
+	igt_output_t *output;
+	uint32_t bb_handle;
+	uint32_t reg;
+	enum pipe p;
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
+	igt_require(intel_register_access_init(intel_get_pci_device(),
+					       false, gem_fd) == 0);
+
+	/**
+	 * We will use the display to render event forwarind so need to
+	 * program the DERRMR register and restore it at exit.
+	 *
+	 * We will emit a MI_WAIT_FOR_EVENT listening for vblank events,
+	 * have a background helper to indirectly enable vblank irqs, and
+	 * listen to the recorded time spent in engine wait state as reported
+	 * by the PMU.
+	 */
+	reg = intel_register_read(DERRMR);
+
+	kmstest_set_vt_graphics_mode();
+	igt_display_init(&data.display, gem_fd);
+
+	bb_handle = gem_create(gem_fd, 4096);
+
+	b = batch;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
+	*b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg;
+	*b++ = MI_BATCH_BUFFER_END;
+
+	obj.handle = bb_handle;
+
+	eb.buffer_count = 1;
+	eb.buffers_ptr = to_user_pointer(&obj);
+	eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
+
+	for_each_pipe_with_valid_output(display, p, output) {
+		struct igt_helper_process waiter = { };
+		const unsigned int frames = 3;
+		unsigned int frame;
+		uint64_t val[2];
+
+		batch[3] = MI_WAIT_FOR_EVENT;
+		switch (p) {
+		case PIPE_A:
+			batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+			break;
+		case PIPE_B:
+			batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+			break;
+		case PIPE_C:
+			batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+			break;
+		default:
+			continue;
+		}
+
+		gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+		data.pipe = p;
+		prepare_crtc(&data, gem_fd, output);
+
+		fd = open_pmu(I915_PMU_ENGINE_WAIT(e->class, e->instance));
+
+		val[0] = pmu_read_single(fd);
+
+		igt_fork_helper(&waiter) {
+			const uint32_t pipe_id_flag =
+					kmstest_get_vbl_flag(data.pipe);
+
+			for (;;) {
+				union drm_wait_vblank vbl = { };
+
+				vbl.request.type = DRM_VBLANK_RELATIVE;
+				vbl.request.type |= pipe_id_flag;
+				vbl.request.sequence = 1;
+				igt_assert_eq(wait_vblank(gem_fd, &vbl), 0);
+			}
+		}
+
+		for (frame = 0; frame < frames; frame++) {
+			gem_execbuf(gem_fd, &eb);
+			gem_sync(gem_fd, bb_handle);
+		}
+
+		igt_stop_helper(&waiter);
+
+		val[1] = pmu_read_single(fd);
+
+		close(fd);
+
+		cleanup_crtc(&data, gem_fd, output);
+		valid_tests++;
+
+		igt_assert(val[1] - val[0] > 0);
+	}
+
+	gem_close(gem_fd, bb_handle);
+
+	intel_register_access_fini();
+
+	igt_require_f(valid_tests,
+		      "no valid crtc/connector combinations found\n");
+}
+
+static void
+multi_client(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	uint64_t config = I915_PMU_ENGINE_BUSY(e->class, e->instance);
+	unsigned int slept;
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd[2];
+
+	fd[0] = open_pmu(config);
+
+	/*
+	 * Second PMU client which is initialized after the first one,
+	 * and exists before it, should not affect accounting as reported
+	 * in the first client.
+	 */
+	fd[1] = open_pmu(config);
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	slept = measured_usleep(batch_duration_ns / 3000);
+	val[1] = pmu_read_single(fd[1]);
+	close(fd[1]);
+
+	gem_sync(gem_fd, spin->handle);
+
+	val[0] = pmu_read_single(fd[0]);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+
+	assert_within_epsilon(val[0], batch_duration_ns, tolerance);
+	assert_within_epsilon(val[1], slept, tolerance);
+}
+
+/**
+ * Tests that i915 PMU corectly errors out in invalid initialization.
+ * i915 PMU is uncore PMU, thus:
+ *  - sampling period is not supported
+ *  - pid > 0 is not supported since we can't count per-process (we count
+ *    per whole system)
+ *  - cpu != 0 is not supported since i915 PMU exposes cpumask for CPU0
+ */
+static void invalid_init(void)
+{
+	struct perf_event_attr attr;
+	int pid, cpu;
+
+#define ATTR_INIT() \
+do { \
+	memset(&attr, 0, sizeof (attr)); \
+	attr.config = I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0); \
+	attr.type = i915_type_id(); \
+	igt_assert(attr.type != 0); \
+} while(0)
+
+	ATTR_INIT();
+	attr.sample_period = 100;
+	pid = -1;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = 0;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = -1;
+	cpu = 1;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, ENODEV);
+}
+
+static void init_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	close(fd);
+}
+
+static void read_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	(void)pmu_read_single(fd);
+
+	close(fd);
+}
+
+static bool cpu0_hotplug_support(void)
+{
+	return access("/sys/devices/system/cpu/cpu0/online", W_OK) == 0;
+}
+
+static void cpu_hotplug(int gem_fd)
+{
+	struct timespec start = { };
+	igt_spin_t *spin;
+	uint64_t val, ref;
+	int fd;
+
+	igt_require(cpu0_hotplug_support());
+
+	fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
+	igt_assert(fd >= 0);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+
+	igt_nsec_elapsed(&start);
+
+	/*
+	 * Toggle online status of all the CPUs in a child process and ensure
+	 * this has not affected busyness stats in the parent.
+	 */
+	igt_fork(child, 1) {
+		int cpu = 0;
+
+		for (;;) {
+			char name[128];
+			int cpufd;
+
+			sprintf(name, "/sys/devices/system/cpu/cpu%d/online",
+				cpu);
+			cpufd = open(name, O_WRONLY);
+			if (cpufd == -1) {
+				igt_assert(cpu > 0);
+				break;
+			}
+			igt_assert_eq(write(cpufd, "0", 2), 2);
+
+			usleep(1e6);
+
+			igt_assert_eq(write(cpufd, "1", 2), 2);
+
+			close(cpufd);
+			cpu++;
+		}
+	}
+
+	igt_waitchildren();
+
+	igt_spin_batch_end(spin);
+	gem_sync(gem_fd, spin->handle);
+
+	ref = igt_nsec_elapsed(&start);
+	val = pmu_read_single(fd);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+}
+
+static unsigned long calibrate_nop(int fd, const unsigned int calibration_us)
+{
+	const unsigned int cal_min_us = calibration_us * 3;
+	const unsigned int tolerance_pct = 10;
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	const unsigned int loops = 17;
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	struct timespec t_begin = { };
+	long size, last_size;
+	unsigned long ns;
+
+	igt_nsec_elapsed(&t_begin);
+
+	size = 256 * 1024;
+	do {
+		struct timespec t_start = { };
+
+		obj.handle = gem_create(fd, size);
+		gem_write(fd, obj.handle, size - sizeof(bbe), &bbe,
+			  sizeof(bbe));
+		gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		igt_nsec_elapsed(&t_start);
+
+		for (int loop = 0; loop < loops; loop++)
+			gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		ns = igt_nsec_elapsed(&t_start);
+
+		gem_close(fd, obj.handle);
+
+		last_size = size;
+		size = calibration_us * 1000 * size * loops / ns;
+		size = ALIGN(size, sizeof(uint32_t));
+	} while (igt_nsec_elapsed(&t_begin) / 1000 < cal_min_us ||
+		 abs(size - last_size) > (size * tolerance_pct / 100));
+
+	return size / sizeof(uint32_t);
+}
+
+static void exec_nop(int gem_fd, unsigned long sz)
+{
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	struct pollfd pfd;
+	int fence;
+
+	sz = ALIGN(sz, sizeof(uint32_t));
+
+	obj.handle = gem_create(gem_fd, sz);
+	gem_write(gem_fd, obj.handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
+
+	eb.flags = I915_EXEC_RENDER | I915_EXEC_FENCE_OUT;
+
+	gem_execbuf_wr(gem_fd, &eb);
+	fence = eb.rsvd2 >> 32;
+
+	/*
+	 * Poll on the output fence to ensure user interrupts will be
+	 * generated and listened to.
+	 */
+	pfd.fd = fence;
+	pfd.events = POLLIN;
+	igt_assert_eq(poll(&pfd, 1, -1), 1);
+
+	close(fence);
+	gem_close(gem_fd, obj.handle);
+}
+
+static void
+test_interrupts(int gem_fd)
+{
+	const unsigned int calibration_us = 250000;
+	const unsigned int batch_len_us = 100000;
+	const unsigned int batch_count = 3e6 / batch_len_us;
+	uint64_t idle, busy, prev;
+	unsigned long cal, sz;
+	unsigned int i;
+	int fd;
+
+	cal = calibrate_nop(gem_fd, calibration_us);
+	sz = batch_len_us * cal / calibration_us;
+
+	fd = open_pmu(I915_PMU_INTERRUPTS);
+
+	gem_quiescent_gpu(gem_fd);
+
+	/* Wait for idle state. */
+	prev = pmu_read_single(fd);
+	idle = prev + 1;
+	while (idle != prev) {
+		usleep(1e6);
+		prev = idle;
+		idle = pmu_read_single(fd);
+	}
+
+	igt_assert_eq(idle - prev, 0);
+
+	/*
+	 * Send some no-op batches waiting on output fences to
+	 * ensure interrupts.
+	 */
+	for (i = 0; i < batch_count; i++)
+		exec_nop(gem_fd, sz);
+
+	/* Check at least as many interrupts has been generated. */
+	busy = pmu_read_single(fd) - idle;
+	close(fd);
+
+	igt_assert(busy >= batch_count);
+}
+
+static void
+test_frequency(int gem_fd)
+{
+	const uint64_t duration_ns = 2e9;
+	uint32_t min_freq, max_freq, boost_freq;
+	uint64_t min[2], max[2], start[2];
+	igt_spin_t *spin;
+	int fd, sysfs;
+
+	sysfs = igt_sysfs_open(gem_fd, NULL);
+	igt_require(sysfs >= 0);
+
+	min_freq = igt_sysfs_get_u32(sysfs, "gt_RPn_freq_mhz");
+	max_freq = igt_sysfs_get_u32(sysfs, "gt_RP0_freq_mhz");
+	boost_freq = igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz");
+	igt_require(min_freq > 0 && max_freq > 0 && boost_freq > 0);
+	igt_require(max_freq > min_freq);
+	igt_require(boost_freq > min_freq);
+
+	fd = open_group(I915_PMU_REQUESTED_FREQUENCY, -1);
+	open_group(I915_PMU_ACTUAL_FREQUENCY, fd);
+
+	/*
+	 * Set GPU to min frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == min_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, min);
+	min[0] -= start[0];
+	min[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	usleep(1e6);
+
+	/*
+	 * Set GPU to max frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == max_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", boost_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == boost_freq);
+
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, max);
+	max[0] -= start[0];
+	max[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	/*
+	 * Restore min/max.
+	 */
+	igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", min_freq);
+	if (igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") != min_freq)
+		igt_warn("Unable to restore min frequency to saved value [%u MHz], now %u MHz\n",
+			 min_freq, igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz"));
+	close(fd);
+
+	igt_assert(min[0] < max[0]);
+	igt_assert(min[1] < max[1]);
+}
+
+static void
+test_rc6(int gem_fd)
+{
+	int64_t duration_ns = 2e9;
+	uint64_t idle, busy, prev;
+	unsigned int slept;
+	int fd, fw;
+
+	fd = open_pmu(I915_PMU_RC6_RESIDENCY);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	prev = pmu_read_single(fd);
+	slept = measured_usleep(duration_ns / 1000);
+	idle = pmu_read_single(fd);
+
+	assert_within_epsilon(idle - prev, slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	prev = pmu_read_single(fd);
+	usleep(duration_ns / 1000);
+	busy = pmu_read_single(fd);
+
+	close(fw);
+	close(fd);
+
+	assert_within_epsilon(busy - prev, 0.0, tolerance);
+}
+
+static void
+test_rc6p(int gem_fd)
+{
+	int64_t duration_ns = 2e9;
+	unsigned int num_pmu = 1;
+	uint64_t idle[3], busy[3], prev[3];
+	unsigned int slept, i;
+	int fd, ret, fw;
+
+	fd = open_group(I915_PMU_RC6_RESIDENCY, -1);
+	ret = perf_i915_open_group(I915_PMU_RC6p_RESIDENCY, fd);
+	if (ret > 0) {
+		num_pmu++;
+		ret = perf_i915_open_group(I915_PMU_RC6pp_RESIDENCY, fd);
+		if (ret > 0)
+			num_pmu++;
+	}
+
+	igt_require(num_pmu == 3);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	pmu_read_multi(fd, num_pmu, prev);
+	slept = measured_usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, idle);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(idle[i] - prev[i], slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	pmu_read_multi(fd, num_pmu, prev);
+	usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, busy);
+
+	close(fw);
+	close(fd);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(busy[i] - prev[i], 0.0, tolerance);
+}
+
+igt_main
+{
+	const unsigned int num_other_metrics =
+				I915_PMU_LAST - __I915_PMU_OTHER(0) + 1;
+	unsigned int num_engines = 0;
+	int fd = -1;
+	const struct intel_execution_engine2 *e;
+	unsigned int i;
+
+	igt_fixture {
+		fd = drm_open_driver_master(DRIVER_INTEL);
+
+		igt_require_gem(fd);
+		igt_require(i915_type_id() > 0);
+
+		for_each_engine_class_instance(fd, e) {
+			if (gem_has_engine(fd, e->class, e->instance))
+				num_engines++;
+		}
+	}
+
+	/**
+	 * Test invalid access via perf API is rejected.
+	 */
+	igt_subtest("invalid-init")
+		invalid_init();
+
+	for_each_engine_class_instance(fd, e) {
+		/**
+		 * Test that a single engine metric can be initialized.
+		 */
+		igt_subtest_f("init-busy-%s", e->name)
+			init(fd, e, I915_SAMPLE_BUSY);
+
+		igt_subtest_f("init-wait-%s", e->name)
+			init(fd, e, I915_SAMPLE_WAIT);
+
+		igt_subtest_f("init-sema-%s", e->name)
+			init(fd, e, I915_SAMPLE_SEMA);
+
+		/**
+		 * Test that engines show no load when idle.
+		 */
+		igt_subtest_f("idle-%s", e->name)
+			single(fd, e, false);
+
+		/**
+		 * Test that a single engine reports load correctly.
+		 */
+		igt_subtest_f("busy-%s", e->name)
+			single(fd, e, true);
+
+		/**
+		 * Test that when one engine is loaded other report no load.
+		 */
+		igt_subtest_f("busy-check-all-%s", e->name)
+			busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that when all except one engine are loaded all loads
+		 * are correctly reported.
+		 */
+		igt_subtest_f("most-busy-check-all-%s", e->name)
+			most_busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that semphore counters report no activity on idle
+		 * or busy engines.
+		 */
+		igt_subtest_f("idle-no-semaphores-%s", e->name)
+			no_sema(fd, e, false);
+
+		igt_subtest_f("busy-no-semaphores-%s", e->name)
+			no_sema(fd, e, true);
+
+		/**
+		 * Test that semaphore waits are correctly reported.
+		 */
+		igt_subtest_f("semaphore-wait-%s", e->name)
+			sema_wait(fd, e);
+
+		/**
+		 * Test that event waits are correctly reported.
+		 */
+		if (e->class == I915_ENGINE_CLASS_RENDER)
+			igt_subtest_f("event-wait-%s", e->name)
+				event_wait(fd, e);
+
+		/**
+		 * Check that two perf clients do not influence each others
+		 * observations.
+		 */
+		igt_subtest_f("multi-client-%s", e->name)
+			multi_client(fd, e);
+	}
+
+	/**
+	 * Test that when all engines are loaded all loads are
+	 * correctly reported.
+	 */
+	igt_subtest("all-busy-check-all")
+		all_busy_check_all(fd, num_engines);
+
+	/**
+	 * Test that non-engine counters can be initialized and read. Apart
+	 * from the invalid metric which should fail.
+	 */
+	for (i = 0; i < num_other_metrics + 1; i++) {
+		igt_subtest_f("other-init-%u", i)
+			init_other(i, i < num_other_metrics);
+
+		igt_subtest_f("other-read-%u", i)
+			read_other(i, i < num_other_metrics);
+	}
+
+	/**
+	 * Test counters are not affected by CPU offline/online events.
+	 */
+	igt_subtest("cpu-hotplug")
+		cpu_hotplug(fd);
+
+	/**
+	 * Test GPU frequency.
+	 */
+	igt_subtest("frequency")
+		test_frequency(fd);
+
+	/**
+	 * Test interrupt count reporting.
+	 */
+	igt_subtest("interrupts")
+		test_interrupts(fd);
+
+	/**
+	 * Test RC6 residency reporting.
+	 */
+	igt_subtest("rc6")
+		test_rc6(fd);
+
+	/**
+	 * Test RC6p residency reporting.
+	 */
+	igt_subtest("rc6p")
+		test_rc6p(fd);
+
+	/**
+	 * Check render nodes are counted.
+	 */
+	igt_subtest_group {
+		int render_fd;
+
+		igt_fixture {
+			render_fd = drm_open_driver_render(DRIVER_INTEL);
+			igt_require_gem(render_fd);
+
+			gem_quiescent_gpu(fd);
+		}
+
+		for_each_engine_class_instance(fd, e) {
+			igt_subtest_f("render-node-busy-%s", e->name)
+				single(fd, e, true);
+		}
+
+		igt_fixture {
+			close(render_fd);
+		}
+	}
+}
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v3 1/9] intel-gpu-overlay: Move local perf implementation to a library
  2017-10-10  9:30 ` [PATCH i-g-t 1/9] intel-gpu-overlay: Move local perf implementation to a library Tvrtko Ursulin
@ 2017-11-21 19:34   ` Tvrtko Ursulin
  0 siblings, 0 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-11-21 19:34 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Idea is to avoid duplication across multiple users in
upcoming patches.

v2: Commit message and use a separate library instead of piggy-
    backing to libintel_tools. (Chris Wilson)

v3: Add Petri's meson build recipe.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
---
 lib/Makefile.am                  | 6 +++++-
 overlay/perf.c => lib/igt_perf.c | 2 +-
 overlay/perf.h => lib/igt_perf.h | 2 ++
 lib/meson.build                  | 4 ++++
 overlay/Makefile.am              | 6 ++----
 overlay/gem-interrupts.c         | 3 ++-
 overlay/gpu-freq.c               | 3 ++-
 overlay/gpu-perf.c               | 3 ++-
 overlay/gpu-top.c                | 3 ++-
 overlay/meson.build              | 2 +-
 overlay/power.c                  | 3 ++-
 overlay/rc6.c                    | 3 ++-
 12 files changed, 27 insertions(+), 13 deletions(-)
 rename overlay/perf.c => lib/igt_perf.c (94%)
 rename overlay/perf.h => lib/igt_perf.h (99%)

diff --git a/lib/Makefile.am b/lib/Makefile.am
index 30ddb92bd0bc..30423dbc8c21 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -7,7 +7,11 @@ include Makefile.sources
 
 libintel_tools_la_SOURCES = $(lib_source_list)
 
-noinst_LTLIBRARIES = libintel_tools.la
+libigt_perf_la_SOURCES = \
+	igt_perf.c	 \
+	igt_perf.h
+
+noinst_LTLIBRARIES = libintel_tools.la libigt_perf.la
 noinst_HEADERS = check-ndebug.h
 
 if HAVE_LIBDRM_VC4
diff --git a/overlay/perf.c b/lib/igt_perf.c
similarity index 94%
rename from overlay/perf.c
rename to lib/igt_perf.c
index b8fdc675c587..45cccff0ae53 100644
--- a/overlay/perf.c
+++ b/lib/igt_perf.c
@@ -3,7 +3,7 @@
 #include <unistd.h>
 #include <stdlib.h>
 
-#include "perf.h"
+#include "igt_perf.h"
 
 uint64_t i915_type_id(void)
 {
diff --git a/overlay/perf.h b/lib/igt_perf.h
similarity index 99%
rename from overlay/perf.h
rename to lib/igt_perf.h
index c44e65f9734c..a80b311cd1d1 100644
--- a/overlay/perf.h
+++ b/lib/igt_perf.h
@@ -1,6 +1,8 @@
 #ifndef I915_PERF_H
 #define I915_PERF_H
 
+#include <stdint.h>
+
 #include <linux/perf_event.h>
 
 #define I915_SAMPLE_BUSY	0
diff --git a/lib/meson.build b/lib/meson.build
index ddf93ec6e350..a1c6eca014ee 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -181,4 +181,8 @@ lib_igt = declare_dependency(link_with : lib_igt_build,
 
 igt_deps = [ lib_igt ] + lib_deps
 
+lib_igt_perf = static_library('igt_perf',
+	['igt_perf.c']
+)
+
 subdir('tests')
diff --git a/overlay/Makefile.am b/overlay/Makefile.am
index 39fbcc4ec3cf..cefde2d040f8 100644
--- a/overlay/Makefile.am
+++ b/overlay/Makefile.am
@@ -4,8 +4,8 @@ endif
 
 AM_CPPFLAGS = -I.
 AM_CFLAGS = $(DRM_CFLAGS) $(PCIACCESS_CFLAGS) $(CWARNFLAGS) \
-	$(CAIRO_CFLAGS) $(OVERLAY_CFLAGS) $(WERROR_CFLAGS)
-LDADD = $(DRM_LIBS) $(PCIACCESS_LIBS) $(CAIRO_LIBS) $(OVERLAY_LIBS)
+	$(CAIRO_CFLAGS) $(OVERLAY_CFLAGS) $(WERROR_CFLAGS) -I$(srcdir)/../lib
+LDADD = $(DRM_LIBS) $(PCIACCESS_LIBS) $(CAIRO_LIBS) $(OVERLAY_LIBS) $(top_builddir)/lib/libigt_perf.la
 
 intel_gpu_overlay_SOURCES = \
 	chart.h \
@@ -29,8 +29,6 @@ intel_gpu_overlay_SOURCES = \
 	igfx.c \
 	overlay.h \
 	overlay.c \
-	perf.h \
-	perf.c \
 	power.h \
 	power.c \
 	rc6.h \
diff --git a/overlay/gem-interrupts.c b/overlay/gem-interrupts.c
index 0150a1d03825..7ba54fcd487d 100644
--- a/overlay/gem-interrupts.c
+++ b/overlay/gem-interrupts.c
@@ -31,9 +31,10 @@
 #include <string.h>
 #include <ctype.h>
 
+#include "igt_perf.h"
+
 #include "gem-interrupts.h"
 #include "debugfs.h"
-#include "perf.h"
 
 static int perf_open(void)
 {
diff --git a/overlay/gpu-freq.c b/overlay/gpu-freq.c
index 321c93882238..7f29b1aa986e 100644
--- a/overlay/gpu-freq.c
+++ b/overlay/gpu-freq.c
@@ -28,9 +28,10 @@
 #include <string.h>
 #include <stdio.h>
 
+#include "igt_perf.h"
+
 #include "gpu-freq.h"
 #include "debugfs.h"
-#include "perf.h"
 
 static int perf_i915_open(int config, int group)
 {
diff --git a/overlay/gpu-perf.c b/overlay/gpu-perf.c
index f557b9f06a17..3d4a9be91a94 100644
--- a/overlay/gpu-perf.c
+++ b/overlay/gpu-perf.c
@@ -34,7 +34,8 @@
 #include <fcntl.h>
 #include <errno.h>
 
-#include "perf.h"
+#include "igt_perf.h"
+
 #include "gpu-perf.h"
 #include "debugfs.h"
 
diff --git a/overlay/gpu-top.c b/overlay/gpu-top.c
index 891a7ea7c0b1..06f489dfdc83 100644
--- a/overlay/gpu-top.c
+++ b/overlay/gpu-top.c
@@ -31,7 +31,8 @@
 #include <errno.h>
 #include <assert.h>
 
-#include "perf.h"
+#include "igt_perf.h"
+
 #include "igfx.h"
 #include "gpu-top.h"
 
diff --git a/overlay/meson.build b/overlay/meson.build
index a92ef89542d3..ffc011cce998 100644
--- a/overlay/meson.build
+++ b/overlay/meson.build
@@ -10,7 +10,6 @@ gpu_overlay_src = [
 	'gpu-freq.c',
 	'igfx.c',
 	'overlay.c',
-	'perf.c',
 	'power.c',
 	'rc6.c',
 ]
@@ -56,5 +55,6 @@ if xrandr.found() and cairo.found()
 			include_directories : inc,
 			c_args : gpu_overlay_cflags,
 			dependencies : gpu_overlay_deps,
+			link_with : lib_igt_perf,
 			install : true)
 endif
diff --git a/overlay/power.c b/overlay/power.c
index 2f1521b82cd6..84d860cae40c 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -31,7 +31,8 @@
 #include <time.h>
 #include <errno.h>
 
-#include "perf.h"
+#include "igt_perf.h"
+
 #include "power.h"
 #include "debugfs.h"
 
diff --git a/overlay/rc6.c b/overlay/rc6.c
index d7047c2f4880..3175bb22308f 100644
--- a/overlay/rc6.c
+++ b/overlay/rc6.c
@@ -31,8 +31,9 @@
 #include <time.h>
 #include <errno.h>
 
+#include "igt_perf.h"
+
 #include "rc6.h"
-#include "perf.h"
 
 static int perf_i915_open(int config, int group)
 {
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v4 6/9] intel-gpu-overlay: Use RAPL PMU for power reading
  2017-10-10 12:05     ` [PATCH i-g-t v3 " Tvrtko Ursulin
  2017-10-10 12:25       ` Chris Wilson
@ 2017-11-21 19:35       ` Tvrtko Ursulin
  1 sibling, 0 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-11-21 19:35 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Wire up to the RAPL PMU for GPU energy readings.

The only complication is that we have to add code to parse:

 # cat /sys/devices/power/events/energy-gpu.scale
 2.3283064365386962890625e-10

v2: Link with -lm.
v3: strtod can handle scientific notation, even though my initial
    reading of the man page did not spot that. (Chris Wilson)
v4: Meson fix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
---
 lib/igt_perf.c      |  16 +++++--
 lib/igt_perf.h      |   1 +
 overlay/Makefile.am |   2 +-
 overlay/meson.build |   2 +-
 overlay/power.c     | 127 ++++++++++++++++++++++++++++++++++++----------------
 overlay/power.h     |   2 +
 6 files changed, 105 insertions(+), 45 deletions(-)

diff --git a/lib/igt_perf.c b/lib/igt_perf.c
index 208474302fcc..0221461e918f 100644
--- a/lib/igt_perf.c
+++ b/lib/igt_perf.c
@@ -27,11 +27,12 @@ uint64_t i915_type_id(void)
 	return strtoull(buf, NULL, 0);
 }
 
-static int _perf_open(uint64_t config, int group, uint64_t format)
+static int
+_perf_open(uint64_t type, uint64_t config, int group, uint64_t format)
 {
 	struct perf_event_attr attr = { };
 
-	attr.type = i915_type_id();
+	attr.type = type;
 	if (attr.type == 0)
 		return -ENOENT;
 
@@ -46,11 +47,18 @@ static int _perf_open(uint64_t config, int group, uint64_t format)
 
 int perf_i915_open(uint64_t config)
 {
-	return _perf_open(config, -1, PERF_FORMAT_TOTAL_TIME_ENABLED);
+	return _perf_open(i915_type_id(), config, -1,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED);
 }
 
 int perf_i915_open_group(uint64_t config, int group)
 {
-	return _perf_open(config, group,
+	return _perf_open(i915_type_id(), config, group,
 			  PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_GROUP);
 }
+
+int igt_perf_open(uint64_t type, uint64_t config)
+{
+	return _perf_open(type, config, -1,
+			  PERF_FORMAT_TOTAL_TIME_ENABLED);
+}
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index db07b00a7b6b..938d548891c5 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -99,5 +99,6 @@ perf_event_open(struct perf_event_attr *attr,
 uint64_t i915_type_id(void);
 int perf_i915_open(uint64_t config);
 int perf_i915_open_group(uint64_t config, int group);
+int igt_perf_open(uint64_t type, uint64_t config);
 
 #endif /* I915_PERF_H */
diff --git a/overlay/Makefile.am b/overlay/Makefile.am
index cefde2d040f8..f49f54ac3590 100644
--- a/overlay/Makefile.am
+++ b/overlay/Makefile.am
@@ -63,7 +63,7 @@ intel_gpu_overlay_SOURCES += \
 
 intel_gpu_overlay_SOURCES += $(both_x11_sources)
 
-intel_gpu_overlay_LDADD = $(LDADD) -lrt
+intel_gpu_overlay_LDADD = $(LDADD) -lrt -lm
 
 EXTRA_DIST= \
 	README \
diff --git a/overlay/meson.build b/overlay/meson.build
index ffc011cce998..6b479eb89890 100644
--- a/overlay/meson.build
+++ b/overlay/meson.build
@@ -21,7 +21,7 @@ dri2proto = dependency('dri2proto', version : '>= 2.6', required : false)
 cairo_xlib = dependency('cairo-xlib', required : false)
 xrandr = dependency('xrandr', version : '>=1.3', required : false)
 
-gpu_overlay_deps = [ realtime, cairo, pciaccess, libdrm, libdrm_intel ]
+gpu_overlay_deps = [ realtime, math, cairo, pciaccess, libdrm, libdrm_intel ]
 
 both_x11_src = ''
 
diff --git a/overlay/power.c b/overlay/power.c
index 805f4ca7805c..9ac90fde8786 100644
--- a/overlay/power.c
+++ b/overlay/power.c
@@ -30,60 +30,107 @@
 #include <fcntl.h>
 #include <time.h>
 #include <errno.h>
+#include <ctype.h>
+#include <math.h>
 
 #include "igt_perf.h"
 
 #include "power.h"
 #include "debugfs.h"
 
-/* XXX Is this exposed through RAPL? */
-
-int power_init(struct power *power)
+static int
+filename_to_buf(const char *filename, char *buf, unsigned int bufsize)
 {
-	char buf[4096];
-	int fd, len;
-
-	memset(power, 0, sizeof(*power));
-
-	power->fd = -1;
+	int fd;
+	ssize_t ret;
 
-	sprintf(buf, "%s/i915_energy_uJ", debugfs_dri_path);
-	fd = open(buf, 0);
+	fd = open(filename, O_RDONLY);
 	if (fd < 0)
-		return power->error = errno;
+		return -1;
 
-	len = read(fd, buf, sizeof(buf));
+	ret = read(fd, buf, bufsize - 1);
 	close(fd);
+	if (ret < 1)
+		return -1;
 
-	if (len < 0)
-		return power->error = errno;
-
-	buf[len] = '\0';
-	if (strtoull(buf, 0, 0) == 0)
-		return power->error = EINVAL;
+	buf[ret] = '\0';
 
 	return 0;
 }
 
-static uint64_t file_to_u64(const char *name)
+static uint64_t filename_to_u64(const char *filename, int base)
 {
-	char buf[4096];
-	int fd, len;
+	char buf[64], *b;
 
-	sprintf(buf, "%s/%s", debugfs_dri_path, name);
-	fd = open(buf, 0);
-	if (fd < 0)
+	if (filename_to_buf(filename, buf, sizeof(buf)))
 		return 0;
 
-	len = read(fd, buf, sizeof(buf)-1);
-	close(fd);
+	/*
+	 * Handle both single integer and key=value formats by skipping
+	 * leading non-digits.
+	 */
+	b = buf;
+	while (*b && !isdigit(*b))
+		b++;
+
+	return strtoull(b, NULL, base);
+}
+
+static uint64_t debugfs_file_to_u64(const char *name)
+{
+	char buf[1024];
+
+	snprintf(buf, sizeof(buf), "%s/%s", debugfs_dri_path, name);
+
+	return filename_to_u64(buf, 0);
+}
+
+static uint64_t rapl_type_id(void)
+{
+	return filename_to_u64("/sys/devices/power/type", 10);
+}
+
+static uint64_t rapl_gpu_power(void)
+{
+	return filename_to_u64("/sys/devices/power/events/energy-gpu", 0);
+}
 
-	if (len < 0)
+static double filename_to_double(const char *filename)
+{
+	char buf[64];
+
+	if (filename_to_buf(filename, buf, sizeof(buf)))
 		return 0;
 
-	buf[len] = '\0';
+	return strtod(buf, NULL);
+}
+
+static double rapl_gpu_power_scale(void)
+{
+	return filename_to_double("/sys/devices/power/events/energy-gpu.scale");
+}
+
+int power_init(struct power *power)
+{
+	uint64_t val;
+
+	memset(power, 0, sizeof(*power));
+
+	power->fd = igt_perf_open(rapl_type_id(), rapl_gpu_power());
+	if (power->fd >= 0) {
+		power->rapl_scale = rapl_gpu_power_scale();
+
+		if (power->rapl_scale != NAN) {
+			power->rapl_scale *= 1e3; /* from nano to micro */
+			return 0;
+		}
+	}
+
+	val = debugfs_file_to_u64("i915_energy_uJ");
+	if (val == 0)
+		return power->error = EINVAL;
 
-	return strtoull(buf, 0, 0);
+	return 0;
 }
 
 static uint64_t clock_ms_to_u64(void)
@@ -93,30 +140,30 @@ static uint64_t clock_ms_to_u64(void)
 	if (clock_gettime(CLOCK_MONOTONIC, &tv) < 0)
 		return 0;
 
-	return (uint64_t)tv.tv_sec * 1000 + tv.tv_nsec / 1000000;
+	return (uint64_t)tv.tv_sec * 1e3 + tv.tv_nsec / 1e6;
 }
 
 int power_update(struct power *power)
 {
-	struct power_stat *s = &power->stat[power->count++&1];
-	struct power_stat *d = &power->stat[power->count&1];
+	struct power_stat *s = &power->stat[power->count++ & 1];
+	struct power_stat *d = &power->stat[power->count & 1];
 	uint64_t d_time;
 
 	if (power->error)
 		return power->error;
 
-	if (power->fd != -1) {
+	if (power->fd >= 0) {
 		uint64_t data[2];
 		int len;
 
 		len = read(power->fd, data, sizeof(data));
-		if (len < 0)
+		if (len != sizeof(data))
 			return power->error = errno;
 
-		s->energy = data[0];
-		s->timestamp = data[1] / (1000*1000);
+		s->energy = llround((double)data[0] * power->rapl_scale);
+		s->timestamp = data[1] / 1e6;
 	} else {
-		s->energy = file_to_u64("i915_energy_uJ");
+		s->energy = debugfs_file_to_u64("i915_energy_uJ") / 1e3;
 		s->timestamp = clock_ms_to_u64();
 	}
 
@@ -124,7 +171,9 @@ int power_update(struct power *power)
 		return EAGAIN;
 
 	d_time = s->timestamp - d->timestamp;
-	power->power_mW = (s->energy - d->energy) / d_time;
+	power->power_mW = round((double)(s->energy - d->energy) *
+				(1e3f / d_time));
 	power->new_sample = 1;
+
 	return 0;
 }
diff --git a/overlay/power.h b/overlay/power.h
index bf8346ce46b4..28abfc32234b 100644
--- a/overlay/power.h
+++ b/overlay/power.h
@@ -39,6 +39,8 @@ struct power {
 	int new_sample;
 
 	uint64_t power_mW;
+
+	double rapl_scale;
 };
 
 int power_init(struct power *power);
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v9 7/9] tests/perf_pmu: Tests for i915 PMU API
  2017-11-21 18:21               ` [PATCH i-g-t v8 " Tvrtko Ursulin
@ 2017-11-21 19:36                 ` Tvrtko Ursulin
  0 siblings, 0 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-11-21 19:36 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A bunch of tests for the new i915 PMU feature.

Parts of the code were initialy sketched by Dmitry Rogozhkin.

v2: (Most suggestions by Chris Wilson)
 * Add new class/instance based engine list.
 * Add gem_has_engine/gem_require_engine to work with class/instance.
 * Use the above two throughout the test.
 * Shorten tests to 100ms busy batches, seems enough.
 * Add queued counter sanity checks.
 * Use igt_nsec_elapsed.
 * Skip on perf -ENODEV in some tests instead of embedding knowledge locally.
 * Fix multi ordering for busy accounting.
 * Use new guranteed_usleep when sleep time is asserted on.
 * Check for no queued when idle/busy.
 * Add queued counter init test.
 * Add queued tests.
 * Consolidate and increase multiple busy engines tests to most-busy and
   all-busy tests.
 * Guarantte interrupts by using fences.
 * Test RC6 via forcewake.

v3:
 * Tweak assert in interrupts subtest.
 * Sprinkle of comments.
 * Fix multi-client test which got broken in v2.

v4:
 * Measured instead of guaranteed sleep.
 * Missing sync in no_sema.
 * Log busyness before asserts for debug.
 * access(2) instead of open(2) to determine if cpu0 is hotpluggable.
 * Test frequency reporting via min/max setting instead assuming.
   ^^ All above suggested by Chris Wilson. ^^
 * Drop queued subtests to match i915.
 * Use long batches with fences to ensure interrupts.
 * Test render node as well.

v5:
 * Add to meson build. (Petri Latvala)
 * Use 1eN constants. (Chris Wilson)
 * Add tests for semaphore and event waiting.

v6:
 * Fix interrupts subtest by polling the fence from the "outside".
   (Chris Wilson)

v7:
 * Assert number of initialized engines matches the expectation.
   (Chris Wilson)
 * Warn instead of skipping if we couldn't restore the initial
   frequency. (Chris Wilson)
 * Move all asserts to after the test cleanup (just a tidy).
 * More 1eN notation for timeouts.
 * Bump the tolerance to 5% since I saw a few noisy runs with
   sampling counters.
 * Always start the PMU before submitting batches to lower
   reliance on i915 doing the delayed engine busy stats disable.

v8:
 * Update for upstream engine class enum.

v9:
 * Add meson build support.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v8)
---
 lib/igt_gt.c           |   50 ++
 lib/igt_gt.h           |   38 ++
 lib/igt_perf.h         |    9 +-
 tests/Makefile.am      |    1 +
 tests/Makefile.sources |    1 +
 tests/meson.build      |    6 +
 tests/perf_pmu.c       | 1242 ++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 1339 insertions(+), 8 deletions(-)
 create mode 100644 tests/perf_pmu.c

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index 89727d22dd5c..63a06611f16d 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -608,3 +608,53 @@ bool gem_can_store_dword(int fd, unsigned int engine)
 
 	return true;
 }
+
+const struct intel_execution_engine2 intel_execution_engines2[] = {
+	{ "rcs0", I915_ENGINE_CLASS_RENDER, 0 },
+	{ "bcs0", I915_ENGINE_CLASS_COPY, 0 },
+	{ "vcs0", I915_ENGINE_CLASS_VIDEO, 0 },
+	{ "vcs1", I915_ENGINE_CLASS_VIDEO, 1 },
+	{ "vecs0", I915_ENGINE_CLASS_VIDEO_ENHANCE, 0 },
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance)
+{
+	if (class != I915_ENGINE_CLASS_VIDEO)
+		igt_assert(instance == 0);
+	else
+		igt_assert(instance >= 0 && instance <= 1);
+
+	switch (class) {
+	case I915_ENGINE_CLASS_RENDER:
+		return I915_EXEC_RENDER;
+	case I915_ENGINE_CLASS_COPY:
+		return I915_EXEC_BLT;
+	case I915_ENGINE_CLASS_VIDEO:
+		if (instance == 0) {
+			if (gem_has_bsd2(gem_fd))
+				return I915_EXEC_BSD | I915_EXEC_BSD_RING1;
+			else
+				return I915_EXEC_BSD;
+
+		} else {
+			return I915_EXEC_BSD | I915_EXEC_BSD_RING2;
+		}
+	case I915_ENGINE_CLASS_VIDEO_ENHANCE:
+		return I915_EXEC_VEBOX;
+	case I915_ENGINE_CLASS_INVALID:
+	default:
+		igt_assert(0);
+	};
+}
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance)
+{
+	return gem_has_ring(gem_fd,
+			    gem_class_instance_to_eb_flags(gem_fd, class,
+							   instance));
+}
diff --git a/lib/igt_gt.h b/lib/igt_gt.h
index 2579cbd37be7..48ed48af8117 100644
--- a/lib/igt_gt.h
+++ b/lib/igt_gt.h
@@ -25,6 +25,7 @@
 #define IGT_GT_H
 
 #include "igt_debugfs.h"
+#include "igt_core.h"
 
 void igt_require_hang_ring(int fd, int ring);
 
@@ -80,4 +81,41 @@ extern const struct intel_execution_engine {
 
 bool gem_can_store_dword(int fd, unsigned int engine);
 
+extern const struct intel_execution_engine2 {
+	const char *name;
+	int class;
+	int instance;
+} intel_execution_engines2[];
+
+#define for_each_engine_class_instance(fd__, e__) \
+	for ((e__) = intel_execution_engines2;\
+	     (e__)->name; \
+	     (e__)++)
+
+enum drm_i915_gem_engine_class {
+	I915_ENGINE_CLASS_RENDER 	= 0,
+	I915_ENGINE_CLASS_COPY		= 1,
+	I915_ENGINE_CLASS_VIDEO		= 2,
+	I915_ENGINE_CLASS_VIDEO_ENHANCE	= 3,
+
+	I915_ENGINE_CLASS_INVALID	= -1
+};
+
+unsigned int
+gem_class_instance_to_eb_flags(int gem_fd,
+			       enum drm_i915_gem_engine_class class,
+			       unsigned int instance);
+
+bool gem_has_engine(int gem_fd,
+		    enum drm_i915_gem_engine_class class,
+		    unsigned int instance);
+
+static inline
+void gem_require_engine(int gem_fd,
+			enum drm_i915_gem_engine_class class,
+			unsigned int instance)
+{
+	igt_require(gem_has_engine(gem_fd, class, instance));
+}
+
 #endif /* IGT_GT_H */
diff --git a/lib/igt_perf.h b/lib/igt_perf.h
index 938d548891c5..5428feb0c746 100644
--- a/lib/igt_perf.h
+++ b/lib/igt_perf.h
@@ -29,14 +29,7 @@
 
 #include <linux/perf_event.h>
 
-enum drm_i915_gem_engine_class {
-	I915_ENGINE_CLASS_RENDER 	= 0,
-	I915_ENGINE_CLASS_COPY		= 1,
-	I915_ENGINE_CLASS_VIDEO		= 2,
-	I915_ENGINE_CLASS_VIDEO_ENHANCE	= 3,
-
-	I915_ENGINE_CLASS_INVALID	= -1
-};
+#include "igt_gt.h"
 
 enum drm_i915_pmu_engine_sample {
 	I915_SAMPLE_BUSY = 0,
diff --git a/tests/Makefile.am b/tests/Makefile.am
index db360523dad6..1bc1c57a7452 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -131,6 +131,7 @@ gen7_forcewake_mt_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gen7_forcewake_mt_LDADD = $(LDADD) -lpthread
 gem_userptr_blits_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_userptr_blits_LDADD = $(LDADD) -lpthread
+perf_pmu_LDADD = $(LDADD) $(top_builddir)/lib/libigt_perf.la
 
 gem_wait_LDADD = $(LDADD) -lrt
 kms_flip_LDADD = $(LDADD) -lrt -lpthread
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 2313c12b508c..2c22242e0113 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -215,6 +215,7 @@ TESTS_progs = \
 	kms_vblank \
 	meta_test \
 	perf \
+	perf_pmu \
 	pm_backlight \
 	pm_lpsp \
 	pm_rc6_residency \
diff --git a/tests/meson.build b/tests/meson.build
index 20ff79dcb15f..ece4ceaefbb6 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -195,6 +195,7 @@ test_progs = [
 	'kms_vblank',
 	'meta_test',
 	'perf',
+	'perf_pmu',
 	'pm_backlight',
 	'pm_lpsp',
 	'pm_rc6_residency',
@@ -266,9 +267,14 @@ endif
 libexecdir = join_paths(get_option('prefix'), get_option('libexecdir'), 'intel-gpu-tools')
 
 foreach prog : test_progs
+	link = []
+	if prog == 'perf_pmu'
+		link += lib_igt_perf
+	endif
 	executable(prog, prog + '.c',
 		   dependencies : test_deps,
 		   install_dir : libexecdir,
+		   link_with : link,
 		   install : true)
 endforeach
 
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
new file mode 100644
index 000000000000..8585ed7bcee8
--- /dev/null
+++ b/tests/perf_pmu.c
@@ -0,0 +1,1242 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/times.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <time.h>
+#include <poll.h>
+
+#include "igt.h"
+#include "igt_core.h"
+#include "igt_perf.h"
+#include "igt_sysfs.h"
+
+IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
+
+const double tolerance = 0.05f;
+const unsigned long batch_duration_ns = 100e6;
+
+static int open_pmu(uint64_t config)
+{
+	int fd;
+
+	fd = perf_i915_open(config);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static int open_group(uint64_t config, int group)
+{
+	int fd;
+
+	fd = perf_i915_open_group(config, group);
+	igt_require(fd >= 0 || (fd < 0 && errno != ENODEV));
+	igt_assert(fd >= 0);
+
+	return fd;
+}
+
+static void
+init(int gem_fd, const struct intel_execution_engine2 *e, uint8_t sample)
+{
+	int fd;
+
+	fd = open_pmu(__I915_PMU_ENGINE(e->class, e->instance, sample));
+
+	close(fd);
+}
+
+static uint64_t pmu_read_single(int fd)
+{
+	uint64_t data[2];
+
+	igt_assert_eq(read(fd, data, sizeof(data)), sizeof(data));
+
+	return data[0];
+}
+
+static void pmu_read_multi(int fd, unsigned int num, uint64_t *val)
+{
+	uint64_t buf[2 + num];
+	unsigned int i;
+
+	igt_assert_eq(read(fd, buf, sizeof(buf)), sizeof(buf));
+
+	for (i = 0; i < num; i++)
+		val[i] = buf[2 + i];
+}
+
+#define assert_within_epsilon(x, ref, tolerance) \
+	igt_assert_f((double)(x) <= (1.0 + tolerance) * (double)ref && \
+		     (double)(x) >= (1.0 - tolerance) * (double)ref, \
+		     "'%s' != '%s' (%f not within %f%% tolerance of %f)\n",\
+		     #x, #ref, (double)x, tolerance * 100.0, (double)ref)
+
+/*
+ * Helper for cases where we assert on time spent sleeping (directly or
+ * indirectly), so make it more robust by ensuring the system sleep time
+ * is within test tolerance to start with.
+ */
+static unsigned int measured_usleep(unsigned int usec)
+{
+	uint64_t slept = 0;
+
+	while (usec > 0) {
+		struct timespec start = { };
+		uint64_t this_sleep;
+
+		igt_nsec_elapsed(&start);
+		usleep(usec);
+		this_sleep = igt_nsec_elapsed(&start);
+		slept += this_sleep;
+		if (this_sleep > usec * 1000)
+			break;
+		usec -= this_sleep;
+	}
+
+	return slept;
+}
+
+static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	return gem_class_instance_to_eb_flags(gem_fd, e->class, e->instance);
+}
+
+static void
+single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	double ref = busy ? batch_duration_ns : 0.0f;
+	igt_spin_t *spin;
+	uint64_t val;
+	int fd;
+
+	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	val = pmu_read_single(fd);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+}
+
+static void log_busy(int fd, unsigned int num_engines, uint64_t *val)
+{
+	char buf[1024];
+	int rem = sizeof(buf);
+	unsigned int i;
+	char *p = buf;
+
+	for (i = 0; i < num_engines; i++) {
+		int len;
+
+		len = snprintf(p, rem, "%u=%" PRIu64 "\n",  i, val[i]);
+		igt_assert(len > 0);
+		rem -= len;
+		p += len;
+	}
+
+	igt_info("%s", buf);
+}
+
+static void
+busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+	       const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin;
+	unsigned int busy_idx, i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+		else if (e == e_)
+			busy_idx = i;
+
+		fd[i++] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							  e_->instance),
+				     fd[0]);
+	}
+
+	igt_assert_eq(i, num_engines);
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+
+	assert_within_epsilon(val[busy_idx], batch_duration_ns, tolerance);
+	for (i = 0; i < num_engines; i++) {
+		if (i == busy_idx)
+			continue;
+		assert_within_epsilon(val[i], 0.0f, tolerance);
+	}
+
+}
+
+static void
+most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
+		    const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e_;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int idle_idx, i;
+
+	gem_require_engine(gem_fd, e->class, e->instance);
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e_) {
+		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e_->class,
+							e_->instance),
+				   fd[0]);
+
+		if (e == e_) {
+			idle_idx = i;
+		} else {
+			spin[i] = igt_spin_batch_new(gem_fd, 0,
+						     e2ring(gem_fd, e_), 0);
+			igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+		}
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			gem_sync(gem_fd, spin[i]->handle);
+	}
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++) {
+		if (i != idle_idx)
+			igt_spin_batch_free(gem_fd, spin[i]);
+	}
+	close(fd[0]);
+
+	for (i = 0; i < num_engines; i++) {
+		if (i == idle_idx)
+			assert_within_epsilon(val[i], 0.0f, tolerance);
+		else
+			assert_within_epsilon(val[i], batch_duration_ns,
+					      tolerance);
+	}
+}
+
+static void
+all_busy_check_all(int gem_fd, const unsigned int num_engines)
+{
+	const struct intel_execution_engine2 *e;
+	uint64_t val[num_engines];
+	int fd[num_engines];
+	igt_spin_t *spin[num_engines];
+	unsigned int i;
+
+	i = 0;
+	fd[0] = -1;
+	for_each_engine_class_instance(fd, e) {
+		if (!gem_has_engine(gem_fd, e->class, e->instance))
+			continue;
+
+		fd[i] = open_group(I915_PMU_ENGINE_BUSY(e->class, e->instance),
+				   fd[0]);
+
+		spin[i] = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin[i], batch_duration_ns);
+
+		i++;
+	}
+
+	for (i = 0; i < num_engines; i++)
+		gem_sync(gem_fd, spin[i]->handle);
+
+	pmu_read_multi(fd[0], num_engines, val);
+	log_busy(fd[0], num_engines, val);
+
+	for (i = 0; i < num_engines; i++)
+		igt_spin_batch_free(gem_fd, spin[i]);
+	close(fd[0]);
+
+	for (i = 0; i < num_engines; i++)
+		assert_within_epsilon(val[i], batch_duration_ns, tolerance);
+}
+
+static void
+no_sema(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
+{
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd;
+
+	fd = open_group(I915_PMU_ENGINE_SEMA(e->class, e->instance), -1);
+	open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
+
+	if (busy) {
+		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		igt_spin_batch_set_timeout(spin, batch_duration_ns);
+	} else {
+		usleep(batch_duration_ns / 1000);
+	}
+
+	if (busy)
+		gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, val);
+
+	if (busy)
+		igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val[0], 0.0f, tolerance);
+	assert_within_epsilon(val[1], 0.0f, tolerance);
+}
+
+#define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
+#define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
+#define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+
+static void
+sema_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_relocation_entry reloc = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	struct drm_i915_gem_exec_object2 obj[2];
+	uint32_t bb_handle, obj_handle;
+	unsigned long slept;
+	uint32_t *obj_ptr;
+	uint32_t batch[6];
+	uint64_t val[2];
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 8);
+
+	/**
+	 * Setup up a batchbuffer with a polling semaphore wait command which
+	 * will wait on an value in a shared bo to change. This way we are able
+	 * to control how much time we will spend in this bb.
+	 */
+
+	bb_handle = gem_create(gem_fd, 4096);
+	obj_handle = gem_create(gem_fd, 4096);
+
+	obj_ptr = gem_mmap__wc(gem_fd, obj_handle, 0, 4096, PROT_WRITE);
+
+	batch[0] = MI_SEMAPHORE_WAIT |
+		   MI_SEMAPHORE_POLL |
+		   MI_SEMAPHORE_SAD_GTE_SDD;
+	batch[1] = 1;
+	batch[2] = 0x0;
+	batch[3] = 0x0;
+	batch[4] = MI_NOOP;
+	batch[5] = MI_BATCH_BUFFER_END;
+
+	gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+	reloc.target_handle = obj_handle;
+	reloc.offset = 2 * sizeof(uint32_t);
+	reloc.read_domains = I915_GEM_DOMAIN_RENDER;
+
+	memset(obj, 0, sizeof(obj));
+
+	obj[0].handle = obj_handle;
+
+	obj[1].handle = bb_handle;
+	obj[1].relocation_count = 1;
+	obj[1].relocs_ptr = to_user_pointer(&reloc);
+
+	eb.buffer_count = 2;
+	eb.buffers_ptr = to_user_pointer(obj);
+	eb.flags = e2ring(gem_fd, e);
+
+	/**
+	 * Start the semaphore wait PMU and after some known time let the above
+	 * semaphore wait command finish. Then check that the PMU is reporting
+	 * to expected time spent in semaphore wait state.
+	 */
+
+	fd = open_pmu(I915_PMU_ENGINE_SEMA(e->class, e->instance));
+
+	val[0] = pmu_read_single(fd);
+
+	gem_execbuf(gem_fd, &eb);
+
+	slept = measured_usleep(100e3);
+
+	*obj_ptr = 1;
+
+	gem_sync(gem_fd, bb_handle);
+
+	val[1] = pmu_read_single(fd);
+
+	munmap(obj_ptr, 4096);
+	gem_close(gem_fd, obj_handle);
+	gem_close(gem_fd, bb_handle);
+	close(fd);
+
+	assert_within_epsilon(val[1] - val[0], slept, tolerance);
+}
+
+#define   MI_WAIT_FOR_PIPE_C_VBLANK (1<<21)
+#define   MI_WAIT_FOR_PIPE_B_VBLANK (1<<11)
+#define   MI_WAIT_FOR_PIPE_A_VBLANK (1<<3)
+
+typedef struct {
+	igt_display_t display;
+	struct igt_fb primary_fb;
+	igt_output_t *output;
+	enum pipe pipe;
+} data_t;
+
+static void prepare_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	drmModeModeInfo *mode;
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	/* select the pipe we want to use */
+	igt_output_set_pipe(output, data->pipe);
+
+	/* create and set the primary plane fb */
+	mode = igt_output_get_mode(output);
+	igt_create_color_fb(fd, mode->hdisplay, mode->vdisplay,
+			    DRM_FORMAT_XRGB8888,
+			    LOCAL_DRM_FORMAT_MOD_NONE,
+			    0.0, 0.0, 0.0,
+			    &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, &data->primary_fb);
+
+	igt_display_commit(display);
+
+	igt_wait_for_vblank(fd, data->pipe);
+}
+
+static void cleanup_crtc(data_t *data, int fd, igt_output_t *output)
+{
+	igt_display_t *display = &data->display;
+	igt_plane_t *primary;
+
+	igt_remove_fb(fd, &data->primary_fb);
+
+	primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+	igt_plane_set_fb(primary, NULL);
+
+	igt_output_set_pipe(output, PIPE_ANY);
+	igt_display_commit(display);
+}
+
+static int wait_vblank(int fd, union drm_wait_vblank *vbl)
+{
+	int err;
+
+	err = 0;
+	if (igt_ioctl(fd, DRM_IOCTL_WAIT_VBLANK, vbl))
+		err = -errno;
+
+	return err;
+}
+
+static void
+event_wait(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	struct drm_i915_gem_exec_object2 obj = { };
+	struct drm_i915_gem_execbuffer2 eb = { };
+	data_t data;
+	igt_display_t *display = &data.display;
+	const uint32_t DERRMR = 0x44050;
+	unsigned int valid_tests = 0;
+	uint32_t batch[8], *b;
+	igt_output_t *output;
+	uint32_t bb_handle;
+	uint32_t reg;
+	enum pipe p;
+	int fd;
+
+	igt_require(intel_gen(intel_get_drm_devid(gem_fd)) >= 6);
+	igt_require(intel_register_access_init(intel_get_pci_device(),
+					       false, gem_fd) == 0);
+
+	/**
+	 * We will use the display to render event forwarind so need to
+	 * program the DERRMR register and restore it at exit.
+	 *
+	 * We will emit a MI_WAIT_FOR_EVENT listening for vblank events,
+	 * have a background helper to indirectly enable vblank irqs, and
+	 * listen to the recorded time spent in engine wait state as reported
+	 * by the PMU.
+	 */
+	reg = intel_register_read(DERRMR);
+
+	kmstest_set_vt_graphics_mode();
+	igt_display_init(&data.display, gem_fd);
+
+	bb_handle = gem_create(gem_fd, 4096);
+
+	b = batch;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg & ~((1 << 3) | (1 << 11) | (1 << 21));
+	*b++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_PIPE_A_VBLANK;
+	*b++ = MI_LOAD_REGISTER_IMM;
+	*b++ = DERRMR;
+	*b++ = reg;
+	*b++ = MI_BATCH_BUFFER_END;
+
+	obj.handle = bb_handle;
+
+	eb.buffer_count = 1;
+	eb.buffers_ptr = to_user_pointer(&obj);
+	eb.flags = e2ring(gem_fd, e) | I915_EXEC_SECURE;
+
+	for_each_pipe_with_valid_output(display, p, output) {
+		struct igt_helper_process waiter = { };
+		const unsigned int frames = 3;
+		unsigned int frame;
+		uint64_t val[2];
+
+		batch[3] = MI_WAIT_FOR_EVENT;
+		switch (p) {
+		case PIPE_A:
+			batch[3] |= MI_WAIT_FOR_PIPE_A_VBLANK;
+			break;
+		case PIPE_B:
+			batch[3] |= MI_WAIT_FOR_PIPE_B_VBLANK;
+			break;
+		case PIPE_C:
+			batch[3] |= MI_WAIT_FOR_PIPE_C_VBLANK;
+			break;
+		default:
+			continue;
+		}
+
+		gem_write(gem_fd, bb_handle, 0, batch, sizeof(batch));
+
+		data.pipe = p;
+		prepare_crtc(&data, gem_fd, output);
+
+		fd = open_pmu(I915_PMU_ENGINE_WAIT(e->class, e->instance));
+
+		val[0] = pmu_read_single(fd);
+
+		igt_fork_helper(&waiter) {
+			const uint32_t pipe_id_flag =
+					kmstest_get_vbl_flag(data.pipe);
+
+			for (;;) {
+				union drm_wait_vblank vbl = { };
+
+				vbl.request.type = DRM_VBLANK_RELATIVE;
+				vbl.request.type |= pipe_id_flag;
+				vbl.request.sequence = 1;
+				igt_assert_eq(wait_vblank(gem_fd, &vbl), 0);
+			}
+		}
+
+		for (frame = 0; frame < frames; frame++) {
+			gem_execbuf(gem_fd, &eb);
+			gem_sync(gem_fd, bb_handle);
+		}
+
+		igt_stop_helper(&waiter);
+
+		val[1] = pmu_read_single(fd);
+
+		close(fd);
+
+		cleanup_crtc(&data, gem_fd, output);
+		valid_tests++;
+
+		igt_assert(val[1] - val[0] > 0);
+	}
+
+	gem_close(gem_fd, bb_handle);
+
+	intel_register_access_fini();
+
+	igt_require_f(valid_tests,
+		      "no valid crtc/connector combinations found\n");
+}
+
+static void
+multi_client(int gem_fd, const struct intel_execution_engine2 *e)
+{
+	uint64_t config = I915_PMU_ENGINE_BUSY(e->class, e->instance);
+	unsigned int slept;
+	igt_spin_t *spin;
+	uint64_t val[2];
+	int fd[2];
+
+	fd[0] = open_pmu(config);
+
+	/*
+	 * Second PMU client which is initialized after the first one,
+	 * and exists before it, should not affect accounting as reported
+	 * in the first client.
+	 */
+	fd[1] = open_pmu(config);
+
+	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	igt_spin_batch_set_timeout(spin, batch_duration_ns);
+
+	slept = measured_usleep(batch_duration_ns / 3000);
+	val[1] = pmu_read_single(fd[1]);
+	close(fd[1]);
+
+	gem_sync(gem_fd, spin->handle);
+
+	val[0] = pmu_read_single(fd[0]);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd[0]);
+
+	assert_within_epsilon(val[0], batch_duration_ns, tolerance);
+	assert_within_epsilon(val[1], slept, tolerance);
+}
+
+/**
+ * Tests that i915 PMU corectly errors out in invalid initialization.
+ * i915 PMU is uncore PMU, thus:
+ *  - sampling period is not supported
+ *  - pid > 0 is not supported since we can't count per-process (we count
+ *    per whole system)
+ *  - cpu != 0 is not supported since i915 PMU exposes cpumask for CPU0
+ */
+static void invalid_init(void)
+{
+	struct perf_event_attr attr;
+	int pid, cpu;
+
+#define ATTR_INIT() \
+do { \
+	memset(&attr, 0, sizeof (attr)); \
+	attr.config = I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0); \
+	attr.type = i915_type_id(); \
+	igt_assert(attr.type != 0); \
+} while(0)
+
+	ATTR_INIT();
+	attr.sample_period = 100;
+	pid = -1;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = 0;
+	cpu = 0;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, EINVAL);
+
+	ATTR_INIT();
+	pid = -1;
+	cpu = 1;
+	igt_assert_eq(perf_event_open(&attr, pid, cpu, -1, 0), -1);
+	igt_assert_eq(errno, ENODEV);
+}
+
+static void init_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	close(fd);
+}
+
+static void read_other(unsigned int i, bool valid)
+{
+	int fd;
+
+	fd = perf_i915_open(__I915_PMU_OTHER(i));
+	igt_require(!(fd < 0 && errno == ENODEV));
+	if (valid) {
+		igt_assert(fd >= 0);
+	} else {
+		igt_assert(fd < 0);
+		return;
+	}
+
+	(void)pmu_read_single(fd);
+
+	close(fd);
+}
+
+static bool cpu0_hotplug_support(void)
+{
+	return access("/sys/devices/system/cpu/cpu0/online", W_OK) == 0;
+}
+
+static void cpu_hotplug(int gem_fd)
+{
+	struct timespec start = { };
+	igt_spin_t *spin;
+	uint64_t val, ref;
+	int fd;
+
+	igt_require(cpu0_hotplug_support());
+
+	fd = perf_i915_open(I915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_RENDER, 0));
+	igt_assert(fd >= 0);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+
+	igt_nsec_elapsed(&start);
+
+	/*
+	 * Toggle online status of all the CPUs in a child process and ensure
+	 * this has not affected busyness stats in the parent.
+	 */
+	igt_fork(child, 1) {
+		int cpu = 0;
+
+		for (;;) {
+			char name[128];
+			int cpufd;
+
+			sprintf(name, "/sys/devices/system/cpu/cpu%d/online",
+				cpu);
+			cpufd = open(name, O_WRONLY);
+			if (cpufd == -1) {
+				igt_assert(cpu > 0);
+				break;
+			}
+			igt_assert_eq(write(cpufd, "0", 2), 2);
+
+			usleep(1e6);
+
+			igt_assert_eq(write(cpufd, "1", 2), 2);
+
+			close(cpufd);
+			cpu++;
+		}
+	}
+
+	igt_waitchildren();
+
+	igt_spin_batch_end(spin);
+	gem_sync(gem_fd, spin->handle);
+
+	ref = igt_nsec_elapsed(&start);
+	val = pmu_read_single(fd);
+
+	igt_spin_batch_free(gem_fd, spin);
+	close(fd);
+
+	assert_within_epsilon(val, ref, tolerance);
+}
+
+static unsigned long calibrate_nop(int fd, const unsigned int calibration_us)
+{
+	const unsigned int cal_min_us = calibration_us * 3;
+	const unsigned int tolerance_pct = 10;
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	const unsigned int loops = 17;
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	struct timespec t_begin = { };
+	long size, last_size;
+	unsigned long ns;
+
+	igt_nsec_elapsed(&t_begin);
+
+	size = 256 * 1024;
+	do {
+		struct timespec t_start = { };
+
+		obj.handle = gem_create(fd, size);
+		gem_write(fd, obj.handle, size - sizeof(bbe), &bbe,
+			  sizeof(bbe));
+		gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		igt_nsec_elapsed(&t_start);
+
+		for (int loop = 0; loop < loops; loop++)
+			gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		ns = igt_nsec_elapsed(&t_start);
+
+		gem_close(fd, obj.handle);
+
+		last_size = size;
+		size = calibration_us * 1000 * size * loops / ns;
+		size = ALIGN(size, sizeof(uint32_t));
+	} while (igt_nsec_elapsed(&t_begin) / 1000 < cal_min_us ||
+		 abs(size - last_size) > (size * tolerance_pct / 100));
+
+	return size / sizeof(uint32_t);
+}
+
+static void exec_nop(int gem_fd, unsigned long sz)
+{
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	struct pollfd pfd;
+	int fence;
+
+	sz = ALIGN(sz, sizeof(uint32_t));
+
+	obj.handle = gem_create(gem_fd, sz);
+	gem_write(gem_fd, obj.handle, sz - sizeof(bbe), &bbe, sizeof(bbe));
+
+	eb.flags = I915_EXEC_RENDER | I915_EXEC_FENCE_OUT;
+
+	gem_execbuf_wr(gem_fd, &eb);
+	fence = eb.rsvd2 >> 32;
+
+	/*
+	 * Poll on the output fence to ensure user interrupts will be
+	 * generated and listened to.
+	 */
+	pfd.fd = fence;
+	pfd.events = POLLIN;
+	igt_assert_eq(poll(&pfd, 1, -1), 1);
+
+	close(fence);
+	gem_close(gem_fd, obj.handle);
+}
+
+static void
+test_interrupts(int gem_fd)
+{
+	const unsigned int calibration_us = 250000;
+	const unsigned int batch_len_us = 100000;
+	const unsigned int batch_count = 3e6 / batch_len_us;
+	uint64_t idle, busy, prev;
+	unsigned long cal, sz;
+	unsigned int i;
+	int fd;
+
+	cal = calibrate_nop(gem_fd, calibration_us);
+	sz = batch_len_us * cal / calibration_us;
+
+	fd = open_pmu(I915_PMU_INTERRUPTS);
+
+	gem_quiescent_gpu(gem_fd);
+
+	/* Wait for idle state. */
+	prev = pmu_read_single(fd);
+	idle = prev + 1;
+	while (idle != prev) {
+		usleep(1e6);
+		prev = idle;
+		idle = pmu_read_single(fd);
+	}
+
+	igt_assert_eq(idle - prev, 0);
+
+	/*
+	 * Send some no-op batches waiting on output fences to
+	 * ensure interrupts.
+	 */
+	for (i = 0; i < batch_count; i++)
+		exec_nop(gem_fd, sz);
+
+	/* Check at least as many interrupts has been generated. */
+	busy = pmu_read_single(fd) - idle;
+	close(fd);
+
+	igt_assert(busy >= batch_count);
+}
+
+static void
+test_frequency(int gem_fd)
+{
+	const uint64_t duration_ns = 2e9;
+	uint32_t min_freq, max_freq, boost_freq;
+	uint64_t min[2], max[2], start[2];
+	igt_spin_t *spin;
+	int fd, sysfs;
+
+	sysfs = igt_sysfs_open(gem_fd, NULL);
+	igt_require(sysfs >= 0);
+
+	min_freq = igt_sysfs_get_u32(sysfs, "gt_RPn_freq_mhz");
+	max_freq = igt_sysfs_get_u32(sysfs, "gt_RP0_freq_mhz");
+	boost_freq = igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz");
+	igt_require(min_freq > 0 && max_freq > 0 && boost_freq > 0);
+	igt_require(max_freq > min_freq);
+	igt_require(boost_freq > min_freq);
+
+	fd = open_group(I915_PMU_REQUESTED_FREQUENCY, -1);
+	open_group(I915_PMU_ACTUAL_FREQUENCY, fd);
+
+	/*
+	 * Set GPU to min frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == min_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", min_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, min);
+	min[0] -= start[0];
+	min[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	usleep(1e6);
+
+	/*
+	 * Set GPU to max frequency and read PMU counters.
+	 */
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_max_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_max_freq_mhz") == max_freq);
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", boost_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == boost_freq);
+
+	igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", max_freq));
+	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
+
+	pmu_read_multi(fd, 2, start);
+
+	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	igt_spin_batch_set_timeout(spin, duration_ns);
+	gem_sync(gem_fd, spin->handle);
+
+	pmu_read_multi(fd, 2, max);
+	max[0] -= start[0];
+	max[1] -= start[1];
+
+	igt_spin_batch_free(gem_fd, spin);
+
+	/*
+	 * Restore min/max.
+	 */
+	igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", min_freq);
+	if (igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") != min_freq)
+		igt_warn("Unable to restore min frequency to saved value [%u MHz], now %u MHz\n",
+			 min_freq, igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz"));
+	close(fd);
+
+	igt_assert(min[0] < max[0]);
+	igt_assert(min[1] < max[1]);
+}
+
+static void
+test_rc6(int gem_fd)
+{
+	int64_t duration_ns = 2e9;
+	uint64_t idle, busy, prev;
+	unsigned int slept;
+	int fd, fw;
+
+	fd = open_pmu(I915_PMU_RC6_RESIDENCY);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	prev = pmu_read_single(fd);
+	slept = measured_usleep(duration_ns / 1000);
+	idle = pmu_read_single(fd);
+
+	assert_within_epsilon(idle - prev, slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	prev = pmu_read_single(fd);
+	usleep(duration_ns / 1000);
+	busy = pmu_read_single(fd);
+
+	close(fw);
+	close(fd);
+
+	assert_within_epsilon(busy - prev, 0.0, tolerance);
+}
+
+static void
+test_rc6p(int gem_fd)
+{
+	int64_t duration_ns = 2e9;
+	unsigned int num_pmu = 1;
+	uint64_t idle[3], busy[3], prev[3];
+	unsigned int slept, i;
+	int fd, ret, fw;
+
+	fd = open_group(I915_PMU_RC6_RESIDENCY, -1);
+	ret = perf_i915_open_group(I915_PMU_RC6p_RESIDENCY, fd);
+	if (ret > 0) {
+		num_pmu++;
+		ret = perf_i915_open_group(I915_PMU_RC6pp_RESIDENCY, fd);
+		if (ret > 0)
+			num_pmu++;
+	}
+
+	igt_require(num_pmu == 3);
+
+	gem_quiescent_gpu(gem_fd);
+	usleep(1e6);
+
+	/* Go idle and check full RC6. */
+	pmu_read_multi(fd, num_pmu, prev);
+	slept = measured_usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, idle);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(idle[i] - prev[i], slept, tolerance);
+
+	/* Wake up device and check no RC6. */
+	fw = igt_open_forcewake_handle(gem_fd);
+	igt_assert(fw >= 0);
+
+	pmu_read_multi(fd, num_pmu, prev);
+	usleep(duration_ns / 1000);
+	pmu_read_multi(fd, num_pmu, busy);
+
+	close(fw);
+	close(fd);
+
+	for (i = 0; i < num_pmu; i++)
+		assert_within_epsilon(busy[i] - prev[i], 0.0, tolerance);
+}
+
+igt_main
+{
+	const unsigned int num_other_metrics =
+				I915_PMU_LAST - __I915_PMU_OTHER(0) + 1;
+	unsigned int num_engines = 0;
+	int fd = -1;
+	const struct intel_execution_engine2 *e;
+	unsigned int i;
+
+	igt_fixture {
+		fd = drm_open_driver_master(DRIVER_INTEL);
+
+		igt_require_gem(fd);
+		igt_require(i915_type_id() > 0);
+
+		for_each_engine_class_instance(fd, e) {
+			if (gem_has_engine(fd, e->class, e->instance))
+				num_engines++;
+		}
+	}
+
+	/**
+	 * Test invalid access via perf API is rejected.
+	 */
+	igt_subtest("invalid-init")
+		invalid_init();
+
+	for_each_engine_class_instance(fd, e) {
+		/**
+		 * Test that a single engine metric can be initialized.
+		 */
+		igt_subtest_f("init-busy-%s", e->name)
+			init(fd, e, I915_SAMPLE_BUSY);
+
+		igt_subtest_f("init-wait-%s", e->name)
+			init(fd, e, I915_SAMPLE_WAIT);
+
+		igt_subtest_f("init-sema-%s", e->name)
+			init(fd, e, I915_SAMPLE_SEMA);
+
+		/**
+		 * Test that engines show no load when idle.
+		 */
+		igt_subtest_f("idle-%s", e->name)
+			single(fd, e, false);
+
+		/**
+		 * Test that a single engine reports load correctly.
+		 */
+		igt_subtest_f("busy-%s", e->name)
+			single(fd, e, true);
+
+		/**
+		 * Test that when one engine is loaded other report no load.
+		 */
+		igt_subtest_f("busy-check-all-%s", e->name)
+			busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that when all except one engine are loaded all loads
+		 * are correctly reported.
+		 */
+		igt_subtest_f("most-busy-check-all-%s", e->name)
+			most_busy_check_all(fd, e, num_engines);
+
+		/**
+		 * Test that semphore counters report no activity on idle
+		 * or busy engines.
+		 */
+		igt_subtest_f("idle-no-semaphores-%s", e->name)
+			no_sema(fd, e, false);
+
+		igt_subtest_f("busy-no-semaphores-%s", e->name)
+			no_sema(fd, e, true);
+
+		/**
+		 * Test that semaphore waits are correctly reported.
+		 */
+		igt_subtest_f("semaphore-wait-%s", e->name)
+			sema_wait(fd, e);
+
+		/**
+		 * Test that event waits are correctly reported.
+		 */
+		if (e->class == I915_ENGINE_CLASS_RENDER)
+			igt_subtest_f("event-wait-%s", e->name)
+				event_wait(fd, e);
+
+		/**
+		 * Check that two perf clients do not influence each others
+		 * observations.
+		 */
+		igt_subtest_f("multi-client-%s", e->name)
+			multi_client(fd, e);
+	}
+
+	/**
+	 * Test that when all engines are loaded all loads are
+	 * correctly reported.
+	 */
+	igt_subtest("all-busy-check-all")
+		all_busy_check_all(fd, num_engines);
+
+	/**
+	 * Test that non-engine counters can be initialized and read. Apart
+	 * from the invalid metric which should fail.
+	 */
+	for (i = 0; i < num_other_metrics + 1; i++) {
+		igt_subtest_f("other-init-%u", i)
+			init_other(i, i < num_other_metrics);
+
+		igt_subtest_f("other-read-%u", i)
+			read_other(i, i < num_other_metrics);
+	}
+
+	/**
+	 * Test counters are not affected by CPU offline/online events.
+	 */
+	igt_subtest("cpu-hotplug")
+		cpu_hotplug(fd);
+
+	/**
+	 * Test GPU frequency.
+	 */
+	igt_subtest("frequency")
+		test_frequency(fd);
+
+	/**
+	 * Test interrupt count reporting.
+	 */
+	igt_subtest("interrupts")
+		test_interrupts(fd);
+
+	/**
+	 * Test RC6 residency reporting.
+	 */
+	igt_subtest("rc6")
+		test_rc6(fd);
+
+	/**
+	 * Test RC6p residency reporting.
+	 */
+	igt_subtest("rc6p")
+		test_rc6p(fd);
+
+	/**
+	 * Check render nodes are counted.
+	 */
+	igt_subtest_group {
+		int render_fd;
+
+		igt_fixture {
+			render_fd = drm_open_driver_render(DRIVER_INTEL);
+			igt_require_gem(render_fd);
+
+			gem_quiescent_gpu(fd);
+		}
+
+		for_each_engine_class_instance(fd, e) {
+			igt_subtest_f("render-node-busy-%s", e->name)
+				single(fd, e, true);
+		}
+
+		igt_fixture {
+			close(render_fd);
+		}
+	}
+}
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH i-g-t v3 8/9] gem_wsim: Busy stats balancers
  2017-10-10  9:30 ` [PATCH i-g-t 8/9] gem_wsim: Busy stats balancers Tvrtko Ursulin
@ 2017-11-21 19:37   ` Tvrtko Ursulin
  0 siblings, 0 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-11-21 19:37 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add busy and busy-avg balancers which make balancing decisions by looking
at engine busyness via the i915 PMU.

And thus are able to make decisions on the actual instantaneous load of
the system, and not use metrics that lag behind by a batch or two. In
doing so, each client should be able to greedily maximise their own
usage of the system, leading to improved load balancing even in the face
of other uncooperative clients. On the other hand, we are only using the
instantaneous load without coupling in the predictive factor for dispatch
and execution length.

v2:
 * Commit text. (Chris Wilson)
 * Rename get_stats to get_pmu_stats. (Chris Wilson)
 * Fix PMU readout in VCS remap mode.

v3:
 * Integrated Petri's meson build recipe.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Petri Latvala <petri.latvala@intel.com>
---
 benchmarks/Makefile.am |   2 +-
 benchmarks/gem_wsim.c  | 142 +++++++++++++++++++++++++++++++++++++++++++++++++
 benchmarks/meson.build |   7 ++-
 3 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/benchmarks/Makefile.am b/benchmarks/Makefile.am
index d066112a32a2..a81a55e01697 100644
--- a/benchmarks/Makefile.am
+++ b/benchmarks/Makefile.am
@@ -21,7 +21,7 @@ gem_latency_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_latency_LDADD = $(LDADD) -lpthread
 gem_syslatency_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_syslatency_LDADD = $(LDADD) -lpthread -lrt
-gem_wsim_LDADD = $(LDADD) -lpthread
+gem_wsim_LDADD = $(LDADD) $(top_builddir)/lib/libigt_perf.la -lpthread
 
 EXTRA_DIST= \
 	README \
diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c
index 82fe6ba9ec5f..8b2cd90659a9 100644
--- a/benchmarks/gem_wsim.c
+++ b/benchmarks/gem_wsim.c
@@ -50,6 +50,7 @@
 #include "intel_io.h"
 #include "igt_aux.h"
 #include "igt_rand.h"
+#include "igt_perf.h"
 #include "sw_sync.h"
 
 #include "ewma.h"
@@ -188,6 +189,16 @@ struct workload
 			uint32_t last[NUM_ENGINES];
 		} rt;
 	};
+
+	struct busy_balancer {
+		int fd;
+		bool first;
+		unsigned int num_engines;
+		unsigned int engine_map[5];
+		uint64_t t_prev;
+		uint64_t prev[5];
+		double busy[5];
+	} busy_balancer;
 };
 
 static const unsigned int nop_calibration_us = 1000;
@@ -993,6 +1004,8 @@ struct workload_balancer {
 	unsigned int flags;
 	unsigned int min_gen;
 
+	int (*init)(const struct workload_balancer *balancer,
+		    struct workload *wrk);
 	unsigned int (*get_qd)(const struct workload_balancer *balancer,
 			       struct workload *wrk,
 			       enum intel_engine_id engine);
@@ -1242,6 +1255,108 @@ context_balance(const struct workload_balancer *balancer,
 	return get_vcs_engine(wrk->ctx_list[w->context].static_vcs);
 }
 
+static unsigned int
+get_engine_busy(const struct workload_balancer *balancer,
+		struct workload *wrk, enum intel_engine_id engine)
+{
+	struct busy_balancer *bb = &wrk->busy_balancer;
+
+	if (engine == VCS2 && (wrk->flags & VCS2REMAP))
+		engine = BCS;
+
+	return bb->busy[bb->engine_map[engine]];
+}
+
+static void
+get_pmu_stats(const struct workload_balancer *b, struct workload *wrk)
+{
+	struct busy_balancer *bb = &wrk->busy_balancer;
+	uint64_t val[7];
+	unsigned int i;
+
+	igt_assert_eq(read(bb->fd, val, sizeof(val)),
+		      (2 + bb->num_engines) * sizeof(uint64_t));
+
+	if (!bb->first) {
+		for (i = 0; i < bb->num_engines; i++) {
+			double d;
+
+			d = (val[2 + i] - bb->prev[i]) * 100;
+			d /= val[1] - bb->t_prev;
+			bb->busy[i] = d;
+		}
+	}
+
+	for (i = 0; i < bb->num_engines; i++)
+		bb->prev[i] = val[2 + i];
+
+	bb->t_prev = val[1];
+	bb->first = false;
+}
+
+static enum intel_engine_id
+busy_avg_balance(const struct workload_balancer *balancer,
+		 struct workload *wrk, struct w_step *w)
+{
+	get_pmu_stats(balancer, wrk);
+
+	return qdavg_balance(balancer, wrk, w);
+}
+
+static enum intel_engine_id
+busy_balance(const struct workload_balancer *balancer,
+	     struct workload *wrk, struct w_step *w)
+{
+	get_pmu_stats(balancer, wrk);
+
+	return qd_balance(balancer, wrk, w);
+}
+
+static int
+busy_init(const struct workload_balancer *balancer, struct workload *wrk)
+{
+	struct busy_balancer *bb = &wrk->busy_balancer;
+	struct engine_desc {
+		unsigned class, inst;
+		enum intel_engine_id id;
+	} *d, engines[] = {
+		{ I915_ENGINE_CLASS_RENDER, 0, RCS },
+		{ I915_ENGINE_CLASS_COPY, 0, BCS },
+		{ I915_ENGINE_CLASS_VIDEO, 0, VCS1 },
+		{ I915_ENGINE_CLASS_VIDEO, 1, VCS2 },
+		{ I915_ENGINE_CLASS_VIDEO_ENHANCE, 0, VECS },
+		{ 0, 0, VCS }
+	};
+
+	bb->num_engines = 0;
+	bb->first = true;
+	bb->fd = -1;
+
+	for (d = &engines[0]; d->id != VCS; d++) {
+		int pfd;
+
+		pfd = perf_i915_open_group(I915_PMU_ENGINE_BUSY(d->class,
+							        d->inst),
+					   bb->fd);
+		if (pfd < 0) {
+			if (d->id != VCS2)
+				return -(10 + bb->num_engines);
+			else
+				continue;
+		}
+
+		if (bb->num_engines == 0)
+			bb->fd = pfd;
+
+		bb->engine_map[d->id] = bb->num_engines++;
+	}
+
+	if (bb->num_engines < 5 && !(wrk->flags & VCS2REMAP))
+		return -1;
+
+	return 0;
+}
+
 static const struct workload_balancer all_balancers[] = {
 	{
 		.id = 0,
@@ -1315,6 +1430,22 @@ static const struct workload_balancer all_balancers[] = {
 		.desc = "Static round-robin VCS assignment at context creation.",
 		.balance = context_balance,
 	},
+	{
+		.id = 9,
+		.name = "busy",
+		.desc = "Engine busyness based balancing.",
+		.init = busy_init,
+		.get_qd = get_engine_busy,
+		.balance = busy_balance,
+	},
+	{
+		.id = 10,
+		.name = "busy-avg",
+		.desc = "Average engine busyness based balancing.",
+		.init = busy_init,
+		.get_qd = get_engine_busy,
+		.balance = busy_avg_balance,
+	},
 };
 
 static unsigned int
@@ -2226,6 +2357,17 @@ int main(int argc, char **argv)
 				    (verbose > 0 && master_workload == i);
 
 		prepare_workload(i, w[i], flags_);
+
+		if (balancer && balancer->init) {
+			int ret = balancer->init(balancer, w[i]);
+			if (ret) {
+				if (verbose)
+					fprintf(stderr,
+						"Failed to initialize balancing! (%u=%d)\n",
+						i, ret);
+				return 1;
+			}
+		}
 	}
 
 	gem_quiescent_gpu(fd);
diff --git a/benchmarks/meson.build b/benchmarks/meson.build
index 9ab738f76588..fa7f07643a97 100644
--- a/benchmarks/meson.build
+++ b/benchmarks/meson.build
@@ -31,6 +31,11 @@ endif
 foreach prog : benchmark_progs
 	# FIXME meson doesn't like binaries with the same name
 	# meanwhile just suffix with _bench
+	link = []
+	if prog == 'gem_wsim'
+		link += lib_igt_perf
+	endif
 	executable(prog + '_bench', prog + '.c',
-			dependencies : test_deps)
+		   link_with : link,
+		   dependencies : test_deps)
 endforeach
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* ✓ Fi.CI.BAT: success for IGT PMU support (rev18)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (16 preceding siblings ...)
  2017-10-11 20:16 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2017-11-22 11:41 ` Patchwork
  2017-11-22 11:57   ` Tvrtko Ursulin
  2017-11-22 14:31 ` ✓ Fi.CI.IGT: " Patchwork
  18 siblings, 1 reply; 46+ messages in thread
From: Patchwork @ 2017-11-22 11:41 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev18)
URL   : https://patchwork.freedesktop.org/series/28253/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
f8f6db9ced0061229018fa658cf1c80c56464686 tools/intel_watermark: Try not to dump nonexistent planes on SKL+

with latest DRM-Tip kernel build CI_DRM_3371
d8251dc669fe drm-tip: 2017y-11m-22d-10h-27m-38s UTC integration manifest

Testlist changes:
+igt@perf_pmu@all-busy-check-all
+igt@perf_pmu@busy-bcs0
+igt@perf_pmu@busy-check-all-bcs0
+igt@perf_pmu@busy-check-all-rcs0
+igt@perf_pmu@busy-check-all-vcs0
+igt@perf_pmu@busy-check-all-vcs1
+igt@perf_pmu@busy-check-all-vecs0
+igt@perf_pmu@busy-no-semaphores-bcs0
+igt@perf_pmu@busy-no-semaphores-rcs0
+igt@perf_pmu@busy-no-semaphores-vcs0
+igt@perf_pmu@busy-no-semaphores-vcs1
+igt@perf_pmu@busy-no-semaphores-vecs0
+igt@perf_pmu@busy-rcs0
+igt@perf_pmu@busy-vcs0
+igt@perf_pmu@busy-vcs1
+igt@perf_pmu@busy-vecs0
+igt@perf_pmu@cpu-hotplug
+igt@perf_pmu@event-wait-rcs0
+igt@perf_pmu@frequency
+igt@perf_pmu@idle-bcs0
+igt@perf_pmu@idle-no-semaphores-bcs0
+igt@perf_pmu@idle-no-semaphores-rcs0
+igt@perf_pmu@idle-no-semaphores-vcs0
+igt@perf_pmu@idle-no-semaphores-vcs1
+igt@perf_pmu@idle-no-semaphores-vecs0
+igt@perf_pmu@idle-rcs0
+igt@perf_pmu@idle-vcs0
+igt@perf_pmu@idle-vcs1
+igt@perf_pmu@idle-vecs0
+igt@perf_pmu@init-busy-bcs0
+igt@perf_pmu@init-busy-rcs0
+igt@perf_pmu@init-busy-vcs0
+igt@perf_pmu@init-busy-vcs1
+igt@perf_pmu@init-busy-vecs0
+igt@perf_pmu@init-sema-bcs0
+igt@perf_pmu@init-sema-rcs0
+igt@perf_pmu@init-sema-vcs0
+igt@perf_pmu@init-sema-vcs1
+igt@perf_pmu@init-sema-vecs0
+igt@perf_pmu@init-wait-bcs0
+igt@perf_pmu@init-wait-rcs0
+igt@perf_pmu@init-wait-vcs0
+igt@perf_pmu@init-wait-vcs1
+igt@perf_pmu@init-wait-vecs0
+igt@perf_pmu@interrupts
+igt@perf_pmu@invalid-init
+igt@perf_pmu@most-busy-check-all-bcs0
+igt@perf_pmu@most-busy-check-all-rcs0
+igt@perf_pmu@most-busy-check-all-vcs0
+igt@perf_pmu@most-busy-check-all-vcs1
+igt@perf_pmu@most-busy-check-all-vecs0
+igt@perf_pmu@multi-client-bcs0
+igt@perf_pmu@multi-client-rcs0
+igt@perf_pmu@multi-client-vcs0
+igt@perf_pmu@multi-client-vcs1
+igt@perf_pmu@multi-client-vecs0
+igt@perf_pmu@other-init-0
+igt@perf_pmu@other-init-1
+igt@perf_pmu@other-init-2
+igt@perf_pmu@other-init-3
+igt@perf_pmu@other-init-4
+igt@perf_pmu@other-init-5
+igt@perf_pmu@other-init-6
+igt@perf_pmu@other-read-0
+igt@perf_pmu@other-read-1
+igt@perf_pmu@other-read-2
+igt@perf_pmu@other-read-3
+igt@perf_pmu@other-read-4
+igt@perf_pmu@other-read-5
+igt@perf_pmu@other-read-6
+igt@perf_pmu@rc6
+igt@perf_pmu@rc6p
+igt@perf_pmu@render-node-busy-bcs0
+igt@perf_pmu@render-node-busy-rcs0
+igt@perf_pmu@render-node-busy-vcs0
+igt@perf_pmu@render-node-busy-vcs1
+igt@perf_pmu@render-node-busy-vecs0
+igt@perf_pmu@semaphore-wait-bcs0
+igt@perf_pmu@semaphore-wait-rcs0
+igt@perf_pmu@semaphore-wait-vcs0
+igt@perf_pmu@semaphore-wait-vcs1
+igt@perf_pmu@semaphore-wait-vecs0

Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                incomplete -> PASS       (fi-snb-2520m) fdo#103713

fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:452s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:455s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:385s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:546s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:280s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:512s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:512s
fi-byt-n2820     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:497s
fi-cfl-s2        total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:607s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:431s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:265s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:543s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:430s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:440s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:432s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:490s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:464s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:480s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:534s
fi-kbl-7567u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:476s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:538s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:581s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:456s
fi-skl-6600u     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:546s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:572s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:523s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:495s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:468s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:562s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:435s
Blacklisted hosts:
fi-cnl-y         total:253  pass:227  dwarn:0   dfail:0   fail:0   skip:25 
fi-glk-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:506s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_520/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: ✓ Fi.CI.BAT: success for IGT PMU support (rev18)
  2017-11-22 11:41 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev18) Patchwork
@ 2017-11-22 11:57   ` Tvrtko Ursulin
  2017-11-22 12:47     ` Petri Latvala
  0 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-11-22 11:57 UTC (permalink / raw)
  To: intel-gfx, Patchwork, Tvrtko Ursulin, Tomi Sarvela, Arkadiusz Hiler


Hi guys,

On 22/11/2017 11:41, Patchwork wrote:

[snip]

> Testlist changes:
> +igt@perf_pmu@all-busy-check-all
> +igt@perf_pmu@busy-bcs0
> +igt@perf_pmu@busy-check-all-bcs0
> +igt@perf_pmu@busy-check-all-rcs0
> +igt@perf_pmu@busy-check-all-vcs0
> +igt@perf_pmu@busy-check-all-vcs1
> +igt@perf_pmu@busy-check-all-vecs0
> +igt@perf_pmu@busy-no-semaphores-bcs0
> +igt@perf_pmu@busy-no-semaphores-rcs0
> +igt@perf_pmu@busy-no-semaphores-vcs0
> +igt@perf_pmu@busy-no-semaphores-vcs1
> +igt@perf_pmu@busy-no-semaphores-vecs0
> +igt@perf_pmu@busy-rcs0
> +igt@perf_pmu@busy-vcs0
> +igt@perf_pmu@busy-vcs1
> +igt@perf_pmu@busy-vecs0
> +igt@perf_pmu@cpu-hotplug
> +igt@perf_pmu@event-wait-rcs0
> +igt@perf_pmu@frequency
> +igt@perf_pmu@idle-bcs0
> +igt@perf_pmu@idle-no-semaphores-bcs0
> +igt@perf_pmu@idle-no-semaphores-rcs0
> +igt@perf_pmu@idle-no-semaphores-vcs0
> +igt@perf_pmu@idle-no-semaphores-vcs1
> +igt@perf_pmu@idle-no-semaphores-vecs0
> +igt@perf_pmu@idle-rcs0
> +igt@perf_pmu@idle-vcs0
> +igt@perf_pmu@idle-vcs1
> +igt@perf_pmu@idle-vecs0
> +igt@perf_pmu@init-busy-bcs0
> +igt@perf_pmu@init-busy-rcs0
> +igt@perf_pmu@init-busy-vcs0
> +igt@perf_pmu@init-busy-vcs1
> +igt@perf_pmu@init-busy-vecs0
> +igt@perf_pmu@init-sema-bcs0
> +igt@perf_pmu@init-sema-rcs0
> +igt@perf_pmu@init-sema-vcs0
> +igt@perf_pmu@init-sema-vcs1
> +igt@perf_pmu@init-sema-vecs0
> +igt@perf_pmu@init-wait-bcs0
> +igt@perf_pmu@init-wait-rcs0
> +igt@perf_pmu@init-wait-vcs0
> +igt@perf_pmu@init-wait-vcs1
> +igt@perf_pmu@init-wait-vecs0
> +igt@perf_pmu@interrupts
> +igt@perf_pmu@invalid-init
> +igt@perf_pmu@most-busy-check-all-bcs0
> +igt@perf_pmu@most-busy-check-all-rcs0
> +igt@perf_pmu@most-busy-check-all-vcs0
> +igt@perf_pmu@most-busy-check-all-vcs1
> +igt@perf_pmu@most-busy-check-all-vecs0
> +igt@perf_pmu@multi-client-bcs0
> +igt@perf_pmu@multi-client-rcs0
> +igt@perf_pmu@multi-client-vcs0
> +igt@perf_pmu@multi-client-vcs1
> +igt@perf_pmu@multi-client-vecs0
> +igt@perf_pmu@other-init-0
> +igt@perf_pmu@other-init-1
> +igt@perf_pmu@other-init-2
> +igt@perf_pmu@other-init-3
> +igt@perf_pmu@other-init-4
> +igt@perf_pmu@other-init-5
> +igt@perf_pmu@other-init-6
> +igt@perf_pmu@other-read-0
> +igt@perf_pmu@other-read-1
> +igt@perf_pmu@other-read-2
> +igt@perf_pmu@other-read-3
> +igt@perf_pmu@other-read-4
> +igt@perf_pmu@other-read-5
> +igt@perf_pmu@other-read-6
> +igt@perf_pmu@rc6
> +igt@perf_pmu@rc6p
> +igt@perf_pmu@render-node-busy-bcs0
> +igt@perf_pmu@render-node-busy-rcs0
> +igt@perf_pmu@render-node-busy-vcs0
> +igt@perf_pmu@render-node-busy-vcs1
> +igt@perf_pmu@render-node-busy-vecs0
> +igt@perf_pmu@semaphore-wait-bcs0
> +igt@perf_pmu@semaphore-wait-rcs0
> +igt@perf_pmu@semaphore-wait-vcs0
> +igt@perf_pmu@semaphore-wait-vcs1
> +igt@perf_pmu@semaphore-wait-vecs0

Would it be possible to have a test run of these new tests on the shards?

If successful then we can add it to the testlist. Total runtime should 
be up to 30 seconds.

But I wouldn't be surprised if there will be issues since I was only 
able to test on SKL during development.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: ✓ Fi.CI.BAT:  success for IGT PMU support (rev18)
  2017-11-22 11:57   ` Tvrtko Ursulin
@ 2017-11-22 12:47     ` Petri Latvala
  0 siblings, 0 replies; 46+ messages in thread
From: Petri Latvala @ 2017-11-22 12:47 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Tomi Sarvela, intel-gfx

On Wed, Nov 22, 2017 at 11:57:19AM +0000, Tvrtko Ursulin wrote:
> 
> Hi guys,
> 
> On 22/11/2017 11:41, Patchwork wrote:
> 
> [snip]
> 
> > Testlist changes:
> > +igt@perf_pmu@all-busy-check-all
> > +igt@perf_pmu@busy-bcs0
> > +igt@perf_pmu@busy-check-all-bcs0
> > +igt@perf_pmu@busy-check-all-rcs0
> > +igt@perf_pmu@busy-check-all-vcs0
> > +igt@perf_pmu@busy-check-all-vcs1
> > +igt@perf_pmu@busy-check-all-vecs0
> > +igt@perf_pmu@busy-no-semaphores-bcs0
> > +igt@perf_pmu@busy-no-semaphores-rcs0
> > +igt@perf_pmu@busy-no-semaphores-vcs0
> > +igt@perf_pmu@busy-no-semaphores-vcs1
> > +igt@perf_pmu@busy-no-semaphores-vecs0
> > +igt@perf_pmu@busy-rcs0
> > +igt@perf_pmu@busy-vcs0
> > +igt@perf_pmu@busy-vcs1
> > +igt@perf_pmu@busy-vecs0
> > +igt@perf_pmu@cpu-hotplug
> > +igt@perf_pmu@event-wait-rcs0
> > +igt@perf_pmu@frequency
> > +igt@perf_pmu@idle-bcs0
> > +igt@perf_pmu@idle-no-semaphores-bcs0
> > +igt@perf_pmu@idle-no-semaphores-rcs0
> > +igt@perf_pmu@idle-no-semaphores-vcs0
> > +igt@perf_pmu@idle-no-semaphores-vcs1
> > +igt@perf_pmu@idle-no-semaphores-vecs0
> > +igt@perf_pmu@idle-rcs0
> > +igt@perf_pmu@idle-vcs0
> > +igt@perf_pmu@idle-vcs1
> > +igt@perf_pmu@idle-vecs0
> > +igt@perf_pmu@init-busy-bcs0
> > +igt@perf_pmu@init-busy-rcs0
> > +igt@perf_pmu@init-busy-vcs0
> > +igt@perf_pmu@init-busy-vcs1
> > +igt@perf_pmu@init-busy-vecs0
> > +igt@perf_pmu@init-sema-bcs0
> > +igt@perf_pmu@init-sema-rcs0
> > +igt@perf_pmu@init-sema-vcs0
> > +igt@perf_pmu@init-sema-vcs1
> > +igt@perf_pmu@init-sema-vecs0
> > +igt@perf_pmu@init-wait-bcs0
> > +igt@perf_pmu@init-wait-rcs0
> > +igt@perf_pmu@init-wait-vcs0
> > +igt@perf_pmu@init-wait-vcs1
> > +igt@perf_pmu@init-wait-vecs0
> > +igt@perf_pmu@interrupts
> > +igt@perf_pmu@invalid-init
> > +igt@perf_pmu@most-busy-check-all-bcs0
> > +igt@perf_pmu@most-busy-check-all-rcs0
> > +igt@perf_pmu@most-busy-check-all-vcs0
> > +igt@perf_pmu@most-busy-check-all-vcs1
> > +igt@perf_pmu@most-busy-check-all-vecs0
> > +igt@perf_pmu@multi-client-bcs0
> > +igt@perf_pmu@multi-client-rcs0
> > +igt@perf_pmu@multi-client-vcs0
> > +igt@perf_pmu@multi-client-vcs1
> > +igt@perf_pmu@multi-client-vecs0
> > +igt@perf_pmu@other-init-0
> > +igt@perf_pmu@other-init-1
> > +igt@perf_pmu@other-init-2
> > +igt@perf_pmu@other-init-3
> > +igt@perf_pmu@other-init-4
> > +igt@perf_pmu@other-init-5
> > +igt@perf_pmu@other-init-6
> > +igt@perf_pmu@other-read-0
> > +igt@perf_pmu@other-read-1
> > +igt@perf_pmu@other-read-2
> > +igt@perf_pmu@other-read-3
> > +igt@perf_pmu@other-read-4
> > +igt@perf_pmu@other-read-5
> > +igt@perf_pmu@other-read-6
> > +igt@perf_pmu@rc6
> > +igt@perf_pmu@rc6p
> > +igt@perf_pmu@render-node-busy-bcs0
> > +igt@perf_pmu@render-node-busy-rcs0
> > +igt@perf_pmu@render-node-busy-vcs0
> > +igt@perf_pmu@render-node-busy-vcs1
> > +igt@perf_pmu@render-node-busy-vecs0
> > +igt@perf_pmu@semaphore-wait-bcs0
> > +igt@perf_pmu@semaphore-wait-rcs0
> > +igt@perf_pmu@semaphore-wait-vcs0
> > +igt@perf_pmu@semaphore-wait-vcs1
> > +igt@perf_pmu@semaphore-wait-vecs0
> 
> Would it be possible to have a test run of these new tests on the shards?

The shard run will run them automatically, you just have to check the
results manually, and look at shards-all.html instead of shards.html



-- 
Petri Latvala
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* ✓ Fi.CI.IGT: success for IGT PMU support (rev18)
  2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
                   ` (17 preceding siblings ...)
  2017-11-22 11:41 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev18) Patchwork
@ 2017-11-22 14:31 ` Patchwork
  2017-11-22 14:39   ` Petri Latvala
  18 siblings, 1 reply; 46+ messages in thread
From: Patchwork @ 2017-11-22 14:31 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: IGT PMU support (rev18)
URL   : https://patchwork.freedesktop.org/series/28253/
State : success

== Summary ==

Test kms_flip:
        Subgroup plain-flip-fb-recreate-interruptible:
                pass       -> FAIL       (shard-hsw) fdo#100368

fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368

shard-hsw        total:2667 pass:1473 dwarn:2   dfail:0   fail:11  skip:1181 time:9509s
shard-snb        total:2649 pass:1241 dwarn:1   dfail:0   fail:11  skip:1395 time:7972s
Blacklisted hosts:
shard-apl        total:2631 pass:1591 dwarn:2   dfail:0   fail:24  skip:1012 time:12761s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_520/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: ✓ Fi.CI.IGT:  success for IGT PMU support (rev18)
  2017-11-22 14:31 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-11-22 14:39   ` Petri Latvala
  2017-11-22 14:50     ` Petri Latvala
  2017-11-22 17:09     ` Tvrtko Ursulin
  0 siblings, 2 replies; 46+ messages in thread
From: Petri Latvala @ 2017-11-22 14:39 UTC (permalink / raw)
  To: Tomi Sarvela, Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Nov 22, 2017 at 02:31:08PM +0000, Patchwork wrote:
> == Series Details ==
> 
> Series: IGT PMU support (rev18)
> URL   : https://patchwork.freedesktop.org/series/28253/
> State : success
> 
> == Summary ==
> 
> Test kms_flip:
>         Subgroup plain-flip-fb-recreate-interruptible:
>                 pass       -> FAIL       (shard-hsw) fdo#100368


Tomi, why doesn't this diff show igt@drv_selftest@live_hangcheck
pass -> incomplete on snb?



> 
> fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
> 
> shard-hsw        total:2667 pass:1473 dwarn:2   dfail:0   fail:11  skip:1181 time:9509s
> shard-snb        total:2649 pass:1241 dwarn:1   dfail:0   fail:11  skip:1395 time:7972s
> Blacklisted hosts:
> shard-apl        total:2631 pass:1591 dwarn:2   dfail:0   fail:24  skip:1012 time:12761s
> 
> == Logs ==
> 
> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_520/shards.html


shards-all.html shows perf_pmu tests getting a SKIP across the board,
on SNB, HSW and APL. igt@perf_pmu@multi-client-vcs1 on APL being the
only exception but that's just a notrun (due to a hang earlier in that
particular shard).



-- 
Petri Latvala
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: ✓ Fi.CI.IGT:   success for IGT PMU support (rev18)
  2017-11-22 14:39   ` Petri Latvala
@ 2017-11-22 14:50     ` Petri Latvala
  2017-11-22 17:09     ` Tvrtko Ursulin
  1 sibling, 0 replies; 46+ messages in thread
From: Petri Latvala @ 2017-11-22 14:50 UTC (permalink / raw)
  To: Petri Latvala; +Cc: Tomi Sarvela, intel-gfx

On Wed, Nov 22, 2017 at 04:39:49PM +0200, Petri Latvala wrote:
> On Wed, Nov 22, 2017 at 02:31:08PM +0000, Patchwork wrote:
> > == Series Details ==
> > 
> > Series: IGT PMU support (rev18)
> > URL   : https://patchwork.freedesktop.org/series/28253/
> > State : success
> > 
> > == Summary ==
> > 
> > Test kms_flip:
> >         Subgroup plain-flip-fb-recreate-interruptible:
> >                 pass       -> FAIL       (shard-hsw) fdo#100368
> 
> 
> Tomi, why doesn't this diff show igt@drv_selftest@live_hangcheck
> pass -> incomplete on snb?

Mystery solved, cibuglog has that test suppressed.

-- 
Petri Latvala
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: ✓ Fi.CI.IGT: success for IGT PMU support (rev18)
  2017-11-22 14:39   ` Petri Latvala
  2017-11-22 14:50     ` Petri Latvala
@ 2017-11-22 17:09     ` Tvrtko Ursulin
  1 sibling, 0 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2017-11-22 17:09 UTC (permalink / raw)
  To: Petri Latvala, Tomi Sarvela, Tvrtko Ursulin; +Cc: intel-gfx


On 22/11/2017 14:39, Petri Latvala wrote:
> On Wed, Nov 22, 2017 at 02:31:08PM +0000, Patchwork wrote:

[snip]

>>
>> fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
>>
>> shard-hsw        total:2667 pass:1473 dwarn:2   dfail:0   fail:11  skip:1181 time:9509s
>> shard-snb        total:2649 pass:1241 dwarn:1   dfail:0   fail:11  skip:1395 time:7972s
>> Blacklisted hosts:
>> shard-apl        total:2631 pass:1591 dwarn:2   dfail:0   fail:24  skip:1012 time:12761s
>>
>> == Logs ==
>>
>> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_520/shards.html
> 
> 
> shards-all.html shows perf_pmu tests getting a SKIP across the board,
> on SNB, HSW and APL. igt@perf_pmu@multi-client-vcs1 on APL being the
> only exception but that's just a notrun (due to a hang earlier in that
> particular shard).

After some head scratching Chris suggested that drm-tip at the time when 
this run did not yet contain the PMU patches.

However on a subsequent runs some lockdep issue surfaced so its not like 
all is good after all.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2017-11-22 17:09 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10  9:29 [PATCH v4 i-g-t 0/7] IGT PMU support Tvrtko Ursulin
2017-10-10  9:30 ` [PATCH i-g-t 1/9] intel-gpu-overlay: Move local perf implementation to a library Tvrtko Ursulin
2017-11-21 19:34   ` [PATCH i-g-t v3 " Tvrtko Ursulin
2017-10-10  9:30 ` [PATCH i-g-t 2/9] intel-gpu-overlay: Consolidate perf PMU access to library Tvrtko Ursulin
2017-10-10 12:21   ` Chris Wilson
2017-10-10  9:30 ` [PATCH i-g-t 3/9] lib/perf: Fix data types and general tidy Tvrtko Ursulin
2017-10-10 12:22   ` Chris Wilson
2017-10-10  9:30 ` [PATCH i-g-t 4/9] intel-gpu-overlay: Fix interrupts PMU readout Tvrtko Ursulin
2017-10-10 12:23   ` Chris Wilson
2017-10-10 14:17     ` [PATCH i-g-t v2 " Tvrtko Ursulin
2017-10-10  9:30 ` [PATCH i-g-t 5/9] intel-gpu-overlay: Catch-up to new i915 PMU Tvrtko Ursulin
2017-11-21 18:20   ` Tvrtko Ursulin
2017-10-10  9:30 ` [PATCH i-g-t 6/9] intel-gpu-overlay: Use RAPL PMU for power reading Tvrtko Ursulin
2017-10-10 11:30   ` [PATCH i-g-t v2 " Tvrtko Ursulin
2017-10-10 12:05     ` [PATCH i-g-t v3 " Tvrtko Ursulin
2017-10-10 12:25       ` Chris Wilson
2017-11-21 19:35       ` [PATCH i-g-t v4 " Tvrtko Ursulin
2017-10-10  9:30 ` [PATCH i-g-t 7/9] tests/perf_pmu: Tests for i915 PMU API Tvrtko Ursulin
2017-10-10 12:37   ` Chris Wilson
2017-10-10 13:38     ` Tvrtko Ursulin
2017-10-10 13:46       ` Chris Wilson
2017-10-10 14:17         ` [PATCH i-g-t v6 " Tvrtko Ursulin
2017-10-10 16:39           ` Chris Wilson
2017-10-11 12:54             ` [PATCH i-g-t v7 " Tvrtko Ursulin
2017-11-21 11:50               ` Chris Wilson
2017-11-21 18:21               ` [PATCH i-g-t v8 " Tvrtko Ursulin
2017-11-21 19:36                 ` [PATCH i-g-t v9 " Tvrtko Ursulin
2017-10-10  9:30 ` [PATCH i-g-t 8/9] gem_wsim: Busy stats balancers Tvrtko Ursulin
2017-11-21 19:37   ` [PATCH i-g-t v3 " Tvrtko Ursulin
2017-10-10  9:30 ` [PATCH i-g-t 9/9] media-bench.pl: Add busy balancers to the list Tvrtko Ursulin
2017-11-21 11:51   ` Chris Wilson
2017-10-10  9:42 ` ✗ Fi.CI.BAT: failure for IGT PMU support (rev7) Patchwork
2017-10-10 12:06 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev8) Patchwork
2017-10-10 13:48 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev9) Patchwork
2017-10-10 15:19 ` ✗ Fi.CI.IGT: failure for IGT PMU support (rev8) Patchwork
2017-10-10 18:42 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev11) Patchwork
2017-10-11  1:28 ` ✓ Fi.CI.IGT: " Patchwork
2017-10-11 14:09 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev12) Patchwork
2017-10-11 20:16 ` ✗ Fi.CI.IGT: failure " Patchwork
2017-11-22 11:41 ` ✓ Fi.CI.BAT: success for IGT PMU support (rev18) Patchwork
2017-11-22 11:57   ` Tvrtko Ursulin
2017-11-22 12:47     ` Petri Latvala
2017-11-22 14:31 ` ✓ Fi.CI.IGT: " Patchwork
2017-11-22 14:39   ` Petri Latvala
2017-11-22 14:50     ` Petri Latvala
2017-11-22 17:09     ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.